Early next week, I plan on correcting a mistake I made some time ago. This will take a bit of explaining, so bear with me.
A customer has a branch office store. Originally, back in the mists of history, a T1 line connected the two stores telephone systems and provided some data transfer. I can vaguely remember tying into those data channels many years ago. That was just dumb terminals talking to the Unix box at the main store; we did it through Digiboard multiport serial boxes.
Time went on, and pcs replaced terminals. It was still telnet across the T1, but it evolved to TCP/IP and of course a few shared folders were added. To my surprise, this didn't choke the data lines until the Internet started to be important. The pcs at the branch stores were getting their Internet access from the main store through the T1, and it was too slow.
To fix that, we gave the branch its own internet access, and that's where I made the mistake.
The pcs in the branch were currently getting DHCP from the router in the main store. Of course that gave them that router as their default gateway and we needed them to have their local router serve that function. There are a few things I could have done at that point.
I could have put the branch pcs in a different subnet and routed them through the T1 for access to the original LAN. That would have required more advanced routers than we were currently using.
I could have put those pcs into another subnet and created a VPN back to the store, bypassing the T1. This was easy enough to do, but the owner didn't like it because Internet access was far from rock solid then and he rightfully felt that the T1 was much more reliable. He still feels that is true, and although the Internet has become more reliable in the passing years, he's right.
I could have put static ip addresses and static gateways on all the pcs at the branch. That's annoying, so I went for the only other choice.
I enabled DHCP on their local router.
Of course I used a different scope. My reasoning went like this: DHCP works by the pcs broadcasting a request to be given an address. A server that can hand out addresses responds, and whatever response the pc sees first is what it will take.
There's nothing wrong with having two DHCP servers. People will insist otherwise, but they are wrong. Another server to failover is fine. You'd usually configure them with different scopes as I did, but it is even possible to configure the same scope.
That is is unusual and tricky stuff though; usually you'd split the scope.
Of course in the usual case, you'd be handing out the same gateway and DNS from each server, but that's not what I was doing. What would prevent a pc in the branch from getting its address and gateway from the main store or vice versa?
Well, the T1 should keep that in order, I thought. Being a relatively slow path, the local pcs should get the local address offer before seeing an offer from the branch and vice versa.
And so it seemed to be. I deployed the router, rebooted the pcs, and yes, they got the gateway I wanted them to have. Life was good.
The first hint of trouble didn't turn up until we deployed OpenDNS in the main store. I configured their router to use OpenDNS, but we noticed that a few machines were not being restricted. I was shocked when I saw why - they were getting DHCP from the branch!
How could that be? Well, maybe at some point we had to reboot the router just as these machines were booting. Not getting a local response, of course they took what they got from the branch. Simple fix, I thought, repair the ip or reboot.
Nope. Did not work. Well, ok, that makes sense, it has an ip and it wants to renew with that same ip if it can, so it must prefer that server.
I tried giving it a static ip from the scope I wanted and then changing back to DHCP. Nope, it went right back to the branch. Does it store this stuff somewhere? I don't know. Is there any place you can tell it to prefer one server over others? Not that I could find.
It seems that the only way to force this was to disconnect the branch router and renew ip's while it was non-functional. That's a long way from a good solution.
Upon further investigation, I found one pc at the branch that had latched on to the main router at some point. Enough - this has to be fixed.
So, next week I will be putting static info on the branch office pcs. Then I will disable DHCP on its router and reorient the main office pcs that are incorrectly using it. That will put things back as they should be.