A long time customer called on Thursday. "We had a power failure
in part of the building
yesterday and now some stuff doesn't work. Can you come down?"
"Just reboot the Windows machines", I said. My assumption was
that a Windows machine or two had held on to an ip address that
was handed out to another machine. This will happen if the
router gets knocked out (it's their DHCP server) and some
of the Windows machines are not.
"No, that's not it", he said. "It's the new server and the printer
in the Parts department".
Hmmm. That is different. That printer definitely has a static
address. I'm not so sure about the new server because I didn't put it in -
it was supplied by somebody having them evaluate new POS software - but
it certainly should have been static. I asked him to check
the network setup in that server. "I already did", he said. "It was
set to 'Obtain an IP address Automatically'. I changed it to the
static address. It works internally, but the software guy can't get
to it from his office. He says our router must be confused."
Yeah, like he knows anything about your routers, I thought to myself.
Well, that I can check from here, so I did. No problem, and
still set to forward the right ports to the right places. But he
was right about one thing: outside traffic wasn't getting to that
"Are you sure you set it to .103?", I asked.
OK, I'll come look. What the heck, I haven't been out all week
anyway. It's a nice ride..
When I arrived, I decided to look at the printer first. The
software guy getting in remotely is of less importance than the
people in Parts being able to print. I first printed out the config
page and noted the ip address ended in .122. That didn't sound
right to me, so I checked the hosts file on the Unix box and yeah,
it should be .170. Hmmm, what the heck is this? It was not set
for DHCP, so somebody had put that in manually..
I changed it back to .170 and tried to ping it. No response. I
powered it off and tried again. No response. OK, time for some
forensic history. I tracked down the store manager and asked him
what he knew about the printer. He knew a lot.
"It wasn't working, so I set it to 'Automatic' IP. That didn't
Well, yeah, of course not. The Windows machines may have
been able to find it by Netbios, but the Unix box would be looking
for .170 so that couldn't work.
"So then I tried setting it to the same IP as the server."
That's more common than you might think. Of course you cannot
use duplicate ip's on the same network, but people sometimes
think "Well, this magic number works over there, so maybe it will work
HERE". I explained the folly of that. And then what?
"And then I remembered I have a spreadsheet with all the IP
numbers in it, so I set it to that."
I had joked about not being able to make much money if customers
were going to write things down, but I guess if the wrong thing gets
written down it's almost worse, and it looks like that's what
happened here. His spreadsheet was wrong. But the printer
still wasn't working even with the right IP.
I went back for another look. This time I brought my cable testing
tools from my car. A power failure isn't going to hurt a cable, but
I didn't think it had blown the printer either as the failure hadn't been
in that part of the building. Might as well check.
So I did. The cables were fine. I plugged them back into the
switch and tried a ping again. It worked..
Hmmm. Loosely plugged into the switch maybe? I hadn't noticed
that when I had unplugged to test. I took the cable out and plugged
it back in. I tried the ping - it didn't work. I unplugged the cable
again and moved it down the switch to another free port. The
ping worked and stacked up print documents started coming out..
I went back to the manager and explained what I had found (bad
port on the switch). "Oh, I tried that too", he said.
Ahh. A troubleshooting mistake. Who knows what the original problem
was, but it could have gone something like this: port on switch goes bad
coincident with power failure. He then changed the IP address on the printer.
He then moved the cable to a different port. It would have worked then, but
of course he had the wrong IP. He then moved it back.. the problem is
not controlling your variables: you have to change ONE thing at a time.
OK, printing is fixed but I warned them to keep an eye on that switch.
It's fairly new, so shouldn't be failing. On the other hand, it's just something cheap he bought at Staples or wherever - a little 5 port, so no big deal to
replace. On to the Terminal Server problem.
We knew that terminal services were working locally. I knew that the
router was probably fine - if it were not, email and other inbound services would
probably be dead also. But outside connections didn't work. I checked
the server firewall settings first, they were fine. I looked in the logs,
no indication of problems. I scratched my head for a second and then
I thought "Spreadsheet!" and opened up the network configuration. Sure
enough, he had an incorrect default gateway set. I corrected that, and
of course inbound services now worked.
Writing things down can be helpful, even if the things written are
just "magic" to the person writing them. But when the "magic" is
wrong, not understanding the actual reality leaves you with nowhere
to go. If you understand why the gateway is necessary, you'd also
know how to check it and how to get the proper gateway from any
I had him update his spreadsheet for the next time, though I don't
know why this machine would have lost its address. I wonder if it
really did.. it's possible that it was just slow recovering from
the power failure and that when he changed it, he broke it by having
the wrong gateway.. it doesn't really matter, we'll see what happens
next time - or they could put better UPS devices in place!.
Got something to add? Send me email.
Increase ad revenue 50-250% with Ezoic
More Articles by Anthony Lawrence
Find me on Google+
© 2011-03-10 Anthony Lawrence