A new machine kills an old server (network problem). The customer said the machine had its correct IP address, but just wasn't accessing network resources.
A few days ago I had an email from a long time customer telling me that she
had been trying to get a new Dell configured but it just couldn't
seem to see the network. Her email said the machine was getting
an ip address, but just wasn't accessing network resources. She
said she'd had enough for the day, but would call me in the morning.
I assumed this would be some some variation of a Windows authentication problem and didn't give it any more thought.
The next day's email brought a new message: mysteriously the
machine had fixed itself overnight. Everything was fine, have a
nice day, and so on. OK, great: a lot of problems do fix themselves,
though I thought this was a little
bit odd considering the symptoms described. Oh well, I had plenty
of other work to do, including a programming project that has
been incomplete for several weeks. I had just logged into that customer's
machine to get reoriented with that code when the phone rang.
It was the customer with the misbehaving Dell. A new problem,
she said: all the remote desktop clients are down. She had rebooted
Server. but clients still could not connect. I confirmed that by
trying to connect from my Mac - no dice. I could ssh in to
their Linux box though, so it wasn't their internet connection.
Time to dig deeper.
After logging into Linux, I tried pinging the Terminal Server. No
response. Unreachable. Dead. I told my customer that. "But
it thinks everything is fine", she said, "except that no packets
Ok, maybe we have a bad switch port. I had her unplug the cable.
The server immediately noticed that the cable was unplugged, but plugging
it into a different port didn't help. I had her try a different switch
entirely; no change. Hmmm.
Well, this server has another NIC that we don't currently use. I
had her unconfigure the current card and transfer the ip to the other card.
We switched the cable, but no change. Unreachable. Dead. Maybe a bad
cable? Unlikely since it noticed plug/unplug events, but worth a try.
I was about to suggest that when the customer said "It must have something
to do with the new Dell"
Honestly, that seemed unlikely unless she had tried to configure that
with the same ip or same network name. But I knew she hadn't. I asked
her if that machine was running. She said no, but it was still plugged
into the network. What the heck, unplug it, I said, not expecting
any change from that action. To my surprise, the moment she unplugged
it, the server responded to a ping. Plug it back in, no response. Unplug,
all was fine. I tried the Remote Desktop; it came right up.
Consistently repeatable, no question about it: the problem was this
new Dell - the machine that wasn't even running!. She unplugged
it again because users needed to get work done. As our work was
solving this problem, we booted the new Dell (leaving it unplugged
from the network) to see what we could see.
My immediate suspicion was that this card was a one in a million
incorrect MAC address. Hardware addresses are supposed to be
unique but screwups can happen, so I wanted to
know what that new machine thought its NIC hardware address was. I knew
what the Terminal Server's address was from "arp -an", so
I just needed to get it from the new box.
Stupid #$!@% Windows! If the cable is unplugged, you can't get
XP to give you the status of the connection. Device Manager
doesn't bother to tell you that data at all, so that's unhelpful.
Fortunately you can still get to a command line and "ipconfig/all"
will give you the physical address. Idiots.
Anyway, that wasn't the problem. This machine really does have
a unique and proper MAC address. So that's not it.
I suppose it could be putting out incorrect voltage on the line and that is
leaking to disrupt the server if its wiring is close by, but experimenting
with that by moving the machine is just going to interrupt more work
so we decided to let it be. I told her she could go buy another NIC,
but that this could be a motherboard problem that might manifest itself
somewhere else later, so my best advice was to get Dell to replace
it. She agreed, though the employee who had been suffering with an
old Windows 95 machine for years wasn't happy to see her new toy disappear
so suddenly. But her old machine regained its place, and the network
When all else fails, start unplugging. After last weeks bad storm here
in the Northeast, I had a similar case where a server wouldn't come up
because it insisted that it saw a duplicate name on the network. The
customer checked every machine; there were no conflicts. I then had
her unplug all network cables except the three servers. Rebooting
the troubled server still gave the same message. We unplugged the other two
servers. No change. In desperation, I had her unplug the router
also. Still no change. At this point, there was nothing connected
to the switch but this server. I had her move the cable to another
switch, but the reboot still complained. Obviously there was something
wrong with the card: it was seeing itself! We swapped in a new NIC card,
and the problem went away.
Bad nics can do very strange things.
Got something to add? Send me email.
Increase ad revenue 50-250% with Ezoic
More Articles by Anthony Lawrence
Find me on Google+
© 2011-06-29 Anthony Lawrence