(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Printer Friendly Version



New Dell machine kills server



A few days ago I had an email from a long time customer telling me that she had been trying to get a new Dell configured but it just couldn't seem to see the network. Her email said the machine was getting an ip address, but just wasn't accessing network resources. She said she'd had enough for the day, but would call me in the morning. I assumed this would be some some variation of a Windows authentication problem and didn't give it any more thought.

The next day's email brought a new message: mysteriously the machine had fixed itself overnight. Everything was fine, have a nice day, and so on. OK, great: a lot of problems do fix themselves, though I thought this was a little bit odd considering the symptoms described. Oh well, I had plenty of other work to do, including a programming project that has been incomplete for several weeks. I had just logged into that customer's machine to get reoriented with that code when the phone rang.

It was the customer with the misbehaving Dell. A new problem, she said: all the remote desktop clients are down. She had rebooted the Terminal Server. but clients still could not connect. I confirmed that by trying to connect from my Mac - no dice. I could ssh in to their Linux box though, so it wasn't their internet connection. Time to dig deeper.

After logging into Linux, I tried pinging the Terminal Server. No response. Unreachable. Dead. I told my customer that. "But it thinks everything is fine", she said, "except that no packets are flowing.."

Ok, maybe we have a bad switch port. I had her unplug the cable. The server immediately noticed that the cable was unplugged, but plugging it into a different port didn't help. I had her try a different switch entirely; no change. Hmmm.

Well, this server has another NIC that we don't currently use. I had her unconfigure the current card and transfer the ip to the other card. We switched the cable, but no change. Unreachable. Dead. Maybe a bad cable? Unlikely since it noticed plug/unplug events, but worth a try. I was about to suggest that when the customer said "It must have something to do with the new Dell"

Honestly, that seemed unlikely unless she had tried to configure that with the same ip or same network name. But I knew she hadn't. I asked her if that machine was running. She said no, but it was still plugged into the network. What the heck, unplug it, I said, not expecting any change from that action. To my surprise, the moment she unplugged it, the server responded to a ping. Plug it back in, no response. Unplug, all was fine. I tried the Remote Desktop; it came right up. Consistently repeatable, no question about it: the problem was this new Dell - the machine that wasn't even running!. She unplugged it again because users needed to get work done. As our work was solving this problem, we booted the new Dell (leaving it unplugged from the network) to see what we could see.

My immediate suspicion was that this card was a one in a million incorrect MAC address. Hardware addresses are supposed to be unique but screwups can happen, so I wanted to know what that new machine thought its NIC hardware address was. I knew what the Terminal Server's address was from "arp -an", so I just needed to get it from the new box.

Stupid #$!@% Windows! If the cable is unplugged, you can't get XP to give you the status of the connection. Device Manager doesn't bother to tell you that data at all, so that's unhelpful. Fortunately you can still get to a command line and "ipconfig/all" will give you the physical address. Idiots.

Anyway, that wasn't the problem. This machine really does have a unique and proper MAC address. So that's not it. I suppose it could be putting out incorrect voltage on the line and that is leaking to disrupt the server if its wiring is close by, but experimenting with that by moving the machine is just going to interrupt more work so we decided to let it be. I told her she could go buy another NIC, but that this could be a motherboard problem that might manifest itself somewhere else later, so my best advice was to get Dell to replace it. She agreed, though the employee who had been suffering with an old Windows 95 machine for years wasn't happy to see her new toy disappear so suddenly. But her old machine regained its place, and the network remained happy.

When all else fails, start unplugging. After last weeks bad storm here in the Northeast, I had a similar case where a server wouldn't come up because it insisted that it saw a duplicate name on the network. The customer checked every machine; there were no conflicts. I then had her unplug all network cables except the three servers. Rebooting the troubled server still gave the same message. We unplugged the other two servers. No change. In desperation, I had her unplug the router also. Still no change. At this point, there was nothing connected to the switch but this server. I had her move the cable to another switch, but the reboot still complained. Obviously there was something wrong with the card: it was seeing itself! We swapped in a new NIC card, and the problem went away.





Bad nics can do very strange things.


Technorati tags:


Click here to add your comments





Wed Feb 1 15:03:05 2006: Subject:   BigDumbDinosaur


I told her she could go buy another NIC, but that this could be a motherboard problem that might manifest itself somewhere else later, so my best advice was to get Dell to replace it.

This is actually a fairly common problem with Dell boxes and is caused by a defect in the transceiver that generates and receives the LAN signals. We usually "fix" the problem by disabling the onboard LAN hardware (which is a low quality to begin with) and installing a 3Com 3C905 type NIC. The owner is happy because s/he sees a substantial improvement in the machine's network performance, doesn't have the box out of service for several weeks waiting on Dell to R&R the motherboard and doesn't spend a small fortune to get the machine fixed. There's little likelihood of an outright motherboard failure, although if the machine is under warranty no reason to put more of your own money into it to fix a factory defect.

BTW, your client got what she "deserved" by buying Dell. I'm sure the low price was the deciding factor. You couldn't give me a Dell, let alone convince me to actually pay for one. We regularly get calls from Dell owners whose machines have simply quit working due to hard drive and power supply failures. In all cases, they are out of warranty but not by much. An inferior product, built from inferior parts, using an inferior processor, loaded with an inferior operating system (Win XP) and backed by inferior service. Repeat after me: there is no such thing as a good, cheap computer.



Wed Mar 8 14:37:02 2006: Subject: fix for weird mac problems with dell servers   anonymous


We had this problem and we found out a solution. It seems the bmc features of the dell servers provide SNMP and remote control features even when the machine is powered off. To do this they keep the network interfaces alive. These interfaces in some cases have a different set of mac addresses than the "real" nic interfaces.

We found this using arpping under linux, and when we did we found that the first reply would come back with the correct mac, and then subsequent replies would have the mac increased by 2.

In our case we had 3 servers which always replied with the correct mac. Looking at the bmc controller showed the mac matched the mac of the nic. In another case, on 3 identical servers, the bmc controller mac addresses were exactly 2 higher than the real nics. This was throwing our router into confusion and causing connections to drop.

The fix was to set the IP address of the bmc controller (in the bios) to something different than the actual machine. The other option of course was to disable the bmc controller (snmp).

We wasted months trying to troubleshoot machines which would simply disappear and reappear on the network at random..... calling tech support was no help. We figured this one out on our own.





Wed Mar 8 14:38:47 2006: Subject:   TonyLawrence

gravatar
Great! Thanks for sharing!



Sun Apr 5 02:11:22 2009: Subject:   oldmacminiman

gravatar
What a bizarre circumstance ... never heard of adding a machine to the network, and have it take down the Terminal Server.

Dells, it seems, have become far crappier in the past few years. Every now and then a good build comes along, I guess ... I have a five-year-old Dell Inspiron 5100 laptop that's still running strong ...

Don't miss responses! Subscribe to Comments by RSS or by Email

Click here to add your comments


If you want a picture to show with your comment, go get a Gravatar



Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

Jump to Comments



Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.

Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.

We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.


book graphic unix and linux troubleshooting guide

My Troubleshooting E-Book will show you how to solve tough problems on Linux and Unix systems!



 I sell and support
 Kerio Mail server




pavatar.jpg
More:
       - Networking
       - Microsoft


Unix/Linux Consultants

Skills Tests

Guest Post Here











My Favorites

Change Congress