APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed
RSS Feeds RSS Feeds











(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Printer Friendly Version
->
-> Network Troubleshooting


Network Troubleshooting



I was helping someone troubleshoot a network problem last week. As it turned out, the problem was very physical: a light fixture had fallen down and loosened some connections to a switch. Both the existence and the location of that switch in relation to the rest of the network had vanished from institutional memory so much of the testing we were doing didn't make sense and was hard to interpret. In fact, it was so puzzling that I drove on-site with more sophisticated testing equipment, only to find that someone had noticed the fallen fixture before I arrived. Problem identified and solved, but it did prompt me to look for a general "Network Troubleshooting" page here and I was a bit surprised not to find one.

Oh, sure, there is a lot here about network problems, troubleshooting and solutions. I also have a chapter in my Unix/Linux Troubleshooting E-book but even that isn't quite what I wanted. This post attempts to provide that resource. It is a general overview with links to more detailed information.

The Basics

If you are going to do network troubleshooting, you need basic knowledge and basic tools. If you don't understand basic TCP/IP communications and routing, that's where you need to start. Here are some articles here that can help:


If you aren't sure of your understanding, you might at least review those articles. You might also want to look at these - if you don't understand them, you need to back up and do some review:


Experience and Expectations

You also need a little real world experience. For example, I was helping someone with a strange problem over the phone. We checked a ping between the two machines - the response time was 300 milliseconds. That's not particularly fast, but it certainly could be possible. However, these two machines were on the same local network - the response time should have been a fraction of a millisecond, not nearly a third of a second! The funny thing about that was that it was so far off that neither of us caught it instantly - because we were expecting something like .319 ms, seeing 300 ms didn't immediately seem wrong.

Slowness, intermittent problems vs. loss of connectivity

Obviously these are very different problems. However, very slow connections can be misinterpreted as no connection - the client or user may give up before the underlying network actually has. A 300 ms delay on a local network could contribute to that.

If you had no idea that 300 ms is ridiculous, you would have missed something important here. You obviously (I hope it's obvious) have to know the local infrastructure too - is this wireless, 10Mbs, 100 Mbs or gigabit? If you don't know, you can't tell if ping responses are at least in the ballpark.

If ping does seem odd, what does traceroute (Windows "tracerte") show? It might show an unexpected packet path (that's what was happening with the 300 ms pings).

Ping is ICMP, not TCP. Ping can work when TCP protocols do not and vice versa. You may want to test transferring files (especially if transferring files is what brought up the idea that we have a network problem). Again, you have to have some idea of how fast data should travel. It's hard to be precise (too many variables), but tools like this download calculator or charts such as found at Explaining Network Speeds can be helpful.

You might have a physical problem. Test for a bad port on a switch by swapping patch cables or swapping out the whole switch. Wires in the wall can be chewed, burned, stretched or just badly done to start with. This is where you need some inexpensive testing equipment.

ethernet tester

That tester on the left cost me a few hundred dollars, but you don't need to get quite that fancy. You need to be able to verify that the ends are properly wired and it's good to have at least minimal length testing capability. You don't need great accuracy; if you know a cable should be about 20 feet long, if your tester says it's 150, you have a problem.

A tone generator is very inexpensive and very helpful for tracing wiring.

If you don't have a tester, a visual inspection can sometimes spot gross errors. If you take a properly made CAT-5 ethernet cable and hold both ends of it so the plugs are side by side, all the colors should run the same left to right. If they run in opposite directions, that's a crossover cable - not what you usually want unless you are connecting two computers to each other.

Straight-thru cable

Incomplete or very wrong wiring is all too common, even from professional installers.

Cross-over cable

Because we often have mixed 10/100/1000 networks, it's possible for two devices to misnegotiate their connection. See I don't understand all this half/full duplex negotiation stuff for an introduction and overview.

In addition to all of the reasons mentioned so far, we have to add bad NIC hardware and faulty drivers. In addition to outright "bad", I'd also like to mention "cheap": it astonishes me when I see a server loaded with ram, sporting the biggest and fastest cpu and drives, protected with dual power supplies, extra fans and anything else you can think of - and the NIC card is a $10 piece of junk! I've solved many a network problem by replacing junk.

Take advantage of the blinking lights. Every network device has them. If the lights are out or it has orange or red lights, it is possible that either the device is bad, not connected properly, or is not receiving a signal from the network.

Every system, Windows or Linux, has tools that can help you diagnose network problems. Here are the basics you should know about:

Ping: (Windows and Unix/Linux). Note that by default, Unix/Linux ping doesn't stop - you need to either tell it only to do so many pings or interrupt it (usually CTRL-C but often Delete on SCO Unix).

Netstat: (W and U/L). Note that Windows netstat has a "-b" flag which will display the executable responsible for each connection. If you add "-v", it "will display sequence of components involved in creating the connection or listening port for all executables. Before switches were common, we'd be looking for collisions and any other errors and wouldn't be at all surprised to see at least a few. Today, we wouldn't expect to see anything

Netsh:(W) http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/netsh.mspx. This is both a command line tool and a (weak) GUI tool that can be very helpful if you don't have Unix tools available. See Introduction to Netsh .

ifconfig (U/L), ipconfig (W) (also see Mac OS X ipconfig). Win 95/98: winipcfg.

traceroute (U/L), tracerte (W) (also see Windows "pathping").

nslookup(W, UL), dig, host (U/L). See Dig and Host.

arp(W,U/L) arp, Dealing with Duplicate IP addresses.

lsof (U/L) Specifically, "lsof -i:25" would show you processes using TCP port 25. The -b switch to Windows netstat gives similar information. Although Mark Russinovich's Windows Process Explorer is a more general purpose tool, but it can be useful for network issues; see, for example, The Case of the Slow Logons.

telnet (W, U/L). First, don't confuse using telnet to test networks with running a telnet daemon. Telnet is available even when you are NOT accepting telnet connections (but see the Vista/Win 7 note below). You can use telnet to test network problems:


Special note on Vista and Win 7: Microsoft disables the telnet client on these operating systems. To correct that stupidity:

  • Open Control Panel -> Programs.
  • Click "Turn windows features on or off" . On the list that appears, check the box "Telnet Client".
  • Click OK. You now have "telnet"

Special note on testing ssh, scp, telnet, rcp, ftp and similar protocols: I see people fighting this type of problem often. The very first thing you should do is test to see if it works locally. If you can't ssh to a server that you do have keyboard access to, the first thing you should test is "ssh localhost" ( or "ftp localhost", etc.). If you can't do that, there's no point in looking for firewall or security configuration issues.

If the protocol doesn't have diagnostics like "ssh -v", using "telnet" to the port can tell you whether it is blocked outright or something within the protocol is blocking you. For example. if "telnet somebox 21" returns

Trying x.x.x.x
Connect to somebox
Escape character is '^]'.
220 (vsFTPd 2.0.5)
 

you are NOT being blocked by a firewall.

Network printer issues can be approached the same way. LPD/LPR runs on port 515, IPP is port 631, HP is 9100, and some other devices are listed at Print Server Port Numbers

Don't confuse DNS resolver issues with actual network problems. Don't confuse application problems with network problems.

Special note on DNS caching:

ipconfig /flushdns (Windows)
dscacheutil -flushcache (OS X)
/etc/init.d/nscd restart (Linux)

(Note: Unix systems running "bind" are also cacheing DNS)

Timeout problems

There are conflicting desires here: some things time out too quickly, some don't time out quickly enough. See Keep in touch (tcp keepalives etc.) for a discussion/

Packet tracing

All of the above is basic and simple. Unfortunately, not all networking problems can be found that easily. Sometimes, you need to get down and examine network packets. See, for example, Lan sniffing with a DualComm port mirroring switch, tcpdump and Windump.

As I said at the beginning, this is meant to be a page that will guide you to the things you need to understand to be successful. Troubleshooting is mostly common sense, though apparently it's not all that common. Most troubleshooting failures come from ignorance: you don't know how the arp cache works, you don't undertand DNS, basic subnets, routing or something equally basic. It's rare for problems to require esoteric knowledge - it happens, but it is rare.

I hope this helped get you started with whatever newtworking problem is vexing you. Please feel free to leave comments with suggestions or questions.

You might also be interested in another article I wrote about basic TCP/IP troubleshotting.




If this page was useful to you, please help others find it:  





4 comments




More Articles by - Find me on Google+



Click here to add your comments
- no registration needed!




Tue Mar 16 04:42:18 2010: 8222   BillMohrhardt

gravatar


By default, Windows XP boxes do not return ICMP echoes. In order to
get them to acknowledge your "ping", you right-click on the LAN connection,
then Properties --> Advanced --> Settings (Windows Firewall) --> Advanced
--> Settings (ICMP). Then check the box "Allow Incoming Echo Requests". I think
anything prior to XP allows them by default, and I don't know about Windows
7 or Vista. Ping is always my first attempt to see if a device is alive, and verify that the cabling is okay.






Tue Mar 16 17:23:03 2010: 8225   BigDumbDinosaur
http://bcstechnology.net
gravatar


The Windows "firewall" is close to useless baggage and just another source of connectivity headaches with which to deal. That so-called "firewall" is too much like the chickens trying to protect themselves from the foxes.

We routinely disable the Windows "firewall" and insist that the client use a DSL/broadband router or a Linux/UNIX server as an intermediary between the local network and the Internet. We also block outbound port 25 in the router to prevent a mail zombie on a Windows box from being able to spew spam onto the Internet.

Something else that should be considered with regard to slow networks is examining the physical routing of Ethernet cables. A cable that is close to a heavy power consumer (e.g., air conditining equipment) or EMI generator (e.g., fluorescent lighting) can introduce difficult-to-diagnose errors that will cause repeated packet transmission, resulting in slow performance. I've fixed many a slow network by relocating one or two cables away from lighting and machinery. I won't even mention the case where some guy's kid wired up the office network (using a hub, not a switch) and decided that since only two of the four pairs in the cable are actually carrying data, it was okay to use the "unused" pairs to connect telephones to the PBX. The network went crazy every time a phone rang. <Grin>



Tue Mar 16 17:56:13 2010: 8226   TonyLawrence

gravatar


Good point about physical routing - I haven't seen any of that in recent years (I think most installers have learned their lessons!) but I do still see incomplete wiring now and then.







Tue Mar 16 21:20:59 2010: 8227   AndrewSmallshaw

gravatar


Sadly for a consultant the single biggest thing is in the customer's hand. Firstly a little foresight - label and document everything, and keep that documentation where it can be found when it is needed. Secondly, there comes a point where gettign the professionals in is the cost effective route, and that is BEFORE you have a problem. Network troubleshooting isn't really the kind of thing I do, but when you see those networks assembled in an ad hoc manner from a hundred grey, unlabelled patch leads you know things are going to take forever.

I remember one case in particular at a local charity - their network was just this kind of thing, and it covered three floors and probably a dozen rooms to boot. As it turned out it was a simple problem: a cheap NAS appliance had hung, taking the entire network segment with it. On a good network perhaps where you can quickly isolate individual switches that is perhaps half an hour to find. As it was it took most of the day to track down, by which time I had probably traced through (and labelled) half the network, but still didn't fully know which cable ran to each switch.

If I had been charging for my time the bill would have come to more than enough to professionally wire the whole network, and twice that to cover perhaps 90% of it. Then you would have easily understood wiring that is less likely to fail in the first place - cables in trunking don't get ripped up on a whim, or run over by chairs. I heard a couple of months later they spent just that kind of money on consultancy for yet another simple problem. That time it was an unknown device acting as a rogue DHCP server that they has no way of tracking down. They still don't have a robust network that is easy to work on, and never will until they do the job properly.

But they are not unique in that respect - it seems many places have networks initially set up in an ad hoc fashion and just grow and grow. Evntually the point arrives you have to scrap what was there before and put in a properly designed network, rather than plugging in yet another switch wherever it is convenient.

Don't miss responses! Subscribe to Comments by RSS or by Email

Click here to add your comments


If you want a picture to show with your comment, go get a Gravatar

Kerio Connect Mailserver

Kerio Samepage

Kerio Control Firewall

Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

Jump to Comments



Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.

I am a Kerio reseller. Articles here related to Kerio products reflect my honest opinion, but I do have an obvious interest in selling those products also.

Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.

We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.

pavatar.jpg

This post tagged:

       - Basics
       - Linux
       - MacOSX
       - Mail
       - Microsoft
       - Networking
       - Printing
       - Troubleshooting















My Troubleshooting E-Book will show you how to solve tough problems on Linux and Unix systems!


book graphic unix and linux troubleshooting guide



Buy Kerio from a dealer
who knows tech:
I sell and support

Kerio Connect Mail server, Control, Workspace and Operator licenses and subscription renewals



Click and enter your name and phone number to call me about Kerio® products right now (Flash required)