Fallure to resolve

A common distress call I get is something like "Help! Our Internet is down!"

Sometimes it really is. There's not much I can do if a router in Connecticut went up in flames or a backhoe in Cambridge just cut through a major data line. If it is a dead or malfunctioning router the problem will probably correct itself very quickly, but a cable break can sometimes cause fairly long outages.

Surprisingly often the Internet itself really isn't down at all. My client has access; what they don't have is name resolution. To test that, I might ask them to try some IP address that I know is up. If that works, then name resolution is the issue, not internet access itself.

Of course the name resolution problem can be caused by someone else having a real connectivity problem. For example, my DNS comes from NS23.WORLDNIC.COM and NS24.WORLDNIC.COM. If a physical or routing problem prevents access to those servers, sooner or later you won't be able to resolve this site (and quite a few others, though I'm sure you are most concerned about this, right?). In that case, your symptoms are that some sites resolve but others will not.

A traceroute might show you where the breakdown is, particularly if you can do it from different starting points. You might be able to reach a DNS box from one location but not another just because of the route packets travel.

By the way, do the traceroute with an ip address, not a name. That doesn't matter if DNs is working, but it makes sure of it. The first thing you have to be able to get to is your ISP's default gateway - if you can't get to that, you won't get anywhere.

There can be more subtle problems. I had email yesterday from a customer who asked if www.aplawrence.com was down. I knew it wasn't, but I happened to be logged into his machine at that moment, so I tried pinging www.aplawrence.com. It timed out, unable to resolve.. I tried just aplawrence.com and that responded. I then logged off his machine and tried the same two pings from my home: again the wwww failed to resolve but aplawrence.com did. Obviously worldnic's servers were doing something incorrectly at that time, but within half an hour it had cleared up.

The dig and host commands can help you dig into the truth of these access problems. Usually simple patience is the answer: no doubt many other people are affected by whatever the problem is, and alarms are probably going off at many places. In most cases, the issue will be fixed before you can even make a phone call to complain about it.

