APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

A simple remote site monitor

© July 2003 Dirk Hart
Email: dhart@mailstarusa.com

(This is part 2 of a two part article. Part One is here.)

In the last installment we made a simple script for monitoring remote sites. It seemed to work quite well, but after a week or so I knew I was getting too many results for them to be accurate. Something had to be done to improve the quality of the results.

The first thing I noticed is that sometimes all the sites I was checking were reported at once. I would get a string of messages and then things would be calm again. Clearly my own connection was having occasional drop outs rather than mailstarusa.com or my client sites. I reasoned that I would report outages only if machines I knew to be highly reliable were available. That is, if I could not ping certain DNS servers then I could safely keep quiet about the rest.

ping -c 2 -w 10 $chk1 2>&1 > /dev/null; chk1=$?
ping -c 2 -w 10 $chk2 2>&1 > /dev/null; chk2=$?

if [ $chk1 -gt 0 -a $chk2 -gt 0 ]
       echo "reliable hosts are down. no sites checks performed." `date` >>$log
{ ...

chk1 and chk2 are DNS servers belonging to Ultranet (now RCN) and UUNet and were chosen only because I knew them to be reliable and had their addresses memorized. Basically the results of each ping are recorded and if they are both non-zero then the local connection must be down and we can keep quiet about the rest.

Much to my interest something had escaped my notice the first time I read the man page for ping. ping -c2 pings a site twice and waits endlessly for results - this led me to add -w 10, thinking that I had made ping timeout after 2 pings and 10 seconds. This is not quite the case. What happens when these two parameters are combined is that the remote site is pinged 10 times and the command finishes when 2 replies are received (no error is reported) or when 10 seconds are up (an error is recorded). That means 8 pings go astray and we are still willing to say the site is up. You could certainly make the argument that the results are of poor quality as a result. On the other hand the results seem to pretty accurately match the reality of the situation.

This gave good results, but I still got messages about sites being down. I would immediately ping the sites again and they would not be down at all. I decided after some research, that things on the net were just slow and the packets were still not returning before 10 seconds (-w 10) were up. I carefully read the man pages again. I didn't want to make -w 10 much larger in case the list of sites grew large and cron kicked the script again before it had finished.

There was some reference in the man pages to QoS bits and it turns out that if we set -Q 0x04 we get a more reliable result. This is a good thing as checking SMS messages while travelling in the geek-mobile is a Bad Thing.

I also changed the script so that different people could be notified for each site. If my clients are interested in the results I can email them without delay. I edited pink.sites to match:    Fake site    5085551212@vtext.com    monitor@yourdomain.com
pcunix.com    pcunix.com    5085551212@vtext.com    monitor@pcunix.com

Finally, here is the whole script - not pretty, but functional:

# ping a site if no response then email a message

ping -c 2 -w 10 -Q 0x04 $chk1 2>&1 > /dev/null; chk1=$? ping -c 2 -w 10 -Q 0x04 $chk2 2>&1 > /dev/null; chk2=$?
if [ $chk1 -gt 0 -a $chk2 -gt 0 ] then echo "reliable hosts are down. no sites checks performed." `date` >>$log file else { cat /usr/lib/pink/pink.sites|while read pingsite pingname smsnotify mailnotify do ping -c 2 -w 10 -Q 0x04 $pingsite 2>&1 > /dev/null #if 100% packet loss - a bad ping if [ $? -gt 0 ] then echo no reply from $pingsite \($pingname\) on `date` >>$logfile echo $pingname $pingsite "Alert" `date`| mail -s "$pingname" $mailnotify echo no reply from $pingname $pingsite | mail -s "$pingname" $smsnotify else touch "$logfile" fi done } fi
Here is a bit of the log file:
no reply from mydomain.com (mydomain) on Fri Mar 21 20:01:06 EST 2003
reliable hosts are down. no sites checks performed. Mon Mar 24 05:15:17 EST 2003
no reply from mydomain.com (mydomain) on Mon Mar 24 18:45:18 EST 2003

Using this script I recorded results over a period of a few weeks and noted that my customer had far more dropouts and of longer duration than did my mailserver. I was able to show the results to my client and suggest that they contact their DSL provider.

Editor's note: If you really want to test connectivity, and want the script to be able to tell you where the problems are when it is lacking, you need a more procedural approach. See Testing for network connectivity in a script

Got something to add? Send me email.

(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> A simple remote site monitor part 2

Inexpensive and informative Apple related e-books:

iOS 8: A Take Control Crash Course

Take Control of Preview

Take Control of the Mac Command Line with Terminal, Second Edition

Take Control of iCloud, Fifth Edition

Take Control of Automating Your Mac

More Articles by © Dirk Hart

---August 27, 2004

Very useful.
Been using it with 'c 5' and 'w 5' and found that a return of 0 = site responds to ping
1 = site didn't respond
and interestingly if the DNS can't be resolved it returns 2


Printer Friendly Version

Have you tried Searching this site?

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us

Printer Friendly Version

The nice thing about standards is that you have so many to choose from. (Andrew S. Tanenbaum)

Linux posts

Troubleshooting posts

This post tagged:



Remote Access

Unix/Linux Consultants

Skills Tests

Unix/Linux Book Reviews

My Unix/Linux Troubleshooting Book

This site runs on Linode