A 404 error is what you get when your browser tries to access a
page that doesn't exist. Maybe you mistyped something, or the link
you followed was mistyped by someone else, or maybe the webmaster
moved it or renamed it or just deleted it. It's annoying for you,
and sites that care about your visit try to avoid it happening.
Well, we can't stop 404's 100%, and frankly dealing with it is
an annoyance for those of us maintaining the website too. It's bad
enough that other sites cause us problems with incorrect links, but
it is really annoying when we cause our own problems.
Unfortunately, tracking these things down and fixing them is a
bit of a pain. The "Custom 404" page and associated script referred
to above corrects a lot of common errors automatically, and tries
to offer help when it can't just redirect you to the right page,
but I need to keep updating it as I find new sources of errors.
Sometimes the fix is as simple as just making a symbolic link, but
if it is from an outside source, I want to correct it if I can.
Even if it was caused by my own error, I may still want to add
correction code in case that original error gets picked up by
So, to help me find errors, I have a Perl script that reads in
the error_log, and compares it to a log of "corrections" already
made by the Custom 404 script (this is necessary because the 404
ends up in my logs even though it was corrected). The script
ignores pages that have already been corrected, and spits out a
list of 404's I need to at least investigate. Many of these will be
confused web spiders - it's really amazing how dumb some of these
things are. For example, CUPS (Common Unix Printing System) print to file - the hard way!
contains this text:
sudo lpadmin -p tofile -E -v socket://localhost:12000 -m raw
Dumb spiders regularly think that is a link:
[Sun Jul 11 07:07:05 2004] [error] [client 220.127.116.11] File
does not exist:
I have the script count the number of uncorrected 404 occurences
so that I can devote immediate effort to the more serious problems.
The output of the script might look something like this:
This does generate some extra garbage now and then; it doesn't
need to be perfect - it's just a helper script that saves me
Well, I've got a few hundred 404's I need to go look at..most of
them will probably be spider errors, or things I can easily fix,
but invariably there will be some new 404 mixup to deal with, and
the Custom 404 code will grow some more.