APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Fixing 404 errors

© July 2004 Tony Lawrence

Referencing: Custom 404 Pages

A 404 error is what you get when your browser tries to access a page that doesn't exist. Maybe you mistyped something, or the link you followed was mistyped by someone else, or maybe the webmaster moved it or renamed it or just deleted it. It's annoying for you, and sites that care about your visit try to avoid it happening.

Well, we can't stop 404's 100%, and frankly dealing with it is an annoyance for those of us maintaining the website too. It's bad enough that other sites cause us problems with incorrect links, but it is really annoying when we cause our own problems.

Unfortunately, tracking these things down and fixing them is a bit of a pain. The "Custom 404" page and associated script referred to above corrects a lot of common errors automatically, and tries to offer help when it can't just redirect you to the right page, but I need to keep updating it as I find new sources of errors. Sometimes the fix is as simple as just making a symbolic link, but if it is from an outside source, I want to correct it if I can. Even if it was caused by my own error, I may still want to add correction code in case that original error gets picked up by someone else.

So, to help me find errors, I have a Perl script that reads in the error_log, and compares it to a log of "corrections" already made by the Custom 404 script (this is necessary because the 404 ends up in my logs even though it was corrected). The script ignores pages that have already been corrected, and spits out a list of 404's I need to at least investigate. Many of these will be confused web spiders - it's really amazing how dumb some of these things are. For example, CUPS (Common Unix Printing System) print to file - the hard way! contains this text:

sudo lpadmin -p tofile -E -v socket://localhost:12000 -m raw

Dumb spiders regularly think that is a link:

[Sun Jul 11 07:07:05 2004] [error] [client] File
does not exist:

I have the script count the number of uncorrected 404 occurences so that I can devote immediate effort to the more serious problems. The output of the script might look something like this:

/blog/b930.html 2
/SCOFAQ/news:comp.unix.admin 1
/cgi-bin/fmail.pl 1
/Books/creatingcoolwebsites.html 10
/e51/SCOFAQ/FAQ_scotec8xsession.html 1

Obviously I need to jump on that "creatingcoolwebsites.html" problem right away.

See that "fmail.pl"? That's a script kiddy trying to break in: - - [12/Jul/2004:12:22:04 +0000] "POST /cgi-bin/fmail.pl
HTTP/1.0" 404 2317 "https://aplawrence.com/" "-"

Checking his other attempts proves it: - - [12/Jul/2004:12:21:05 +0000] "POST /cgi-bin/formmail.pl
HTTP/1.0" 404 2320  https://aplawrence.com/"  "-" - - [12/Jul/2004:12:22:04 +0000] "POST /cgi-bin/fmail.pl
HTTP/1.0" 404 2317 " https://aplawrence.com/" "-"

Nothing to worry about there.

The actual script is pretty simple:

# ck404.pl
while(<C>) {
 s/^  *//;
 s/  *$//;
close C;
while(<LOG>) {
  s/^  *//;
  s/  *$//;
  next if $foo{$_};
foreach (keys %foo2)  {
  print "$_ $foo2{$_}\n";

This does generate some extra garbage now and then; it doesn't need to be perfect - it's just a helper script that saves me time.

Well, I've got a few hundred 404's I need to go look at..most of them will probably be spider errors, or things I can easily fix, but invariably there will be some new 404 mixup to deal with, and the Custom 404 code will grow some more.

Got something to add? Send me email.

(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> Fixing 404 errors

Inexpensive and informative Apple related e-books:

Photos: A Take Control Crash Course

Take Control of OS X Server

Take Control of Apple Mail, Third Edition

Take Control of IOS 11

Take Control of iCloud, Fifth Edition

More Articles by © Tony Lawrence

Printer Friendly Version

Have you tried Searching this site?

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us

Printer Friendly Version

Anyone who slaps a 'this page is best viewed with Browser X' label on a Web page appears to be yearning for the bad old days, before the Web, when you had very little chance of reading a document written on another computer, another word processor, or another network. (Tim Berners-Lee)

Linux posts

Troubleshooting posts

This post tagged:





Unix/Linux Consultants

Skills Tests

Unix/Linux Book Reviews

My Unix/Linux Troubleshooting Book

This site runs on Linode