APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

2005/03/09 Comment Spam

© March 2005 Tony Lawrence

Web site owners like comment systems. Aside from letting visitors let us know their opinions of our probable IQ, genealogy, and sexual habits, comments also provide a way to correct errors, and add new information.

Unfortunately, spammers like to use comment systems for their own purposes, adding links to adult or other sites that don't quite fit the theme of our sites. People who use content managment systems to produce their site are apt to get a lot of attention from spammers just because the interface to their comment systems is public knowledge and it's easy for spammers to write automated form submissions. The content management producers have begun to fight back, applying various spam fighting techniques. Of course the spammers will try to learn how those work and thwart them, and on it goes.

What about those of us who write our own code? What can we do? Actually, quite a bit. For example, here's a little snippet of Perl that looks at comments here:

 my $spam=1;
 if (length($in) > 5) {
   $words++ while $in =~ /\w /g;
   $spam++ while $in =~ /http:/g;
   $in=" (looks like spam)" if $toomuch < $SPAM1 and $words > $SPAM1;
   $in=" (looks like spam - too many links) " if $spam > $SPAM2; 
   $in=" (looks like spam ?) " if (($words - $spam) < $SPAM3 and $words > $SPAM3);

That code counts words and the number of times "http:" appears among them. It then makes some judgements based on the relative values (the SPAM1, etc. are set earlier in the script to values I think are reasonable). The math could be different depending on what type of comments you typically get, but this is the basic idea.

You can also look for certain words. If your site gets hit by a lot of spammers, the necessary words aren't too hard to figure out.

Additionally, I implement a time based requirement. If you make a post or a comment, you have to wait a certain number of seconds before posting again. The required time goes up with the cube of the number of posts made in the last 24 hours, so while your third post requires 8 units of time, the next is 27, then 64 and so on. This cuts down on multiple submissions that might otherwise get by the spam filtering.

Finally, I change the interface every now and then. An automated submission will set form values based upon its knowledge of your form. So, the form might be looking for "authtext" today, but next week it will be "inputtext". You can do this manually or programmatically.

Got something to add? Send me email.

(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> Comment Spam

1 comment

Inexpensive and informative Apple related e-books:

Take Control of Numbers

Take Control of iCloud

Take Control of Preview

Take Control of Upgrading to El Capitan

Photos: A Take Control Crash Course

More Articles by © Tony Lawrence

Thu Mar 10 20:43:31 2005: 161   anonymous

There has been some talk of people writing code to do SURBL.org lookups for blog spam. Also the devs of SpamAssassin have been talking about an offshoot called BlogAssassin to fight this kind of stuff.

Most likely the best way would be to code lookups for SURBL.


Printer Friendly Version

Have you tried Searching this site?

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us

Printer Friendly Version

Be respectful to your superiors, if you have any. (Mark Twain)

Linux posts

Troubleshooting posts

This post tagged:




Unix/Linux Consultants

Skills Tests

Unix/Linux Book Reviews

My Unix/Linux Troubleshooting Book

This site runs on Linode