APLawrence - Information and Resources for Unix and Linux Systems, Bloggers and the self-employed
RSS Feeds Get APLawrence.com by RSS











(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Home > Misc. Words > 2005/03/09 Comment Spam
Printer Friendly Version




Comment Spam


Web site owners like comment systems. Aside from letting visitors let us know their opinions of our probable IQ, genealogy, and sexual habits, comments also provide a way to correct errors, and add new information.

Unfortunately, spammers like to use comment systems for their own purposes, adding links to porn or other sites that don't quite fit the theme of our sites. People who use content managment systems to produce their site are apt to get a lot of attention from spammers just because the interface to their comment systems is public knowledge and it's easy for spammers to write automated form submissions. The content management producers have begun to fight back, applying various spam fighting techniques. Of course the spammers will try to learn how those work and thwart them, and on it goes.

What about those of us who write our own code? What can we do? Actually, quite a bit. For example, here's a little snippet of Perl that looks at comments here:

 my $spam=1;
 if (length($in) > 5) {
   $words++ while $in =~ /\w /g;
   $spam++ while $in =~ /http:/g;
   $toomuch=$words/$spam;
   $spam--;
   $in=" (looks like spam)" if $toomuch < $SPAM1 and $words > $SPAM1;
   $in=" (looks like spam - too many links) " if $spam > $SPAM2; 
   $in=" (looks like spam ?) " if (($words - $spam) < $SPAM3 and $words > $SPAM3);
 }
 

That code counts words and the number of times "http:" appears among them. It then makes some judgements based on the relative values (the SPAM1, etc. are set earlier in the script to values I think are reasonable). The math could be different depending on what type of comments you typically get, but this is the basic idea.

You can also look for certain words. If your site gets hit by a lot of porn spammers, the necessary words aren't too hard to figure out.

Additionally, I implement a time based requirement. If you make a post or a comment, you have to wait a certain number of seconds before posting again. The required time goes up with the cube of the number of posts made in the last 24 hours, so while your third post requires 8 units of time, the next is 27, then 64 and so on. This cuts down on multiple submissions that might otherwise get by the spam filtering.












Finally, I change the interface every now and then. An automated submission will set form values based upon its knowledge of your form. So, the form might be looking for "authtext" today, but next week it will be "inputtext". You can do this manually or programmatically.


If this page was useful to you, please click to help others find it:  

Your +1's can help friends, contacts, and others on the web find the best stuff when they search.

1 comment




More Articles by Tony Lawrence - Find me on Google+



Click here to add your comments





Thu Mar 10 20:43:31 2005:   anonymous


There has been some talk of people writing code to do SURBL.org lookups for blog spam. Also the devs of SpamAssassin have been talking about an offshoot called BlogAssassin to fight this kind of stuff.

Most likely the best way would be to code lookups for SURBL.

Don't miss responses! Subscribe to Comments by RSS or by Email

Click here to add your comments


If you want a picture to show with your comment, go get a Gravatar



Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

Jump to Comments



Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.

Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.

We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.


My Troubleshooting E-Book will show you how to solve tough problems on Linux and Unix systems!


book graphic unix and linux troubleshooting guide




 I sell and support
 Kerio Mail server
pavatar.jpg

This post tagged:

       - Malware
       - Programming
       - Web/HTML




Unix/Linux Consultants

Skills Tests

Guest Post Here