Web site owners like comment systems. Aside from letting visitors let us know their opinions of our probable IQ, genealogy, and sexual habits, comments also provide a way to correct errors, and add new information.
Unfortunately, spammers like to use comment systems for their own purposes, adding links to porn or other sites that don't quite fit the theme of our sites. People who use content managment systems to produce their site are apt to get a lot of attention from spammers just because the interface to their comment systems is public knowledge and it's easy for spammers to write automated form submissions. The content management producers have begun to fight back, applying various spam fighting techniques. Of course the spammers will try to learn how those work and thwart them, and on it goes.
What about those of us who write our own code? What can we do? Actually, quite a bit. For example, here's a little snippet of Perl that looks at comments here:
my $spam=1;
if (length($in) > 5) {
$words++ while $in =~ /\w /g;
$spam++ while $in =~ /http:/g;
$toomuch=$words/$spam;
$spam--;
$in=" (looks like spam)" if $toomuch < $SPAM1 and $words > $SPAM1;
$in=" (looks like spam - too many links) " if $spam > $SPAM2;
$in=" (looks like spam ?) " if (($words - $spam) < $SPAM3 and $words > $SPAM3);
}
That code counts words and the number of times "http:" appears among them. It then makes some judgements based on the relative values (the SPAM1, etc. are set earlier in the script to values I think are reasonable). The math could be different depending on what type of comments you typically get, but this is the basic idea.
You can also look for certain words. If your site gets hit by a lot of porn spammers, the necessary words aren't too hard to figure out.
Additionally, I implement a time based requirement. If you make a post or a comment, you have to wait a certain number of seconds before posting again. The required time goes up with the cube of the number of posts made in the last 24 hours, so while your third post requires 8 units of time, the next is 27, then 64 and so on. This cuts down on multiple submissions that might otherwise get by the spam filtering.
Finally, I change the interface every now and then. An automated submission will set form values based upon its knowledge of your form. So, the form might be looking for "authtext" today, but next week it will be "inputtext". You can do this manually or programmatically.
More Articles by Tony Lawrence - Find me on Google+
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.
Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.
We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.
Click here to add your comments
Thu Mar 10 20:43:31 2005: anonymous
There has been some talk of people writing code to do SURBL.org lookups for blog spam. Also the devs of SpamAssassin have been talking about an offshoot called BlogAssassin to fight this kind of stuff.
Most likely the best way would be to code lookups for SURBL.
Don't miss responses! Subscribe to Comments by RSS or by Email
Click here to add your comments
If you want a picture to show with your comment, go get a Gravatar