APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

Spammer Control

This morning I logged in to find some jerk had posted four spam comments to various articles here. They were all identical:

Found this site on http://www.google.com - it is interesting to see that the most popular articles are the ones with quality, unique content. I have been trying to get my articles on the top board that advertise my Linux online poker website http://(deleted) and my Mac online poker website http://(deleted)but it just isn't happening :("

(I deleted the links he gave - I'm not giving him any free advertising!)

He probably would have tried to do more than those four, but I have a throttle in place that tracks time between comments and keeps a corresponding counter of minimum allowed time. That counter ratchets up exponentially with the number of comments made, so you just cannot make multiple comments in a row without waiting a long time in between.

You can make a comment post and then make another 10 seconds later. To make a third post, you need to wait 60 seconds. The fourth will be rejected if you haven't waited ninety seconds, and then we start getting really aggressive: ten minutes for the fifth, twenty for the sixth, and then more than 30 minutes for the seventh. That stops spammers like this from polluting dozens or hundreds of articles, but shouldn't bother someone who is actually reading pages and commenting on a couple.

What really irks me about these people is their stupidity. Did this guy really think that I would leave his junk here? Maybe so, but wouldn't you think that he'd have a little more creativity and not just post the same thing over and over?

I have other filters also: you can't just post links. I count the links and compare it to the amount of text and reject the comment if the ratio of links to text is too high. I also look for certain keywords (you know, *redacted*, etc.) and reject the comment if it contains any of those.

Finally, if someone has attempted abuse, their IP gets locked out for at least a few months. I unlock them eventually because the ip is often dynamic and there is no point in keeping the block.

So far, these controls have managed to keep us fairly free of comment spam, but I do have to manually delete a few now and then as I did this morning. I doubt the spammer reads this, but if he is, he should know this: I'm NOT going to let you post junk in comments. If you get by my controls, I'll delete you and ban you. If you come at me from another IP, I'll filter on the actual words you are trying to post: you WILL NOT get a free ride here.


Got something to add? Send me email.



5 comments



Increase ad revenue 50-250% with Ezoic


More Articles by

Find me on Google+

© Anthony Lawrence







Mon May 1 14:36:59 2006: 1987   BigDumbDinosaur


He probably would have tried to do more than those four, but I have a throttle in place that tracks time between comments and keeps a corresponding counter of minimum allowed time.

Those who use Sendmail on their servers and are not very familiar with it may be interested to know that a somewhat similar concept to what Tony describes exists within. It is possible, in Sendmail, to throttle the connection rate to N connections per second and to also limit the number of child processes that can be spawned -- both of which will limit how many foreign servers (e.g., spam broadcasters) can connect in any given time. Take a look at the ConnectionRateThrottle and MaxDaemonChildren parameters, respectively, in your sendmail.cf configuration file. Unfortunately, it doesn't work exponentially but it's better than nothing.

Completely stopping spammers, whether in E-mail or on a site like this, is impossible. As is true with almost everything else, there will always be a small percentage of the population who thinks they are better than everyone else and that what they have to say is more important -- and will be aggressive about shoving their nonsense into your face. The only thing you can do is try to keep a lid on it.

I have practiced aggressive E-mail spam policing on our company mail server for a number of years, which has had the effect of keeping the junk mail inflow down to a manageable level. But the spam continues to roll in, often from the same servers but with different (and usually fake) domain names. If I see that happening, I block the offending server by IP address -- the bounced mail will tell the sender that. Of course, a lot of the junk comes from spam robots on infected Windows machines with dynamic IP addresses. Not much can be done about those! In any case, the unrelentling spam barrage that clogs the networks today completely disproves the old "monkeys on typewriters" theory.



Mon May 1 14:59:26 2006: 1988   TonyLawrence

gravatar
I really like exponential controls like this. The same concept could be applied to email to let a few through so that a conversational flurry wouldn't be bothered, but it would get more aggressive with the number of messages in the past hour..

By the way, Kerio Mail Server has similar controls (see various articles here). One of the ones I really like is the one that lets you set a delay for responding to a connection attempt. You can set it to 20 seconds or so, which doesn't bother legitimate email all but really frustrates spammers (their software doesn't want to wait around a long time for a response).

Kerio also lets you set things like Maximum Unknown recipients and so on.. it's a very nice product and I sell quite a few of them..






Mon May 1 19:52:42 2006: 1991   mt1955


I have any comments, replies, guestbook entries, etc automatically emailed to me with a "commit" link that I have to manually click on before it will actually show up on the site. I know, it's not a practical approach for a really busy site but it is a good fit for me. Yet even so with nothing actually showing up on the site for a spammer to see except 'Thank you, your message will be processed... ' and the prominent note 'html tags will be discarded' I still get up to a dozen spam attempts a day. Sometimes there will be series of attempts from the same IP address but more often there is a different one with each entry. I suppose that means an anonymizing proxy is being used.

Good luck with your efforts



Mon May 1 20:59:37 2006: 1993   TonyLawrence

gravatar
Yeah, we might have to get to that some day, but so far what I'm doing works.
It helps to be able to write your own code..

------------------------
Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





I am not the only person who uses his computer mainly for the purpose of diddling with his computer. (Dave Barry)

I define UNIX as 30 definitions of regular expressions living under one roof. (Donald Knuth)







This post tagged: