Detecting Comment Spam Part 1

Suppose you were writing a commenting system for a website and you wanted to check user input against a list of words that might indicate spam. You'd want the list of suspicious words in a file and you'd run through that list. An easy way to do that in Perl is to use Perl's "grep" command.



Title  Last Comment
Understanding Perl's map function  
- Understanding Perl's underappreciated map function - basic usage and examples -

Shell script cannot test for the existence of files  
- My shell script needs to move certain files to another location. How do I test that the files exist? -

Perl directory listing  
- Learning Perl basics to produce 'pretty' directory listing with 'File::Find'.: A "pretty" or custom directory listing is a good place to start developing your scripting skills if you want to. -

Learning Perl - why you should even if you'll never use it   2013/07/18 TonyLawrence
- That 'you should learn Perl even if you'll never use it' may seem like an odd thing to say: why learn something that you might never use? -

How do I find out what IP address a user logged in from?   2013/11/13 TonyLawrence
- Obtaining a client IP address on Linux/Unix is not always direct or easy. You may have to dig it out of shell command output. -

Another Perl link checker  
- I'm still trying to write the perfect link checker for my site. -

Script to block DOS attacks   2013/06/03 TonyLawrence
- People steal content. If you run a website, you almost certainly know that; here is a simple script to block annoying abusers and possibly stop a little theft, too. -

Learning Spanish with a little help from Perl   2013/04/16 TonyLawrence
- A little Perl script helps me refresh my Spanish knowledge. -

Early reminders for first Monday of month events  
- I kept forgetting to send our club meeting notices to the newsletter. Google Calendar couldn't help me, so I wrote this script. -

Here files (shell scripting) 2012/10/09 TonyLawrence
- Once again, I've been bitten by not having read the manual recently. This bite really annoyed me. -

A Perl script for tracking nutrition   2012/09/11 BigDumbDinosaur
- This is a very simple Perl script designed to track nutritional information. I wrote it because I got a very bad cholesterol test result recently. -

Perl Profiling with Devel::NYTProf  
- I don't think I have ever used a profiler on my own code. The reasons are simple: I don't write much that is very complicated, so any bottlenecks are usually rather easy to spot. Most of what I do is ad hoc and limited use anyway, so speed is seldom a consideration. -

Perl Date::Manip for date validation  
- Validating dates can be tricky, but Date::Manip makes it easy (at the cost of a little speed). -

KCMENU (Kevin Clark's menu generator) in Perl
- Translate old kcmenu files to Perl scripts - a simple Perl based menu script. -

Slightly Scrambled - unsorting a file   2011/04/05 TonyLawrence
- Here is a typical way to approach the problem. It uses Perl's associative arrays and (somewhat ironically) uses -

Snarling Panda site cleanup  
- The problem is "low value content". That's tough to define absolutely, but some pages here definitely fall into that category -

Smarter HTML Link Extractor   2011/07/20 TonyLawrence
- Checking links is not really hard; you can actually do it with just a few lines of Perl. -

Locking files for shared access  
- Multiple users require some sort of mechanism to give exclusive access to data. It's trivial to demonstrate advisory locking with Perl. -

Quick and simple web log grepper in Perl  
- Get quick pageviews/uniques from web logs with this short Perl script -

 
 











 
 
Simple XML POST and reply   2010/03/24 TonyLawrence
- A customer has an app that needs to post and get XML data from a website. This task was being handled by .asp scripts on a Windows box, but now they want it moved to Linux and Perl. -

Detecting Comment Spam Part 3   2009/12/31 TonyLawrence
- In the previous posts in this series, I've said that spammers habits allow us to detect their attempts to leave inappropriate comments. I use these techniques here and am able to send most spam comments directly to the bit-bucket without ever having to examine them myself. -

Detecting Comment Spam Part 2   2010/02/05 TonyLawrence
- Links are much more difficult for spammers to mangle - they can use redirection at the destination site, but the site itself is static. -

Detecting Comment Spam Part 1   2009/12/22 TonyLawrence
- Suppose you were writing a commenting system for a website and you wanted to check user input against a list of words that might indicate spam. -

Awk vs. Perl   2012/08/15 TonyLawrence
- Sure, I used to use awk. When I used it, you weren't likely to find Perl onmost Unix systems, so for a lot of text mangling, awk was at least easier than writing in C or anything else. It did the job, and you'd get used to its quirks. -

Fishing for an unknown device  
- If you have a DHCP server anywhere in the network, the device will have obtained an IP address. But what is it? -

Easy file editing with Tie::File and perl   2010/07/10 StavanShah
- This is a delightful way to do in place editing of files. You don't have to save a copy in /tmp under a unique file name and then delete it. Truly delightful experience. -

The Genius of the Perl programming language  
- Perl is great not just because of its intrinsic features,syntax or semantics. Perl is great because it brought about the CPAN culture. -

Fetching RSS info with the Awareness API  
- Google Feedburner Awareness API fetchers feed data -

Destroying Twitter Friendships with twitterdeaf.pl  
- I'm unfollowing those who follow too many other Twitterers -

Adding Gravatars with Perl  
- I've added Gravatar support to the comments system -

Perl Reporting  
- It's been so long since I have used any of these reporting features that I had to drag out my big Camel Book to review the whole subject. -

mod_perl on Debian   2010/12/09 Questorian
- I've ignored mod_perl because I see no point in doing half a job. It could offer many advantages, but I'd need to rewrite many, many scripts to take full advantage. -

Using Multiple Submits with Perl CGI   2010/11/20 TonyLawrence
- You don't have to limit yourself to one submit, but you do have to be careful -

Perl script to get Numly  
- Using Perl LWP to get Numly tags (I stopped using this sometime back but am leaving this code for others). -

Perl 5.10  
- Perl 5.10 introduces new features and fixes things that have bitten me -

Random errors in Perl  
- Don't make this programming mistake in Perl or any other language. -

Writing a Twitter getter Widget  
- I did not like the Twitter Widget, so I wrote my own. I didn't like that either, but maybe you will. -

Power failure changes my habits  
- I need to be more careful in editing scripts - a power failure catches me at exactly the wrong time! -

Equal height CSS columns with filler text   2010/04/15 w3cvalidation
- Making column length match with text. The problem is knowing when to stop writing text in the left column. -

How many RSS readers do you really have?  
- Do those RSS readers come to the site? I sometimes subscribe to a site and only after months of ignoring it do I get around to actually getting rid of it. -

Random subroutines in Perl  
- How to call subroutines randomly with Perl - and why you might want to do something that seems so silly. -

Handling missing data  
- I have an old Perl project that goes out to a Government web site, ftp's some files, massages them in various ways, and spits out some output. Over time, the project grew, and now does more than it used to. To keep it generic and simple, let's pretend that originally it went out and got "temperature" files every hour. That was simple enough: about the only error condition would be the inability to get one or more of the files it wanted. My program simply kept track of what it actually got and only ran the rest of the process on new files. -

Perl loop causes strange read-only error  
- I don't understand this. It must have something to do with anonymous arrays in Perl (no, it doesn't, I realize now), but I don't grok the connection. I ran into this in attempting a seemingly simple change in some customer's code; -

Continuation Lines   2010/05/28 anonymous
- There's been a long standing Unix convention of breaking long lines with a "\" to make them easier to read. You'd almost always see this in files like /etc/printcap, but there are plenty of other places where this convention is used. -

Net:FTP $ftp->put problem!  
- using ftp->put to upload file to ftp server: I am using $ftp->put to upload file to ftp server but it is not working... -

Lady and the Scamp (SCO does the Web)  
- Sco tackles Lamp - SCAMP against LAMP! Those rascals! Don't you just want to hug 'em? -

Simple Schedule  
- I often get asked for web-based scheduling programs. I've done quite a few of them over the years, sometimes using scripts available from the web, but more often writing my own simply because I don't like modifying other people's code. -

Wicked Cool Perl Scripts  
- Bah, humbug. Well, maybe not that bad. Actually, not "bad" at all: I have no real complaints about this book, but I didn't like it and can't imagine handing it to anyone with a hearty "Here, read this, you'll love it". -

A ps problem with BBX  
- The actual problem related to BBX. Apparently this gets run with very long command lines, which could not be seen in 'ps' -

Copying Mac Resource Forks with Perl  
- We have the problem that if we move a Mac executable to our Linux web server, we lose the resource fork unless we use Stuffit first. -

 
 
Samepage - Redefining how people create and share information
 
 
Controlling concurrent runs with Perl  
- Sometimes you have a program that can't be run by more than one person, or that must run frequently but you are not sure how long an instance of it will take. -

Sitemaps: Influencing Google for Web Site Promotion and Adsense Revenue  
- There really isn't much you can do directly with Google in the area of web site promotion. However, there are a few tools you can use. -

File date comparison  
- For this example, we'll use the case where a file shouldn't be overwritten if it was created or changed today. But what does "today" mean? -

Rounding time  
- Rounding time to the nearest fifteen minutes isn't all that difficult with Perl. Perhaps there are faster ways? -

Beginning Web Development with Perl  
- Beginning Web Development with Perl- While Perl may not be the 'cool' language for websites anymore, there are some of us who prefer to work with it because we use it for so many other tasks. -

Content Management Systems  
- Most bloggers use Content Management Systems (CMS from now on) of some sort. These do what they promise: they manage your pages. -

Pro Perl Parsing  
- I thoroughly enjoyed this. It may not be everyone's cup of tea; the subject matter is a bit esoteric. -

Perl Range Operator (.. and ...)  
- In a list context, this operator is easy to use and understand. It is much more confusing in a scalar context, and is often badly explained in books and webpages. -

Creating Perl Modules for web sites  
- When you are writing your own code, you are more apt to use someone else's module than write your own, unless your project gets fairly large and complex. Small scripting tasks just don't need the advantages modules offer. However, there is a case where modules might make perfect sense: web server cgi scripts often repeat the same tasks. Putting those common features into a module can make your web scripting easier. -



More Perl articles

privacy policy