Email spam is an ever growing problem. This article covers getting SpamAssassin working on Mac OS X. Actually, it is probably useful for doing this on any Unix/Linux, but Mac OS X presents special challenges.
Let's do the grand overview first.
The first problem is that there is no security in plain old POP3. Some ISP's offer secure POP, but that can sometimes have its own set of problems, so we'll stick with POP but tunnel through ssh for security. See related article Securing POP in Mac OS X. (you don't NEED to do this to use Spamassassin).
The second problem is that Mail.app in OS X 10.2.4 doesn't understand the concept of just getting its mail from a local mailbox. It (stupidly) has to use a protocol like pop or imap. So, we need a pop or imap server running on the Mac, and unfortunately Mac OS X 10.2.4 doesn't have either.
Once we solve these two problems, the flow is pretty simple: regularly go out (through an ssh tunnel, again that part is entirely optional and unrelated to Spamassassin) and download mail, running it through SpamAssassin as it comes in, storing it in a non-Mail.app location. Have a pop or imap server that reads that mailbox upon request from Mail.app, which then sorts out the mail based on the flags SpamAssassin adds (and by its own filters).
Simple enough? Let's get started..
I used the writeup at http://www.stupidfool.org/docs/sa.html. This is the hardest part of the whole project. There are a TON of Perl modules that you need to download, make, and install. I ran into no glitches on any of these.
Part of this is the "popread" program. I modified it both to add Spamassasin and to add an ssh tunnel. Here is my popread after the changes:
#!/usr/bin/perl
# REMEMBER THAT YOU NEED TO EDIT THE SUBROUTINE "filter" BEFORE
# DEPLOYING!
use strict;
use Mail::POP3Client;
my %midcache;
my $tunnel;
my @accounts = (
{
USER => "user1",
AUTH_MODE => "PASS",
PASSWORD => "fgojhgc6783fg",
PORT => "11110",
HOST => "localhost"
},
{
USER => "user2",
AUTH_MODE => "PASS",
PASSWORD => "fgu7h9#",
PORT => "11110",
HOST => "localhost"
},
# More accounts here.
);
%midcache = map {chomp; $_ => 1} `tail -50 $ENV{HOME}/.msgidcache`;
$|=1;
for (@accounts) {
# assumes ssh-agent is running
# ssh will quit after 20 seconds but won't quit if we're still
# reading through the tunnel
#
sleep 20;
system('ssh -f -L 11110:isphost.com:110 -l user isphost.com sleep 20 ');
# if you have accounts on different servers, you need to
# modify this slightly. I'd put the servers/user info in the
# accounts hashes..
print "\nConnecting to $$_{HOST}...";
my $pop = new Mail::POP3Client (%$_);
unless ($pop) { warn "Couldn't connect\n"; next; }
my $count = $pop->Count;
if ($count <0) { warn "Authorization failed"; next; }
print "\n";
print "New messages: $count\n";
my %down = map {$_ => 1} (1..$count);
my @mails;
for my $num (1..$count) {
print "\n";
my @head = $pop->Head($num);
for (@head) {
/^(From|Subject):\s+(.*)/i and do {
print "$1\t$2\n";
$mails[$num]->{$1} = $2;
};
/^Message-Id:\s+(\S+)/i and do {
if (exists $midcache{$1}) {
print "(Duplicate)\n";
delete $down{$num};
$mails[$num]->{mid} = $1;
$pop->Delete($num);
}
$midcache{$1}++;
}
}
}
next unless keys %down;
my @tocome = sort {$a <=> $b} keys %down;
print "Downloading: @tocome\n";
for my $num (@tocome) {
my @mail;
print "Downloading message $num (", $mails[$num]->{From}, ":",
$mails[$num]->{Subject}, ")...";
@mail = $pop->Retrieve($num);
$_ .= "\n" for @mail;
my $now = scalar localtime;
$mail[0] =~ s/Return-Path:\s+<([^>]+)>/From $1 $now/;
print "\n";
if (!@mail) {
print "Ugh, something went wrong!\n";
delete $midcache{$mails[$num]->{mid}};
next;
}
filter(@mail);
$pop->Delete($num);
}
$pop->Close;
}
open OUT, ">$ENV{HOME}/.msgidcache" or die $!;
print OUT "$_\n" for keys %midcache;
close OUT;
use Mail::Audit;
use Mail::Send;
use Mail::SpamAssassin;
sub filter {
my @data = @_;
my $item = Mail::Audit->new(data => \@data, noexit => 1, nomime => 1);
my $spam = Mail::SpamAssassin->new;
my $status = $spam->check($item);
if ($status->is_spam) {
print "Spam..\n";
$status->rewrite_mail;
}
$item->accept("/var/mail/<username>");
$status->finish;
}
Note that the user accounts are set up to read from "localhost" on port 11110. That's actually going to be reading the remote machine by way of an ssh tunnel. We start up a new tunnel with every account and sleep 20 seconds before starting a new tunnel just to make sure that the last tunnel is done. The trick to that is that the ssh tunnel incudes a "sleep 20" that it runs as a command. When that command finishes, the tunnel will be torn down, but only if it isn't still in use. So we get all the time we need to get our messages as long as we START reading within 20 seconds.
I run this in a script that does this:
#!/bin/bash TUN=`lsof -i:11110 -Fp | head -1| sed s/p//` if [ "$TUN" ] then lsof -i:11110 kill $TUN fi ping -c5 pcunix.com sleep 30 while true do echo "Connecting.." ~/bin/popread echo "Sleeping..`date`" sleep 310 done
So every 310 seconds this goes out and puts new mail into /var/mail/apl (I'm logged in as "apl"). I chose 310 seconds because Mail.app will be checking every 5 minutes and I don't want these two to accidentally get locked into the same exact time span. Concurrent access is not the problem, but if these were running at exactly the same time my access to new mail could be delayed.
The other stuff just kills off any hung port 11110 processes.
I will need to have run ssh-agent before starting this script unless I want to be bothered for passwords every time it runs. See SSH Basics.
For this, I used the article at http://www.stepwise.com/Articles/Workbench/eart.2.0.html. Ignore all the Perl scripts; the ONLY change to the source you need the make is to find the line that defines where your mailbox is. You can ignore all the rest of his configuration if you are using Mail.app as I've done here. In src/osdep/unix/env_unix.c, I set
static char *myHomeDir = "/var/mail/apl"; /* home directory name */
(If you need a quick intro to vi, see Vi Primer)
The compilation was completely smooth, and the rest of the instructions (adding the services to Netinfo and inetd.conf) are fine. Get that running and confirm it's all good by following the instructions on testing a pop or imap server at: How do I test IMAP? and How do I test POP3?
Netinfo is gone as of Leopard (October 2007). Good riddance.
With that all working, it's time to edit accounts in Mail's preferences. Just tell it that the host is localhost, it's pop or imap (whichever you installed - I used pop), and it's your user name and login password. Add a filter, and use the "Edit Header List" in the drop-down to add X-Spam-Flag as a header that you can write rules for.
Note that there's no reason to turn off Mail's own junk filtering. The two of these can work together quite happily.
Of course, you'll want to train SpamAssassin. I started that (after deleting all junk) by running this:
find /Users/apl/library/mail/mailboxes -name mbox -exec sa-learn --ham --mbox {} \;
The "--ham" tells sa-learn that these are messages I like. A "--spam" tells it the opposite, so after I have made sure I have no false positives in my "junk" box, I run:
sa-learn --spam --mbox /Users/apl/library/mail/mailboxes/action/junk.mbox/mbox
(You have to at least LOOK at these messages first, or they will be in Incoming_Mail rather than mbox)
I particularly like that Spamassassin shows why it calls something spam:
This mail is probably spam. The original message has been attached
along with this report, so you can recognize or block similar unwanted
mail in future. See http://spamassassin.org/tag/ for more details.
Content preview: *NEW-Special Package Deal!* Norton SystemWorks 2003
Software Suite -Professional Edition- ATTN: This is a MUST for ALL
Computer Users!!! [...]
Content analysis details: (10.30 points, 5 required)
ONLY_COST (0.2 points) BODY: Only $$$
OFFERS_ETC (0.6 points) BODY: Stop with the offers, coupons, discounts etc!
FOR_JUST_SOME_AMT (0.2 points) BODY: Contains 'for only' some amount of cash
DATE_IN_PAST_03_06 (0.3 points) Date: is 3 to 6 hours before Received: date
HABEAS_HIL (4.0 points) RBL: Sender is on www.habeas.com Habeas Infringer List
[RBL check: found 220.232.178.218.hil.habeas.com., type: 127.1.0.6]
RCVD_IN_SBL (0.6 points) RBL: Received via SBLed relay, see http://www.spamhaus.org/sbl/
[RBL check: found 220.232.178.218.sbl.spamhaus.org.]
FORGED_MUA_OUTLOOK (3.3 points) Forged mail pretending to be from MS Outlook
FROM_HAS_UNDERLINE_NUMS (0.6 points) From: contains an underline and numbers/letters
MISSING_MIMEOLE (0.5 points) Message has X-MSMail-Priority, but no X-MimeOLE
The original message did not contain plain text, and may be unsafe to
open with some email clients; in particular, it may contain a virus,
or confirm that your address can receive spam. If you wish to view
it, it may be safer to save it to a file and open it with an editor.
Notice the point scoring: that's what makes Spamassassin so good at trapping spam. It doesn't just use all or nothing. For example, a message gets points for being from a domain know to harbor spammers, but that by itself isn't enough. That stops false spam tagging of innocent emails from people who unluckily live in the same ip block as known spam producers.
That's it. A fair amount of work, but with both SpamAssassin and Mail.apps spam filters tackling my mail, the job is a little easier. I found that Spamassassin made an immediate difference. In spite of being trained for months, Mail.app still misses a lot of spam that Spamassassin identified. I hope that lasts.
See also POPFile
(A packaged mailserver that has Spamassassin preconfigured is
http://www.kerio.com/kms_home.html.
If this page was useful to you, please help others find it:
More Articles by Tony Lawrence - Find me on Google+ 2003-03-01
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.
I am a Kerio reseller. Articles here related to Kerio products reflect my honest opinion, but I do have an obvious interest in selling those products also.
Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.
We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.
Click here to add your comments - no registration needed!
(Sorry: I don't think I was clear on what I asked earlier)
The Stupidfool article explains the advantage of using Popread over Fetchmail.
But can I configure Popread do the same things as Fetchmail? I UNDERSTAND THAT IT DOWNLOADS POP MAIL. I mean, can I configure OPTIONS analagous to Fetchmail's --keep --erase etc.
For example, I'd like to try this solution by leaving email on my server until I'm sure I have everything working. That's not a problem in Fetchmail, but how do I accomplish that in Popread?
I can't find very much documentation for Popread, and my Perl skills are too weak to infer it from the sample script or modules.
--
There are no "options", but you can do whatever you like. Unfortunately, you do need some understanding of Perl. You can choose whether or not to delete or leave messages, but that is scripting YOU do, not options. It's not particularly hard; documentation for the mail::pop3client is at various places on the net, including
http://search.cpan.org/~sdowd/Mail-POP3Client-2.14/POP3Client.pm
Specifically, the above script deletes messages from the server with the line:
$pop->Delete($num);
If that line were not there, the messages would be left.
--TonyLawrence
That was helpful. Still, doing more than that requires more Perl knowledge than I've got (never got to the chapter in Camel (or is it Llama?) about pointers and modules).
Anyway, my mail is being downloaded by popread and served by IMAP beautifully, but as far as I can tell, the spamassassin filter isn't working. There's nothing in my mails' headers to suggest that spamassassin touched them. Not sure if it's a problem in the user_templates or what. Darn it.
Fri Jul 8 00:33:23 2005: 761 anonymous
Im getting this:
Can't locate auto/Mail/Audit/MailInternet/extract_mes.al in @INC (@INC contains: lib /sw/lib/perl5/5.8.6/darwin-thread-multi-2level /sw/lib/perl5/5.8.6 /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 .) at /Library/Perl/5.8.6/Mail/SpamAssassin/PerMsgStatus.pm line 1261
Fri Jul 8 10:13:34 2005: 762 TonyLawrence
That tells you that xtract_mes didn't get installed properly.
Sometimes things change. This article was written a while ago; spamassassin is surely very different now and may require more work than is noted here.
Don't miss responses! Subscribe to Comments by RSS or by Email
Click here to add your comments
If you want a picture to show with your comment, go get a Gravatar