Email spam is an ever growing problem. This article covers getting SpamAssassin working on Mac OS X. Actually, it is probably useful for doing this on any Unix/Linux, but Mac OS X presents special challenges.
Let's do the grand overview first.
The first problem is that there is no security in plain old POP3. Some ISP's offer secure POP, but that can sometimes have its own set of problems, so we'll stick with POP but tunnel through ssh for security. See related article Securing POP in Mac OS X. (you don't NEED to do this to use Spamassassin).
The second problem is that Mail.app in OS X 10.2.4 doesn't understand the concept of just getting its mail from a local mailbox. It (stupidly) has to use a protocol like pop or imap. So, we need a pop or imap server running on the Mac, and unfortunately Mac OS X 10.2.4 doesn't have either.
Once we solve these two problems, the flow is pretty simple: regularly go out (through an ssh tunnel, again that part is entirely optional and unrelated to Spamassassin) and download mail, running it through SpamAssassin as it comes in, storing it in a non-Mail.app location. Have a pop or imap server that reads that mailbox upon request from Mail.app, which then sorts out the mail based on the flags SpamAssassin adds (and by its own filters).
Simple enough? Let's get started..
I used the writeup at https://www.stupidfool.org/docs/sa.html. This is the hardest part of the whole project. There are a TON of Perl modules that you need to download, make, and install. I ran into no glitches on any of these.
Part of this is the "popread" program. I modified it both to add Spamassasin and to add an ssh tunnel. Here is my popread after the changes:
#!/usr/bin/perl # REMEMBER THAT YOU NEED TO EDIT THE SUBROUTINE "filter" BEFORE # DEPLOYING! use strict; use Mail::POP3Client; my %midcache; my $tunnel; my @accounts = ( { USER => "user1", AUTH_MODE => "PASS", PASSWORD => "fgojhgc6783fg", PORT => "11110", HOST => "localhost" }, { USER => "user2", AUTH_MODE => "PASS", PASSWORD => "fgu7h9#", PORT => "11110", HOST => "localhost" }, # More accounts here. ); %midcache = map {chomp; $_ => 1} `tail -50 $ENV{HOME}/.msgidcache`; $|=1; for (@accounts) { # assumes ssh-agent is running # ssh will quit after 20 seconds but won't quit if we're still # reading through the tunnel # sleep 20; system('ssh -f -L 11110:isphost.com:110 -l user isphost.com sleep 20 '); # if you have accounts on different servers, you need to # modify this slightly. I'd put the servers/user info in the # accounts hashes.. print "\nConnecting to $$_{HOST}..."; my $pop = new Mail::POP3Client (%$_); unless ($pop) { warn "Couldn't connect\n"; next; } my $count = $pop->Count; if ($count <0) { warn "Authorization failed"; next; } print "\n"; print "New messages: $count\n"; my %down = map {$_ => 1} (1..$count); my @mails; for my $num (1..$count) { print "\n"; my @head = $pop->Head($num); for (@head) { /^(From|Subject):\s+(.*)/i and do { print "$1\t$2\n"; $mails[$num]->{$1} = $2; }; /^Message-Id:\s+(\S+)/i and do { if (exists $midcache{$1}) { print "(Duplicate)\n"; delete $down{$num}; $mails[$num]->{mid} = $1; $pop->Delete($num); } $midcache{$1}++; } } } next unless keys %down; my @tocome = sort {$a <=> $b} keys %down; print "Downloading: @tocome\n"; for my $num (@tocome) { my @mail; print "Downloading message $num (", $mails[$num]->{From}, ":", $mails[$num]->{Subject}, ")..."; @mail = $pop->Retrieve($num); $_ .= "\n" for @mail; my $now = scalar localtime; $mail[0] =~ s/Return-Path:\s+<([^>]+)>/From $1 $now/; print "\n"; if (!@mail) { print "Ugh, something went wrong!\n"; delete $midcache{$mails[$num]->{mid}}; next; } filter(@mail); $pop->Delete($num); } $pop->Close; } open OUT, ">$ENV{HOME}/.msgidcache" or die $!; print OUT "$_\n" for keys %midcache; close OUT; use Mail::Audit; use Mail::Send; use Mail::SpamAssassin; sub filter { my @data = @_; my $item = Mail::Audit->new(data => \@data, noexit => 1, nomime => 1); my $spam = Mail::SpamAssassin->new; my $status = $spam->check($item); if ($status->is_spam) { print "Spam..\n"; $status->rewrite_mail; } $item->accept("/var/mail/<username>"); $status->finish; }
Note that the user accounts are set up to read from "localhost" on port 11110. That's actually going to be reading the remote machine by way of an ssh tunnel. We start up a new tunnel with every account and sleep 20 seconds before starting a new tunnel just to make sure that the last tunnel is done. The trick to that is that the ssh tunnel incudes a "sleep 20" that it runs as a command. When that command finishes, the tunnel will be torn down, but only if it isn't still in use. So we get all the time we need to get our messages as long as we START reading within 20 seconds.
I run this in a script that does this:
#!/bin/bash TUN=`lsof -i:11110 -Fp | head -1| sed s/p//` if [ "$TUN" ] then lsof -i:11110 kill $TUN fi ping -c5 pcunix.com sleep 30 while true do echo "Connecting.." ~/bin/popread echo "Sleeping..`date`" sleep 310 done
So every 310 seconds this goes out and puts new mail into /var/mail/apl (I'm logged in as "apl"). I chose 310 seconds because Mail.app will be checking every 5 minutes and I don't want these two to accidentally get locked into the same exact time span. Concurrent access is not the problem, but if these were running at exactly the same time my access to new mail could be delayed.
The other stuff just kills off any hung port 11110 processes.
I will need to have run ssh-agent before starting this script unless I want to be bothered for passwords every time it runs. See SSH Basics.
For this, I used the article at https://www.stepwise.com/Articles/Workbench/eart.2.0.html. Ignore all the Perl scripts; the ONLY change to the source you need the make is to find the line that defines where your mailbox is. You can ignore all the rest of his configuration if you are using Mail.app as I've done here. In src/osdep/unix/env_unix.c, I set
static char *myHomeDir = "/var/mail/apl"; /* home directory name */
(If you need a quick intro to vi, see Vi Primer)
The compilation was completely smooth, and the rest of the instructions (adding the services to Netinfo and inetd.conf) are fine. Get that running and confirm it's all good by following the instructions on testing a pop or imap server at: How do I test IMAP? and How do I test POP3?
Netinfo is gone as of Leopard (October 2007). Good riddance.
With that all working, it's time to edit accounts in Mail's preferences. Just tell it that the host is localhost, it's pop or imap (whichever you installed - I used pop), and it's your user name and login password. Add a filter, and use the "Edit Header List" in the drop-down to add X-Spam-Flag as a header that you can write rules for.
Note that there's no reason to turn off Mail's own junk filtering. The two of these can work together quite happily.
Of course, you'll want to train SpamAssassin. I started that (after deleting all junk) by running this:
find /Users/apl/library/mail/mailboxes -name mbox -exec sa-learn --ham --mbox {} \;
The "--ham" tells sa-learn that these are messages I like. A "--spam" tells it the opposite, so after I have made sure I have no false positives in my "junk" box, I run:
sa-learn --spam --mbox /Users/apl/library/mail/mailboxes/action/junk.mbox/mbox
(You have to at least LOOK at these messages first, or they will be in Incoming_Mail rather than mbox)
I particularly like that Spamassassin shows why it calls something spam:
This mail is probably spam. The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future. See https://spamassassin.org/tag/ for more details. Content preview: *NEW-Special Package Deal!* Norton SystemWorks 2003 Software Suite -Professional Edition- ATTN: This is a MUST for ALL Computer Users!!! [...] Content analysis details: (10.30 points, 5 required) ONLY_COST (0.2 points) BODY: Only $$$ OFFERS_ETC (0.6 points) BODY: Stop with the offers, coupons, discounts etc! FOR_JUST_SOME_AMT (0.2 points) BODY: Contains 'for only' some amount of cash DATE_IN_PAST_03_06 (0.3 points) Date: is 3 to 6 hours before Received: date HABEAS_HIL (4.0 points) RBL: Sender is on www.habeas.com Habeas Infringer List [RBL check: found 220.232.178.218.hil.habeas.com., type: 127.1.0.6] RCVD_IN_SBL (0.6 points) RBL: Received via SBLed relay, see https://www.spamhaus.org/sbl/ [RBL check: found 220.232.178.218.sbl.spamhaus.org.] FORGED_MUA_OUTLOOK (3.3 points) Forged mail pretending to be from MS Outlook FROM_HAS_UNDERLINE_NUMS (0.6 points) From: contains an underline and numbers/letters MISSING_MIMEOLE (0.5 points) Message has X-MSMail-Priority, but no X-MimeOLE The original message did not contain plain text, and may be unsafe to open with some email clients; in particular, it may contain a virus, or confirm that your address can receive spam. If you wish to view it, it may be safer to save it to a file and open it with an editor.
Notice the point scoring: that's what makes Spamassassin so good at trapping spam. It doesn't just use all or nothing. For example, a message gets points for being from a domain know to harbor spammers, but that by itself isn't enough. That stops false spam tagging of innocent emails from people who unluckily live in the same ip block as known spam producers.
That's it. A fair amount of work, but with both SpamAssassin and Mail.apps spam filters tackling my mail, the job is a little easier. I found that Spamassassin made an immediate difference. In spite of being trained for months, Mail.app still misses a lot of spam that Spamassassin identified. I hope that lasts.
See also POPFile
(A packaged mailserver that has Spamassassin preconfigured is
https://www.kerio.com/kms_home.html.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2012-07-09 Tony Lawrence
Software and cathedrals are much the same – first we build them, then we pray. (Sam Redwine)
(Sorry: I don't think I was clear on what I asked earlier)
The Stupidfool article explains the advantage of using Popread over Fetchmail.
But can I configure Popread do the same things as Fetchmail? I UNDERSTAND THAT IT DOWNLOADS POP MAIL. I mean, can I configure OPTIONS analagous to Fetchmail's --keep --erase etc.
For example, I'd like to try this solution by leaving email on my server until I'm sure I have everything working. That's not a problem in Fetchmail, but how do I accomplish that in Popread?
I can't find very much documentation for Popread, and my Perl skills are too weak to infer it from the sample script or modules.
--
There are no "options", but you can do whatever you like. Unfortunately, you do need some understanding of Perl. You can choose whether or not to delete or leave messages, but that is scripting YOU do, not options. It's not particularly hard; documentation for the mail::pop3client is at various places on the net, including
https://search.cpan.org/~sdowd/Mail-POP3Client-2.14/POP3Client.pm
Specifically, the above script deletes messages from the server with the line:
$pop->Delete($num);
If that line were not there, the messages would be left.
--TonyLawrence
That was helpful. Still, doing more than that requires more Perl knowledge than I've got (never got to the chapter in Camel (or is it Llama?) about pointers and modules).
Anyway, my mail is being downloaded by popread and served by IMAP beautifully, but as far as I can tell, the spamassassin filter isn't working. There's nothing in my mails' headers to suggest that spamassassin touched them. Not sure if it's a problem in the user_templates or what. Darn it.
Fri Jul 8 00:33:23 2005: 761 anonymous
Im getting this:
Can't locate auto/Mail/Audit/MailInternet/extract_mes.al in @INC (@INC contains: lib /sw/lib/perl5/5.8.6/darwin-thread-multi-2level /sw/lib/perl5/5.8.6 /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 .) at /Library/Perl/5.8.6/Mail/SpamAssassin/PerMsgStatus.pm line 1261
Fri Jul 8 10:13:34 2005: 762 TonyLawrence
That tells you that xtract_mes didn't get installed properly.
Sometimes things change. This article was written a while ago; spamassassin is surely very different now and may require more work than is noted here.
------------------------
Printer Friendly Version
Spamassassin on Mac OS X Copyright © March 2003 Tony Lawrence
Have you tried Searching this site?
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.
Contact us
Printer Friendly Version