Spamassassin on Mac OS X

More Articles

Email spam is an ever growing problem. This article covers getting SpamAssassin working on Mac OS X. Actually, it is probably useful for doing this on any Unix/Linux, but Mac OS X presents special challenges.


Hate these ads?

Let's do the grand overview first.

  • Mac OS X 10.2.4
  • Multiple pop mailboxes on the internet
  • Mail.app to read mail



The first problem is that there is no security in plain old POP3. Some ISP's offer secure POP, but that can sometimes have its own set of problems, so we'll stick with POP but tunnel through ssh for security. See related article Securing POP in Mac OS X. (you don't NEED to do this to use Spamassassin).

The second problem is that Mail.app in OS X 10.2.4 doesn't understand the concept of just getting its mail from a local mailbox. It (stupidly) has to use a protocol like pop or imap. So, we need a pop or imap server running on the Mac, and unfortunately Mac OS X 10.2.4 doesn't have either.

Once we solve these two problems, the flow is pretty simple: regularly go out (through an ssh tunnel, again that part is entirely optional and unrelated to Spamassassin) and download mail, running it through SpamAssassin as it comes in, storing it in a non-Mail.app location. Have a pop or imap server that reads that mailbox upon request from Mail.app, which then sorts out the mail based on the flags SpamAssassin adds (and by its own filters).

Simple enough? Let's get started..

SpamAssassin

I used the writeup at http://www.stupidfool.org/docs/sa.html. This is the hardest part of the whole project. There are a TON of Perl modules that you need to download, make, and install. I ran into no glitches on any of these.






Part of this is the "popread" program. I modified it both to add Spamassasin and to add an ssh tunnel. Here is my popread after the changes:



#!/usr/bin/perl
# REMEMBER THAT YOU NEED TO EDIT THE SUBROUTINE "filter" BEFORE
# DEPLOYING!
use strict;
use Mail::POP3Client;
my %midcache;
my $tunnel;



my @accounts = (
    {
     USER => "user1",
     AUTH_MODE => "PASS",
     PASSWORD => "fgojhgc6783fg",
     PORT => "11110",
     HOST => "localhost"
    },
    {
     USER => "user2",
     AUTH_MODE => "PASS",
     PASSWORD => "fgu7h9#",
     PORT => "11110",
     HOST => "localhost"
    },
    # More accounts here.
);



%midcache = map {chomp; $_ => 1} `tail -50 $ENV{HOME}/.msgidcache`;
$|=1;
for (@accounts) {
# assumes ssh-agent is running
# ssh will quit after 20 seconds but won't quit if we're still
# reading through the tunnel
#
    sleep 20;
    system('ssh  -f -L 11110:isphost.com:110 -l user isphost.com sleep 20 ');
    # if you have accounts on different servers, you need to 
    # modify this slightly.  I'd put the servers/user info in the
    # accounts hashes..


    
    print "\nConnecting to $$_{HOST}...";



    my $pop = new Mail::POP3Client (%$_);
    unless ($pop) { warn "Couldn't connect\n"; next; }



    my $count = $pop->Count;
    if ($count <0) { warn "Authorization failed"; next; }
    print "\n";
    print "New messages: $count\n";



    my %down = map {$_ => 1} (1..$count); 
    my @mails;
    for my $num (1..$count) {
        print "\n";
        my @head = $pop->Head($num);
        for (@head) {
             /^(From|Subject):\s+(.*)/i and do {
                print "$1\t$2\n";
                $mails[$num]->{$1} = $2;
             };
             /^Message-Id:\s+(\S+)/i and do {
                if (exists $midcache{$1}) {
                    print "(Duplicate)\n";
                    delete $down{$num};
                    $mails[$num]->{mid} = $1;
                    $pop->Delete($num);
                }
                $midcache{$1}++;
             }
        }
    }



    next unless keys %down;
    my @tocome = sort {$a <=> $b} keys %down;
    print "Downloading: @tocome\n";
    for my $num (@tocome) {
        my @mail;
        print "Downloading message $num (", $mails[$num]->{From}, ":",
        $mails[$num]->{Subject}, ")...";
        @mail = $pop->Retrieve($num);
        $_ .= "\n" for @mail;
        my $now = scalar localtime;
        $mail[0] =~ s/Return-Path:\s+<([^>]+)>/From $1 $now/;
        print "\n";
        if (!@mail) { 
            print "Ugh, something went wrong!\n"; 
            delete $midcache{$mails[$num]->{mid}};
            next;
        }
        filter(@mail);
        $pop->Delete($num);
    }



    $pop->Close;
}



open OUT, ">$ENV{HOME}/.msgidcache" or die $!;
print OUT "$_\n" for keys %midcache;
close OUT;



use Mail::Audit;
use Mail::Send;



use Mail::SpamAssassin;
sub filter {
    my @data = @_;
    my $item = Mail::Audit->new(data => \@data, noexit => 1, nomime => 1);
    my $spam = Mail::SpamAssassin->new;
    my $status = $spam->check($item);
    if ($status->is_spam) {
        print "Spam..\n";
        $status->rewrite_mail;
    }
    $item->accept("/var/mail/<username>");
    $status->finish;
}


cartoon

Note that the user accounts are set up to read from "localhost" on port 11110. That's actually going to be reading the remote machine by way of an ssh tunnel. We start up a new tunnel with every account and sleep 20 seconds before starting a new tunnel just to make sure that the last tunnel is done. The trick to that is that the ssh tunnel incudes a "sleep 20" that it runs as a command. When that command finishes, the tunnel will be torn down, but only if it isn't still in use. So we get all the time we need to get our messages as long as we START reading within 20 seconds.

I run this in a script that does this:



#!/bin/bash
TUN=`lsof -i:11110 -Fp | head -1| sed s/p//`
if [ "$TUN" ]
then
lsof -i:11110
kill $TUN
fi
ping -c5 pcunix.com
sleep 30
while true
do
echo "Connecting.."
~/bin/popread
echo "Sleeping..`date`"
sleep 310
done


cartoon

So every 310 seconds this goes out and puts new mail into /var/mail/apl (I'm logged in as "apl"). I chose 310 seconds because Mail.app will be checking every 5 minutes and I don't want these two to accidentally get locked into the same exact time span. Concurrent access is not the problem, but if these were running at exactly the same time my access to new mail could be delayed.

The other stuff just kills off any hung port 11110 processes.

I will need to have run ssh-agent before starting this script unless I want to be bothered for passwords every time it runs. See SSH Basics.

Imap and Pop

For this, I used the article at http://www.stepwise.com/Articles/Workbench/eart.2.0.html. Ignore all the Perl scripts; the ONLY change to the source you need the make is to find the line that defines where your mailbox is. You can ignore all the rest of his configuration if you are using Mail.app as I've done here. In src/osdep/unix/env_unix.c, I set



static char *myHomeDir = "/var/mail/apl";       /* home directory name */


(If you need a quick intro to vi, see Vi Primer)

The compilation was completely smooth, and the rest of the instructions (adding the services to Netinfo and inetd.conf) are fine. Get that running and confirm it's all good by following the instructions on testing a pop or imap server at: /SCOFAQ/scotec4.html#testimap and /SCOFAQ/scotec4.html#popper

Netinfo is gone as of Leopard (October 2007). Good riddance.

Mail.app

With that all working, it's time to edit accounts in Mail's preferences. Just tell it that the host is localhost, it's pop or imap (whichever you installed - I used pop), and it's your user name and login password. Add a filter, and use the "Edit Header List" in the drop-down to add X-Spam-Flag as a header that you can write rules for.

Note that there's no reason to turn off Mail's own junk filtering. The two of these can work together quite happily.

cartoon

Training Day

Of course, you'll want to train SpamAssassin. I started that (after deleting all junk) by running this:



find /Users/apl/library/mail/mailboxes -name mbox -exec sa-learn --ham --mbox {} \;


The "--ham" tells sa-learn that these are messages I like. A "--spam" tells it the opposite, so after I have made sure I have no false positives in my "junk" box, I run:



sa-learn --spam --mbox /Users/apl/library/mail/mailboxes/action/junk.mbox/mbox


(You have to at least LOOK at these messages first, or they will be in Incoming_Mail rather than mbox)

I particularly like that Spamassassin shows why it calls something spam:



This mail is probably spam.  The original message has been attached
along with this report, so you can recognize or block similar unwanted
mail in future.  See http://spamassassin.org/tag/ for more details.



Content preview:  *NEW-Special Package Deal!* Norton SystemWorks 2003
  Software Suite -Professional Edition- ATTN: This is a MUST for ALL
    Computer Users!!! [...] 



    Content analysis details:   (10.30 points, 5 required)
    ONLY_COST          (0.2 points)  BODY: Only $$$
    OFFERS_ETC         (0.6 points)  BODY: Stop with the offers, coupons, discounts etc!
    FOR_JUST_SOME_AMT  (0.2 points)  BODY: Contains 'for only' some amount of cash
    DATE_IN_PAST_03_06 (0.3 points)  Date: is 3 to 6 hours before Received: date
    HABEAS_HIL         (4.0 points)  RBL: Sender is on www.habeas.com Habeas Infringer List
       [RBL check: found 220.232.178.218.hil.habeas.com., type: 127.1.0.6]
       RCVD_IN_SBL        (0.6 points)  RBL: Received via SBLed relay, see http://www.spamhaus.org/sbl/
  [RBL check: found 220.232.178.218.sbl.spamhaus.org.]
  FORGED_MUA_OUTLOOK (3.3 points)  Forged mail pretending to be from MS Outlook
  FROM_HAS_UNDERLINE_NUMS (0.6 points)  From: contains an underline and numbers/letters
  MISSING_MIMEOLE    (0.5 points)  Message has X-MSMail-Priority, but no X-MimeOLE



  The original message did not contain plain text, and may be unsafe to
  open with some email clients; in particular, it may contain a virus,
  or confirm that your address can receive spam.  If you wish to view
  it, it may be safer to save it to a file and open it with an editor.





Notice the point scoring: that's what makes Spamassassin so good at trapping spam. It doesn't just use all or nothing. For example, a message gets points for being from a domain know to harbor spammers, but that by itself isn't enough. That stops false spam tagging of innocent emails from people who unluckily live in the same ip block as known spam producers.

That's it. A fair amount of work, but with both SpamAssassin and Mail.apps spam filters tackling my mail, the job is a little easier. I found that Spamassassin made an immediate difference. In spite of being trained for months, Mail.app still misses a lot of spam that Spamassassin identified. I hope that lasts.

See also POPFile

(A packaged mailserver that has Spamassassin preconfigured is http://www.kerio.com/kms_home.html. It's also easy to add this to E-Smith and most other mail servers)



Comments /MacOSX/macosxspamassassin.html


Fri Jul 8 00:33:23 2005: Subject:   anonymous
Im getting this:



Can't locate auto/Mail/Audit/MailInternet/extract_mes.al in @INC (@INC contains: lib /sw/lib/perl5/5.8.6/darwin-thread-multi-2level /sw/lib/perl5/5.8.6 /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 .) at /Library/Perl/5.8.6/Mail/SpamAssassin/PerMsgStatus.pm line 1261




Fri Jul 8 10:13:34 2005: Subject:   TonyLawrence
That tells you that xtract_mes didn't get installed properly.



Sometimes things change. This article was written a while ago; spamassassin is surely very different now and may require more work than is noted here.



Add your comments

Enter your email address for automatic notification of new posts here
(be sure to whitelist 'feedburner.com' if you use spam filtering)

Or use any RSS reader

Delivered by FeedBurner


M3IP inc.

Views for this page
Today This Week This Month This Year  Overall
11613633,903 29,821

Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

pavatar.jpg
More:
       - Code
       - MacOSX
       - Mail
       - Networking
       - Programming
       - Reviews
       - Security




Unix/Linux Consultants

Your ad here - $24.00 yearly!

http://bcstechnology.net Full service Linux & UNIX systems integrator; Windows to UNIX/Linux Client-Server Specialist; Secure E-Mail & Website Hosting; Thoroughbred Software Developer; Custom Industrial Automation; Hardware & Electronics Experts; In Business Since 1985.


http://thatitguy.com Business networking servers, Linux and Unix experts. In business since 1997! Windows and Exchange to Samba and Scalix migration experts.


http://www.vss3.com SCO/Caldera OpenServer, Unixware & Linux. Tarantella & Non-stop Clustering




card_image








Change Congress


Related Posts