APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed
RSS Feeds RSS Feeds











(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Printer Friendly Version
->
-> Spamassassin on Mac OS X






Spamassassin on Mac OS X

More Articles

Email spam is an ever growing problem. This article covers getting SpamAssassin working on Mac OS X. Actually, it is probably useful for doing this on any Unix/Linux, but Mac OS X presents special challenges.

Let's do the grand overview first.

  • Mac OS X 10.2.4
  • Multiple pop mailboxes on the internet
  • Mail.app to read mail



The first problem is that there is no security in plain old POP3. Some ISP's offer secure POP, but that can sometimes have its own set of problems, so we'll stick with POP but tunnel through ssh for security. See related article Securing POP in Mac OS X. (you don't NEED to do this to use Spamassassin).

The second problem is that Mail.app in OS X 10.2.4 doesn't understand the concept of just getting its mail from a local mailbox. It (stupidly) has to use a protocol like pop or imap. So, we need a pop or imap server running on the Mac, and unfortunately Mac OS X 10.2.4 doesn't have either.

Once we solve these two problems, the flow is pretty simple: regularly go out (through an ssh tunnel, again that part is entirely optional and unrelated to Spamassassin) and download mail, running it through SpamAssassin as it comes in, storing it in a non-Mail.app location. Have a pop or imap server that reads that mailbox upon request from Mail.app, which then sorts out the mail based on the flags SpamAssassin adds (and by its own filters).

Simple enough? Let's get started..

SpamAssassin

I used the writeup at http://www.stupidfool.org/docs/sa.html. This is the hardest part of the whole project. There are a TON of Perl modules that you need to download, make, and install. I ran into no glitches on any of these.

Part of this is the "popread" program. I modified it both to add Spamassasin and to add an ssh tunnel. Here is my popread after the changes:

#!/usr/bin/perl
# REMEMBER THAT YOU NEED TO EDIT THE SUBROUTINE "filter" BEFORE
# DEPLOYING!
use strict;
use Mail::POP3Client;
my %midcache;
my $tunnel;

my @accounts = (
    {
     USER => "user1",
     AUTH_MODE => "PASS",
     PASSWORD => "fgojhgc6783fg",
     PORT => "11110",
     HOST => "localhost"
    },
    {
     USER => "user2",
     AUTH_MODE => "PASS",
     PASSWORD => "fgu7h9#",
     PORT => "11110",
     HOST => "localhost"
    },
    # More accounts here.
);

%midcache = map {chomp; $_ => 1} `tail -50 $ENV{HOME}/.msgidcache`;
$|=1;
for (@accounts) {
# assumes ssh-agent is running
# ssh will quit after 20 seconds but won't quit if we're still
# reading through the tunnel
#
    sleep 20;
    system('ssh  -f -L 11110:isphost.com:110 -l user isphost.com sleep 20 ');
    # if you have accounts on different servers, you need to 
    # modify this slightly.  I'd put the servers/user info in the
    # accounts hashes..
    
    print "\nConnecting to $$_{HOST}...";

    my $pop = new Mail::POP3Client (%$_);
    unless ($pop) { warn "Couldn't connect\n"; next; }

    my $count = $pop->Count;
    if ($count <0) { warn "Authorization failed"; next; }
    print "\n";
    print "New messages: $count\n";

    my %down = map {$_ => 1} (1..$count); 
    my @mails;
    for my $num (1..$count) {
        print "\n";
        my @head = $pop->Head($num);
        for (@head) {
             /^(From|Subject):\s+(.*)/i and do {
                print "$1\t$2\n";
                $mails[$num]->{$1} = $2;
             };
             /^Message-Id:\s+(\S+)/i and do {
                if (exists $midcache{$1}) {
                    print "(Duplicate)\n";
                    delete $down{$num};
                    $mails[$num]->{mid} = $1;
                    $pop->Delete($num);
                }
                $midcache{$1}++;
             }
        }
    }

    next unless keys %down;
    my @tocome = sort {$a <=> $b} keys %down;
    print "Downloading: @tocome\n";
    for my $num (@tocome) {
        my @mail;
        print "Downloading message $num (", $mails[$num]->{From}, ":",
        $mails[$num]->{Subject}, ")...";
        @mail = $pop->Retrieve($num);
        $_ .= "\n" for @mail;
        my $now = scalar localtime;
        $mail[0] =~ s/Return-Path:\s+<([^>]+)>/From $1 $now/;
        print "\n";
        if (!@mail) { 
            print "Ugh, something went wrong!\n"; 
            delete $midcache{$mails[$num]->{mid}};
            next;
        }
        filter(@mail);
        $pop->Delete($num);
    }

    $pop->Close;
}

open OUT, ">$ENV{HOME}/.msgidcache" or die $!;
print OUT "$_\n" for keys %midcache;
close OUT;

use Mail::Audit;
use Mail::Send;

use Mail::SpamAssassin;
sub filter {
    my @data = @_;
    my $item = Mail::Audit->new(data => \@data, noexit => 1, nomime => 1);
    my $spam = Mail::SpamAssassin->new;
    my $status = $spam->check($item);
    if ($status->is_spam) {
        print "Spam..\n";
        $status->rewrite_mail;
    }
    $item->accept("/var/mail/<username>");
    $status->finish;
}
 
cartoon

Note that the user accounts are set up to read from "localhost" on port 11110. That's actually going to be reading the remote machine by way of an ssh tunnel. We start up a new tunnel with every account and sleep 20 seconds before starting a new tunnel just to make sure that the last tunnel is done. The trick to that is that the ssh tunnel incudes a "sleep 20" that it runs as a command. When that command finishes, the tunnel will be torn down, but only if it isn't still in use. So we get all the time we need to get our messages as long as we START reading within 20 seconds.

I run this in a script that does this:

#!/bin/bash
TUN=`lsof -i:11110 -Fp | head -1| sed s/p//`
if [ "$TUN" ]
then
lsof -i:11110
kill $TUN
fi
ping -c5 pcunix.com
sleep 30
while true
do
echo "Connecting.."
~/bin/popread
echo "Sleeping..`date`"
sleep 310
done
 
cartoon

So every 310 seconds this goes out and puts new mail into /var/mail/apl (I'm logged in as "apl"). I chose 310 seconds because Mail.app will be checking every 5 minutes and I don't want these two to accidentally get locked into the same exact time span. Concurrent access is not the problem, but if these were running at exactly the same time my access to new mail could be delayed.

The other stuff just kills off any hung port 11110 processes.

I will need to have run ssh-agent before starting this script unless I want to be bothered for passwords every time it runs. See SSH Basics.

Imap and Pop

For this, I used the article at http://www.stepwise.com/Articles/Workbench/eart.2.0.html. Ignore all the Perl scripts; the ONLY change to the source you need the make is to find the line that defines where your mailbox is. You can ignore all the rest of his configuration if you are using Mail.app as I've done here. In src/osdep/unix/env_unix.c, I set

static char *myHomeDir = "/var/mail/apl";       /* home directory name */
 

(If you need a quick intro to vi, see Vi Primer)

The compilation was completely smooth, and the rest of the instructions (adding the services to Netinfo and inetd.conf) are fine. Get that running and confirm it's all good by following the instructions on testing a pop or imap server at: How do I test IMAP? and How do I test POP3?

Netinfo is gone as of Leopard (October 2007). Good riddance.

Mail.app

With that all working, it's time to edit accounts in Mail's preferences. Just tell it that the host is localhost, it's pop or imap (whichever you installed - I used pop), and it's your user name and login password. Add a filter, and use the "Edit Header List" in the drop-down to add X-Spam-Flag as a header that you can write rules for.

Note that there's no reason to turn off Mail's own junk filtering. The two of these can work together quite happily.

cartoon

Training Day

Of course, you'll want to train SpamAssassin. I started that (after deleting all junk) by running this:

find /Users/apl/library/mail/mailboxes -name mbox -exec sa-learn --ham --mbox {} \;
 

The "--ham" tells sa-learn that these are messages I like. A "--spam" tells it the opposite, so after I have made sure I have no false positives in my "junk" box, I run:

sa-learn --spam --mbox /Users/apl/library/mail/mailboxes/action/junk.mbox/mbox
 

(You have to at least LOOK at these messages first, or they will be in Incoming_Mail rather than mbox)

I particularly like that Spamassassin shows why it calls something spam:

This mail is probably spam.  The original message has been attached
along with this report, so you can recognize or block similar unwanted
mail in future.  See http://spamassassin.org/tag/ for more details.

Content preview:  *NEW-Special Package Deal!* Norton SystemWorks 2003
  Software Suite -Professional Edition- ATTN: This is a MUST for ALL
    Computer Users!!! [...] 

    Content analysis details:   (10.30 points, 5 required)
    ONLY_COST          (0.2 points)  BODY: Only $$$
    OFFERS_ETC         (0.6 points)  BODY: Stop with the offers, coupons, discounts etc!
    FOR_JUST_SOME_AMT  (0.2 points)  BODY: Contains 'for only' some amount of cash
    DATE_IN_PAST_03_06 (0.3 points)  Date: is 3 to 6 hours before Received: date
    HABEAS_HIL         (4.0 points)  RBL: Sender is on www.habeas.com Habeas Infringer List
       [RBL check: found 220.232.178.218.hil.habeas.com., type: 127.1.0.6]
       RCVD_IN_SBL        (0.6 points)  RBL: Received via SBLed relay, see http://www.spamhaus.org/sbl/
  [RBL check: found 220.232.178.218.sbl.spamhaus.org.]
  FORGED_MUA_OUTLOOK (3.3 points)  Forged mail pretending to be from MS Outlook
  FROM_HAS_UNDERLINE_NUMS (0.6 points)  From: contains an underline and numbers/letters
  MISSING_MIMEOLE    (0.5 points)  Message has X-MSMail-Priority, but no X-MimeOLE

  The original message did not contain plain text, and may be unsafe to
  open with some email clients; in particular, it may contain a virus,
  or confirm that your address can receive spam.  If you wish to view
  it, it may be safer to save it to a file and open it with an editor.

Notice the point scoring: that's what makes Spamassassin so good at trapping spam. It doesn't just use all or nothing. For example, a message gets points for being from a domain know to harbor spammers, but that by itself isn't enough. That stops false spam tagging of innocent emails from people who unluckily live in the same ip block as known spam producers.

That's it. A fair amount of work, but with both SpamAssassin and Mail.apps spam filters tackling my mail, the job is a little easier. I found that Spamassassin made an immediate difference. In spite of being trained for months, Mail.app still misses a lot of spam that Spamassassin identified. I hope that lasts.

See also POPFile

(A packaged mailserver that has Spamassassin preconfigured is http://www.kerio.com/kms_home.html.


If this page was useful to you, please help others find it:  





2 comments




More Articles by - Find me on Google+



Click here to add your comments
- no registration needed!

(Sorry: I don't think I was clear on what I asked earlier)

The Stupidfool article explains the advantage of using Popread over Fetchmail.

But can I configure Popread do the same things as Fetchmail? I UNDERSTAND THAT IT DOWNLOADS POP MAIL. I mean, can I configure OPTIONS analagous to Fetchmail's --keep --erase etc.

For example, I'd like to try this solution by leaving email on my server until I'm sure I have everything working. That's not a problem in Fetchmail, but how do I accomplish that in Popread?

I can't find very much documentation for Popread, and my Perl skills are too weak to infer it from the sample script or modules.

--

There are no "options", but you can do whatever you like. Unfortunately, you do need some understanding of Perl. You can choose whether or not to delete or leave messages, but that is scripting YOU do, not options. It's not particularly hard; documentation for the mail::pop3client is at various places on the net, including
http://search.cpan.org/~sdowd/Mail-POP3Client-2.14/POP3Client.pm

Specifically, the above script deletes messages from the server with the line:

$pop->Delete($num);

If that line were not there, the messages would be left.

--TonyLawrence



That was helpful. Still, doing more than that requires more Perl knowledge than I've got (never got to the chapter in Camel (or is it Llama?) about pointers and modules).

Anyway, my mail is being downloaded by popread and served by IMAP beautifully, but as far as I can tell, the spamassassin filter isn't working. There's nothing in my mails' headers to suggest that spamassassin touched them. Not sure if it's a problem in the user_templates or what. Darn it.






Fri Jul 8 00:33:23 2005: 761   anonymous


Im getting this:

Can't locate auto/Mail/Audit/MailInternet/extract_mes.al in @INC (@INC contains: lib /sw/lib/perl5/5.8.6/darwin-thread-multi-2level /sw/lib/perl5/5.8.6 /sw/lib/perl5 /sw/lib/perl5/darwin /System/Library/Perl/5.8.6/darwin-thread-multi-2level /System/Library/Perl/5.8.6 /Library/Perl/5.8.6/darwin-thread-multi-2level /Library/Perl/5.8.6 /Library/Perl /Network/Library/Perl/5.8.6/darwin-thread-multi-2level /Network/Library/Perl/5.8.6 /Network/Library/Perl /System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level /System/Library/Perl/Extras/5.8.6 /Library/Perl/5.8.1/darwin-thread-multi-2level /Library/Perl/5.8.1 .) at /Library/Perl/5.8.6/Mail/SpamAssassin/PerMsgStatus.pm line 1261






Fri Jul 8 10:13:34 2005: 762   TonyLawrence

gravatar
That tells you that xtract_mes didn't get installed properly.

Sometimes things change. This article was written a while ago; spamassassin is surely very different now and may require more work than is noted here.





Don't miss responses! Subscribe to Comments by RSS or by Email

Click here to add your comments


If you want a picture to show with your comment, go get a Gravatar

Kerio Connect Mailserver

Kerio Samepage

Kerio Control Firewall

Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

Jump to Comments



Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.

I am a Kerio reseller. Articles here related to Kerio products reflect my honest opinion, but I do have an obvious interest in selling those products also.

Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.

We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.

pavatar.jpg

This post tagged:

       - Code
       - MacOSX
       - Mail
       - Malware
       - Networking
       - Programming
       - Reviews
       - Security















My Troubleshooting E-Book will show you how to solve tough problems on Linux and Unix systems!


book graphic unix and linux troubleshooting guide



Buy Kerio from a dealer
who knows tech:
I sell and support

Kerio Connect Mail server, Control, Workspace and Operator licenses and subscription renewals



Click and enter your name and phone number to call me about Kerio® products right now (Flash required)