May 25

Improving SpamAssassin

John Gruber was recently discussing his experience with SpamAssassin. The best part about SpamAssassin is that you can tweak it to significantly improve accuracy. I've seen perhaps one or two spams a week making it through my filters out of perhaps 50+ per day and those weren't pure spam - unrequested mailings from companies I've done business with rather than pure unsolicited crap, resulting in prompt additions to my blacklist.

There are three important things you can do to improve SpamAssassin's already excellent filtering:

  1. Use sa-learn to give the Bayesian filter examples of the spam and ham (non-spam) messages you receive. I have a simple script I run every so often to update the filter after I've verified that I don't have any false negatives or positives:

    echo Learning from INBOX
    nice -20 sa-learn --ham --dir Library/Mail/HOME/INBOX.imapmbox/CachedMessages/ nice -20 sa-learn --ham --dir Library/Mail/HOME/INBOX/Archive/*.imapmbox/CachedMessages/ 
    nice -20 sa-learn --ham --dir Library/Mail/WORK/INBOX.imapmbox/CachedMessages/ echo Learning from Spam 
    nice -20 sa-learn --spam --dir Library/Mail/HOME/INBOX/Spam.imapmbox/CachedMessages/
    nice -20 sa-learn --spam --dir Library/Mail/WORK/INBOX/Spam.imapmbox/CachedMessages/

    The nice thing about sa-learn is that it keeps track of the messages it's seen before so there's no accuracy or significant performance hit from rescanning your mailboxes repeatedly.

  2. Restrict messages to the languages and character sets you actually use by adding something like this to your .spamassassin/user_prefs:

    ok_locales en
    ok_languages en
  3. Use more DNS blacklists in your .spamassassin/user_prefs:

    header RCVD_IN_RFC_PM eval:check_rbl('relay', 'postmaster.rfc-ignorant.org.') 
    describe RCVD_IN_RFC_PM Received via a relay in postmaster.rfc-ignorant.org 
    score RCVD_IN_RFC_PM 2.0 

    header X_CHINESE_RELAY eval:check_rbl('relay', 'cn.rbl.cluecentral.net.') 
    describe X_CHINESE_RELAY Received via a relay in China
    score X_CHINESE_RELAY 1.5

    header X_KOREAN_RELAY eval:check_rbl('relay', 'korea.services.net.') 
    describe X_KOREAN_RELAY Received via a relay in Korea
    score X_KOREAN_RELAY 1.5 

    header X_MONKEY_FORMMAIL eval:check_rbl('relay', 'formmail.relays.monkeys.com.')

    describe X_MONKEY_FORMMAIL Received via relay in monkeys.com's open formmail scripts list
    score X_MONKEY_FORMMAIL 1.5 
    header X_MONKEY_PROXY eval:check_rbl('relay', 'proxies.relays.monkeys.com.')

    describe X_MONKEY_PROXY Received via relay in monkeys.com's open proxy list
    score X_MONKEY_PROXY 1.5 

    header X_MONKEY_PROXY eval:check_rbl('relay','spamhaus.relays.osirusoft.com.')
    describe X_MONKEY_PROXY Received via relay in Spamhaus Blacklist
    score X_MONKEY_PROXY 1.5 # Not Just Another BlackList tests from http://njabl.org/use.html

    header IN_NJABL_ORG rbleval:check_rbl('njabl','dnsbl.njabl.org.') describe IN_NJABL_ORG Received via a relay in dnsbl.njabl.org tflags IN_NJABL_ORG net

    header NJABL_OPEN_RELAY rbleval:check_rbl_results_for('njabl', '127.0.0.2')
    describe NJABL_OPEN_RELAY DNSBL: sender is Confirmed Open Relay tflags NJABL_OPEN_RELAY net

    header NJABL_DUL rbleval:check_rbl_results_for('njabl', '127.0.0.3') 
    describe NJABL_DUL DNSBL: sender ip address in in a dialup block tflags NJABL_DUL net

    header NJABL_SPAM_SRC rbleval:check_rbl_results_for('njabl', '127.0.0.4')
    describe NJABL_SPAM_SRC DNSBL: sender is Confirmed Spam Source tflags NJABL_SPAM_SRC net

    header NJABL_MULTI_STAGE rbleval:check_rbl_results_for('njabl', '127.0.0.5')
    describe NJABL_MULTI_STAGE DNSBL: sent through multi-stage open relay tflags NJABL_MULTI_STAGE net

    header NJABL_CGI rbleval:check_rbl_results_for('njabl', '127.0.0.8') 
    describe NJABL_CGI DNSBL: sender is an open formmail tflags NJABL_CGI net

    header NJABL_PROXY rbleval:check_rbl_results_for('njabl', '127.0.0.9') 
    describe NJABL_PROXY DNSBL: sender is an open proxy tflags NJABL_PROXY net

    score IN_NJABL_ORG 0.38
    score NJABL_DUL 0.62
    score NJABL_MULTI_STAGE 0.75
    score NJABL_PROXY 3.00
    score NJABL_OPEN_RELAY 3.00
    score NJABL_CGI 1.50
    score NJABL_SPAM_SRC 3.00

Sorry, comments are disabled for this post.