Improving SpamAssassin
John Gruber was recently discussing his experience with SpamAssassin. The best part about SpamAssassin is that you can tweak it to significantly improve accuracy. I've seen perhaps one or two spams a week making it through my filters out of perhaps 50+ per day and those weren't pure spam - unrequested mailings from companies I've done business with rather than pure unsolicited crap, resulting in prompt additions to my blacklist.
There are three important things you can do to improve SpamAssassin's already excellent filtering:
Use sa-learn to give the Bayesian filter examples of the spam and ham (non-spam) messages you receive. I have a simple script I run every so often to update the filter after I've verified that I don't have any false negatives or positives:
echo Learning from INBOXnice -20 sa-learn --ham --dir Library/Mail/HOME/INBOX.imapmbox/CachedMessages/ nice -20 sa-learn --ham --dir Library/Mail/HOME/INBOX/Archive/*.imapmbox/CachedMessages/nice -20 sa-learn --ham --dir Library/Mail/WORK/INBOX.imapmbox/CachedMessages/ echo Learning from Spamnice -20 sa-learn --spam --dir Library/Mail/HOME/INBOX/Spam.imapmbox/CachedMessages/nice -20 sa-learn --spam --dir Library/Mail/WORK/INBOX/Spam.imapmbox/CachedMessages/The nice thing about sa-learn is that it keeps track of the messages it's seen before so there's no accuracy or significant performance hit from rescanning your mailboxes repeatedly.
Restrict messages to the languages and character sets you actually use by adding something like this to your .spamassassin/user_prefs:
ok_locales enok_languages enUse more DNS blacklists in your .spamassassin/user_prefs:
header RCVD_IN_RFC_PM eval:check_rbl('relay', 'postmaster.rfc-ignorant.org.')describe RCVD_IN_RFC_PM Received via a relay in postmaster.rfc-ignorant.orgscore RCVD_IN_RFC_PM 2.0header X_CHINESE_RELAY eval:check_rbl('relay', 'cn.rbl.cluecentral.net.')describe X_CHINESE_RELAY Received via a relay in Chinascore X_CHINESE_RELAY 1.5header X_KOREAN_RELAY eval:check_rbl('relay', 'korea.services.net.')describe X_KOREAN_RELAY Received via a relay in Koreascore X_KOREAN_RELAY 1.5header X_MONKEY_FORMMAIL eval:check_rbl('relay', 'formmail.relays.monkeys.com.')describe X_MONKEY_FORMMAIL Received via relay in monkeys.com's open formmail scripts listscore X_MONKEY_FORMMAIL 1.5header X_MONKEY_PROXY eval:check_rbl('relay', 'proxies.relays.monkeys.com.')describe X_MONKEY_PROXY Received via relay in monkeys.com's open proxy listscore X_MONKEY_PROXY 1.5header X_MONKEY_PROXY eval:check_rbl('relay','spamhaus.relays.osirusoft.com.')describe X_MONKEY_PROXY Received via relay in Spamhaus Blacklistscore X_MONKEY_PROXY 1.5 # Not Just Another BlackList tests from http://njabl.org/use.htmlheader IN_NJABL_ORG rbleval:check_rbl('njabl','dnsbl.njabl.org.') describe IN_NJABL_ORG Received via a relay in dnsbl.njabl.org tflags IN_NJABL_ORG netheader NJABL_OPEN_RELAY rbleval:check_rbl_results_for('njabl', '127.0.0.2')describe NJABL_OPEN_RELAY DNSBL: sender is Confirmed Open Relay tflags NJABL_OPEN_RELAY netheader NJABL_DUL rbleval:check_rbl_results_for('njabl', '127.0.0.3')describe NJABL_DUL DNSBL: sender ip address in in a dialup block tflags NJABL_DUL netheader NJABL_SPAM_SRC rbleval:check_rbl_results_for('njabl', '127.0.0.4')describe NJABL_SPAM_SRC DNSBL: sender is Confirmed Spam Source tflags NJABL_SPAM_SRC netheader NJABL_MULTI_STAGE rbleval:check_rbl_results_for('njabl', '127.0.0.5')describe NJABL_MULTI_STAGE DNSBL: sent through multi-stage open relay tflags NJABL_MULTI_STAGE netheader NJABL_CGI rbleval:check_rbl_results_for('njabl', '127.0.0.8')describe NJABL_CGI DNSBL: sender is an open formmail tflags NJABL_CGI netheader NJABL_PROXY rbleval:check_rbl_results_for('njabl', '127.0.0.9')describe NJABL_PROXY DNSBL: sender is an open proxy tflags NJABL_PROXY netscore IN_NJABL_ORG 0.38score NJABL_DUL 0.62score NJABL_MULTI_STAGE 0.75score NJABL_PROXY 3.00score NJABL_OPEN_RELAY 3.00score NJABL_CGI 1.50score NJABL_SPAM_SRC 3.00


Sorry, comments are disabled for this post.