Script email filtering with Ruby
Sterling Camden
I’ve used all sorts of email filters since my very first internet email account in the early 90s – and none of them have been quite right. I’d like to be able to block anything about Viagra, but not when a friend or family member uses the word. Pure Bayesian filters always seem to block something from someone I know, while letting a few of the real spam messages through. But whitelists and blacklists suffer from a “which rule comes first” problem.
I recently moved to FreeBSD as my primary workstation OS, and I’m now reading my email with mutt, after delivery by getmail. Getmail has a pretty easy configuration for inline filters, so I decided to create a rules engine for filtering messages the way I want to. I decided to write it in Ruby, which naturally led to the creation of a simple EDSL in Ruby for manipulating email content and approving or rejecting an incoming message. Since it’s intended for use with getmail, I decided to call it “getlessmail”.
By connecting the getlessmail.rb script (which you can download below) into getmail as an external filter, you can write a user-specific script in Ruby to specify your filtering rules, like so:
keep if from “mybestfriend@example.com”
spam if from “@example.com”
spam if subject “viagra|cialis”
spam if body “(?m:\bnude\b.*\bpics\b)”
With this ordering, mybestfriend@example.com is automatically approved, while anybody else from that domain is considered spam. Likewise, mybestfriend can use viagra or cialis in the subject line, or “nude” followed by “pics” in the body, and it will still be approved – but not if from anyone else.
As you can see, the patterns are regular expression fragments. These get sewn into larger expressions that isolate their intended context. By default, they’re treated as case-insensitive and not multi-line – but you can turn any options on or off using the contextual options grouping supported by Ruby regexen (as I have with “(?m:)” in the last example entry above). Patterns are always automatically parenthesized to avoid issues with operator precedence, so don’t add enclosing parentheses of your own unless you need them for other reasons.
But there’s more. I’ve included methods for moving messages to folders automatically, and for manipulating message headers. The folder operations assume that your mailboxes are stored as mbox files, so don’t use them if you’re using maildir format instead.
But that’s not all. Since your rules script is interpreted as Ruby code, you can go crazy. Log events, change the contents of the message, translate attachments, write your own Bayesian filter, or anything else you can do with Ruby.
I’ll probably extend the core functions at some point to deal more easily with multi-part messages. My number one beef is the ms-tnef MIME format, which merely wraps attachments in a Microsoft-specific container. There’s a tnef utility for unpacking that, so I should be able to strip out attachments in that format, pipe them to tnef, and then sew the resulting files back together in regular MIME multi-part.
See the README file for full documentation. The download below is in tar.bz2 format, since it’s really only useful on Unix or Linux, where most tar implementations should be able to read it as is.
Posted in Ruby, Unix, Wildly popular |
4 Comments » RSS 2.0 | Sphere it!




