Chip's Tips for Developers

Contains coding, but not narcotic.

Script email filtering with Ruby

April 22nd, 2010 5:32:49 pm pst by Sterling Camden

image I’ve used all sorts of email filters since my very first internet email account in the early 90s – and none of them have been quite right.  I’d like to be able to block anything about Viagra, but not when a friend or family member uses the word.  Pure Bayesian filters always seem to block something from someone I know, while letting a few of the real spam messages through.  But whitelists and blacklists suffer from a “which rule comes first” problem.

I recently moved to FreeBSD as my primary workstation OS, and I’m now reading my email with mutt, after delivery by getmail.  Getmail has a pretty easy configuration for inline filters, so I decided to create a rules engine for filtering messages the way I want to.  I decided to write it in Ruby, which naturally led to the creation of a simple EDSL in Ruby for manipulating email content and approving or rejecting an incoming message.  Since it’s intended for use with getmail, I decided to call it “getlessmail”.

By connecting the getlessmail.rb script (which you can download below) into getmail as an external filter, you can write a user-specific script in Ruby to specify your filtering rules, like so:

keep if from “mybestfriend@example.com
spam if from “@example.com
spam if subject “viagra|cialis”
spam if body “(?m:\bnude\b.*\bpics\b)”

With this ordering, mybestfriend@example.com is automatically approved, while anybody else from that domain is considered spam.  Likewise, mybestfriend can use viagra or cialis in the subject line, or “nude” followed by “pics” in the body, and it will still be approved – but not if from anyone else.

As you can see, the patterns are regular expression fragments.  These get sewn into larger expressions that isolate their intended context.  By default, they’re treated as case-insensitive and not multi-line – but you can turn any options on or off using the contextual options grouping supported by Ruby regexen (as I have with “(?m:)” in the last example entry above).  Patterns are always automatically parenthesized to avoid issues with operator precedence, so don’t add enclosing parentheses of your own unless you need them for other reasons.

But there’s more.  I’ve included methods for moving messages to folders automatically, and for manipulating message headers.  The folder operations assume that your mailboxes are stored as mbox files, so don’t use them if you’re using maildir format instead.

But that’s not all.  Since your rules script is interpreted as Ruby code, you can go crazy.  Log events, change the contents of the message, translate attachments, write your own Bayesian filter, or anything else you can do with Ruby.

I’ll probably extend the core functions at some point to deal more easily with multi-part messages.  My number one beef is the ms-tnef MIME format, which merely wraps attachments in a Microsoft-specific container.  There’s a tnef utility for unpacking that, so I should be able to strip out attachments in that format, pipe them to tnef, and then sew the resulting files back together in regular MIME multi-part.

See the README file for full documentation.  The download below is in tar.bz2 format, since it’s really only useful on Unix or Linux, where most tar implementations should be able to read it as is.

download

Posted in Ruby, Unix, Wildly popular | 4 Comments » RSS 2.0 | Sphere it!

Subscribe to comment edits plugin for WordPress

June 23rd, 2008 4:20:36 pm pst by Sterling Camden

For blog surfers who still rely on their email Inbox, the Subscribe to Comments plugin (originally by scriptygoddess, updated by Mark Jaquith) is truly a goddesssend.  Just by ticking a box, commenters can be notified by email whenever additional comments are added to a post.  But, it doesn’t notify them when comments are edited.

OK, you’re thinking “Why do I care to know when user ‘teratroll’ corrects his/her misspelling of ‘turd-for-brains’?”  Ah, but that’s not all the “edit comment” functionality can be used for.

The ever so conversational Teeni uses the “edit comment” facility to respond to comments.  She adds her response in bold right underneath the original.  While I prefer to use Brian’s Threaded Comments and take my place among all the other respondents, Teeni’s technique does feel more intimate.  The only problem with it, from a commenter perspective, is that the “Notify me of followup comments via e-mail” checkbox would notify me of all the comments except those from Teeni herself.  That is, until now.

I created the “Subscribe to Comment Edits” plugin (which you can download below) to piggyback on the “Subscribe to Comments” plugin.  It adds notification to all subscribers whenever a comment on a subscribed post is edited.  It does nothing if “Subscribe to Comments” is not activated.

Even though PHP is mostly a baling-wire and bubblegum excuse for a programming language, it does support some dynamic programming features – a couple of which can be seen in the few lines of code that comprise this plugin.  For instance, function_exists is used to determine whether the “Subscribe to Comments” plugin has been activated.  And create_function does just what its name implies — it creates an anonymous function to pass as a parameter.  That’s nice, because the anonymous function doesn’t introduce any possible name collisions — but on the other hand, if the original function had been named I wouldn’t have to duplicate it here.

I would have also made the ‘init’ function anonymous, but the escaping of quotes gets ridiculous.  That’s one case where less dynamic, more functional would be preferable.  In languages like Javascript and Ruby (and of course Lisp), you can write functions in-line wherever a datum can go, without any quoting.   But in PHP, functions aren’t really first-class objects — they’re always passed by name, and create_function returns a generated name.

Also note that the fact that $sg_subscribe is a global object (yuck) is exactly what makes it possible for me to piggyback onto its functionality without modifying the original plugin (yay).

Posted in PHP, Web, WordPress | 4 Comments » RSS 2.0 | Sphere it!

Better Tag Cloud