Chip's Tips for Developers

Contains coding, but not narcotic.

GetLessMail gets more info

July 5th, 2010 2:43:35 pm pst by Sterling Camden

In one of my curious moods, I began to wonder how difficult it would be to figure out the location of an email sender based on the IP address shown in the “Received” header fields. It turns out to be more difficult than you may have thought, because:

  1. An email often contains multiple “Received” headers, one for each relay point. The innermost (last) is the original sender.
  2. However, the original transmission is often within a local network, so the first one or few IPs may be in the reserved local range.
  3. No free, global, authoritative database exists that contains the location of all IPs. At least, not that I’ve found. However, there are some free databases you can download that are updated from time to time.
  4. The owner of the IP address may not be located at the same place as the connection. In fact, it usually isn’t, but it may be close.

Despite these impediments, I have implemented IP Geolocation for Ruby, and created a method specialized for GetLessMail that uses it.

The two scripts IPGeo.rb and IPGeoMail.rb should be placed somewhere in your Ruby require path. The example database, which I downloaded from http://linuxbox.co.uk/ip-address-whois-database.php, should be placed in /usr/local/share/IPGeo (or you can modify the script to access it wherever you choose). The included dot.getlessmail shows how you could use it to add an “X-IP-Location” header that provides the IP Location data, if found.

As I intimated, you could also use IPGeo.rb outside of the context of email. It would be trivial to write a script that accepts an IP Address and prints out the information. Like so:


require 'IPGeo'
$<.each do |line|
  puts IPGeo.locate IPGeo.get_ip(line)
end

Of course, this information is only as good as your database. The one I've included hasn't been updated since August 2009. You can probably find better databases out there, if you're willing to spend some money on them. I'm not.

You can get the updated tarball using the button below, or scrape it out of the BitBucket.

download

Posted in Ruby, Unix | 4 Comments » RSS 2.0 | Sphere it!

Getlessmail gets more scripts

May 3rd, 2010 3:24:24 pm pst by Sterling Camden

After using my getlessmail filter for getmail for a few days, I began to notice a usability issue.  Whenever I’d identify mail as from a spammer while viewing it in mutt, I’d have to do the following to add the sender to my filter:

  1. Highlight the sender’s address
  2. !vim ~/.getlessmail
  3. Gospam if from '\b (middle-click) \b' ESC :x

Since mutt supports piping messages thorough a filter, and allows you to create macros that bind keys to lengthy key sequences, I decided to write some supporting scripts.  I’ve added these to getlessmail, which you can download below.

By the way, Chad Perrin also created a BitBucket repository for this project, so you can pull it from there if you prefer.

The first script, glmpipe.rb, reads stdin parsing it as an email and looks for the sender’s address.  You can pass it a switch to indicate what you want to do with that address:

  • -k:  keep if from this address
  • -K:  keep if from this domain
  • -s:  spam if from this address
  • -S:  spam if from this domain

glmpipe.rb in turn calls another script (which you can also invoke from a command line):

glmadd.rb address –switch

where address is the email address, and switch is one of the options listed above, or –a to ask you for the option (which also happens if you don’t pass a switch).

So now, I’ve mapped ‘M’ (capital M, not the lowercase which is used for composing new mail) in mutt to launch the command |glmpipe.rb –, which waits for me to enter the desired switch and press Enter.  I can’t use -a\n here, because stdin is being filled by the message.  You can, of course, map individual keys to each function and include the newline if you so desire.  See the README for details.  I like to be asked, because that gives me a chance to back out if I hit capital M by mistake.

download

Posted in Ruby, Unix | 2 Comments » RSS 2.0 | Sphere it!

Script email filtering with Ruby

April 22nd, 2010 5:32:49 pm pst by Sterling Camden

image I’ve used all sorts of email filters since my very first internet email account in the early 90s – and none of them have been quite right.  I’d like to be able to block anything about Viagra, but not when a friend or family member uses the word.  Pure Bayesian filters always seem to block something from someone I know, while letting a few of the real spam messages through.  But whitelists and blacklists suffer from a “which rule comes first” problem.

I recently moved to FreeBSD as my primary workstation OS, and I’m now reading my email with mutt, after delivery by getmail.  Getmail has a pretty easy configuration for inline filters, so I decided to create a rules engine for filtering messages the way I want to.  I decided to write it in Ruby, which naturally led to the creation of a simple EDSL in Ruby for manipulating email content and approving or rejecting an incoming message.  Since it’s intended for use with getmail, I decided to call it “getlessmail”.

By connecting the getlessmail.rb script (which you can download below) into getmail as an external filter, you can write a user-specific script in Ruby to specify your filtering rules, like so:

keep if from “mybestfriend@example.com
spam if from “@example.com
spam if subject “viagra|cialis”
spam if body “(?m:\bnude\b.*\bpics\b)”

With this ordering, mybestfriend@example.com is automatically approved, while anybody else from that domain is considered spam.  Likewise, mybestfriend can use viagra or cialis in the subject line, or “nude” followed by “pics” in the body, and it will still be approved – but not if from anyone else.

As you can see, the patterns are regular expression fragments.  These get sewn into larger expressions that isolate their intended context.  By default, they’re treated as case-insensitive and not multi-line – but you can turn any options on or off using the contextual options grouping supported by Ruby regexen (as I have with “(?m:)” in the last example entry above).  Patterns are always automatically parenthesized to avoid issues with operator precedence, so don’t add enclosing parentheses of your own unless you need them for other reasons.

But there’s more.  I’ve included methods for moving messages to folders automatically, and for manipulating message headers.  The folder operations assume that your mailboxes are stored as mbox files, so don’t use them if you’re using maildir format instead.

But that’s not all.  Since your rules script is interpreted as Ruby code, you can go crazy.  Log events, change the contents of the message, translate attachments, write your own Bayesian filter, or anything else you can do with Ruby.

I’ll probably extend the core functions at some point to deal more easily with multi-part messages.  My number one beef is the ms-tnef MIME format, which merely wraps attachments in a Microsoft-specific container.  There’s a tnef utility for unpacking that, so I should be able to strip out attachments in that format, pipe them to tnef, and then sew the resulting files back together in regular MIME multi-part.

See the README file for full documentation.  The download below is in tar.bz2 format, since it’s really only useful on Unix or Linux, where most tar implementations should be able to read it as is.

download

Posted in Ruby, Unix, Wildly popular | 4 Comments » RSS 2.0 | Sphere it!

Better Tag Cloud