Best way to parse email recipient lists?

Hey all,

So… I’m trying to parse email recipient lists (entered by hand into
the “to”, “cc” and “bcc” fields of a mail app by users).

These can obviously come in a wild variety of formats, and I’d like to
support as many as possible.

The other gotcha - is that I’d like to keep as much name metadata
available as possible.

Using TMail’s parser - I was under the impression that the name portion
in the “to”, “cc”, “bcc” fields gets stripped, down to an array of email
addresses. (i.e. otherwise we could use just TMail - please let me know
if this is incorrect or if there’s a work around)

Here are a few example scenarios (from relatively easy to a little
harder):
[email protected]
[email protected]
[email protected]
“Bob S.” [email protected]
Bob S. [email protected]
“Jones, Craig” [email protected]
“Summer Thomas” [email protected]; “Al Franken” [email protected]
“Clinton, Bill” [email protected]; “Obama, Barack”
[email protected]; “Jenny McCarthy” [email protected]
Bob [email protected], [email protected], James Blunt
[email protected]

etc…

Any ideas?

I’ve been working up RegEx’s like crazy but my RegEx foo isn’t quite
what it used to be. Are there any shortcuts, or do I need one big RegEx
many specific ones to match the various scenarios?

We’re currently using this RegEx to detect when we have a single
properly formatted address (w/o a name attached):
http://tfletcher.com/lib/rfc822.rb
…but that’s only one small portion of the problem.

  • Shanti

Shanti,

Try:

/(\W?([\w\s]+)\W+)?(\w[\w+-.]+@[\w-.]+)\W?/i
(with “+” signs in mailbox (like [email protected]), which are
invalid)

/(\W?([\w\s]+)\W+)?(\w[\w\-.]+@[\w-.]+)\W?/i
(without “+” signs)

These should break the addresses down into arrays of matches that you
can parse into:
display name
mailbox
domain

Let me know if this doesn’t pass the tests. Better yet, send me a unit
test and i’ll make it work. :slight_smile:

also: Ruby | zenspider.com | by ryan davis

Michael Fleet
Disinnovate
http://www.disinnovate.com/

On Fri, 10 Nov 2006, Shanti B. wrote:

“Bob S.” [email protected]
Bob S. [email protected]
“Jones, Craig” [email protected]
“Summer Thomas” [email protected]; “Al Franken” [email protected]
“Clinton, Bill” [email protected]; “Obama, Barack”
[email protected]; “Jenny McCarthy” [email protected]
Bob [email protected], [email protected], James Blunt
[email protected]

 harp:~ > cat a.rb
 require 'tmail'
 require 'yaml'

 tmail = TMail::Mail::parse <<-msg
 From [email protected] Thu Nov  9 08:55:15 2006
 Date: Fri, 10 Nov 2006 00:52:17 +0900
 From: Shanti B. <[email protected]>
 Reply-To: [email protected]
 To: ruby-talk ML <[email protected]>
 Newsgroups: comp.lang.ruby
 Subject: Best way to parse email recipient lists?

 Hey all,

 So... I'm trying to parse email recipient lists (entered by hand 

into
the “to”, “cc” and “bcc” fields of a mail app by users).
msg

 %w( to from cc bcc ).each do |field|
   list = tmail.send("#{ field }_addrs") || []
   phrases = list.map{|a| a.phrase}

   y field => phrases.zip(list.map{|a| a.to_s})
 end



 harp:~ > ruby a.rb
 to:
 - - ruby-talk ML
   - ruby-talk ML <[email protected]>
 from:
 - - Shanti B.
   - Shanti B. <[email protected]>
 cc: []

 bcc: []

-a