Sample code for displaying a MIME email message on a web pag

jballanc · August 4, 2007, 5:09am

Can anyone point me to a good source of sample code for taking a raw
email message (multipart, with attachments) and displaying it on a web
page? Also same thing for composing a new message with attachments?

~Josh

jballanc · August 4, 2007, 12:38pm

Josh, I found the following very helpful:

http://www.ruby-doc.org/stdlib/libdoc/net/pop/rdoc/classes/Net/POP3.html

Cheers, --Kip

jballanc · August 4, 2007, 7:06pm

Thanks for the pointer! That looks useful, but doesn’t address what
I’m looking for. I want to be able to read in an mbox file, parse off
a message, and display it on a web page similar to how you would view
it in Hotmail or Gmail. That means decoding the MIME but also
“sanitizing” the HTML.

~Josh

jballanc · August 5, 2007, 2:39am

Josh, I have done a very simple wrapper around this message parsing
bit. This will let you get at the parts of a mime multipart message.
I guess you could use sanitize() strip_tags() and strip_links() to
help make the html more displayable. Note I just put this together
for a proof-of-concept for emailing articles and images to a content
system I’m writing so i sure wouldn’t cut and paste!

require ‘hermes_messages.rb’
my_messages.HermesMessages.new
my_messages.get_messages(pop server parameters)
my_messages.message.each do |m|
# Each message has many parts, but we simplifiy with some helper
methods
puts m.subject
puts m.from
puts m.body # Note the ‘body’ method retrieves the text/
plain part for my needs - modify as required to get the html part
end

Hope this helps a little more,

–Kip

class HermesMessages
require ‘net/pop’
require ‘action_mailer’

EMPTY_PART = " \r\n\r\n"
TEXT_PLAIN = ‘text/plain’
MULTIPART_ALTERNATIVE = ‘multipart/alternative’

attr_reader :messages

def get_messages(host, user, password, port, *args)
options = args.last.is_a?(Hash) ? args.pop : {}
@messages = []
begin
Net::POP3.start(host, port, user, password) do |pop|
if pop.mails.empty?
puts ‘No mail.’
else
puts “#{pop.mails.size} messages will be processed.”
pop.each_mail do |m|
@messages << IncomingImageHandler.receive(m.pop)
m.delete if options[:delete] == true
end
end
puts “#{pop.mails.size} mails popped.”
return pop.mails.size
end
rescue
puts “Exception detected - could not retrieve email”
return -1
end
end

class Message
attr_accessor :subject, :from
attr_reader :parts

def initialize
  @parts = []
end

def add_part(p)
  @parts << p
end

def has_jpeg?
  parts.each {|p| return true if p.sub_type == "jpeg"}
  return false
end

def has_body?
  parts.each {|p| return true if p.main_type == "text" &&

p.sub_type == “plain”}
return false
end

def body
  parts.each {|p| return p.body if p.main_type == "text" &&

p.sub_type == “plain”}
return “”
end

def jpeg
  parts.each {|p| return p.body if p.main_type == "image" &&

p.sub_type == “jpeg”}
return “”
end

def image_filename
  parts.each {|p| return p.filename if p.main_type == "image" &&

p.sub_type == “jpeg”}
return “”
end

end

class Content

attr_accessor :main_type, :sub_type, :body, :filename, :file_extension,
:index
end

class IncomingImageHandler < ActionMailer::Base
# email is a TMail::Mail
def receive(email)
@message = Message.new
@message.from = email.from[0]
@message.subject = email.subject
puts “Processing message: #{@message.subject}”
process(email)
end

protected

def process(part)
  if part.multipart? then
    part.parts.each {|p| process(p) } unless
      part.content_type == MULTIPART_ALTERNATIVE &&
      plain_part_is_empty(part)
  else
    process_part(part)
  end
  return @message
end

def process_part(part)
  puts "   part: #{part.content_type} with size

#{part.body.length}"
pp " content is: ‘#{part.body}’" if part.body.length <= 10
content = Content.new
content.main_type = part.main_type
content.sub_type = part.sub_type
content.body = part.body
content.filename = part_filename(part)
content.file_extension = ext(part)
@message.add_part(content)
end

private

def plain_part_is_empty(part)
  part.parts.each do |p|
    return true if p.content_type == TEXT_PLAIN && p.body ==

EMPTY_PART
end
return false
end

def part_filename(part)
  # get filename
  if part['content-location'] != nil && part['content-

location’].body.length != 0
filename = part[‘content-location’].body
elsif part.type_param(‘name’) != nil &&
part.type_param(‘name’).length != 0
filename = part.type_param(‘name’)
elsif part.disposition_param(‘filename’) != nil &&
part.disposition_param(‘filename’).length != 0
filename = part.disposition_param(‘filename’)
else
filename = nil
end
end

CTYPE_TO_EXT = {
  'image/jpeg' => 'jpg',
  'image/gif'  => 'gif',
  'image/png'  => 'png',
  'image/tiff' => 'tif'
}

def ext( mail )
  CTYPE_TO_EXT[mail.content_type] || 'txt'
end

end #class
end

jballanc · August 5, 2007, 3:08am

What’s the best place to find documentation on sanitize, strip_links,
etc. and find other functions like this? I’m looking through Ruby Core
and Ruby Standard Libs and don’t see it.

~Josh

jballanc · August 5, 2007, 3:01am

Thanks Kip!

I just found sanitize() and that seems to be a big part of what I’m
looking for. I’ll check out strip_links and strip_tags also!

~Josh

jballanc · August 5, 2007, 3:16am

Josh, here:
http://rails.rubyonrails.org/classes/ActionView/Helpers/TextHelper.html

Cheers, --Kip

jballanc · August 5, 2007, 3:55am

That was exactly what I was looking for. Thanks!

~Josh

jballanc · August 5, 2007, 4:08am

Sorry one last question on TextHelper… I can call sanitize from
within a view (.rhtml file) no problem, but when I try to use it
within a controller or model it complains that the method is unknown.
What do I need to require/include to reference sanitize from within a
controller or model?

~Josh

jballanc · August 17, 2007, 11:20am

Joshua B. wrote:

Sorry one last question on TextHelper… I can call sanitize from
within a view (.rhtml file) no problem, but when I try to use it
within a controller or model it complains that the method is unknown.
What do I need to require/include to reference sanitize from within a
controller or model?

~Josh

Just copy this on your application.rb

def strip_tags(html)
return html if html.blank?
if html.index("<")
text = “”
tokenizer = HTML::Tokenizer.new(html)

  while token = tokenizer.next
    node = HTML::Node.parse(nil, 0, 0, token, false)
    # result is only the content of any Text nodes
    text << node.to_s if node.class == HTML::Text
  end
  # strip any comments, and if they have a newline at the end (ie.

line with
# only a comment) strip that too
text.gsub(/[\n]?/m, “”)
else
html # already plain text
end

jballanc · August 17, 2007, 11:23am

Olivier Dirrenberger wrote:

Joshua B. wrote:

Sorry one last question on TextHelper… I can call sanitize from
within a view (.rhtml file) no problem, but when I try to use it
within a controller or model it complains that the method is unknown.
What do I need to require/include to reference sanitize from within a
controller or model?

~Josh

Just copy this on your application.rb

def strip_tags(html)
return html if html.blank?
if html.index("<")
text = “”
tokenizer = HTML::Tokenizer.new(html)
  while token = tokenizer.next
    node = HTML::Node.parse(nil, 0, 0, token, false)
    # result is only the content of any Text nodes
    text << node.to_s if node.class == HTML::Text
  end
  # strip any comments, and if they have a newline at the end (ie. 
line with
# only a comment) strip that too
text.gsub(/[\n]?/m, “”)
else
html # already plain text
end
Oups, if you want the sanitize this is this sample code :

def sanitize(html)
# only do this if absolutely necessary
if html.index("<")
tokenizer = HTML::Tokenizer.new(html)
new_text = “”

  while token = tokenizer.next
    node = HTML::Node.parse(nil, 0, 0, token, false)
    new_text << case node
      when HTML::Tag
        if VERBOTEN_TAGS.include?(node.name)
          node.to_s.gsub(/</, "&lt;")
        else
          if node.closing != :close
            node.attributes.delete_if { |attr,v| attr =~

VERBOTEN_ATTRS }
%w(href src).each do |attr|
node.attributes.delete attr if node.attributes[attr]
=~ /^javascript:/i
end
end
node.to_s
end
else
node.to_s.gsub(/</, “<”)
end
end

  html = new_text
end

html

end