The problem I’m having is removing all html tags and getting past the
header information. Then I want to extract all the information per row
to put into a database.
I have tried Tmail, but can’t seem to extract just the body. Then I
tried Hpricot and wasn’t sure what to use before the .inner_html. So
basically I’m very lost on where to start.
I have tried Tmail, but can’t seem to extract just the body. Then I
tried Hpricot and wasn’t sure what to use before the .inner_html. So
basically I’m very lost on where to start.
Any help is appreciated.
Thanks!
It would help if you posted some code that didn’t work, so people can
have a better idea of what you’re trying to do. Tmail should have been
able to parse that without problem, however, extracting the body is
easy. The box follows the empty line. You could use something like
split, but duping such huge strings could be slow. When you read the
mail, try to read a line at a time until you get the empty line, then
read the rest into a buffer for hpricot.
I have tried Tmail, but can’t seem to extract just the body. Then I
tried Hpricot and wasn’t sure what to use before the .inner_html. So
basically I’m very lost on where to start.
Any help is appreciated.
Thanks!
It would help if you posted some code that didn’t work, so people can
have a better idea of what you’re trying to do. Tmail should have been
able to parse that without problem, however, extracting the body is
easy. The box follows the empty line. You could use something like
split, but duping such huge strings could be slow. When you read the
mail, try to read a line at a time until you get the empty line, then
read the rest into a buffer for hpricot.
Below is the code I am using to try and get the body out of the html
email (copy of email http://pastie.org/265259) .
require ‘rubygems’
require ‘tmail’
email = TMail::Mail.load( ‘emailhtml.eml’ )
puts email[‘body’] # comes back nil
puts email[‘from’]
puts email[‘Delivered-To’]
puts email[‘to’] # comes back nil
puts email[‘subject’]
puts email[‘date’]
puts email[‘X-Originalarrivaltime’]
I got Tmail to extract the body of my email. The solution (very simple
and embarrassing) is below. Now I’m trying to figure out Hpricot, but
examples seem to be fairly thin. If anyone knows of a good tutorial for
beginners, please post. I have been using http://code.whytheluckystiff.net/doc/hpricot/ , but could use something
more basic.