Charset Detection

Is there any existing functions or external gems that can take a generic
string and parse for charset markers, converting that parts of the
string to their appropriate charsets?

For example, theres an email with the headers
Date: Sat, 03 Jul 2010 04:00:29 EDT
From: =?iso-8859-1?B?U0FU?= [email protected]
Subject:
=?iso-8859-1?B?VGhlIE9mZmljaWFsIFNBVCBRdWVzdGlvbiBvZiB0aGUgRGF5?=

How can I convert this to what it should be,
Date: Sat, 03 Jul 2010 04:00:29 EDT
From: SAT remov[email protected]
Subject: SAT Question of the Day

Shea B. wrote:

Is there any existing functions or external gems that can take a generic
string and parse for charset markers, converting that parts of the
string to their appropriate charsets?

For example, theres an email with the headers
Date: Sat, 03 Jul 2010 04:00:29 EDT
From: =?iso-8859-1?B?U0FU?= [email protected]
Subject:
=?iso-8859-1?B?VGhlIE9mZmljaWFsIFNBVCBRdWVzdGlvbiBvZiB0aGUgRGF5?=

How can I convert this to what it should be,
Date: Sat, 03 Jul 2010 04:00:29 EDT
From: SAT [email protected]
Subject: SAT Question of the Day

Not answering your question directly, but this syntax is specific to
MIME encoding of E-mail headers:
http://en.wikipedia.org/wiki/MIME#Encoded-Word

So if you look at ruby MIME toolkits you may find what you’re looking
for.

heres the method I created in case it is useful to anyone else…

def convert_mime_encoded_word(mime_encoded_word)
require ‘iconv’
require ‘base64’
from_charset, from_encoding, encoded_word =
mime_encoded_word.scan(/=?([^?]+)?([BQ])?([^?]+)?=/i).first
if from_encoding == “Q”
decoded_word = encoded_word.unpack(“M”).first
elsif from_encoding == “B”
decoded_word = encoded_word.unpack(“m”).first
end
Iconv.iconv(“UTF8”, from_charset, decoded_word).first
end

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs