Converting escaped html to utf-8


#1

Hi everyone,

I’ve looked around online for a solution, but I’m pretty new to ruby
and programming in general, so I feel like I’m hitting a wall here.

I’m retrieving data from Hpricot that I’d like to store in UTF-8, but
I can’t find a function to convert hex NCRs like:

á

Surely somebody’s had to do this in the past that could point me in
the right direction? Thanks!


#2

Well, after some more googling, I found a solution. If anyone was
curious –

require ‘cgi’
require ‘iconv’

n = “á”
n = CGI.unescapeHTML(n)
n = Iconv.conv(“UTF-8”, “ISO-8859-1”, n)


#3

Chris Worrall wrote:

Well, after some more googling, I found a solution. If anyone was
curious –

require ‘cgi’
require ‘iconv’

n = “á”
n = CGI.unescapeHTML(n)
n = Iconv.conv(“UTF-8”, “ISO-8859-1”, n)

I’m surprised no one mentioned it but you could use

require “rubygems”
require “htmlentities”
puts HTMLEntities.decode_entities(“Ā Ĉ Ď”)
=> Ā Ĉ Ď

Daniel