HTMLEntities.decode_entities - problems with output



I am trying to use the HTMLEntities library to translate HTML entities
into their character equivalents so that I can print a text version of
some HTML to a file.

However, I am having trouble understanding how to successfully emit the
converted text as a string without ending up with weird UTF-8 characters
in front of the converted characters.

Referencing the irb session below, I’m attempting to do the Iconv
conversion from ASCII to UTF-8 because I’m assuming that I have to in
order to use the HTMLEntities calls. I am trying to convert back to
ISO-8859-1 because I’m assuming that I need to.

If I just try to print the output of HTMLEntities.decode_entities to a
file without doing any iconv conversions, I get A-circumflex before
every modified character. A-circumflex is the ISO-8859-1 equivalent of

What am I missing here? How can I successfully display   as a
space in a file that I am writing to? I’d rather not have to my own
gsubs on each character entity, although I am prepared to do that.

I also thought of substituting the \302 character with ‘’ (if I could
only figure out how to do that).

Any help is appreciated,


irb(main):001:0> require ‘iconv’
=> true
irb(main):002:0> require ‘rubygems’
=> false
irb(main):003:0> require ‘HTMLEntities’
=> true
irb(main):004:0> x = ’ xyz’
=> " xyz"
irb(main):006:0> conv =“ASCII”, “UTF-8”)
=> #Iconv:0x2c297f8
irb(main):007:0> y = conv.iconv(x)
=> " xyz"
irb(main):008:0> HTMLEntities.decode_entities(y)
=> “\302\240xyz”
irb(main):009:0> conv =“UTF-8”, “ISO-8859-1”)
=> #Iconv:0x2c12ed8
irb(main):010:0> conv.iconv(y)
=> " xyz"