Forum: Ruby on Rails HTMLEntities.decode_entities - problems with output

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Bb4bdf2b184027bc38d4fb529770cde5?d=identicon&s=25 Wes Gamble (weyus)
on 2006-05-25 00:16
All,

I am trying to use the HTMLEntities library to translate HTML entities
into their character equivalents so that I can print a text version of
some HTML to a file.

However, I am having trouble understanding how to successfully emit the
converted text as a string without ending up with weird UTF-8 characters
in front of the converted characters.

Referencing the irb session below, I'm attempting to do the Iconv
conversion from ASCII to UTF-8 because I'm assuming that I have to in
order to use the HTMLEntities calls.  I am trying to convert back to
ISO-8859-1 because I'm assuming that I need to.

If I just try to print the output of HTMLEntities.decode_entities to a
file without doing any iconv conversions, I get A-circumflex before
every modified character.  A-circumflex is the ISO-8859-1 equivalent of
\302.

What am I missing here?  How can I successfully display   as a
space in a file that I am writing to?  I'd rather not have to my own
gsubs on each character entity, although I am prepared to do that.

I also thought of substituting the \302 character with '' (if I could
only figure out how to do that).

Any help is appreciated,
Wes

==============================================================================

C:\eclipse\workspace>irb
irb(main):001:0> require 'iconv'
=> true
irb(main):002:0> require 'rubygems'
=> false
irb(main):003:0> require 'HTMLEntities'
=> true
irb(main):004:0> x = ' xyz'
=> " xyz"
irb(main):006:0> conv = Iconv.new("ASCII", "UTF-8")
=> #<Iconv:0x2c297f8>
irb(main):007:0> y = conv.iconv(x)
=> "&nbsp;xyz"
irb(main):008:0> HTMLEntities.decode_entities(y)
=> "\302\240xyz"
irb(main):009:0> conv = Iconv.new("UTF-8", "ISO-8859-1")
=> #<Iconv:0x2c12ed8>
irb(main):010:0> conv.iconv(y)
=> "&nbsp;xyz"
This topic is locked and can not be replied to.