All, I am trying to use the HTMLEntities library to translate HTML entities into their character equivalents so that I can print a text version of some HTML to a file. However, I am having trouble understanding how to successfully emit the converted text as a string without ending up with weird UTF-8 characters in front of the converted characters. Referencing the irb session below, I'm attempting to do the Iconv conversion from ASCII to UTF-8 because I'm assuming that I have to in order to use the HTMLEntities calls. I am trying to convert back to ISO-8859-1 because I'm assuming that I need to. If I just try to print the output of HTMLEntities.decode_entities to a file without doing any iconv conversions, I get A-circumflex before every modified character. A-circumflex is the ISO-8859-1 equivalent of \302. What am I missing here? How can I successfully display as a space in a file that I am writing to? I'd rather not have to my own gsubs on each character entity, although I am prepared to do that. I also thought of substituting the \302 character with '' (if I could only figure out how to do that). Any help is appreciated, Wes ============================================================================== C:\eclipse\workspace>irb irb(main):001:0> require 'iconv' => true irb(main):002:0> require 'rubygems' => false irb(main):003:0> require 'HTMLEntities' => true irb(main):004:0> x = ' xyz' => " xyz" irb(main):006:0> conv = Iconv.new("ASCII", "UTF-8") => #<Iconv:0x2c297f8> irb(main):007:0> y = conv.iconv(x) => " xyz" irb(main):008:0> HTMLEntities.decode_entities(y) => "\302\240xyz" irb(main):009:0> conv = Iconv.new("UTF-8", "ISO-8859-1") => #<Iconv:0x2c12ed8> irb(main):010:0> conv.iconv(y) => " xyz"
on 2006-05-25 00:16