Forum: Ruby XML, WebService and Character Encoding issue

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Gonzalo R. (Guest)
on 2006-01-31 21:18
I created a Ruby proxy for a FoxPro app that needs to fetch data from a
WebService (which returns it in XML) and read it in CSV format (for
which i use REXML parser and output the CSV by hand)
To do this the WebService returns me a Base64 encoded XML that i then
decode and process.

Everything works ok until you have non-standard characters in the XML
data (non 7-bit characters, i.e. Western European accented characters)
since the REXML parser dies complaining about a closing tag not found.
I looked for an entities processor or a character encoding converter in
the standard library and i coudn't find it.

I ended doing an ugly hack by feeding a Hash with the accented character
as the key, and the entity as the value, and then replacing back and
forth the returned data.
my function looks like this:

def iso2entities(str, inverse)
  rep = Hash.new
  rep['á'] = 'á'
  # ... snipped code ...
  rep['©'] = '©'

  unless inverse
    rep.each{|code, entity| str.gsub!(code, entity) }
  else
    rep.each{|code, entity| str.gsub!(entity, code) }
  end
  return str
end

It works, but feeding the Hash by hand is time consuming and code
obviously looks like an ugly work-around... is there a "ruby standard"
way to do it?
Russell R. Rutledge (Guest)
on 2006-02-25 14:46
Gonzalo R. wrote:
> I created a Ruby proxy for a FoxPro app that needs to fetch data from a
> WebService (which returns it in XML) and read it in CSV format (for
> which i use REXML parser and output the CSV by hand)
> To do this the WebService returns me a Base64 encoded XML that i then
> decode and process.
>
> Everything works ok until you have non-standard characters in the XML
> data (non 7-bit characters, i.e. Western European accented characters)
> since the REXML parser dies complaining about a closing tag not found.
> I looked for an entities processor or a character encoding converter in
> the standard library and i coudn't find it.
>
> I ended doing an ugly hack by feeding a Hash with the accented character
> as the key, and the entity as the value, and then replacing back and
> forth the returned data.
> my function looks like this:
>
> def iso2entities(str, inverse)
>   rep = Hash.new
>   rep['á'] = 'á'
>   # ... snipped code ...
>   rep['©'] = '©'
>
>   unless inverse
>     rep.each{|code, entity| str.gsub!(code, entity) }
>   else
>     rep.each{|code, entity| str.gsub!(entity, code) }
>   end
>   return str
> end
>
> It works, but feeding the Hash by hand is time consuming and code
> obviously looks like an ugly work-around... is there a "ruby standard"
> way to do it?

Hey Gonzalo.  I was having the same problem.  I'm not at a final
solution, but part of what worked for me was changing my XML character
encoding (find it in the first line of your XML file) from UTF-8 to
ISO-8859-1.  For some reason REXML can parse characters encoded by more
than 7-bits in this format (like é).  Hope that helps.


Russ
This topic is locked and can not be replied to.