Hi,
I’ve been trying to use REXML to process an XML file with entities,
but I can’t seem to get it to leave my entities alone even with the
:raw context set. The simplified example looks something like this:
song.xml:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Song [
]>
&convoy;
&rubberduck;
script.rb:
doc = Document.new File.open(‘song.xml’, ‘r’), { :raw => :all }
doc.elements.each(‘Song/lyric’) do |lyric|
puts lyric.raw # This prints ‘true’
puts lyric.text # This always has its entities decoded!
end
output:
true
we got a great big convoy
true
ain’t she a beautiful sight
desired output:
true
&convoy;
true
&rubberduck;
When I take out the { :raw => :all } part, the entry.raw line returns
nil, but the output isn’t changed. Am I misunderstanding how this is
supposed to work, or is it broken? How can I get the entities back in
an unencoded form? This is driving me crazy.
Thanks!
-Pawel