Preventing "collapse" of HTML tags during XML parse

rhalff · August 30, 2007, 11:11pm

I’m trying to avoid having

turn into

But this is exactly what happens to me in both XmlSimple and REXML:

irb(main):028:0> print
XmlSimple.xml_out(XmlSimple.xml_in("<textarea id=“foo”></tex
tarea>"))

The problem is that

doesn’t render in a browser.

Does anyone know of a way to avoid this in either library? Or, should I
be doing this another way?

Thanks,
Rob

rhalff · August 30, 2007, 11:30pm

Rob H. wrote:

I’m trying to avoid having

turn into

But this is exactly what happens to me in both XmlSimple and REXML:

irb(main):028:0> print
XmlSimple.xml_out(XmlSimple.xml_in("<textarea id=“foo”></tex
tarea>"))

The problem is that

doesn’t render in a browser.

Does anyone know of a way to avoid this in either library? Or, should I
be doing this another way?

Thanks,
Rob

HTML is not an XML language, it is an SGML language. Use an HTML parser
instead of an XML parser. For example Hpricot.

Regards
Stefan

rhalff · August 31, 2007, 3:48pm

On 8/30/07, Rob H. [email protected] wrote:

I’m trying to avoid having

turn into
But this is exactly what happens to me in both XmlSimple and REXML:

and are equivalent in the XML spec, so this
is correct behavior.

Does anyone know of a way to avoid this in either library? Or, should I
be doing this another way?

If you’re using REXML, you might investigate the last argument to
write():

http://www.ruby-doc.org/stdlib/libdoc/rexml/rdoc/classes/REXML/Element.html#M002971

ie_hack: Internet Explorer is the worst piece of crap to have ever
been written, with the possible exception of Windows itself. Since IE
is unable to parse proper XML, we have to provide a hack to generate
XML that IE’s limited abilities can handle. This hack inserts a space
before the /> on empty tags. Defaults to false

HTH,
Keith