Parsing xhtml with libxml

If you get errors complaining of undefined entities like   when
parsing xhtml it means you need to install the DTD for xhtml 1.0 or
1.1.

Example of a doctype for xhtml 1.1:

You want to install the DTDs locally following the model in /etc/xml.
If you don’t libxml will fetch the DTD from www.w3.org each time you
parse a document. Needing to install these DTDs was not obvious to me
and should be part of the documentation. There a rpm for xhtml 1.0 -
“xhtml1-dtds-1.0-7”. I couldn’t find one for xhtml 1.1 so I downloaded
it piecemeal from w3.org.

Installing the DTD does not automatically turn on validation. If you
want to validate you need to turn it on:
XML::Parser::default_validity_checking = TRUE

XML::Parser::default_load_external_dtd controls the loading of the
‘external subset’ (the definition for the character entities like
&. It is defaulted to TRUE.

XML::Parser::default_load_external_dtd is broken. This fixes it.

Index: ruby_xml_parser.c

RCS file: /var/cvs/xml-tools/libxml-ruby/ruby_xml_parser.c,v
retrieving revision 1.1.1.1
diff -r1.1.1.1 ruby_xml_parser.c
274c274
< if (xmlSubstituteEntitiesDefaultValue)

if (xmlLoadExtDtdDefaultValue)
916c916
<
ruby_xml_parser_default_load_external_dtd_set, 0);


                       ruby_xml_parser_default_load_external_dtd_get, 0);

918c918
<
ruby_xml_parser_default_load_external_dtd_get, 1);

                       ruby_xml_parser_default_load_external_dtd_set, 1);

Sam’s patches for libxml are also needed:
http://www.intertwingly.net/blog/2005/11/05/Patch-for-libxml2s-Ruby-binding

Jon S. wrote:

If you get errors complaining of undefined entities like   when
parsing xhtml it means you need to install the DTD for xhtml 1.0 or
1.1.

Example of a doctype for xhtml 1.1:

<snip explanation & code due to ruby-forum.com />

Sam’s patches for libxml are also needed:
Patch for libxml2's Ruby binding

Thank you for this!

E

This document is NOT valid XHTML 1.0!

On Fri, 16 Dec 2005 23:18:54 -0000, Jon S. [email protected]
wrote:

If you get errors complaining of undefined entities like   when
parsing xhtml it means you need to install the DTD for xhtml 1.0 or
1.1.

Thanks for that. I’ve been gathering up problems and patches in a quiet
sort of way, but I’m not sure at the moment what’s happening with the
project. I’m planning to get proactive this week and see if we can at
least get these issues sorted and the patches I have (including yours
and
Sam’s) in.

Thanks,
Ross