If you get errors complaining of undefined entities like when
parsing xhtml it means you need to install the DTD for xhtml 1.0 or
1.1.
Example of a doctype for xhtml 1.1:
You want to install the DTDs locally following the model in /etc/xml.
If you don’t libxml will fetch the DTD from www.w3.org each time you
parse a document. Needing to install these DTDs was not obvious to me
and should be part of the documentation. There a rpm for xhtml 1.0 -
“xhtml1-dtds-1.0-7”. I couldn’t find one for xhtml 1.1 so I downloaded
it piecemeal from w3.org.
Installing the DTD does not automatically turn on validation. If you
want to validate you need to turn it on:
XML::Parser::default_validity_checking = TRUE
XML::Parser::default_load_external_dtd controls the loading of the
‘external subset’ (the definition for the character entities like
&. It is defaulted to TRUE.
XML::Parser::default_load_external_dtd is broken. This fixes it.
Index: ruby_xml_parser.c
RCS file: /var/cvs/xml-tools/libxml-ruby/ruby_xml_parser.c,v
retrieving revision 1.1.1.1
diff -r1.1.1.1 ruby_xml_parser.c
274c274
< if (xmlSubstituteEntitiesDefaultValue)
if (xmlLoadExtDtdDefaultValue)
916c916
<
ruby_xml_parser_default_load_external_dtd_set, 0);
ruby_xml_parser_default_load_external_dtd_get, 0);
918c918
<
ruby_xml_parser_default_load_external_dtd_get, 1);
ruby_xml_parser_default_load_external_dtd_set, 1);
Sam’s patches for libxml are also needed:
http://www.intertwingly.net/blog/2005/11/05/Patch-for-libxml2s-Ruby-binding