Forum: Ruby Parsing xhtml with libxml

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Jon S. (Guest)
on 2005-12-17 01:19
(Received via mailing list)
If you get errors complaining of undefined entities like   when
parsing xhtml it means you need to install the DTD for xhtml 1.0 or
1.1.

Example of a doctype for xhtml 1.1:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

You want to install the DTDs locally following the model in /etc/xml.
If you don't libxml will fetch the DTD from www.w3.org each time you
parse a document. Needing to install these DTDs was not obvious to me
and should be part of the documentation. There a rpm for xhtml 1.0 -
"xhtml1-dtds-1.0-7". I couldn't find one for xhtml 1.1 so I downloaded
it piecemeal from w3.org.

Installing the DTD does not automatically turn on validation. If you
want to validate you need to turn it on:
XML::Parser::default_validity_checking = TRUE

XML::Parser::default_load_external_dtd controls the loading of the
'external subset' (the definition for the character entities like
&amp;. It is defaulted to TRUE.

XML::Parser::default_load_external_dtd is broken. This fixes it.

Index: ruby_xml_parser.c
==========================================================
RCS file: /var/cvs/xml-tools/libxml-ruby/ruby_xml_parser.c,v
retrieving revision 1.1.1.1
diff -r1.1.1.1 ruby_xml_parser.c
274c274
<   if (xmlSubstituteEntitiesDefaultValue)
---
>   if (xmlLoadExtDtdDefaultValue)
916c916
<
ruby_xml_parser_default_load_external_dtd_set, 0);
---
>                            ruby_xml_parser_default_load_external_dtd_get, 0);
918c918
<
ruby_xml_parser_default_load_external_dtd_get, 1);
---
>                            ruby_xml_parser_default_load_external_dtd_set, 1);


Sam's patches for libxml are also needed:
http://www.intertwingly.net/blog/2005/11/05/Patch-...
Eero S. (Guest)
on 2005-12-17 01:42
Jon S. wrote:
> If you get errors complaining of undefined entities like &nbsp; when
> parsing xhtml it means you need to install the DTD for xhtml 1.0 or
> 1.1.
>
> Example of a doctype for xhtml 1.1:
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
> "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
>
> <snip explanation & code due to ruby-forum.com />
>
> Sam's patches for libxml are also needed:
> http://www.intertwingly.net/blog/2005/11/05/Patch-...

Thank you for this!



E
--
This document is NOT valid XHTML 1.0!
Ross B. (Guest)
on 2005-12-17 02:13
(Received via mailing list)
On Fri, 16 Dec 2005 23:18:54 -0000, Jon S. 
<removed_email_address@domain.invalid>
wrote:

> If you get errors complaining of undefined entities like &nbsp; when
> parsing xhtml it means you need to install the DTD for xhtml 1.0 or
> 1.1.
>

Thanks for that. I've been gathering up problems and patches in a quiet
sort of way, but I'm not sure at the moment what's happening with the
project. I'm planning to get proactive this week and see if we can at
least get these issues sorted and the patches I have (including yours
and
Sam's) in.

Thanks,
Ross
This topic is locked and can not be replied to.