I have a Ruby/Rails script which receives data passed into it as a param
in xml format. This data is packaged up “remotely” and I don’t have any
control over what it can contain other than that the contents of the
elements have had their html entities encoded into text. The problem
I’m having is that libxml is choking on some characters. The particular
instance that brought the problem to my attention is a string which
contains the ♂ character (ascii 11).
I have been searching around and found several sites which seem to
address the problem but none seem to do the “whole job”. I can strip
this specific character out obviously but given that libxml must have a
clearly defined set of valid characters it can handle, how do I process
that string before passing it to libxml to guarantee that it does not
contain any characters that libxml can’t process without overdoing it?