I have a Ruby/Rails script which receives data passed into it as a param in xml format. This data is packaged up "remotely" and I don't have any control over what it can contain other than that the contents of the elements have had their html entities encoded into text. The problem I'm having is that libxml is choking on some characters. The particular instance that brought the problem to my attention is a string which contains the â™‚ character (ascii 11). I have been searching around and found several sites which seem to address the problem but none seem to do the "whole job". I can strip this specific character out obviously but given that libxml must have a clearly defined set of valid characters it can handle, how do I process that string before passing it to libxml to guarantee that it does not contain any characters that libxml can't process without overdoing it?
on 2008-10-23 15:35