I’ve written a small script to parse an xml doc with SaxParser and
everything goes well until the parser encounters a Unicode character.
For example, in the for the following snippet:
In case it doesn’t come through correctly, the “’” character above is an
apostrophe, represented as <80><99> when I view the xml with less.
When the on_characters method is called for the string “90’s Music”, the
buffer only contains “90”, with no error or warning being presented.
After this is encountered parsing occurs normally; the first I saw of
the bug was when I noticed some of my strings being truncated. Is there
some setting of libxml or ruby that I’ve overlooked to cause this