I’ve tried a few xml parsers such as xml-simple, libxml and quixml, but
all reject this data as badly formed. One answer would, of course, be
for the data to be re-generated using properly formed xml. Meanwhile, is
there anything that could be done with the existing files? Is it a case
of having to write regexps to parse this sort of thing?
Note that there should be no - the line at the top is a
declaration, not an opening tag. Where did come from? What
happens if you remove that from the data?
Note that there should be no - the line at the top is a
declaration, not an opening tag. Where did come from? What
happens if you remove that from the data?
Good point about the XML. Unfortunately, these are the files I have
received and have to deal with them for now.
Removing the final tag gives:
.file.xml:3: parser error : Extra content at the end of the document
<server_name>myserver.edu</server_name>
^
rake aborted!
You should have done two things: 1. add root node (with
closing just before ) AND 2. remove the trailing
Great, thanks.
That should sort out the “legacy” files, and future ones can be
corrected.
I have also been parsing each line with IO.foreach and
/<(.+)[^>]*>(.+?)<(/.+)>/, which though not as nice as a proper XML
parser does avoid loading huge files into memory in one go.