Ruby method to strip out XML codes?

I am trying to process an XML file that includes various codes. The
problem I am running into is that some of these codes are inserted into
the middle of an encrypted string. If I display the file using a
browser these codes do not show up and copying and pasting the string
work fine. The problem occurs when I try to strip out the string in a
program and these “extraneous” XML codes are included. This of course
makes the decryption routine crash.
What I am looking for is a simple way to read through the file and
remove all the XML codes leaving just plain text. I could probably
write a series of regular expressions to remove each code that I can
find in my text but am afraid I might miss some and it will come back to
haunt me at a later time.

On Dec 5, 6:13 pm, “Michael W. Ryder” [email protected]
wrote:

find in my text but am afraid I might miss some and it will come back to
haunt me at a later time.

str.gsub /</?[^>]+>/, ‘’

This will only be a problem if your XML file is legal and has a CDATA
section which has a literal < character (not <), like:

for ( var i=0, len=a.length; i<len; ++i )

In that case you likely want a proper XML parser (like REXML) and to
use it.

Do you really want to remove the XML, or would it suffice to just:

str.gsub! ‘&’, ‘&’
str.gsub! ‘<’, ‘<’
str.gsub! ‘>’, ‘>’
(and maybe even)
str.gsub! ‘"’, ‘"’
str.gsub! “’”, ‘’’

to make your string valid and escaped for use in an HTML context?

Phrogz wrote:

remove all the XML codes leaving just plain text. I could probably

str.gsub! “’”, ‘’’

to make your string valid and escaped for use in an HTML context?

My problem is that the XML file includes in the middle of a
couple of fields, especially in the encrypted fields. If I just strip
out the encrypted field and try to decrypt it the program crashes as the
key is invalid. I have to remove the “bad” character strings before
sending it to my decryption program. I would prefer to do this removal
before sending the file to my programs so that I don’t have to deal with
these codes.
I assume that the string I am seeing is XML’s way of saying CR/LF as DA
in hex is CR/LF and the output in a browser shows the field being broken
at that point. The problem is that is only the ones that I have noticed
and there may be others hiding in the data. The XML file is being
parsed for conversion to our accounts.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs