Ruby method to strip out XML codes?

I am trying to process an XML file that includes various codes. The
problem I am running into is that some of these codes are inserted into
the middle of an encrypted string. If I display the file using a
browser these codes do not show up and copying and pasting the string
work fine. The problem occurs when I try to strip out the string in a
program and these “extraneous” XML codes are included. This of course
makes the decryption routine crash.
What I am looking for is a simple way to read through the file and
remove all the XML codes leaving just plain text. I could probably
write a series of regular expressions to remove each code that I can
find in my text but am afraid I might miss some and it will come back to
haunt me at a later time.

On Dec 5, 6:13 pm, “Michael W. Ryder” [email protected]
wrote:

find in my text but am afraid I might miss some and it will come back to
haunt me at a later time.

str.gsub /</?[^>]+>/, ‘’

This will only be a problem if your XML file is legal and has a CDATA
section which has a literal < character (not <), like:

for ( var i=0, len=a.length; i<len; ++i )

In that case you likely want a proper XML parser (like REXML) and to
use it.

Do you really want to remove the XML, or would it suffice to just:

str.gsub! ‘&’, ‘&’
str.gsub! ‘<’, ‘<’
str.gsub! ‘>’, ‘>’
(and maybe even)
str.gsub! ‘"’, ‘"’
str.gsub! “'”, ‘'’

to make your string valid and escaped for use in an HTML context?

Phrogz wrote:

remove all the XML codes leaving just plain text. I could probably

str.gsub! “’”, ‘’’

to make your string valid and escaped for use in an HTML context?

My problem is that the XML file includes in the middle of a
couple of fields, especially in the encrypted fields. If I just strip
out the encrypted field and try to decrypt it the program crashes as the
key is invalid. I have to remove the “bad” character strings before
sending it to my decryption program. I would prefer to do this removal
before sending the file to my programs so that I don’t have to deal with
these codes.
I assume that the string I am seeing is XML’s way of saying CR/LF as DA
in hex is CR/LF and the output in a browser shows the field being broken
at that point. The problem is that is only the ones that I have noticed
and there may be others hiding in the data. The XML file is being
parsed for conversion to our accounts.