Rexml difficulties

Hello all, just wondering whether someone else has observed this.

I am using REXML and at times, it seems to have difficulties “seeing” a
closing tag. I am wrapping XML-escaped binary data in XML and as a
result there might be a lot of special characters between tags. I
assume some of those special characters are causing problems…

I added ‘\n’ chars at the end of the binary stream, which seemed to
help, but not completely solve the problem.

Anybody else observed this or has suggestions on how to overcome this
problem?

Christian

[email protected] wrote:

Hello all, just wondering whether someone else has observed this.

I am using REXML and at times, it seems to have difficulties “seeing” a
closing tag. I am wrapping XML-escaped binary data in XML and as a
result there might be a lot of special characters between tags. I
assume some of those special characters are causing problems…

What do you mean by “XML escaped”?


James B.

http://www.ruby-doc.org - Ruby Help & Documentation
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.rubyaz.org - Hacking in the Desert

[email protected] wrote:

Anybody else observed this or has suggestions on how to overcome this
problem?

Show us the problem. There are many kinds of character sequences that
are
not allowed in XML data fields, and there are a number of ways to escape
the data fields, but they have to be applied in order to work. Arbitrary
data can’t simply be dropped between XML delimiters, without certain
precautions being taken.

On 10/21/06, Paul L. [email protected] wrote:

http://www.arachnoid.com

Hi Paul,

It sounds like you might have some experience in this area. Not to
hijack the OP, but could you possibly describe the process you would
go through if you had a completely random pile of binary barf that you
wanted to store as an XML attribute?

Would your process include using Base64?

Also, let’s pretend that small size is desirable, but time spent
zipping is unacceptable.

Thanks in advance for any insight,
-Harold

Harold H. wrote:

It sounds like you might have some experience in this area. Not to
hijack the OP, but could you possibly describe the process you would
go through if you had a completely random pile of binary barf that you
wanted to store as an XML attribute?

Would your process include using Base64?

“Base64 encoding, specified in RFC 2045 - MIME (Multipurpose Internet
Mail Extensions) uses a 64-character subset (A-Za-z0-9+/) to represent
binary data and = for padding. Base64 processes data as 24-bit groups,
mapping this data to four encoded characters. It is sometimes referred
to as 3-to-4 encoding. Each 6 bits of the 24-bit group is used as an
index into a mapping table (the base64 alphabet) to obtain a character
for the encoded data. According to the MIME specification the encoded
data has line lengths limited to 76 characters, but this line length
restriction does not apply when transmitting binary data as part of XML
document.”

It’s a common, practical approach.


James B.

“Every object obscures another object.”
- Luis Bunuel

This is something that I personally use for a Ruby routine I have that
stores stock item images for the retail jewelry company I work for. I
extract JPG images and store them as Base-64 encoded elements in an XML
file. Then I port that into a SQL database. To extract the images I
just decode them. Works like a charm and perhaps saves some SQL
resources since I’m not storing the images and actual BLOB items…

I was just calling REXML::Text.normalize…I guess that is not
sufficient.
I give base64 encode a try.

THanks for you help.

Christian

On 10/21/06, Paul L. [email protected] wrote:

wanted to store as an XML attribute?

You should realize that another, possibly better, approach for truly large
binary globs is to store them as files, and store links to the files in the
XML data set, rather than the raw data itself.

Thanks for this insight.

It’s funny, to me, how laziness has become a defense mechanism. I
think I personnally kind of like it. (:

Storing the binary as a seperate file is a great solution. In our
particular case we like to have the data in one big xml file for the
purposes of source control. I’m sure I don’t need to expound on the
greatness of plain text on the Ruby list, but the source control
system we use doesn’t play exceptionally well with binary files.

Thanks again,
-Harold

Harold H. wrote:

Paul L.
http://www.arachnoid.com

Hi Paul,

It sounds like you might have some experience in this area. Not to
hijack the OP, but could you possibly describe the process you would
go through if you had a completely random pile of binary barf that you
wanted to store as an XML attribute?

Okay, you need to know I am famously lazy. In fact, I think Larry Wall
was
describing me when he made his well-known remark about programmer
laziness
and hubris. Being lazy, the first simple approach I would take is to
enclose the binary data like this:

</enclosing XML tag>

The next step would be to make sure neither the starting or ending CDATA
tag
appears in the enclosed binary data, otherwise this strategy will fail.

The next step after that is to escape (and later unescape) the binary
data
if needed to assure the uniqueness of the delimiters.

You need to understand that, with a sufficiently large and varied binary
data set, every imaginable character string will appear in the data,
eventually including the delimiters.

This, in turn, means that escaping the data is eventually a requirement,
and
escaping the data means it will be larger than if this step were not
needed.

You should realize that another, possibly better, approach for truly
large
binary globs is to store them as files, and store links to the files in
the
XML data set, rather than the raw data itself.

Harold H. wrote:

/ …

Storing the binary as a seperate file is a great solution. In our
particular case we like to have the data in one big xml file for the
purposes of source control. I’m sure I don’t need to expound on the
greatness of plain text on the Ruby list,

Or anywhere else IMHO. It’s the ultimate in reusability and portability.

but the source control
system we use doesn’t play exceptionally well with binary files.

At its base, this problem is one of statistics. The longer a pure-binary
data block becomes, the more likely that there will be an appearance of
the
character sequence required to terminate the block. And if the obvious
solution is applied, that of using some coding that cannot deviate from
a
safe syntax (like hexadecimal ASCII characters), the block becomes more
than twice as large as the original, seriously cutting into storage and
time efficiency.