Disabling XML character escaping for to_xml


#1

Currently, it appears to_xml will automatically escape any entities
into their corresponding &XXX representation. There’s a piece in the
documentation that says “If $KCODE is set to u and encoding set to
UTF8, then escaping will NOT be performed.”

Unfortunately, this doesn’t appear to be the case. Even after
following the docs and ensuring that default_charset is indeed UTF-8
(actually the default for Rails nowadays), we still get encoded
characters in to_xml output.

Since our client is UTF-8 aware, we need to pass thru the UTF-8 data
intact. The only way we’ve found to do this is thru the following
horrible monkey-patch:

module Builder
class XmlBase
def _escape(text)
text
end
end
end

What’s the proper way to do this?

Thanks,
Nate


#2

I had the same issue, but eventually putting

$KCODE=‘UTF8’

in my config/environment.rb solved the issue.

Greetings,

Wouter


#3

Just deployed to a production server, but it doesn’t work there,
although the rails version is the same. Maybe it’s the ruby version
(1.8.7 locally and 1.8.6 on the server)


#4

I have the same issue,
$KCODE=‘UTF8’ by default, but I set it anyway in environment.rb
This didn’t solve my problem, I applied the patch and it worked,
It’s not the ideal solution, but it gets the job done :slight_smile:
I’ve tried the multibyte chars thing and it didn’t work eather.

May the source be with you


#5

Any word on if this is fixed in Edge/Rails 2.2?

Cheers,
Walter


#6

Actually, the monkey patch solution sort of sucks. It turns off ALL
escaping, not just turning off utf to entities escaping.

So this is fine:

dc:descriptionmâori</dc:description>

but this is not:

dc:description

âçîôû

 

The html tags SHOULD be escaped, while the unicode characters
shouldn’t be. My work around will simply be to strip out the embedded
HTML, but this a problem that people should be aware of when using the
monkey patch.

Cheers,
Watler


#7

On Nov 11, 10:16 pm, mcginniwa removed_email_address@domain.invalid wrote:

The html tags SHOULD be escaped, while the unicode characters
shouldn’t be. My work around will simply be to strip out the embedded
HTML, but this a problem that people should be aware of when using the
monkey patch.

Many moons ago I overrode the String#to_xs method that builder adds to
just escape the vitals (ie &<>’" ) instead of all the extra stuff it
does.

Fred


#8

Yeah, I ended up doing that basically, but in some specific helpers. My
coworker refined it though using the htmlentities plugin. You can see
it
here:
http://github.com/kete/kete/tree/master/lib/oai_dc_helpers.rb#L135

Long term we may do this for all the xml values, not just our
dc:description
element. So it might move up to monkey patching builder or more general
spot or something.

Cheers,
Walter

On Wed, Nov 12, 2008 at 1:20 PM, Frederick C. <