Disabling XML character escaping for to_xml

nwiger · October 14, 2008, 11:01pm

Currently, it appears to_xml will automatically escape any entities
into their corresponding &XXX representation. There’s a piece in the
documentation that says “If $KCODE is set to u and encoding set to
UTF8, then escaping will NOT be performed.”

Unfortunately, this doesn’t appear to be the case. Even after
following the docs and ensuring that default_charset is indeed UTF-8
(actually the default for Rails nowadays), we still get encoded
characters in to_xml output.

Since our client is UTF-8 aware, we need to pass thru the UTF-8 data
intact. The only way we’ve found to do this is thru the following
horrible monkey-patch:

module Builder
class XmlBase
def _escape(text)
text
end
end
end

What’s the proper way to do this?

Thanks,
Nate

nwiger · October 16, 2008, 7:12pm

I had the same issue, but eventually putting

$KCODE=‘UTF8’

in my config/environment.rb solved the issue.

Greetings,

Wouter

nwiger · October 16, 2008, 9:24pm

Just deployed to a production server, but it doesn’t work there,
although the rails version is the same. Maybe it’s the ruby version
(1.8.7 locally and 1.8.6 on the server)

nwiger · October 21, 2008, 11:29am

I have the same issue,
$KCODE=‘UTF8’ by default, but I set it anyway in environment.rb
This didn’t solve my problem, I applied the patch and it worked,
It’s not the ideal solution, but it gets the job done
I’ve tried the multibyte chars thing and it didn’t work eather.

May the source be with you

nwiger · November 11, 2008, 11:10pm

Any word on if this is fixed in Edge/Rails 2.2?

Cheers,
Walter

nwiger · November 11, 2008, 11:17pm

Actually, the monkey patch solution sort of sucks. It turns off ALL
escaping, not just turning off utf to entities escaping.

So this is fine:

dc:descriptionmâori</dc:description>

but this is not:

dc:description

âçîôû

The html tags SHOULD be escaped, while the unicode characters
shouldn’t be. My work around will simply be to strip out the embedded
HTML, but this a problem that people should be aware of when using the
monkey patch.

Cheers,
Watler

nwiger · November 12, 2008, 1:21am

On Nov 11, 10:16 pm, mcginniwa [email protected] wrote:

The html tags SHOULD be escaped, while the unicode characters
shouldn’t be. My work around will simply be to strip out the embedded
HTML, but this a problem that people should be aware of when using the
monkey patch.

Many moons ago I overrode the String#to_xs method that builder adds to
just escape the vitals (ie &<>'" ) instead of all the extra stuff it
does.

Fred

nwiger · November 14, 2008, 5:17am

Yeah, I ended up doing that basically, but in some specific helpers. My
coworker refined it though using the htmlentities plugin. You can see
it
here:

github.com

kete/kete/blob/master/lib/oai_dc_helpers.rb#L135


      
                                              protocol: protocol,
                                              host: host,
                                              locale: false
              ))
            )
          end
          
          
def oai_dc_xml_dc_title(xml, options = {})
            xml.send('dc:title', title, options)
          end
          
          
def oai_dc_xml_dc_publisher(xml, publisher = nil)
            # this website is the publisher by default
            if publisher.nil?
              xml.send('dc:publisher', simulated_request[:host])
            else
              xml.send('dc:publisher', publisher)
            end
          end
          
          
def oai_dc_xml_dc_description(xml, passed_description = nil, options = {})

Long term we may do this for all the xml values, not just our
dc:description
element. So it might move up to monkey patching builder or more general
spot or something.

Cheers,
Walter

On Wed, Nov 12, 2008 at 1:20 PM, Frederick C. <