ActiveRecord to_json encoding

Question:
Hi, our company is using Ruby 1.8.6 with Rails 2.2.2. Does anyone
know how we can explicitly specify what encoding to use when calling
.to_json on an ActiveRecord?

Description:
We have some multibyte characters in our database. For example we
have a table with a name column that has this French accented e: Café
Records. When we serialize this object using ActiveRecord’s to_xml()
everything looks fine in the browser and with our json objects.

When we render JSON using to_json() we are seeing problems where the
accented ‘e’ character is getting mangled and causes our calling web
client to fail since it’s expecting properly UTF-8 encoded characters.

If we use the browser to submit HTTP Get requesting JSON format, save
the file and view it in binary mode in Hexadecimal representation,
this is what we get. It looks like this is using extended ASCII.

Bytes Text
43 61 66 E9 C a f (should be accented e but get weird
block unprintable character)

If we save that same file from above and convert it to UTF-8, we get an
extra byte that seems to be proper UTF-8 encoding as shown below.

Bytes Text
43 61 66 C3 A9 C a f é (correctl get accented e)

Can someone tell me how they’ve made to_json() UTF-8 compliant?
Thanks in advance, Calvin.

On Aug 10, 10:02 pm, Calvin N. [email protected]
wrote:

When we render JSON using to_json() we are seeing problems where the
accented ‘e’ character is getting mangled and causes our calling web
client to fail since it’s expecting properly UTF-8 encoded characters.

What’s actually stored in the database ? If you open up a console and
find the relevant object in the database what does the name attribute
contain ? I don’t think that to_json does much more than spit out the
data ActiveRecord already has.

Fred

Frederick C. wrote:

On Aug 10, 10:02�pm, Calvin N. [email protected]
wrote:

When we render JSON using to_json() we are seeing problems where the
accented ‘e’ character is getting mangled and causes our calling web
client to fail since it’s expecting properly UTF-8 encoded characters.

What’s actually stored in the database ? If you open up a console and
find the relevant object in the database what does the name attribute
contain ? I don’t think that to_json does much more than spit out the
data ActiveRecord already has.

Fred

Here is our record in the datbase:
322 Café Records NULL

In debugging we have had to set up a webclient to stream the bytes and
it results in what I summarized previously:
Bytes Text
43 61 66 E9 C a f (should be accented e but get weird
block unprintable character)

On Aug 10, 11:55 pm, Calvin N. [email protected]
wrote:

data ActiveRecord already has.

Fred

Here is our record in the datbase:
322 Café Records NULL

That’s not what I meant. What are the actual bytes stored in the
database ? What encoding does the database think this column is in.
If, in a ruby console, you inspect the bytes contained in the name
column what do you see ?

Fred

Hi Fred,
I appreciate the reply. We are using SQL Server 2005 and our database
record looks like this:

name cast(name as varbinary)
Café Records 0x436166E9205265636F726473

If I use the ruby console and print each byte, I get this. Must be at
better way to show the byte stream in ruby 1.8.6 than this…

l.name[0]
=> 67
l.name[1]
=> 97
l.name[2]
=> 102
l.name[3]
=> 233
l.name[4]
=> 32

The database table defines the name column as varchar(255). I cannot
find the column level encoding but I read
(Microsoft SQL Server 2005: A Beginner''s Guide - Dusan Petkovic - Google Books)

that SQL server 2005 by default uses Unicode ( doesn’t say which Unicode
though. I believe when you specify a column as nvarchar then it default
to UCS-2?