Working with rails and unicode


#1

I’m trying to get basic unicode support working using the “Iteration
A1” sample application from the “Agile Web D. With Rails”
book.

Following the “HowToUseUnicodeStrings” wiki document, I have made the
following changes:

config/environment.rb:

Include your application configuration below

$KCODE = ‘u’
require ‘jcode’

admin.rhtml:

database.yml:
development:
adapter: mysql
encoding: utf8

application.rb:
class ApplicationController < ActionController::Base
before_filter :set_charset
after_filter :fix_unicode_for_safari

def set_charset
@headers[“Content-Type”] = “text/html; charset=utf-8”
end

automatically and transparently fixes utf-8 bug

with Safari when using xmlhttp

def fix_unicode_for_safari
if @headers[“Content-Type”] == “text/html; charset=utf-8” and
@request.env[‘HTTP_USER_AGENT’].to_s.include? ‘AppleWebKit’ and
request.xhr?
@response.body = @response.body.gsub(/([^\x00-\xa0])/u) { |s|
“&#x%x;” % $1.unpack(‘U’)[0] }
end
end
end

And finally create.sql:
) ENGINE=MyISAM DEFAULT CHARSET=utf8

The last step was not mentioned in the wiki guide, but was
nevertheless required in my testing.

So, having performed the above steps, I can now successfully view,
add, and edit entries with unicode text (try, for example, inserting
chinese characters).

I run into problems, however, when I try to import existing unicode
data into the mysql table. I have a utf8 encoded sql query file that
inserts some unicode records into the table. When I try to view these
new records with the admin interface of the sample application, I get
garbage instead of the correct unicode characters. Editing a record
and pasting the correct unicode strings and updating again (all
through the interface) works correctly.

At this point I’m not sure what can be causing the problem. It seems
that anything that’s handled outside of the rails application seems to
be incorrectly treated once inside rails.

Has anyone had any experience with this?

Cheers, M


#2

martin aatmaa wrote:

And finally create.sql:
) ENGINE=MyISAM DEFAULT CHARSET=utf8

This step isn’t necessary if you set the default character set for your
schema to utf8:

CREATE DATABASE mydatabase DEFAULT CHARSET utf8

Now every table you create will use utf8 unless you specify otherwise.

I run into problems, however, when I try to import existing unicode
data into the mysql table. I have a utf8 encoded sql query file that
inserts some unicode records into the table. When I try to view these
new records with the admin interface of the sample application, I get
garbage instead of the correct unicode characters. Editing a record
and pasting the correct unicode strings and updating again (all
through the interface) works correctly.

My guess is that the problem is that your data is not being inserted
correctly by your “utf8 encoded sql”. If you view the data within the
MySQL Query Browser does it appear as you would expect (my guess is that
you’ll see the same garbled data that you see in the admin interface)?

The character set handling within MySQL is complicated to say the least.
It took me quite a lot of reading the documentation and experimenting to
work it out. If your problem is what I think it is, then what’s
happening is that MySQL thinks that your utf8 encoded SQL is actually
ISO-8859 encoded. You can tell it that it’s utf8 by adding this line to
the top of your SQL:

SET NAMES utf8

It’s worth reading the MySQL documentation about character sets
carefully. The way that it works isn’t obvious (to me, at least!).

Hope this is some help,

paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park…
Who says I have a one track mind?


#3

Paul, thank you for the response.

Your suggestion that it is a mysql problem turned out to be true. It
seems that I am using the command line mysql command incorrectly, and
using the “source” command somehow incorrectly reads my SQL script
(located in a file).

For the time being I run the script from mySQLFront, which handles the
data correctly.

Cheers,
Martin