Forum: Ruby on Rails The dreaded Unicode issue

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Dave H. (Guest)
on 2007-01-02 19:04
(Received via mailing list)
Hi all,

I'm trying to store the following in a mysql DB but am having no luck:

1. Pär
2. Björn
3. gösta
(hopefully the encodings came out correct in this message)

I've looked into Ruby's Unicode library and have tried things such as:

Unicode.normalize_D(nickname)
Unicode.normalize_KD(nickname)
Unicode.normalize_C(nickname)
Unicode.normalize_KC(nickname)

All those methods either seem to strip out the European characters or
replace them with their English equivalents. All I want to do is store
the
text in its original form.

If I try to store the string as it currently is, I get question marks
showing up in all my views where the European characters are.


Does anyone have any ideas on how to accomplish such a thing?

Thanks!
-Dave H.
Rimantas L. (Guest)
on 2007-01-02 19:13
(Received via mailing list)
> I'm trying to store the following in a mysql DB but am having no luck:
>
> 1. Pär
> 2. Björn
> 3. gösta
<...>
> If I try to store the string as it currently is, I get question marks
> showing up in all my views where the European characters are.
> Does anyone have any ideas on how to accomplish such a thing?

It may well be, that your characters are ok in DB, but there is
different charset specified in
your HTTP headers. Check what headers server is sending (you can do it
with the help of LiveHTTPHeaders or Firebug extensions for Firefox).
Or the simplest test - check out, what
menu item is checked on View-Character Encoding menu in your browser.
If UTF is not checked, try to check it and see if characters are OK.
If yes, that means you have to fix HTTP headers. If no, then problems
may indeed be in the database.

Regards,
Rimantas
--
http://rimantas.com/
Dave H. (Guest)
on 2007-01-02 19:57
(Received via mailing list)
Thanks Rimantas,

I checked out the headers and they are sending:
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*,q=0.7

My database is using ISO Latin 1.

I failed to mention before that I'm loading a ruby script of users in a
text
file and executing it from the terminal.

If I were to enter Unicode characters through a form in my Rails view
then
everything works fine. "Bjorn" is properly saved as "Bj\303\266rn". It's
just loading the text file and looping through the records that seems to
be
the problem.

The important parts of the script:

$KCODE = 'u'
require 'jcode'
require 'unicode'

arr = IO.readlines(member_file)  # member_file is actually a text file
member = Member.new(nickname)
member.save

That obviously happens within a loop. I just wanted to uber-simplify it.

Still kind of confused...


-Dave
Richard C. (Guest)
on 2007-01-02 20:17
(Received via mailing list)
On 1/2/07, Dave H. <removed_email_address@domain.invalid> wrote:
> Thanks Rimantas,

> My database is using ISO Latin 1.

You might want to change that to UTF-8. Your accented unicode characters
(western european) use a single byte under latin-1 and 2 under utf-8.

Every link in the unicode chain (the round-trip) has to be fixed.
Rimantas L. (Guest)
on 2007-01-02 22:05
(Received via mailing list)
>
> I checked out the headers and they are sending:
> Accept-Charset: ISO-8859-1,utf-8;q=0.7,*,q=0.7

These are sent by browser, how does server response look like?
Look for the 'Content-type' header.

>
> My database is using ISO Latin 1.
<...>

This may or may not be the problem. It is possible (but not advisable)
to keep utf data in
database tables with Latin 1 encoding.
However it would be better to convert your tables (and data) to UTF,
then make sure that you
have line 'encoding: utf8' in your config/database.yml

This part is confusing, so I'll try to describe some scenarios:
a) Database's encoding is set as Latin 1 (default) and there is no
encoding line in config/database.yml. This will work, at least data
can be saved/retrieved and displayed correctly
in the browser - given the HTTP headers are right.

b) Database's encoding is Latin 1, but database.yml sets encoding to
UTF. This will not work, because that line makes database client to
issue "SET NAMES UTF8" when connecting to the database. In this case
MySQL will try to convert data which it thinks is in Latin 1 to UTF.
Alas, data is already UTF, but wrong encoding was specified–what you
get is useless conversion and character junk.

c) Like b), but in this case database is set to UTF, and data is in
UTF, but there is no encoding line in database.yml. In this case
connection's encoding will be Latin 1 and MySQL will try to convert
data from UTF to Latin 1. This will mess things, assuming HTTP
headers/meta tag says that encoding is UTF.

So what would I do in your situation:
1) Try to change view-encoding to UTF-8 in the browser. If that fixes
how characters are displayed,  then, most likely, HTTP headers are
messed up.
2) Check 'Content-type' header of the server response. If it says
anything else than UTF - that must be fixed.
3) When headers are ok and page is displayed the way it should -
convert tables and data  in DB to UTF and then add encoding: utf8 to
config/database.yml
4) Enjoy.

Of course other scenarios are likely too, which would require
additional exploration.

Regards,
Rimantas
--
http://rimantas.com/
Dave H. (Guest)
on 2007-01-02 22:44
(Received via mailing list)
Thank you very much Rimantas for that explanation. That was just what I
needed. I will try your suggestions.

-Dave
Rob B. (Guest)
on 2007-01-07 22:57
(Received via mailing list)
On Jan 2, 2007, at 3:34 PM, Dave H. wrote:

> Look for the 'Content-type' header.
> have line 'encoding: utf8' in your config/database.yml
> MySQL will try to convert data which it thinks is in Latin 1 to UTF.
> 1) Try to change view-encoding to UTF-8 in the browser. If that fixes
> additional exploration.
>
> Regards,
> Rimantas
> --
> http://rimantas.com/
>

This response is someone late, but if you're using migrations, here's
one to put your MySql tables and database to UTF8:

class UseUtf8AsDefaultCharset < ActiveRecord::Migration
   def self.my_number_is
     say File.basename(__FILE__).to_i
   end

   def self.database_charset(charset='utf8')
     execute "ALTER DATABASE `#{current_database}` DEFAULT CHARACTER
SET #{charset}"
   end

   def self.table_charset(table, charset='utf8')
     execute "ALTER IGNORE TABLE `#{table}` CONVERT TO CHARACTER SET #
{charset}, DEFAULT CHARACTER SET #{charset}"
   end

   def self.up
     my_number_is
     database_charset('utf8')
     select_values("SHOW TABLES").each do |table|
       table_charset(table, 'utf8')
     end
   end

   def self.down
     my_number_is
     database_charset('latin1')
     select_values("SHOW TABLES").each do |table|
       table_charset(table, 'latin1')
     end
   end
end

Note that the down migration puts the tables back to their "original"
character set which, in my case, was latin1.

-Rob

Rob B.    http://agileconsultingllc.com
removed_email_address@domain.invalid
Dave H. (Guest)
on 2007-01-08 18:31
(Received via mailing list)
Thanks Rob! That will come in handy.

-Dave
This topic is locked and can not be replied to.