Forum: Ruby MySql+UTF8 woes

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C667185722eaecf12461cb423dc8b410?d=identicon&s=25 Ronald Fischer (Guest)
on 2007-07-26 13:53
(Received via mailing list)
I am experiencing a strange problem when retrieving Unicode values from
my MySql5 database on Windows. Here is what I have:

* All programs/modules in my application require in the beginning a file
containing

    $KCODE='u'
    require 'jcode'

in the beginning, ensuring that "everything is done in Unicode".

* I checked that the MySql database is "as UTF-8 as possible", in
particular
 - in the table editor of the MySql Administrator, the character set
   for the rows is listed as utf8 (with sorting order utf8_general_ci)
 - The "start variables" tab of the Administrator says utf8 for the
default
   character set
 - mysql.ini says
    default-character-set=utf8
 - On the mysql command line, I see

mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | utf8   |
| character_set_connection | utf8   |
| character_set_database   | utf8   |
| character_set_filesystem | binary |
| character_set_results    | utf8   |
| character_set_server     | utf8   |
| character_set_system     | utf8   |
+--------------------------+--------+


Indeed, the following works well:

    When I store (from Ruby, via 'mysql' gem) a UTF-8 string
    into the database, and later retrieve it, I get back exactly
    what I had put in. This works equally well, whether I type
    (using my UTF-8 enabled text editor) the character (for
    example, a German umlaut) directly into the Ruby string,
    which I then store, i.e. (table X having only one column)

      sqlstatement="INSERT INTO X VALUES ('ö')"
      dbhandle.query(sqlstatement)

    or whether I write the code point in hexadecimal:

      uml_o="\xc3\xb6" # ö, i.e. German umlaut-o in UTF-8 encoding
      sqlstatement="INSERT INTO X VALUES ('#{uml_o}')"
      dbhandle.query(sqlstatement)

   In both cases, when I retrieve the rows with
      dbhandle.query("SELECT * FROM X")
   I eventually get back the row with my "ö" as a UTF-8
   2-Byte-sequence C3 B6.

So far OK. But there are several things which do not work:

(1) When I look at the database table using the SQL Administrator,
    I do not see the letter 'ö', but instead the characters corresponding
    to the Windows ANSI code sequence for C3 and B6 characters.

(2) When I manually enter data into the database using MySql
    Administrator, by submitting the SQL statement
       INSERT INTO X VALUES
('ö')    the ö is displayed fine when looking at the table with a SELECT
    using the Administrator, but when I read the table from Ruby, I do
    not get back a 2 byte UTF-8 sequence for the umlaut, but only one
    byte with hex code F6. Note that this F6 corresponds to the ANSI
    code for ö in Windows.

(3) When I put the unicode data into a file, and use a Java application
    to read the file and write the data into the database (using JDBC),
and
    I look at the database then with MySql Administrator, I can see the
    German characters; and when I then read them back with my Ruby
    application, I get it encoded in Windows ANSI, not Unicode.

Observations (1) and (2) could be explained in the way that maybe the
MySql administrator interface was not written with UTF-8 in minde. What
surprises me more is that when writing data using Java (which is
inherently
Unicode aware) and reading it back with Ruby, causes somewhere a
conversion
from UTF-8 into Windows ANSI.

Does someone have experienced a similar problem and can point out a
possible
reason for it?


Ronald
This topic is locked and can not be replied to.