UFT8 String looks great in MySql, garbage in browser

Hi,

I have some seed data in MySql which contains non-ASCII characters (as
you’d imagine as these are worldwide placenames that I’m storing). In
MySql on the command line they look great, they’re all formatted
perfectly, including Arabic and Chinese characters and they look great.

However, when I ask my Rails model for one of these fields, and render
that in the browser, all the lovely work is undone, and I get garbage
character spew.

E.g.:

In the commandline, mySQL: “Чанчунь,长春” (Good!)
In the browser: “K’ua-ch’eng-tzu,Чанчунь,长春” (Bad!!)

So clearly this is an encoding issue, here’s my setup:

Table created via migration with ":options => “DEFAULT CHARSET=utf8"”
Database.yml contains “encoding: utf8”
HTML is served with the meta tag containing “charset=utf-8”

If I puts directly to the command line in the controller, I get the same
(bad) spew, so I don’t think it’s the browser screwing up. All I’m doing
to fetch the string is something like “@name = City.first.name”

Any ideas?

Thanks!

-Nex

What encoding is being set in the HTTP headers, if any?

What is the exact meta tag you are using?

Michael G. wrote:

What encoding is being set in the HTTP headers, if any?

What is the exact meta tag you are using?

The meta tag is:

I’m not sure how to inspect the HTTP headers as they’re returned from
rails, without breaking out tcpdump what’s the best way to see the
header Rails is creating?

Thanks.

  • Peter

continued…

I try printing out the response object just before my controller falls
through to the view, and the header just has one value set:

{“Cache-Control”=>“no-cache”}

On Thu, Oct 22, 2009 at 2:03 PM, Peter L.
[email protected] wrote:

I’m not sure how to inspect the HTTP headers as they’re returned from
rails, without breaking out tcpdump what’s the best way to see the
header Rails is creating?

The Firebug plugin to Firefox works pretty well. Or just use telnet
from the command line.


Hassan S. ------------------------ [email protected]
twitter: @hassan

Hassan S. wrote:

On Thu, Oct 22, 2009 at 2:03 PM, Peter L.
[email protected] wrote:

I’m not sure how to inspect the HTTP headers as they’re returned from
rails, without breaking out tcpdump what’s the best way to see the
header Rails is creating?

The Firebug plugin to Firefox works pretty well. Or just use telnet
from the command line.


Hassan S. ------------------------ [email protected]
twitter: @hassan

Thanks,

Firebug reports the response header content-type as:

“Content-Type text/html; charset=utf-8”

On Thu, Oct 22, 2009 at 2:03 PM, Peter L.
[email protected] wrote:

I’m not sure how to inspect the HTTP headers as they’re returned from
rails, without breaking out tcpdump what’s the best way to see the
header Rails is creating?

The Firebug plugin to Firefox works pretty well. Or just use telnet
from the command line.

curl -I http://…/

is my favorite. Only prints back the headers the server sends and not
the body…

Peter L. wrote:
[…]

Firebug reports the response header content-type as:

“Content-Type text/html; charset=utf-8”

So your HTTP headers are OK and the database is in the right encoding.
Now for my stupid question: have you got the text encoding in the
browser set to something other than auto-detect or UTF-8?
Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Thu, Oct 22, 2009 at 2:26 PM, Peter L.
[email protected] wrote:

Firebug reports the response header content-type as:

“Content-Type text/html; charset=utf-8”

Excellent :slight_smile:

Just to cover all the DB bases – see if everything is using utf-8,
e.g.

mysql> show variables like ‘%char%’;
±-------------------------±----------------------------------------------------------+
| Variable_name | Value
|
±-------------------------±----------------------------------------------------------+
| character_set_client | utf8
|
| character_set_connection | utf8
|
| character_set_database | utf8
|
| character_set_filesystem | binary
|
| character_set_results | utf8
|
| character_set_server | utf8
|
| character_set_system | utf8
|
| character_sets_dir |
/usr/local/mysql-5.0.86-osx10.5-x86/share/mysql/charsets/ |
±-------------------------±----------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> show variables like ‘%coll%’;
±---------------------±----------------+
| Variable_name | Value |
±---------------------±----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
±---------------------±----------------+
3 rows in set (0.01 sec)

mysql>


Hassan S. ------------------------ [email protected]
twitter: @hassan

On Thu, Oct 22, 2009 at 3:32 PM, Peter L.
[email protected] wrote:

Bingo! Thanks Hassan, most of my entries are ‘latin1’.

I’m Googling furiously for the answer, but is there a place where I can
set these values? My.cnf?

Yep, exactly, set them in your /etc/my.cnf and relaunch mysqld.


Hassan S. ------------------------ [email protected]
twitter: @hassan

Bingo! Thanks Hassan, most of my entries are ‘latin1’.

However I tried to set them with the set command (e.g., “set
character_set_results = utf8;”), that didn’t have any effect so I
rebooted mysql and the settings went back to latin1 again.

I’m Googling furiously for the answer, but is there a place where I can
set these values? My.cnf?

Thanks!

Thanks a lot to Hassan, I have finally sorted this problem out properly.
There were a few more steps:

  1. I had to make sure that the right values for utf8 were set in my.cnf.
    Looking for the right set of values I found the following:

[mysqld]
init_connect=’SET collation_connection = utf8_general_ci’
init_connect=’SET NAMES utf8′
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci
skip-character-set-client-handshake

  1. I learned that once created, each database, table, and column has a
    charset assigned. And these were not automatically changed with the
    change to my.cnf. To complete the fix, I would either have to do:

alter database db_name charset=utf8;
alter table t_name charset=utf8;
AND a similar alter for the columns.

Instead I just dropped everything and recreated the databases (which
then picked up the new charset) and it worked, finally. Wow, that was an
interesting bit of setup!

Thanks to all for the help.