Encoding problems with Rails 3 + Ruby 1.9.1 (big surprise)


#1

I have kind of an interesting problem.

I have a form wherein people enter information. Big surprise. If they
enter any “weird” characters like ø or é or whatever, the form will
submit and all is well. However, I have a select box for the state
which, if you’re looking at Spain, has states like A Coruña, Cádiz and
País Vasco. These are pulled from the database which is set to have
everything encoded in UTF-8. Everything we’re doing is in UTF-8.

However… when it renders the template IF someone used a non-ASCII
character in a field that appears BEFORE the select I get this error:

incompatible character encodings: ASCII-8BIT and UTF-8 (on the same
line as f.select :state)

If one of the fields AFTER the state field (like the postal code)
contains a non-ASCII character the error is reversed:

incompatible character encodings: UTF-8 and ASCII-8BIT (on the same
line as f.select :postal_code)

The more I work with encodings in Rails and Ruby in general, the more
I find myself confused and frustrated. I added config.encoding =
Encoding::UTF_8 to my application.rb, but that doesn’t appear to
affect templates at all. The problem, so far as I can see, is in one
of two places:

I either need to tell Rack to make all my string parameters encoded in
UTF-8 or I need to set my template default encoding to UTF-8. A quick
fix is:

params[:form].each { |k, v| v.force_encoding ‘UTF-8’ if v.is_a?
String }

I know this is not ideal, but I don’t understand how the view works
well enough to do this better.

What should I do to fix this problem? (Oh, and I’m using ERB, as an
FYI.)


#2

On Thu, Jul 1, 2010 at 12:09 PM, cult hero removed_email_address@domain.invalid
wrote:

character in a field that appears BEFORE the select I get this error:
The more I work with encodings in Rails and Ruby in general, the more
String }

I know this is not ideal, but I don’t understand how the view works
well enough to do this better.

What should I do to fix this problem? (Oh, and I’m using ERB, as an
FYI.)

Could you try latest Rails master?

If you’re using the ‘mysql’ driver, please try mysql2 or ruby-mysql
instead.

jeremy


#3

cult hero wrote:

I have kind of an interesting problem.

No this problem is boring and is known since at least 2008. It just bit
my ass with Rails 2.3.8.

Here is a beautiful fix: http://redmine.ruby-lang.org/issues/show/1238

I don’t know exactly how to fix all this mess.


#4

It’s definitely not mysql related. I’m not using MySQL. I’m using
PostgreSQL and I’m using Sequel. All the strings coming from the
database are UTF-8.

And I saw A LOT about the magic comment, but where do I put it in a
template? And there’s no way to basically set a “default” magic
comment?


#5

And I saw A LOT about the magic comment, but where do I put it in a
template? And there’s no way to basically set a “default” magic
comment?

Same problem for me. I fixed the models, controllers and helpers by
adding the magic comments, but I don’t know how to fix the problem in
the view. Anyone?


#6

Oh, and I should add I’m using beta3.


#7

On Thu, Jul 1, 2010 at 12:09 PM, cult hero removed_email_address@domain.invalid
wrote:

character in a field that appears BEFORE the select I get this error:
The more I work with encodings in Rails and Ruby in general, the more
String }

I know this is not ideal, but I don’t understand how the view works
well enough to do this better.

What should I do to fix this problem? (Oh, and I’m using ERB, as an
FYI.)

Hi, I would recommend using Rails 3 Beta 4 and Ruby 1.9.2. This worked
well for me for the last 4 months. Next, I would recommend using the
mysql2 gem if you’re using mysql2.

Good luck,

-Conrad


#8

Good luck,

-Conrad

Hi Conrad, thanks for the tip. Yeah I’m eagerly waiting for Rails 3 to
get released!

In the mean time I managed to make Rails 2.3.8 play nicely with Ruby
1.9.1 and that’s very painful to do. I would suggest people to stick to
Ruby 1.8 for some time until everything gets settled. 3rd party gems
also have to be updated to be compatible with the new 1.9 encoding
handling.


#9

Adding to environment.rb:

Encoding.default_external = Encoding::UTF_8

Helps fix a few problems until it explodes somewhere else.

Ruby 1.9 is a catastrophe!


#10

Sent from my iPhone

On Jul 3, 2010, at 11:52 AM, Fernando P. removed_email_address@domain.invalid
wrote:

also have to be updated to be compatible with the new 1.9 encoding
handling.

I have also have been using Rails 2.3.5 and Ruby 1.9.2 for one project
for 6+ months. Thus, it has been super simple to get everything working
by using RVM. Thus, it will make it super simple to migrate this
project to Rails 3. Lastly, if you’re using Ruby 1.9.1, then you’re
definitely using the wrong version Ruby because it does have bugs.
Furthermore, Ruby 1.9.2 is the first C Ruby version to pass 100% of the
RubySpec. Last but least, Ruby 1.9.2 cleans up the Ruby syntax and
provides the much needed speed boost in production.

Good luck,

-Conrad


#11

Ruby 1.9.2 is not yet released, I’ll wait it goes final to update my
freebsd port. Until then I’ll be running buggy 1.9.1.

My main problem was handling differently encoded strings. So I had to
add magic comments all over the place, and force_encoding of rdiscount’s
output which is US-ASCII.

Moreove my original language uses accentuated characters so if you only
write english you might have not run into the same issues as me. But if
one of your users posts an accentuated char I guess your app will
explode. Have you tried?


#12

Sent from my iPhone

On Jul 3, 2010, at 11:52 AM, Fernando P. removed_email_address@domain.invalid
wrote:

also have to be updated to be compatible with the new 1.9 encoding
handling.

Many gems have been updated to support Ruby 1.9 and it should be super
simple to fix the ones that are not compatible. I had a very large code
base using a lot of gems and plugins. The ones that had associated
tests were much easier to fix in general. Lastly, getting up to speed
with the syntax and semantic changed made porting for me the easiest as
I worked through the various issues. In short, you’ll have to make
changes to your code either now or later. Thus, I prefer to make
incremental improvements over. For example, moving to Ruby 1.9.2.
Next, I plan to move to Rails 3. I go in knowing that somethings will
not work and will need to be fixed which is a part of software
engineering. Just create another branch and just do it. :slight_smile:

Good luck,

-Conrad


#13

On Sat, Jul 3, 2010 at 4:06 PM, Fernando P. removed_email_address@domain.invalid
wrote:

Ruby 1.9.2 is not yet released, I’ll wait it goes final to update my
freebsd port. Until then I’ll be running buggy 1.9.1.

1.9.2 is currently in preview and I’m using it on several production
applications
with great success. For me, it works better that 1.9.1.

My main problem was handling differently encoded strings. So I had to
add magic comments all over the place, and force_encoding of rdiscount’s
output which is US-ASCII.

Moreove my original language uses accentuated characters so if you only
write english you might have not run into the same issues as me. But if
one of your users posts an accentuated char I guess your app will
explode. Have you tried?

The application that I’m working on support German, Spanish, Russian,
Japanese, French, Portuguese, and Chinese.

Good luck,

-Conrad


#14

Hi, do you test case that I can run locally because I have done a lot of
work
in this regard?

-Conrad


#15

The application that I’m working on support German, Spanish, Russian,
Japanese, French, Portuguese, and Chinese.

uh? Did you have to add plenty magic comments to your files? Do you need
to force_encoding on certain strings such as those returned by rdiscount
or hpricot?

What changes did you make to your rails app when you moved from Ruby 1.8
to 1.9.x to avoid the dreaded US-ASCII conflict?


#16

On Sun, Jul 4, 2010 at 6:00 PM, Fernando P. removed_email_address@domain.invalid
wrote:

Last question, do you currently used Rails 3 or Rails 2 with Ruby 1.9?
It’s not clear from your previous posts. Thx

The current version (i.e. production) uses Rails 2.3.5 and Ruby 1.9.2
and
the
development version uses Rails 3.0 beta 4 and Ruby 1.9.2. Furthermore,
both
applications currently use the mysql2 Ruby gem.

Good look,

-Conrad


#17

Last question, do you currently used Rails 3 or Rails 2 with Ruby 1.9?
It’s not clear from your previous posts. Thx


#18

On Sun, Jul 4, 2010 at 5:58 PM, Fernando P. removed_email_address@domain.invalid
wrote:

The application that I’m working on support German, Spanish, Russian,
Japanese, French, Portuguese, and Chinese.

uh? Did you have to add plenty magic comments to your files? Do you need
to force_encoding on certain strings such as those returned by rdiscount
or hpricot?

I did not use either hpricot and rdiscount within our development or
production
application. Next, I did not have to force encoding because the
underlying
OS
environment is UTF-8 by default. Thus, you need to make sure that your
external
encoding (.i.e. the encoding used for files) and internal encoding
(.i.e.
the encoding
that’s used for the creation of new string) match up. BTW, I found this
information
when I did my initial research on putting together a multilingual
application using
Ruby/Rails. The default encoding used in Ruby 1.9 is ASCII-7BIT which
is
the
same for Ruby 1.8. Did you read the relevant chapters in “Programming
Ruby
1.9”?
If not, I would highly recommend reading them because they provide a
wealth
of
information.

What changes did you make to your rails app when you moved from Ruby 1.8
to 1.9.x to avoid the dreaded US-ASCII conflict?

I remember having to set the default internal encoding and I was good to
go.
Thus,
I had to do the following:

Encoding.default_internal = ‘utf-8’

Next, I’m using HTML 5 technologies and the view templates are set to
utf-8
within the head tag.

Good luck,

-Conrad


#19

Wait!

I noticed that on my freeBSD box, the locale environment variables are
not set.

$ locale
LANG=
LC_CTYPE=“C”
LC_COLLATE=“C”
LC_TIME=“C”
LC_NUMERIC=“C”
LC_MONETARY=“C”
LC_MESSAGES=“C”
LC_ALL=

Also this problem does not happen in development mode!

$ ./script/console
Loading development environment (Rails 2.3.8)

puts Encoding.default_internal
UTF-8
=> nil

$ RAILS_ENV=production ./script/console
Loading production environment (Rails 2.3.8)
/usr/local/lib/ruby/gems/1.9/gems/activesupport-2.3.8/lib/active_support/vendor/i18n-0.3.7/i18n/backend/base.rb:244:in
`read’: “\xC3” on US-ASCII (Encoding::InvalidByteSequenceError)

Any idea?


#20

Hi Conrad,

My files are all encoded in UTF-8 because I use TextMate and I double
checked on my server with:

$ file --mime-encoding app/views/layout/application.html.erb

My layout is defined with an html5 doctype and . I
tested it with w3c validator and it detects utf-8.

Now if I remove the magic comment <%# # -- coding: UTF-8 -- %> from
application.html.erb and put Encoding.default_internal = Encoding::UTF_8
at the top of environment.rb, I get the following error:

=> Booting WEBrick
=> Rails 2.3.8 application starting on http://0.0.0.0:3000
[gem_path/activesupport-2.3.8/lib/active_support/vendor/i18n-0.3.7/i18n/backend/base.rb:244:in
`read’: “\xC3” on US-ASCII (Encoding::InvalidByteSequenceError)

It happens with Passenger as well.

So you’re telling me you managed to fix that? I don’t understand how to
do it, I nearly tried every trick in the book.