Forum: Ruby on Rails Encoding problems with Rails 3 + Ruby 1.9.1 (big surprise)

Posted by cult hero (Guest)
on 2010-07-01 21:10
(Received via mailing list)
I have kind of an interesting problem.

I have a form wherein people enter information. Big surprise. If they
enter any "weird" characters like ø or é or whatever, the form will
submit and all is well. However, I have a select box for the state
which, if you're looking at Spain, has states like A Coruña, Cádiz and
País Vasco. These are pulled from the database which is set to have
everything encoded in UTF-8. Everything we're doing is in UTF-8.

However... when it renders the template IF someone used a non-ASCII
character in a field that appears BEFORE the select I get this error:

incompatible character encodings: ASCII-8BIT and UTF-8 (on the same
line as f.select :state)

If one of the fields AFTER the state field (like the postal code)
contains a non-ASCII character the error is reversed:

incompatible character encodings: UTF-8 and ASCII-8BIT (on the same
line as f.select :postal_code)

The more I work with encodings in Rails and Ruby in general, the more
I find myself confused and frustrated. I added config.encoding =
Encoding::UTF_8 to my application.rb, but that doesn't appear to
affect templates at all. The problem, so far as I can see, is in one
of two places:

I either need to tell Rack to make all my string parameters encoded in
UTF-8 or I need to set my template default encoding to UTF-8. A quick
fix is:

params[:form].each { |k, v| v.force_encoding 'UTF-8' if v.is_a?
String }

I know this is not ideal, but I don't understand how the view works
well enough to do this better.

What should I do to fix this problem? (Oh, and I'm using ERB, as an
FYI.)
Posted by Jeremy Kemper (Guest)
on 2010-07-01 23:53
(Received via mailing list)
On Thu, Jul 1, 2010 at 12:09 PM, cult hero <binarypaladin@gmail.com> 
wrote:
> character in a field that appears BEFORE the select I get this error:
> The more I work with encodings in Rails and Ruby in general, the more
> String }
>
> I know this is not ideal, but I don't understand how the view works
> well enough to do this better.
>
> What should I do to fix this problem? (Oh, and I'm using ERB, as an
> FYI.)

Could you try latest Rails master?

If you're using the 'mysql' driver, please try mysql2 or ruby-mysql 
instead.

jeremy
Posted by Fernando Perez (fernando)
on 2010-07-02 01:38
cult hero wrote:
> I have kind of an interesting problem.

No this problem is boring and is known since at least 2008. It just bit 
my ass with Rails 2.3.8.

Here is a beautiful fix: http://redmine.ruby-lang.org/issues/show/1238

I don't know exactly how to fix all this mess.
Posted by cult hero (Guest)
on 2010-07-02 02:14
(Received via mailing list)
It's definitely not mysql related. I'm not using MySQL. I'm using
PostgreSQL and I'm using Sequel. All the strings coming from the
database are UTF-8.

And I saw A LOT about the magic comment, but where do I put it in a
template? And there's no way to basically set a "default" magic
comment?
Posted by cult hero (Guest)
on 2010-07-02 02:17
(Received via mailing list)
Oh, and I should add I'm using beta3.
Posted by Fernando Perez (fernando)
on 2010-07-03 11:31
> And I saw A LOT about the magic comment, but where do I put it in a
> template? And there's no way to basically set a "default" magic
> comment?

Same problem for me. I fixed the models, controllers and helpers by 
adding the magic comments, but I don't know how to fix the problem in 
the view. Anyone?
Posted by Fernando Perez (fernando)
on 2010-07-03 13:16
Adding to environment.rb:

Encoding.default_external = Encoding::UTF_8

Helps fix a few problems until it explodes somewhere else.

Ruby 1.9 is a catastrophe!
Posted by Conrad Taylor (conradwt)
on 2010-07-03 19:25
(Received via mailing list)
On Thu, Jul 1, 2010 at 12:09 PM, cult hero <binarypaladin@gmail.com> 
wrote:

> character in a field that appears BEFORE the select I get this error:
> The more I work with encodings in Rails and Ruby in general, the more
> String }
>
> I know this is not ideal, but I don't understand how the view works
> well enough to do this better.
>
> What should I do to fix this problem? (Oh, and I'm using ERB, as an
> FYI.)
>
>
Hi, I would recommend using Rails 3 Beta 4 and Ruby 1.9.2.  This worked
well for me for the last 4 months.  Next, I would recommend using the
mysql2 gem if you're using mysql2.

Good luck,

-Conrad
Posted by Fernando Perez (fernando)
on 2010-07-03 20:52
> Good luck,
> 
> -Conrad

Hi Conrad, thanks for the tip. Yeah I'm eagerly waiting for Rails 3 to 
get released!

In the mean time I managed to make Rails 2.3.8 play nicely with Ruby 
1.9.1 and that's very painful to do. I would suggest people to stick to 
Ruby 1.8 for some time until everything gets settled. 3rd party gems 
also have to be updated to be compatible with the new 1.9 encoding 
handling.
Posted by Conrad Taylor (conradwt)
on 2010-07-04 00:23
(Received via mailing list)
Sent from my iPhone

On Jul 3, 2010, at 11:52 AM, Fernando Perez <lists@ruby-forum.com> 
wrote:

> also have to be updated to be compatible with the new 1.9 encoding 
> handling.

I have also have been using Rails 2.3.5 and Ruby 1.9.2 for one project 
for 6+ months.  Thus, it has been super simple to get everything working 
by using RVM.  Thus, it will make it super simple to migrate this 
project to Rails 3.  Lastly, if you're using Ruby 1.9.1, then you're 
definitely using the wrong version Ruby because it does have bugs. 
Furthermore, Ruby 1.9.2 is the first C Ruby version to pass 100% of the 
RubySpec.  Last but least, Ruby 1.9.2 cleans up the Ruby syntax and 
provides the much needed speed boost in production.

Good luck,

-Conrad
Posted by Conrad Taylor (conradwt)
on 2010-07-04 00:39
(Received via mailing list)
Sent from my iPhone

On Jul 3, 2010, at 11:52 AM, Fernando Perez <lists@ruby-forum.com> 
wrote:

> also have to be updated to be compatible with the new 1.9 encoding 
> handling.

Many gems have been updated to support Ruby 1.9 and it should be super 
simple to fix the ones that are not compatible.  I had a very large code 
base using a lot of gems and plugins.  The ones that had associated 
tests were much easier to fix in general.  Lastly, getting up to speed 
with the syntax and semantic changed made porting for me the easiest as 
I worked through the various issues.  In short, you'll have to make 
changes to your code either now or later.  Thus, I prefer to make 
incremental improvements over.  For example, moving to Ruby 1.9.2. 
Next, I plan to move to Rails 3.  I go in knowing that somethings will 
not work and will need to be fixed which is a part of software 
engineering.  Just create another branch and just do it. :-)

Good luck,

-Conrad
Posted by Fernando Perez (fernando)
on 2010-07-04 01:06
Ruby 1.9.2 is not yet released, I'll wait it goes final to update my 
freebsd port. Until then I'll be running buggy 1.9.1.

My main problem was handling differently encoded strings. So I had to 
add magic comments all over the place, and force_encoding of rdiscount's 
output which is US-ASCII.

Moreove my original language uses accentuated characters so if you only 
write english you might have not run into the same issues as me. But if 
one of your users posts an accentuated char I guess your app will 
explode. Have you tried?
Posted by Conrad Taylor (conradwt)
on 2010-07-04 01:45
(Received via mailing list)
On Sat, Jul 3, 2010 at 4:06 PM, Fernando Perez <lists@ruby-forum.com> 
wrote:

> Ruby 1.9.2 is not yet released, I'll wait it goes final to update my
> freebsd port. Until then I'll be running buggy 1.9.1.
>
>
1.9.2 is currently in preview and I'm using it on several production
applications
with great success.  For me, it works better that 1.9.1.


> My main problem was handling differently encoded strings. So I had to
> add magic comments all over the place, and force_encoding of rdiscount's
> output which is US-ASCII.
>
> Moreove my original language uses accentuated characters so if you only
> write english you might have not run into the same issues as me. But if
> one of your users posts an accentuated char I guess your app will
> explode. Have you tried?
>
>
The application that I'm working on support German, Spanish, Russian,
Japanese, French, Portuguese, and Chinese.

Good luck,

-Conrad
Posted by Conrad Taylor (conradwt)
on 2010-07-04 01:56
(Received via mailing list)
Hi, do you test case that I can run locally because I have done a lot of
work
in this regard?

-Conrad
Posted by Fernando Perez (fernando)
on 2010-07-05 02:58
> The application that I'm working on support German, Spanish, Russian,
> Japanese, French, Portuguese, and Chinese.
> 

uh? Did you have to add plenty magic comments to your files? Do you need 
to force_encoding on certain strings such as those returned by rdiscount 
or hpricot?

What changes did you make to your rails app when you moved from Ruby 1.8 
to 1.9.x to avoid the dreaded US-ASCII conflict?
Posted by Fernando Perez (fernando)
on 2010-07-05 03:00
Last question, do you currently used Rails 3 or Rails 2 with Ruby 1.9? 
It's not clear from your previous posts. Thx
Posted by Conrad Taylor (conradwt)
on 2010-07-05 05:32
(Received via mailing list)
On Sun, Jul 4, 2010 at 6:00 PM, Fernando Perez <lists@ruby-forum.com> 
wrote:

> Last question, do you currently used Rails 3 or Rails 2 with Ruby 1.9?
> It's not clear from your previous posts. Thx


The current version (i.e. production) uses Rails 2.3.5 and Ruby 1.9.2 
and
the
development version uses Rails 3.0 beta 4 and Ruby 1.9.2.  Furthermore, 
both
applications currently use the mysql2 Ruby gem.

Good look,

-Conrad
Posted by Conrad Taylor (conradwt)
on 2010-07-05 05:32
(Received via mailing list)
On Sun, Jul 4, 2010 at 5:58 PM, Fernando Perez <lists@ruby-forum.com> 
wrote:

>
> > The application that I'm working on support German, Spanish, Russian,
> > Japanese, French, Portuguese, and Chinese.
> >
>
> uh? Did you have to add plenty magic comments to your files? Do you need
> to force_encoding on certain strings such as those returned by rdiscount
> or hpricot?
>
>
I did not use either hpricot and rdiscount within our development or
production
application.  Next, I did not have to force encoding because the 
underlying
OS
environment is UTF-8 by default.  Thus, you need to make sure that your
external
encoding (.i.e. the encoding used for files) and internal encoding 
(.i.e.
the encoding
that's used for the creation of new string) match up.  BTW, I found this
information
when I did my initial research on putting together a multilingual
application using
Ruby/Rails.  The default encoding used in Ruby 1.9 is ASCII-7BIT which 
is
the
same for Ruby 1.8.  Did you read the relevant chapters in "Programming 
Ruby
1.9"?
If not, I would highly recommend reading them because they provide a 
wealth
of
information.


> What changes did you make to your rails app when you moved from Ruby 1.8
> to 1.9.x to avoid the dreaded US-ASCII conflict?
>

I remember having to set the default internal encoding and I was good to 
go.
 Thus,
I had to do the following:

Encoding.default_internal = 'utf-8'

Next, I'm using HTML 5 technologies and the view templates are set to 
utf-8
within the head tag.

Good luck,

-Conrad
Posted by Fernando Perez (fernando)
on 2010-07-05 12:02
Hi Conrad,

My files are all encoded in UTF-8 because I use TextMate and I double 
checked on my server with:

$ file --mime-encoding app/views/layout/application.html.erb

My layout is defined with an html5 doctype and <meta charset="utf-8">. I 
tested it with w3c validator and it detects utf-8.

Now if I remove the magic comment <%# # -*- coding: UTF-8 -*- %> from 
application.html.erb and put Encoding.default_internal = Encoding::UTF_8 
at the top of environment.rb, I get the following error:

=> Booting WEBrick
=> Rails 2.3.8 application starting on http://0.0.0.0:3000
[gem_path/activesupport-2.3.8/lib/active_support/vendor/i18n-0.3.7/i18n/backend/base.rb:244:in 
`read': "\xC3" on US-ASCII (Encoding::InvalidByteSequenceError)

It happens with Passenger as well.

So you're telling me you managed to fix that? I don't understand how to 
do it, I nearly tried every trick in the book.
Posted by Fernando Perez (fernando)
on 2010-07-05 12:16
Wait!

I noticed that on my freeBSD box, the locale environment variables are 
not set.

$ locale
LANG=
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=


Also this problem does not happen in development mode!

$ ./script/console
Loading development environment (Rails 2.3.8)
>> puts Encoding.default_internal
UTF-8
=> nil

$ RAILS_ENV=production ./script/console
Loading production environment (Rails 2.3.8)
/usr/local/lib/ruby/gems/1.9/gems/activesupport-2.3.8/lib/active_support/vendor/i18n-0.3.7/i18n/backend/base.rb:244:in 
`read': "\xC3" on US-ASCII (Encoding::InvalidByteSequenceError)


Any idea?
Posted by Conrad Taylor (conradwt)
on 2010-07-05 12:43
(Received via mailing list)
On Mon, Jul 5, 2010 at 3:02 AM, Fernando Perez <lists@ruby-forum.com> 
wrote:

> Now if I remove the magic comment <%# # -*- coding: UTF-8 -*- %> from
> application.html.erb and put Encoding.default_internal = Encoding::UTF_8
> at the top of environment.rb, I get the following error:
>
> => Booting WEBrick
> => Rails 2.3.8 application starting on http://0.0.0.0:3000
>
> [gem_path/activesupport-2.3.8/lib/active_support/vendor/i18n-0.3.7/i18n/backend/base.rb:244:in
> `read': "\xC3" on US-ASCII (Encoding::InvalidByteSequenceError)
>
>
Actually, I'm actually using bundler but I would put this statement at 
the
bottom of the environment.rb.
Also, you can set the internal and the external encodings by doing the
following:

ruby -E <external_encoding>:<internal_encoding>

For example, you could try using something similar to the following:

PassengerRuby <path_to_ruby_executable>/ruby -E utf-8:utf-8

Next, the w3c validator detection is not all that relevant in regards to 
how
Ruby processes the file.  The
w3c validator will parse the file from top to bottom checking that the 
file
syntactically correct.  ERB engine
will parse the file looking for relevant tags and replace them 
accordingly
with the appropriate HTML.

-Conrad
Posted by Conrad Taylor (conradwt)
on 2010-07-05 13:01
(Received via mailing list)
On Mon, Jul 5, 2010 at 3:16 AM, Fernando Perez <lists@ruby-forum.com> 
wrote:

> LC_NUMERIC="C"
> LC_MONETARY="C"
> LC_MESSAGES="C"
> LC_ALL=
>
>
>
Have you tried setting the LANG environment for your OS.  This is 
currently
set within
my environment.


>
> /usr/local/lib/ruby/gems/1.9/gems/activesupport-2.3.8/lib/active_support/vendor/i18n-0.3.7/i18n/backend/base.rb:244:in
> `read': "\xC3" on US-ASCII (Encoding::InvalidByteSequenceError)
>
>
> Any idea?
>

In regards to Rails 2.3, I'm using 2.3.5 and Rails 2.3.8 as well as the
mysql2 Ruby gem which does UTF-8 by default.  Do
you have a small test case or small application which reproduces the 
issue?

-Conrad

--
Posted by Fernando Perez (fernando)
on 2010-07-05 13:14
Thanks for your messages. I'll dig into them thoroughly. I guess the 
LANG is the key.

> Do you have a small test case or small application which reproduces the 
> issue?
> 
> -Conrad

Well, my app doesn't even start up, so there is no better test case to 
provide.
Posted by Fernando Perez (fernando)
on 2010-07-05 13:51
All this is just boring. We're in 2010 now, not 1983.

I stumbled upon 
http://rvdh.de/blog/2010/01/06/why-you-cant-run-rails-23-apps-on-ruby-19/ 
and I'm experiencing the exact same problems.

So I'll just ditch Ruby 1.9 and get back to 1.8.7 until all this mess 
gets fixed and documented.

Thanks for your valuable help.
Posted by Fernando Perez (fernando)
on 2010-07-20 14:21
Hi,

It's me again. Now running Rails 3 beta4 and Ruby 1.9.1, and that 
encoding thing is as I said a gigantic catastrophe! It still does not 
work!

I added default_encoding, and it's as if it doesn't care about it.
Posted by Fernando Perez (fernando)
on 2010-07-20 15:14
Rails3 beta4 is still bugged. It should be fixed in RC1: 
https://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/4807-error-encodingundefinedconversionerror-xc3-from-ascii-8bit-to-utf-8
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.