Ruby 1.9.2 with Rails 3.0.1

Dobai-Pataky_BSSSSl · November 3, 2010, 9:07pm

Is there a problem with Ruby 1.9.2 p0 being able to process unicode
strings? We have an application that is using the unicode characters.
Ruby 1.8.7 has no problems with it whereas ruby 1.9.2 refuses to process
them both with SQLite3 and Postgres databases in a Rails 3.0.1
application?

Thanks.

Bharat

bruparel · November 3, 2010, 9:28pm

On Wed, Nov 3, 2010 at 10:07 PM, Bharat R. [email protected]
wrote:

Is there a problem with Ruby 1.9.2 p0 being able to process unicode
strings? We have an application that is using the unicode characters.
Ruby 1.8.7 has no problems with it whereas ruby 1.9.2 refuses to process
them both with SQLite3 and Postgres databases in a Rails 3.0.1
application?

What do you mean by “refuses to process them”? Are you seeing
mojibake? Or nothing at all?

Some questions come to mind:

Is the DB connection set to use utf-8, in the case of Postgres? Not
sure how this is set for sqlite, but presume there is a way.
Is your environment somehow using an encoding besides utf8?

Regards,
Ammar

bruparel · November 3, 2010, 11:56pm

On Nov 3, 6:07pm, Bharat R. [email protected] wrote:

Is there a problem with Ruby 1.9.2 p0 being able to process unicode
strings? We have an application that is using the unicode characters.
Ruby 1.8.7 has no problems with it whereas ruby 1.9.2 refuses to process
them both with SQLite3 and Postgres databases in a Rails 3.0.1
application?

Please, read this:

http://www.chiark.greenend.org.uk/~sgtatham/bugs.html

Show us some code, show us the exact problem with a sample program.
Tell us exact version not just RUby, but your operating system and
SQLite3 you’re using.

Help us help you.

bruparel · November 4, 2010, 3:20am

On Nov 3, 10:47 pm, Bharat R. [email protected] wrote:

:lessons => [{:title => ‘Getting Started’},
rake db:seed command to fail (throw exception) as follows:
^

The solution was to put the following line at the top of this file
(seeds.rb)

encoding: utf-8

If someone can articulate some simple rules for character encoding in
Ruby 1.9.2 p0 and Rails 3.0.1 environment, that will be quite useful.

Ruby interprets each file ‘encoding’ or magical comments to decide
which encoding is going to use for that particular file.

If the file lacks encoding it assumes the one provided by
Encoding.default_external, which in your case seems US-ASCII.

sqlite3-ruby, since 1.3.0 is quite aware of character encoding and
should work properly.

If Rails is not doing the right thing, that is another question.

You can double check that doing:

ActiveRecord::Base.connection.execute ‘PRAGMA encoding’

That can tell you which encoding SQLite3 was open.

Further than that and about Rails specific issues, ask Rails-Talk:

http://groups.google.com/group/rubyonrails-talk

bruparel · November 4, 2010, 1:47am

Sorry about that. Was tired towards the end of the day and therefore a
not so bright post. I owe it to the group to clear things up.

To make a long story short, it was the character encoding problem with
ruby 1.9.2.

The following is a snippet of code from seeds.rb file

courses = [ {:title => ‘Principles of Good Cooking 1’, :course_code =>
‘PGC1’,
:lessons => [{:title => ‘Getting Started’},
{:title => ‘Saut√©ing’,
:topics => [ {:tag => “Lecture”, :title => “Introduction to
Saut√©ing”,
:pages => [ {:title => “Video Lecture” }] },
{:tag => “Quiz”, :title => “Test Your Saut√©ing IQ”,
:pages => [ {:title => “Questions” }] },
{:tag => “Taste Test”, :title => “Cooking With Wine”,
:pages => [ {:title => “Introduction”},
{:title => “Instructions”},
{:title => “Taste Wine”},
{:title => “Reduce Wine”},
{:title => “Taste Reduced Wine”},
{:title => “Your Results” },

See that ‘Saut√©ing’, string?

That is Sauteing with funny symbols over e for french. That was causing
the

rake db:seed command to fail (throw exception) as follows:

bruparel:~/school
→ rake db:seed
(in /Users/bruparel/school)
rake aborted!
/Users/bruparel/school/db/seeds.rb:3: invalid multibyte char (US-ASCII)
/Users/bruparel/school/db/seeds.rb:3: invalid multibyte char (US-ASCII)
/Users/bruparel/school/db/seeds.rb:3: syntax error, unexpected $end,
expecting ‘}’
{:title => ‘Sautéing’,
^

The solution was to put the following line at the top of this file
(seeds.rb)

encoding: utf-8

Now rake db:seed ran fine and indeed populated the tables. I could see
the correct character encoding in the databases (both SQLite3 and
Postgres) but the display was coming out with plain “Sauteing” instead
of the French rendition of “e”, that was because of the following line
in database.yml file.

development:
adapter: sqlite3
pool: 5
timeout: 5000
encoding: utf8 <— because of this
database: db/atk_school_development

Instead it should be as follows:

development:
adapter: sqlite3
pool: 5
timeout: 5000
encoding: unicode <— this works
database: db/atk_school_development

If someone can articulate some simple rules for character encoding in
Ruby 1.9.2 p0 and Rails 3.0.1 environment, that will be quite useful.

Thanks.

Bharat

bruparel · November 4, 2010, 11:17am

Luis L. wrote in post #959207:

The solution was to put the following line at the top of this file
(seeds.rb)

encoding: utf-8

If someone can articulate some simple rules for character encoding in
Ruby 1.9.2 p0 and Rails 3.0.1 environment, that will be quite useful.

Ruby interprets each file ‘encoding’ or magical comments to decide
which encoding is going to use for that particular file.

If the file lacks encoding it assumes the one provided by
Encoding.default_external, which in your case seems US-ASCII.

That answer is wrong - but I don’t blame you for giving a wrong answer,
since the whole encoding nonsense in ruby 1.9 is ridiculously
complicated.

The correct answer is: the encoding of a ruby 1.9 source file (and hence
the String literals within that file) is always US-ASCII, unless you
tag it with a #encoding line which says otherwise.

I have so far collected about 200 rules for how encodings work in ruby
1.9: string19/string19.rb at master · candlerb/string19 · GitHub

Unfortunately, this list is just the tip of the iceberg. To be complete,
it would have to describe the encoding-related behaviour of every method
on String, every method which accepts a String, and every method which
returns a String.

Regards,

Brian.

bruparel · November 4, 2010, 12:06pm

On Thu, Nov 4, 2010 at 12:18 PM, Brian C. [email protected]
wrote:

I have so far collected about 200 rules for how encodings work in ruby
1.9: string19/string19.rb at master · candlerb/string19 · GitHub

That’s a great collection of tips and rules. Thanks for sharing.

Cheers,
Ammar

bruparel · November 4, 2010, 1:53pm

Hello Brian,
You wrote:
“The correct answer is: the encoding of a ruby 1.9 source file (and
hence
the String literals within that file) is always US-ASCII, unless you
tag it with a #encoding line which says otherwise.”

This works for me and is consistent with my observation. Rails does set
a default encoding in one of the files config/application.rb as shown
below:

configure the defaulting encoding used in templates for Ruby 1.9

config.encoding = “utf-8”

It seems like the seeds.rb file which is conventionally used to
initialize data is unaware of this setting. Further, it seems like that
is not what the Rails team intended.
Regards,
Bharat

bruparel · November 4, 2010, 6:15pm

On Nov 4, 7:18am, Brian C. [email protected] wrote:

complicated.

Thank you Brian for correcting me. Encoding has always been in my TODO
list.

bruparel · November 4, 2010, 2:30pm

Bharat R. wrote in post #959309:

This works for me and is consistent with my observation. Rails does set
a default encoding in one of the files config/application.rb as shown
below:

configure the defaulting encoding used in templates for Ruby 1.9

config.encoding = “utf-8”

It seems like the seeds.rb file which is conventionally used to
initialize data is unaware of this setting. Further, it seems like that
is not what the Rails team intended.

As the comment says, that setting is used for templates, but seeds.rb is
ruby source code.

When you read a ruby 1.9 source file using load() or require(), then the
encoding is always forced to US-ASCII unless you tag it with a
#encoding. That is actually a sane default - imagine what would happen
if the same source file were parsed differently depending on what system
it ran on (*).

It gets more complex if instead of using load() or require(), you read
the file into a String and then eval() that String. In that case, the
encoding of the String is used as the source encoding, unless overridden
by a #encoding line.

Regards,

Brian.

(*) However, the same program may still behave differently on different
systems, even if parsed identically. This is because the default is to
allow the environment to decide the encoding of data files. You need to
explicitly override this if you want your program to behave in a sane
fashion, and that’s what Rails is doing: whenever it reads a template,
it applies its own config.encoding setting instead of letting Ruby pick
an (essentially arbitrary) encoding.