I18n when?

Harm_de_Laat · January 30, 2006, 10:38am

Hi all,

Just wondering… Are there any plans to include i18n support in Rails
anytime soon?
I guess this is about the only feature I’m realy missing in Rails.

Any thoughts?

Regards,

Harm de Laat

Harm_de_Laat · January 30, 2006, 10:57am

second that!

Harm_de_Laat · January 30, 2006, 12:54pm

Just wondering… Are there any plans to include i18n support in Rails
anytime soon?

Take a look at http://www.globalize-rails.org/wiki/

Kasper

Harm_de_Laat · January 30, 2006, 1:43pm

Nicolai Reuschling wrote:

David stated in the “Snakes and Rubies” video (
Snakes and Rubies downloads | Django), I18N is not difficult in
a
technical way. It’s mostly string replacement.

No, it’s not that easy, as I18N is a bit more complicated, and requires
enforcement of some rules about code and data organization, see
FAQ - Basic Questions ,
FAQ - Basic Questions for example.

At present time Rails can’t even handle UTF8 properly, and you ask I18N?

I18N will be in Rails only as soon as you will implement it, don’t
rely on core team, thay have a lot more things to improve and fix, and
I18N isn’t what they are paid for.

Harm_de_Laat · January 30, 2006, 2:11pm

dseverin wrote:

At present time Rails can’t even handle UTF8 properly, and you ask I18N?

Could you elaborate on this? It seems to be working alright for me.

Jeroen

Harm_de_Laat · January 30, 2006, 2:31pm

Jeroen H. wrote:

dseverin wrote:

At present time Rails can’t even handle UTF8 properly, and you ask I18N?

Could you elaborate on this? It seems to be working alright for me.

Jeroen

It also seemed to work alright for me, but once it began to fail

http://www.fngtps.com/2006/01/encoding-in-rails#comment24

Could you just review the code of mentioned there components and tell me
that it will never break on UTF8?

E.g. validates_length_of is definitely broken,
String#blank? doesn’t take into account ALL Unicode space chars,
active_support/core_ext/string/access.rb is completely broken,
action_view/helpers/text_helper.rb has “fixed” truncate(), but broken
excerpt()

– and that list can be still incomplete…

Harm_de_Laat · January 30, 2006, 2:46pm

dseverin wrote:

– and that list can be still incomplete…

Yes I’m afraid you’re absolutely right. There also a wiki entry about
this: Peak Obsession

You would think a Japanese language would handle this properly though?
Couldn’t you globally alias all string methods to use their UTF-8
equivalents once kcode=‘utf-8’ ? I still know very little about rails
internals and ruby so I don’t really know if it’s hard to fix.

Jeroen

Harm_de_Laat · January 30, 2006, 2:58pm

Jeroen H. wrote:

You would think a Japanese language would handle this properly though?
I’m no expert, but apparently UTF-8 is far from the most popular
Japanese encoding, and there are allegedly some very cogent arguments
against it that aren’t a matter of NIH syndrome. That’s why Ruby
doesn’t do Unicode well - not because it’s not been thought about, but
because it’s been thought about too much

Couldn’t you globally alias all string methods to use their UTF-8
equivalents once kcode=‘utf-8’ ? I still know very little about rails
internals and ruby so I don’t really know if it’s hard to fix.
It is Once you lose the assumption that a codepoint is one byte,
the concept of String#length gets really murky, for example.

Harm_de_Laat · January 30, 2006, 3:13pm

Alex Y. wrote:

Couldn’t you globally alias all string methods to use their UTF-8
equivalents once kcode=‘utf-8’ ? I still know very little about rails
internals and ruby so I don’t really know if it’s hard to fix.
It is Once you lose the assumption that a codepoint is one byte,
the concept of String#length gets really murky, for example.

Yes, and you can have up to three lengths: String#byte_length,
String#codepoint_length, String#grapheme_cluster_length (e.g. NFD in
MacOS filenames).

Besides:

Most of Rails internal processing expects ASCII strings (routing and
template path magic, SQL model_table_name <-> ModelTableName etc), and
expect one byte - is one ASCII char.
Overriding globally String methods with their UTF8 equivalents is
risky, and can give unpredictable faults (Julian’s unicode_hacks caused
Webrick work improperly, and in my application ActionMailer failed), as
different parts and libraries which they reference can expect exactly
byte String methods.

So, I think, as it will hardly be fixed (YAGNI, men, YAGNI, web 2.0 is
ASCII!) i18n is still in distant perspective.

Harm_de_Laat · January 30, 2006, 3:36pm

dseverin wrote:

Yes, and you can have up to three lengths: String#byte_length,
String#codepoint_length, String#grapheme_cluster_length (e.g. NFD in
MacOS filenames).
My point precisely. And then what do you do about the byte order mark?

Besides:

So, I think, as it will hardly be fixed (YAGNI, men, YAGNI, web 2.0 is
ASCII!) i18n is still in distant perspective.

Maybe not that far off. According to this:
http://redhanded.hobix.com/inspect/futurismUnicodeInRuby.html
YARV should handle strings a little more sensibly. Once that’s in
place, the string replacing aspect should be a little simpler.

Harm_de_Laat · January 30, 2006, 12:57pm

David stated in the “Snakes and Rubies” video (
Snakes and Rubies downloads | Django), I18N is not difficult in
a
technical way. It’s mostly string replacement.

Harm_de_Laat · January 30, 2006, 6:56pm

Hi,

On Mon, 30 Jan 2006 14:31:57 +0100
dseverin [email protected] wrote:

It also seemed to work alright for me, but once it began to fail

Encoding in Rails

Could you just review the code of mentioned there components and tell me
that it will never break on UTF8?

E.g. validates_length_of is definitely broken,

I’ve used a patch below.
It works well at least Japanese/UTF-8.
#You need to call $KCODE=“u” first.

— validations.rb.old 2006-01-31 02:22:42.000000000 +0900
+++ validations.rb 2006-01-31 02:41:03.000000000 +0900
@@ -459,7 +459,7 @@
message = (options[:message] ||
options[message_options[option]]) % option_value

       validates_each(attrs, options) do |record, attr, value|

       record.errors.add(attr, message) unless !value.nil? and

value.size.method(validity_checks[option])[option_value]

       record.errors.add(attr, message) unless !value.nil? and

value.split(//).size.method(validity_checks[option])[option_value]
end
end
end

It may be better to adde String#char_length something like as:

class String
def char_length
split(//).size
end
end

Harm_de_Laat · February 2, 2006, 1:11pm

On 30-jan-2006, at 14:44, Jeroen H. wrote:

You would think a Japanese language would handle this properly
though? Couldn’t you globally alias all string methods to use their
UTF-8 equivalents once kcode=‘utf-8’ ? I still know very little
about rails internals and ruby so I don’t really know if it’s hard
to fix.

I did exactly that in my Unicode Hacks plugin, however it breaks
other software in a nasty, unpredictable way.
The article by Thijs on Fingertips (incl. the comments) analyzes the
breakage more thoroughly.

–
Julian ‘Julik’ Tarkhanov
me at julik.nl

Harm_de_Laat · February 2, 2006, 1:36am

On Mon, 30 Jan 2006, Alex Y. wrote:

Jeroen H. wrote:

You would think a Japanese language would handle this properly though?

I’m no expert, but apparently UTF-8 is far from the most popular
Japanese encoding, and there are allegedly some very cogent arguments
against it that aren’t a matter of NIH syndrome.

I feel that I’m pretty familiar with this. I live in Japan, speak, read
and write some Japanese, and have done a fair amount of I18N work for
English+Japanese web sites.

The arguments Japanese critics have against Unicode are, for the most
part, complete rubbish. They sum up to more or less the following:

 1. Unicode 1.0 has issue X. Answer: err...try using maybe only a
 five-year-old spec., like Unicode 2.0? Or get really genki and
 upgrade to 3.0, maybe even before 4.0 comes out!

 2. Important characters are missing. Answer: they're missing from
 JIS character sets, too. And more are missing from JIS character
 sets than Unicode. Unicode has *every single character* available

in
Shift-JIS, EUC-JP and ISO-2022-JP, so if you’re switching from
those
(which is what almost every system in Japan uses), you lose not one
single character you had before.

 3. You can't tell the difference between Chinese and Japanese
 in Unicode. Answer: in the same sense that you can't tell the
 difference between French and English in ISO-8859-1. We have other
 methods for doing that that don't involve adding characters to a
 character set. (Some Japanese apparently feel that Unicode should

do
the equivalant of having different “French” and “English” versions
of the letter “a”.)

 4. UTF-8 takes more space. Answer: on a typical web page, it takes
 7% more space than Shift-JIS or EUC-JP. It's not a big deal, and
 is actually a very small price to pay for the benefits you get.
 However, there are other encodings available.

The Japanese critics tend to ignore other things that benefit them.
Some, they just don’t care about, such as the ability to have French and
Japanese on the same page. Others, such as having “generic” web-based
message board systems and other programs “just work” without any extra
effort on the part of a foreign developer, they really ought to care
about, because it will save them money in a very direct way.

Anyway, there’s my rant for the day.

cjs

Curt S. [email protected] +81 90 7737 2974
The power of accurate observation is commonly called cynicism
by those who have not got it. --George Bernard Shaw

Harm_de_Laat · February 2, 2006, 1:11pm

On 30-jan-2006, at 15:13, dseverin wrote:

So, I think, as it will hardly be fixed (YAGNI, men, YAGNI, web 2.0 is
ASCII!) i18n is still in distant perspective.

I wonder what you were smoking

Julian ‘Julik’ Tarkhanov
me at julik.nl

Harm_de_Laat · February 2, 2006, 6:31pm

Look at Localization package

Rgds,
–Siva J.
http://www.varcasa.com/
My First Rails Project.
Education Through Collabration