String#upcase/downcase with UTF-8 strings in Ruby 1.9

Hello,

in Ruby 1.9 I get the following behaviour:

“aoueäöüé”.upcase
=>
“AOUEäöüé”>> “AOUEÄÖÜÉ”.downcase
=> “aoueÄÖÜÉ”

I can’t find however find a bug in the bug tracking system.
Doesn’t this qualify as a bug?

Cheers, Stefan

Hi,

In message “Re: String#upcase/downcase with UTF-8 strings in Ruby 1.9”
on Thu, 10 Jul 2008 07:09:29 +0900, “Stefan S.”
[email protected] writes:

|in Ruby 1.9 I get the following behaviour:
|
|>> “aoueäöüé”.upcase
|=> “AOUEäöüé”
|>> “AOUEÄÖÜÉ”.downcase
|=> “aoueÄÖÜÉ”
|
|I can’t find however find a bug in the bug tracking system.
|Doesn’t this qualify as a bug?

The document for String#upcase says:

call-seq:
str.upcase => new_str

Returns a copy of str with all lowercase letters replaced with
their
uppercase counterparts. The operation is locale insensitive—only
characters a'' to z’’ are affected.
Note: case replacement is effective only in ASCII region.

 "hEllO".upcase   #=> "HELLO"

See “Note:”. Tim B. have persuaded me to do so, since case
conversion outside of ASCII region is highly dependent on country,
language, culture and script.

          matz.

The document for String#upcase says:

Yes, sorry, I should have read the documentation

See “Note:”. Tim B. have persuaded me to do so, since case
conversion outside of ASCII region is highly dependent on country,
language, culture and script.

So basically the Python guys are going down a wrong route ?

-- coding: utf-8 --

import string
print string.upper(u"aoueäöüé")
print string.lower(u"AOUEÄÖÜÉ")

works as expected.

Cheers, Stefan

On Jul 9, 2008, at 8:17 PM, Stefan S. wrote:

-- coding: utf-8 --

import string
print string.upper(u"aoueäöüé")
print string.lower(u"AOUEÄÖÜÉ")

works as expected.

Cheers, Stefan

No.
They’re going down a different route.
Seriously, the language handling is something that could easily be
handled by extensions. It does not need to be a core part of the
language.
Even operating systems handle these things with proprietary and very
sophisticated techniques based on the language in question.
In most cases, what you are expecting to be the correct upper case
characters may be ‘correct’ but it will ultimately depend on the
language and the context.

On Jul 9, 2008, at 6:25 PM, Yukihiro M. wrote:

|=> “aoueÄÖÜÉ”
with their
matz.

This leaves the perfect opening for people to contribute locale or
language specific extensions to String.
It would make a great gem with a plug-in architecture.
Just add options for the language you want to use.
In any case it can get very tricky to do character conversions with
different languages.

No.
They’re going down a different route.
Seriously, the language handling is something that could easily be
handled by extensions. It does not need to be a core part of the
language.

Is Nikolai W.'s Ruby Character Encodings Library [1] currently the
best way to go?

Stefan

[1] http://bitwi.se/software/ruby/character-encodings/

Seriously, the language handling is something that could easily be
handled by extensions. It does not need to be a core part of the
language.

Are there any working extensions for Ruby 1.9 that offer Unicode support
for String#downcase/upcase and/or Array#sort?

Stefan