UTF-8 character downcase!

Hello,
Who can help me with problem?

I have a word = “ПРИВЕТ”, it’s in russian, and i want to downcase this
word(=привет). But standart method downcase not works with non-english
letters

Thank you for reply

From: [email protected] [mailto:[email protected]] On Behalf
Of
Igor K.
Sent: Saturday, September 01, 2007 1:24 PM

Hello,
Who can help me with problem?

I have a word = “ðòé÷åô”, it’s in russian, and i want to downcase this
word(=ÐÒÉ×ÅÔ). But standart method downcase not works with non-english
letters

Thank you for reply

First of all, do you know about ror2ru google group? You’ll be much
comfortable there with unicode questions…

Brief answer is: there’s Unicode gem, which allows things like
Unicode#downcase(string).

Longer answer is: there’s also Julian’s Tarkhanov unicode_hacks, which
uses
this gem for more comfortable things like string#downcase.

Even longer answer is: russian RoR community has something more
organized
about this, but personally I don’t know, as I’m do nothing with RoR. I
use
unicode+unicode_hacks.

V.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Привет Игор ☺

  • Igor K.:

I have a word = “ПРИВЕТ”, it’s in russian, and i want to downcase this
word(=привет). But standart method downcase not works with non-english
letters

Should no tool be available to do downcase, upcase, and the like for
Russian what about implementing it? Doing so is not very complicated.
If you actually intend to do this please support all characters of the
Cyrillic script (quite a number of them is not used in Russian).

The full list of Cyrillic characters and their Unicode code points are
available at Unicode.org

Cyrillic: http://www.unicode.org/charts/PDF/U0400.pdf
Cyrillic supplement: http://www.unicode.org/charts/PDF/U0500.pdf

IANAL but to my understanding it is perfectly legal to use these sheets
(in contrast to buying high-price standards documents that is) for
implementing a conversion tool.

  • From these sheets you can create an array of all lowercase and an
    array
    of all uppercase letters and out of them regular expressions that match
    precisely one Cyrillic letter and a hash that maps each lowercase
    character onto an uppercase one and one that maps each uppercase
    character onto a lowercase one. Should some lowercase or uppercase
    character have no counterpart (I am not completely sure if this is the
    case or not) simply exclude it.

Consider applying the standard method before applying the abovementioned
mapping.

Hope that helps a bit,

Josef ‘Jupp’ Schugt


Blog available at http://www.mynetcologne.de/~nc-schugtjo/blog/
PGP key with id 6CC6574F available at http://wwwkeys.de.pgp.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFG2V+yrhv7B2zGV08RAhsIAKC48/AXTo3/qB0vo9l0tMM5su3MRQCZAW3L
uiT9bm6vhi/eN5dKxwHrWPQ=
=1Mdu
-----END PGP SIGNATURE-----

UTF-8 character downcase!!
Posted by Igor K. (demoversion) on 01.09.2007 12:24
Hello,
Who can help me with problem?

I have a word = “ПРИВЕТ”, it’s in russian, and i want to downcase this
word(=привет). But standart method downcase not works with non-english
letters

Thank you for reply
Reply with quote

Did you try using the character-encodings gem?

http://rubyforge.org/projects/char-encodings/

http://snippets.dzone.com/posts/show/2786

I have a word = “ПРИВЕТ”, it’s in russian, and i want to downcase this
word(=привет). But standart method downcase not works with non-english
letters

Thank you for reply
Reply with quote

Did you try using the character-encodings gem?

http://rubyforge.org/projects/char-encodings/

http://snippets.dzone.com/posts/show/2786

Thank you for reply

But i can’t install this plugin on Windows(the problem i think with
command ‘make’, it’s missing on Windows XP, so i can’t compile source)

Who knows solution for this problem?

Thanks

Re: UTF-8 character downcase!!
Posted by Igor K. (demoversion) on 02.09.2007 11:08

But i can’t install this plugin on Windows(the problem i think with
command ‘make’, it’s missing on Windows XP, so i can’t compile source)

As already indicated above, you may also consider to roll your own
utf8-aware downcase method.

class String

def downcase_utf8

  gsub(/./mu) do |char|

     case char
        when /\320\237/u then "\320\277"
        when /\320\240/u then "\321\200"
        when /\320\230/u then "\320\270"
        when /\320\222/u then "\320\262"
        when /\320\225/u then "\320\265"
        when /\320\242/u then "\321\202"
     else
        char
     end

  end  # gsub

end
end

Cheers,

J.K.

From: [email protected] [mailto:[email protected]] On Behalf
Of
Igor K.
Sent: Sunday, September 02, 2007 12:09 PM

Install Visual C++ (Express version is free). Then ‘nmake’.

V.