Re: unicode in ruby


#1

Dear Michal,

hmm.
I have written the most complex Czech text file I could think of
containing
the words Cesky and cesky - with hooks, that is, and saved it in UTF8
encoding as file utf8.txt.
Now, having installed Library No. 8 from
http://www.yoshidam.net/Ruby.html
(http://www.yoshidam.net/Ruby.html) ,
the following script


#!/c/axelhome/ruby -Ku -rjcode
require “unicode”

text=IO.readlines("/c/axelhome/ruby/utf8.txt")
p ‘mytext’
p text
c=text.to_s.split
p c
d=Unicode::downcase(c[0])
e=Unicode::downcase(c[1])
puts “are the first letters equal ? #{d==e}”


#2

On 3/8/06, removed_email_address@domain.invalid removed_email_address@domain.invalid wrote:

Dear Michal,

hmm.
I have written the most complex Czech text file I could think of containing
the words Cesky and cesky - with hooks, that is, and saved it in UTF8

hmm, I wonder why C with a hook is called Ccaron in keymaps…

encoding as file utf8.txt.
Now, having installed Library No. 8 from http://www.yoshidam.net/Ruby.html

This is much smaller library than icu, and it could be useful for some
other text processing.
Thanks for the link.

But it is still a library one has to install in addition to Ruby. Some
people who install Debian won’t even have a compiler, and almost
nobody has a compiler on Windows.

Also note that unless you know the language you are processing you
cannot get correct upcase/downcase. For one, in most languages using
Latin letters I is uppercase form of i. But I heared that in Turkish
they use both I, I with dot, i, and i without dot. Consequently, I is
uppercase form of i without dot …

Thanks

Michal


Support the freedom of music!
Maybe it’s a weird genre … but weird is not illegal.
Maybe next time they will send a special forces commando
to your picnic … because they think you are weird.
www.music-versus-guns.org http://en.policejnistat.cz