Forum: Ruby Re: unicode in ruby

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
67bb4df2775f6a6b603347dce7119571?d=identicon&s=25 unknown (Guest)
on 2006-03-08 20:26
(Received via mailing list)
Dear Michal,

hmm.
I have written the most complex Czech text file I could think of
containing
the words Cesky and cesky - with hooks, that is, and saved it in  UTF8
encoding as file utf8.txt.
Now, having installed Library No. 8 from
_http://www.yoshidam.net/Ruby.html_
(http://www.yoshidam.net/Ruby.html) ,
the following script

-------------------------------------------------------------------

#!/c/axelhome/ruby -Ku -rjcode
require "unicode"

text=IO.readlines("/c/axelhome/ruby/utf8.txt")
p 'mytext'
p  text
c=text.to_s.split
p  c
d=Unicode::downcase(c[0])
e=Unicode::downcase(c[1])
puts "are the  first letters equal ?  #{d==e}"
F889bf17449ffbf62345d2b2d316a937?d=identicon&s=25 Michal Suchanek (Guest)
on 2006-03-08 21:29
(Received via mailing list)
On 3/8/06, Nuralanur@aol.com <Nuralanur@aol.com> wrote:
> Dear Michal,
>
> hmm.
> I have written the most complex Czech text file I could think of  containing
> the words Cesky and cesky - with hooks, that is, and saved it in  UTF8

hmm, I wonder why C with a hook is called Ccaron in keymaps..

> encoding as file utf8.txt.
> Now, having installed Library No. 8 from _http://www.yoshidam.net/Ruby.html_

This is much smaller library than icu, and it could be useful for some
other text processing.
Thanks for the link.

But it is still a library one has to install in addition to Ruby. Some
people who install Debian won't even have a compiler, and almost
nobody has a compiler on Windows.

Also note that unless you know the language you are processing you
cannot get correct upcase/downcase. For one, in most languages using
Latin letters I is uppercase form of i. But I heared that in Turkish
they use both I, I with dot, i, and i without dot. Consequently, I is
uppercase form of i without dot ..

Thanks

Michal


--
             Support the freedom of music!
Maybe it's a weird genre  ..  but weird is *not* illegal.
Maybe next time they will send a special forces commando
to your picnic .. because they think you are weird.
 www.music-versus-guns.org  http://en.policejnistat.cz
This topic is locked and can not be replied to.