Ruby, irb and iconv with translit


#1

Greetings.

I’ve been trying to find out why sorting a list of German names failed
on
both my local Gentoo box and my remote Debian server. Can somebody
please
explain the following in simple words?

$ ruby -v
ruby 1.8.5 (2006-12-25 patchlevel 12) [x86_64-linux]
$ irb -v
irb 0.9.5(05/04/13)

$ cat test.rb | irb

$KCODE=‘u’
=> “u”

require “iconv”
=> true

$conv = Iconv.new(“ASCII//TRANSLIT”, “UTF-8”)
=> #Iconv:0x2ace62d57090

$arg = “ärger”
=> “ärger”

$asc = $conv.iconv($arg)
=> “?rger”

puts $asc.size
5

ok, translit fails. This might be a bug somewhere,
but then why does the following work, where I called
iconv interactively, but with the same string?

$ irb -r test.rb
5

watch_this = $conv.iconv($arg)
=> “aerger”

watch_this.size
=> 6

Thanks,
s.


#2

On Sat, 03 Mar 2007 18:33:18 +0100, Stefan S. wrote:

=> “?rger”
=> “aerger”

watch_this.size
=> 6

Another facet of the problem:

$ irb -r test
5
$ irb

require “test”
6
=> true

I’d really like to know what irb does to make iconv behave…
s.


#3

On Sat, 03 Mar 2007 18:47:44 +0100, Stefan S. wrote:

I’ve been trying to find out why sorting a list of German names failed
on both my local Gentoo box and my remote Debian server. Can somebody
please explain the following in simple words?

The folks at #ruby-de helped me out with some brain waves and the
problem originates with locale settings. To wit:

$ echo Ärger | LC_CTYPE=de iconv -f utf8 -t ascii//translit
?rger
$ echo Ärger | LC_CTYPE=de_DE iconv -f utf8 -t ascii//translit
AErger

But while you can get the iconv tool to behave by setting LC_CTYPE,
there’s no such luck with ruby:

$ LC_CTYPE=C ruby test.rb
“?rger”
$ LC_CTYPE=de_DE ruby test.rb
“?rger”

For the record, test.rb (saved as utf8) looks like:

$KCODE=‘u’
require “iconv”
$conv = Iconv.new(“ASCII//TRANSLIT”, “UTF-8”)
p($conv.iconv(“Ärger”))

s.