Iconv problems with different machines

Hi,

I have the following piece of code:

ic = Iconv.new(‘US-ASCII//TRANSLIT’, ‘UTF-8’)
puts ic.iconv(“Aüthor”)

  1. on my local machine (OSX 10.5) when I run this, I get the output:
    A"uthor

  2. when I run this same code on my debian server (via rake executed
    through a capistrano task) I get the output: A?thor

  3. when I run this same code on my debian server (via irb), I get:
    Author

Both 1 and 3 are acceptable output to me, however I cant figure out how
to get my program to output the correct result on my server when I run
it through a capistrano task. Is there some environment variable I need
to set? From reading other posts, I’ve tried adding at the top of my
file:
$KCODE = “u”
require ‘jcode’
ENV[‘LANG’] = ‘en_US.UTF-8’
ENV[‘LC_CTYPE’] = ‘en_US.UTF-8’

still doesn’t fix the issue. Any help would be greatly appreciated.

Thanks,
Ray

Actually I found some other posts about this same issue from awhile
ago… Appears there’s no solution.

I stopped using the iconv library and instead switched to the iconv
system command and that seems to work. Not the best solution, but at
least it works…

On Dec 5, 2007, at 12:14 PM, Raymond O’Connor wrote:

Actually I found some other posts about this same issue from awhile
ago… Appears there’s no solution.

I stopped using the iconv library and instead switched to the iconv
system command and that seems to work. Not the best solution, but at
least it works…

I have not been able to understand where is exactly the difference,
but looks like depending on the system/version/something the
transliteration tables are just different. At ASPgems we wrote this
hand-crafted normalizer which we know is portable for sure (note that
it uses Rails #chars and does a bit more stuff, but you see the idea):

def self.normalize(str)
return ‘’ if str.nil?
n = str.chars.downcase.strip.to_s
n.gsub!(/[à áâãäåāă]/, ‘a’)
n.gsub!(/æ/, ‘ae’)
n.gsub!(/[ďđ]/, ‘d’)
n.gsub!(/[çćčĉċ]/, ‘c’)
n.gsub!(/[èéêëēęěĕė]/, ‘e’)
n.gsub!(/Æ’/, ‘f’)
n.gsub!(/[ĝğġģ]/, ‘g’)
n.gsub!(/[ĥħ]/, ‘h’)
n.gsub!(/[ììíîïīĩĭ]/, ‘i’)
n.gsub!(/[įıijĵ]/, ‘j’)
n.gsub!(/[ķĸ]/, ‘k’)
n.gsub!(/[łľĺļŀ]/, ‘l’)
n.gsub!(/[ñńňņʼnŋ]/, ‘n’)
n.gsub!(/[òóôõöøōőŏŏ]/, ‘o’)
n.gsub!(/Å“/, ‘oe’)
n.gsub!(/Ä…/, ‘q’)
n.gsub!(/[ŕřŗ]/, ‘r’)
n.gsub!(/[śšşŝș]/, ‘s’)
n.gsub!(/[ťţŧț]/, ‘t’)
n.gsub!(/[ùúûüūůűŭũų]/, ‘u’)
n.gsub!(/ŵ/, ‘w’)
n.gsub!(/[ýÿŷ]/, ‘y’)
n.gsub!(/[žżź]/, ‘z’)
n.gsub!(/\s+/, ’ ‘)
n.delete!(’^ a-z0-9_/\-’)
n
end

– fxn

Raymond O’Connor said…

  1. when I run this same code on my debian server (via rake executed
    $KCODE = “u”
    require ‘jcode’
    ENV[‘LANG’] = ‘en_US.UTF-8’
    ENV[‘LC_CTYPE’] = ‘en_US.UTF-8’

still doesn’t fix the issue. Any help would be greatly appreciated.

I’ve found a lot of bugs with the MRI Iconv and now only use it with
JRuby - which, I suspect, uses the Java SE convertors.

Hi Xavier,

I like that solution even better. Thanks for sharing!

Best,
Ray

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs