A question about Iconv arguments

Dear all,

I need to convert some accented text, and I would like to know
what arguments I have to give Iconv to produce the desired output.
E.g., in Italian, the word for Friday is “venerdi”, where the
“i” carries a dash (small i with grave accent).
If you type this into Wikipedia search in Italian
(which I believed to be in utf-8 encoding),
it will load:

Venerdì - Wikipedia ,

yet this syntax:

converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)

gives me “venerd\303\254” when I convert from latin1 encoding.

What arguments do I have to use ?

Thank you,

Best regards,

Axel

Axel E. wrote:

Venerdì - Wikipedia ,

yet this syntax:

converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)

gives me “venerd\303\254” when I convert from latin1 encoding.
That looks right to me - if I write that into a UTF-8 HTML document, it
displays correctly. What are you expecting?

Dear Alex,

thank you for responding.
If I try to get a webpage that has accents in its address,
like

require “rubygems”
require “rio”
require ‘iconv’
output_encoding = ‘utf-8’
doc=“Venerdì”
converted_doc = Iconv.new(output_encoding, ‘latin1’).iconv(doc)
rio(“Wikipedia, the free encyclopedia” + converted_doc)>rio(“a.html”)

I get an error message:

/usr/local/lib/ruby/1.8/uri/common.rb:436:in split': bad URI(is not URI?): http://www.wikipedia.org/wiki/venerdì (URI::InvalidURIError) from /usr/local/lib/ruby/1.8/uri/common.rb:485:in parse’
from
/usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/withpath.rb:285:in
uri_from_string_' from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:74:in arg0_info_’
from
/usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:83:in
init_from_args_' from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:56:in initialize’
from
/usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/base.rb:80:in
new' from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/base.rb:80:in parse’
from
/usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/builder.rb:111:in
build' from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/factory.rb:412:in create_state’
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:65:in
initialize' from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:76:in new’
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:76:in
rio' from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/kernel.rb:42:in rio’

This doesn’t happen if I type in:

rio(“http://www.wikipedia.org/wiki/Venerdì")>rio("a.html”)

So I need to know what conversion arguments I need to give Iconv to
turn “Venerdì” into “Venerd%C3%AC”.

Best regards,

Axel

I’ve managed to solve this problem like this:

require “rubygems”
require “rio”
require ‘iconv’

def to_hex(number)
number=number.abs
binary=‘’
while number>0
digit=number%16
if digit<10
binary<<digit.to_s
elsif digit==10
binary<<‘A%’
elsif digit==11
binary<<‘B%’
elsif digit==12
binary<<‘C%’
elsif digit==13
binary<<‘D%’
elsif digit==14
binary<<‘E%’
elsif digit==15
binary<<‘F%’
end
number=(number-digit)/16
end
return binary.reverse.gsub(/%([A-F])%([A-F])/,‘%\1\2’)
end

class String
def wiki_addr
converted_doc = Iconv.new(‘utf-8’, ‘latin1’).iconv(self)
res=‘’
converted_doc.split(//).each{|x|
if /[a-zA-Z0-9_ ]/.match(x)
res<<x
else
res<<to_hex(x[0])
end
}
return res
end
end

doc
="venerdì"doc.wiki_addr
rio(“Wikipedia, l'enciclopedia libera”+ doc.wiki_addr)>rio(“a.html”)

Best regards,

Axel

Dear Stefan,

thank you for bringing this to notice!
(Slightly varying Voltaire, I might
have been able to write a shorter
program had I had more leisure and
more knowledge).
I’ll try your suggestion.
Best regards,

Axel

Hi,

At Sun, 10 Jun 2007 18:05:49 +0900,
Axel E. wrote in [ruby-talk:254981]:

I’ve managed to solve this problem like this:

$ ruby -riconv -rcgi -e ‘puts CGI.escape(Iconv.conv(“utf-8”, “latin1”,
“venerd\354”))’
venerd%C3%AC

Axel E. wrote:

I’ve managed to solve this problem like this:

require “rubygems”
require “rio”
require ‘iconv’

def to_hex(number)
number=number.abs
binary=’’
while number>0
digit=number%16
if digit<10
binary<<digit.to_s
elsif digit==10

I guess you’re not aware of neither:
1234.to_s(16)
nor:
“%x” % 1234

For situations like the above, even a lookup-array or a case/when would
be better.

Regards
Stefan