In 1.9.2, with force_encoding, we still need iconv?

I got code in ruby 1.8.
Iconv.iconv(‘gbk’, ‘utf-8’, string)

now, ruby 1.9.2 has force_encoding(‘utf-8’), so can I just
forceing_encoding(‘utf-8’) ?

On 2011/02/01 11:55, Zhenning G. wrote:

I got code in ruby 1.8.
Iconv.iconv(‘gbk’, ‘utf-8’, string)

now, ruby 1.9.2 has force_encoding(‘utf-8’), so can I just
forceing_encoding(‘utf-8’) ?

No. force_encoding just changes the encoding label, but leaves the bytes
in the string as they are. That would result in garbage (unless
everything is ASCII anyway). The main use of force_encoding is to set
encoding labels for raw byte strings (e.g. coming from outside) when
knowing already what the encoding is.

The equivalent of your Iconv call, in Ruby 1.9, is:

string.encode(‘gbk’, ‘utf-8’)

But I’m a bit vary about the order of the arguments. Both
Iconv.iconv(‘gbk’, ‘utf-8’, string)
string.encode(‘gbk’, ‘utf-8’)

encode from UTF-8 to GBK, but the result of force_encoding(‘utf-8’) is
UTF-8, so if you want the result to be UTF-8, you have to turn the order
of the parameters around. I was never happy with the TO-FROM order in
iconv, and I’m also not happy with the TO-FROM order in String#encode,
but String#encode can also be used just with the TO parameter, e.g. just
string.encode(‘gbk’)
if the string has the correct encoding at this point. So when we (Matz
and me, mainly) designed String#encode, unfortunately TO-FROM was the
only order that made sense.

Please also note that there might be slight differences between Iconv
and String#encode for some characters, but these should be very small in
number.

Regards, Martin.


#-# Martin J. Drst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp mailto:[email protected]