jruby-1.6.0.RC2 --1.9 vs. external_encoding and internal_encoding and *actual* internal_encoding

Hi, there,

I want to read a file (a Google mail address book as CSV), encoded as
UTF-16LE.
I want to treat it within my ruby script as UTF-8.

To demonstrate that, I created a few lines of ruby code, that you find
below.

The encoding of the line, that I read from that file,
should be identical to the “internal_encoding” of that file,
do you agree?

That works fine with ruby-1.9.2-p180,
but it doesn’t work properly with “jruby-1.6.0.RC2 --1.9”.

It this something worth fixing with jruby ("–1.9")?

Or am I doing something wrong?

J.

======================================================================

$ ~/.rvm/bin/jruby-1.6.0.RC2 --1.9 -w test-utf.01.rb

$ ~/.rvm/bin/ruby-1.9.2-p180 -w test-utf.01.rb

f = open(‘google.UTF-16LE.csv’, “r:UTF-16LE:UTF-8”)

STDERR.printf("=%d: %s=>{%s},%s=>{%s} // %s\n",LINE,
‘f.external_encoding’,f.external_encoding,
‘f.internal_encoding’,f.internal_encoding,
‘…’)

line_1 = f.gets

STDERR.printf("=%d: %s=>{%s} // %s\n",LINE,
‘line_1.encoding.name’,line_1.encoding.name,
‘…’)

$ ~/.rvm/bin/ruby-1.9.2-p180 -w test-utf.01.rb

=8: f.external_encoding=>{UTF-16LE},f.internal_encoding=>{UTF-8} //

=15: line_1.encoding.name=>{UTF-8} // …

$ $HOME/.rvm/bin/jruby-1.6.0.RC2 --1.9 -w test-utf.01.rb

=8: f.external_encoding=>{UTF-16LE},f.internal_encoding=>{UTF-8} //

=15: line_1.encoding.name=>{UTF-16LE} // …

On Wed, Feb 23, 2011 at 6:24 AM, Jochen H.
[email protected] wrote:

That works fine with ruby-1.9.2-p180,
but it doesn’t work properly with “jruby-1.6.0.RC2 --1.9”.

It this something worth fixing with jruby (“–1.9”)?

Or am I doing something wrong?

You’re not doing anything wrong and I am looking into supporting for
this as we speak. Our original port of encodings stuff from MRI was
missing all transcoding features; however we do have code written
which can do Java’s transcoding via charset support (which is not
hooked up). So initially, the support for internal ↔ external
encoding transcoding will be done via Java charsets. 1.6.1 hopefully
will have the same transcoding mechanisms as Ruby 1.9.2.

Summary: Hoping to add basic transcoding for 1.6.0.RC3, but for more
complicated scenarios when the internal and external do not match up
we may not have perfect parity with 1.9.2 transcoding.

-Tom

http://xircles.codehaus.org/manage_email


blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]