1.6.7 encoding issue while reading CSV file

Hi,

I’m reading a csv file which contains utf-8 characters with jruby 1.6.7
in 1.9 mode and I’m getting ASCII-8BIT strings instead of UTF-8 ones.

Example:

#---------------------------------

CSV.foreach("#{wb_filepath}", { :encoding => “UTF-8:UTF-8”, :headers =>
true, :return_headers => true, :col_sep => ‘,’ } ) do |row|

:encoding => “UTF-8:UTF-8” should not be necessary anyhow because

Encoding.default_external = ‘UTF-8’

I just added it to test if it would help. It didn’t.

#…

test_string = row.field(‘test header’)

puts “test_string: #{test_string.encoding.name}”

==> here I get ‘test_string: ASCII-8BIT’.

#…
end

#---------------------------------

Am I doing something wrong?

Thanks,
Manfred

Manfred,

Can you also provide a data file for us to run on this? It always
help to use real data…

-Tom

On Wed, Mar 14, 2012 at 12:16 PM, Manfred U.
[email protected] wrote:

true, :return_headers => true, :col_sep => ‘,’ } ) do |row|

Thanks,
Manfred


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]

Hi Tom,

attached a small test case.

Thanks,
Manfred

Am 14.03.2012 21:47, schrieb Thomas E Enebo:

Am 14.03.2012 23:30, schrieb Don W.:

Do you have ‘# coding: UTF-8’ at the top of the script that parses
the CSV? This has bitten me in the past as well…

Yes. Moreover I’ve also set default_external to UTF-8, which should be
used when files are read. At least this is my understanding. But CSV
seems to ignore it, it treats the content as ASCII-8BIT (binary). Or I’m
just doing something wrong. Did you have a look at the small example
attached to my previous mail?

Btw., my current workaround is to call force_encoding(‘UTF-8’) for
every csv string field value.

Manfred

Do you have ‘# coding: UTF-8’ at the top of the script that parses the
CSV? This has bitten me in the past as well…

This is fixed on master, but I can confirm as broken on 1.6.7. So
1.7.0 will address this. If you really need this now you can bisect
to the changeset which fixed it on master and create a patch for 1.6
branch…or just wait for 1.7.0 (May-timeframe).

-Tom

On Thu, Mar 15, 2012 at 2:56 AM, Manfred U.
[email protected] wrote:

previous mail?
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]

Actually, getting master and trying your stuff out would be helpful in
case there is something else which may be broken (plus it would be
another confirmation master fixes this problem).

-Tom

On Thu, Mar 15, 2012 at 9:12 AM, Thomas E Enebo [email protected]
wrote:

Btw., my current workaround is to call force_encoding(‘UTF-8’) for every csv
http://xircles.codehaus.org/manage_email


blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]


blog: http://blog.enebo.com twitter: tom_enebo
mail: [email protected]