Hy all,
I’m using ruby 1.9.2p0 and I’m trying to read a CSV file encoded in
UTF-16LE using the following script:
encoding: utf-8
require ‘csv’
CSV.foreach(“file_path”, {:col_sep => “;”, :encoding =>
“UTF-16LE:UTF-8”}) { |row|
p row
}
When I run this, I got the following exception:
/usr/lib/ruby/1.9.1/csv.rb:2020:in =~': invalid byte sequence in UTF-8 (ArgumentError) from /usr/lib/ruby/1.9.1/csv.rb:2020:in
init_separators’
from /usr/lib/ruby/1.9.1/csv.rb:1570:in initialize' from /usr/lib/ruby/1.9.1/csv.rb:1335:in
new’
from /usr/lib/ruby/1.9.1/csv.rb:1335:in open' from /usr/lib/ruby/1.9.1/csv.rb:1201:in
foreach’
from test.rb:3:in `’
The csv module reads a sample from the file (using IO.read(),
csv.rb:2309) and tries to match it against a Regexp of possible line
endings. This sample have its encoding forced to the encoding I’ve
chose (UTF-8), but the result is sample.valid_encoding? == false. When
the regexp match takes place, the result is this exception I showed.
Am I missing something here or this is a bug on csv module?
Thanks in advance,
Daniel
On Jan 13, 2011, at 1:58 PM, Daniel de Angelis Cordeiro wrote:
When I run this, I got the following exception:
The csv module reads a sample from the file (using IO.read(),
csv.rb:2309) and tries to match it against a Regexp of possible line
endings. This sample have its encoding forced to the encoding I’ve
chose (UTF-8), but the result is sample.valid_encoding? == false. When
the regexp match takes place, the result is this exception I showed.
Am I missing something here or this is a bug on csv module?
It does look like it’s probably a bug. I think it only affects the line
ending guessing though, so set :row_sep manually to avoid it for now.
Sorry!
James Edward G. II
Hi,
On Thu, Jan 13, 2011 at 18:11, James Edward G. II
[email protected] wrote:
}
from test.rb:3:in `’
It does look like it’s probably a bug. I think it only affects the line ending
guessing though, so set :row_sep manually to avoid it for now. Sorry!
Exactly, setting :row_sep manually works.
Since I also don’t know which line ending the file has, I was thinking
in use instead IO.readline() (maybe reading only 1024 each time like
in csv code) and look at the end of the string to see which line
separator the files uses. I don’t know if there is a more efficient
solution…
James Edward G. II
Thanks for the great work in csv module! 
Best regards,
Daniel
Excellent.
Thank you very much!
Best regards,
Daniel
On Thu, Jan 13, 2011 at 18:57, James Edward G. II
On Jan 13, 2011, at 2:48 PM, Daniel de Angelis Cordeiro wrote:
Exactly, setting :row_sep manually works.
Since I also don’t know which line ending the file has, I was thinking
in use instead IO.readline() (maybe reading only 1024 each time like
in csv code) and look at the end of the string to see which line
separator the files uses. I don’t know if there is a more efficient
solution…
Yeah, I’ll fix that code to be more encoding friendly, but that’s what
needs doing.
James Edward G. II