Csv parsing issue

luislavena · August 31, 2011, 9:30pm

I’ve generated a CSV Document using Open Office csv export.
When I read it, with ruby 1.8.7 everything is fine.

Code:

require ‘csv’
reader = CSV.open(“C:tmp/document.csv”, “r”)

headline = reader.shift
reader.each do |row|
puts row
end
reader.close()

When I read the same document with ruby 1.9.2, then I get the following
error:

C:/Ruby192/lib/ruby/1.9.1/csv.rb:1886:in block (2 levels) in shift': CSV::MalformedCSVError (CSV::MalformedCSVError) from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1863:ineach’
from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1863:in block in shift' from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1825:inloop’
from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1825:in shift' from C:/Dokumente und Einstellungen/josemi1/Eigene Dateien/NetBeansProjects/Test/lib/main.rb:8:in’

And with jruby 1.9 I get the following error message:

CSV::MalformedCSVError: Unquoted fields do not allow \r or \n (line 2).
shift at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1893
each at org/jruby/RubyArray.java:1572
shift at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1863
loop at org/jruby/RubyKernel.java:1417
shift at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1825
each at /home/michael/Jruby/jruby-1.6.0/lib/ruby/1.9/csv.rb:1768
(root) at /home/michael/NetBeansProjects/Test/lib/main.rb:12

Hint:
ruby -e ‘p File.read("/tmp/document.csv")’

““Projekt-ID”,<< cut off some data >>,“letzte
Anderung”\r\n\n”",“HSW G04”,“Prim\303\244rprojekt”,<< cut off
some data>>,“zlebpa1”,07.03.2011\r\n"

Note: I have cut out irrelevant some data above and marked it with ‘<<
cut off some data >>’.

Questions:
Is the problem the ‘\r\n\n’ above?
Is it a ruby error or an open office error?

michaelb · August 31, 2011, 10:20pm

On Aug 31, 2011, at 3:31 PM, Michael Blue wrote:

I’ve generated a CSV Document using Open Office csv export.
When I read it, with ruby 1.8.7 everything is fine.

Code:

require ‘csv’
reader = CSV.open(“C:tmp/document.csv”, “r”)

Try using mode “rb” so the line-endings are handled by CSV rather than
the OS.

headline = reader.shift

You can also pass a :headers => true on the open

reader = CSV.open(“C:tmp/document.csv”, “rb”, :headers => true)

reader.each do |row|
puts row
end
reader.close()

And even better for your example, use the .foreach method:

CSV.foreach(“C:tmp/document.csv”, “rb”, :headers => true) do |row|
puts row
end

When I read the same document with ruby 1.9.2, then I get the
following
error:

Aha! The CSV code in 1.9 is what was in the FasterCSV from earlier
versions. (gem install fastercsv)

And with jruby 1.9 I get the following error message:

Questions:
Is the problem the ‘\r\n\n’ above?
Is it a ruby error or an open office error?

It’s entirely possible that the error is from OO, but the use of the
“rb” mode might solve the problem, too. (In which case, the blame is
moot.)

-Rob

Rob B.
[email protected] http://AgileConsultingLLC.com/
[email protected] http://GaslightSoftware.com/

michaelb · August 31, 2011, 11:09pm

Hello Rob,

Thank you for the answer.
No. Using the rb-mode does not solve the problem. The error still
occurs.

michaelb · September 1, 2011, 4:00pm

On Aug 31, 2011, at 5:09 PM, Michael Blue wrote:

Hello Rob,

Thank you for the answer.
No. Using the rb-mode does not solve the problem. The error still
occurs.

Did any of the other suggestions work? In particular, the fact that
CSV in 1.8.x and CSV in 1.9 are different code.

If you stay with 1.8.7, try using FasterCSV.

-Rob

Rob B.
[email protected] http://AgileConsultingLLC.com/
[email protected] http://GaslightSoftware.com/

michaelb · September 1, 2011, 7:59am

Michael Blue wrote in post #1019459:

Hint:
ruby -e ‘p File.read("/tmp/document.csv")’

““Projekt-ID”,<< cut off some data >>,“letzte
Anderung”\r\n\n”",“HSW G04”,“Prim\303\244rprojekt”,<< cut off
some data>>,“zlebpa1”,07.03.2011\r\n"

You really couldn’t come up with a 2 column record that duplicates the
problem? I realize that creating a new Open Office document could take
you upwards of five minutes.

Hint: post something legible.

michaelb · September 1, 2011, 10:39pm

Did any of the other suggestions work?
The “rb” option did surprisingly work on windows (ruby), on Linux
(jruby) the error remained.

In particular, the fact that CSV in 1.8.x and CSV in 1.9 are different code.
If you stay with 1.8.7, try using FasterCSV.

On 1.8.7. (Linux,jruby) with FasterCSV I get the same error like on
1.9.2 (Windows, ruby) with CSV.
It seems to be an issue with FasterCSV. I am not sure if there is an
additional issue with jruby.

I have attached 2 test-files that I have prepared with a hex editor from
the originally very large file.