Bug in 1.9.0 csv parser

Hi,
I’ve found a bug in the 1.9.0 csvparser. I’ve got a script and data
that effectively breaks it, but it is 567kb. Is that too large for this
list?

The ruby instance takes 100% of cpu while processing this file, and I
had to stop it after 5 minutes…

the code is

#!/usr/local/bin/ruby1.9.0

require ‘csv’

filename = ‘broken.csv’

CSV.foreach(filename) do |row|
STDERR.puts row.length
row.each do |entry|
puts entry
end
puts “\n####################################\n”
end

I would try and debug it further, but the debugger seems broken in
1.9.0.

Regards,
Nicko

On Jan 10, 2008, at 2:15 AM, Nicko Kaltner wrote:

I’ve found a bug in the 1.9.0 csvparser. I’ve got a script and data
that effectively breaks it, but it is 567kb. Is that too large for
this list?

Probably, but you are welcome to email it to me privately. I maintain
the CSV library in Ruby 1.9.

The ruby instance takes 100% of cpu while processing this file, and
I had to stop it after 5 minutes…

I’m 95% percent sure this is an issue of your CSV file being invalid.
Because of the way the format works, a simple file like:

"…10 Gigs of data without another quote…

can only be detected invalid by reading the entire thing. I’ve
considered putting limits in the library for controlling how far it
would read ahead, but these would break some valid data. Then the
problem becomes, do I make the limits default to on? That’s the only
way they would have helped here, but that would break some otherwise
valid code. It’s a tough problem to solve correctly.

James Edward G. II