1.9 CSV Parsing Issues

I’m currently porting a script to 1.9 and I’m having problems getting
CSV parsing to work. This script worked fine in 1.8.7 and used the
FasterCSV library for parsing. After playing around in the IRB, I have
determined that the current parser seems incapable of handling newlines
as row seperators (a rather basic and important feature).

I tested with a simple file whose contents are:
field1,field2
field3,field4

This file was created using a basic text editor and does not contain any
unorthodox newline characters. Attempting to parse this file results in
the following error:

C:/Ruby192/lib/ruby/1.9.1/csv.rb:1885:in block (2 levels) in shift': Unquoted fields do not allow \r or \n (line 1). (CSV::MalformedCSVError) from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1856:ineach’
from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1856:in block in shift' from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1818:inloop’
from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1818:in shift' from C:/Ruby192/lib/ruby/1.9.1/csv.rb:1760:ineach’

The return value of the opened csv file shows row_sep to be “\r\n” which
seems correct. I have tried manually setting the value of row_sep when
calling CSV::open but I get the same issue.

Once again, I do not have this problem with FasterCSV under 1.8.7 (which
as I understand, is the same code used in 1.9’s csv library). I’m using
Ruby 1.9.2p0 on Windows XP. I would greatly appreciate any help.

On Nov 4, 2010, at 1:40 PM, Kenny Lam wrote:

I’m currently porting a script to 1.9 and I’m having problems getting
CSV parsing to work.

I tested with a simple file whose contents are:
field1,field2
field3,field4

CSV should definitely handle that data. Indeed it does for me:

$ ruby -v -r csv -e ‘p CSV.parse(“field1,field2\r\nfield3,field4\r\n”)’
ruby 1.9.2dev (2010-04-28 trunk 27536) [x86_64-darwin10.3.0]
[[“field1”, “field2”], [“field3”, “field4”]]

This file was created using a basic text editor and does not contain any
unorthodox newline characters.

Can we see exactly what the file does contain, with code like:

$ ruby -e ‘p File.read(“path/to/file.csv”)’

?

James Edward G. II

On Nov 4, 2010, at 2:26 PM, Kenny Lam wrote:

File.read shows “field1,field2\nfield3,field4\n”

Great. That’s what we expected to see. You are right about the
content.

I have played around with the some of the other methods and have
determined that this problem only seems to occur when using CSV::open
and then looped through with CSV::each. CSV::foreach and CSV::parse
seem fine.

Ah, and let me guess, you always pass a read mode of ‘r’ to open(),
right? CSV is clever and it shuts off Ruby’s line ending translation on
Windows using ‘rb’ if you don’t specify a mode. By specify a mode, you
leave this feature on which allows Ruby to switch \r\n to \n as it did
with the read above.

Unfortunately, I need to use CSV::open because I need a
reference to the opened file object in order to do some file cursor
manipulation.

No worries, open() is going to work for you.

processed correctly. I am relatively new to Ruby, am I using the
language incorrectly or is this a bug?

You have a misunderstanding of Ruby Strings. Double quotes allow for
escapes like \r or \n, but single quotes do not. You’ve set the
:row_sep to literally slash, r, slash, and n.

I image all you need to do is switch your open() call to:

CSV.open(‘path/to/file’)

The library should take it from there.

Hope that helps.

James Edward G. II

File.read shows “field1,field2\nfield3,field4\n”
I have played around with the some of the other methods and have
determined that this problem only seems to occur when using CSV::open
and then looped through with CSV::each. CSV::foreach and CSV::parse
seem fine. Unfortunately, I need to use CSV::open because I need a
reference to the opened file object in order to do some file cursor
manipulation.

Other things I have noted is that when running CSV.open(‘file’,‘r’) the
result is show:
<#CSV io_type:File io_path:"/log/test.log" encoding:CP850 lineno:0
col_sep:"," row_sep:"\r\n" quote_char:""">

While CSV.open(‘test.log’,‘r’,:row_sep => ‘\r\n’) shows result:
<#CSV io_type:File io_path:"/log/test.log" encoding:CP850 lineno:0
col_sep:"," row_sep:"\r\n" quote_char:""">

The double backslashes make me question if the escape character is being
processed correctly. I am relatively new to Ruby, am I using the
language incorrectly or is this a bug?

On Nov 4, 2010, at 2:52 PM, Kenny Lam wrote:

Excellent, that works perfectly. Thanks a lot for your help.

My pleasure.

James Edward G. II

I’m running into this same error, file reads like so: (Client
Uploaded CSV)

“field1,field2\rfield3,field4\r\n”

Is this an issue with my how the CSV file was generated, or is there
some setting I can use to avoid this error?

Appreciate any assistance!

Excellent, that works perfectly. Thanks a lot for your help.

On Wed, Dec 11, 2013 at 4:53 PM, a grave robber wrote:

I’m running into this same error, file reads like so: (Client
Uploaded CSV)

What same error?

“field1,field2\rfield3,field4\r\n”

Is this an issue with my how the CSV file was generated, or is there
some setting I can use to avoid this error?

It looks like your CSV is broken. I do not know what your goal is but
I assume you want the piece above to be treated as a single record.
You could manually parse:

irb(main):013:0> s=“field1,field2\rfield3,field4\r\n”
=> “field1,field2\rfield3,field4\r\n”
irb(main):014:0> CSV.parse_line(s.gsub(/[\r\n]+/, ‘’), col_sep: ‘,’)
=> [“field1”, “field2field3”, “field4”]
irb(main):015:0> CSV.parse_line(s.gsub(/[\r\n]+/, ‘,’), col_sep: ‘,’)
=> [“field1”, “field2”, “field3”, “field4”, nil]

Kind regards

robert

Robert K. wrote in post #1130373:

On Wed, Dec 11, 2013 at 4:53 PM, a grave robber wrote:

I’m running into this same error, file reads like so: (Client
Uploaded CSV)

What same error?

Unquoted fields do not allow \r or \n (line 152344).
(CSV::MalformedCSVError)

It looks like your CSV is broken. I do not know what your goal is but
I assume you want the piece above to be treated as a single record.

Actually the \r is the record separator, and it processes the correct
number of records but errors on the \n at the end of the file.

Wouldn’t your gsub line possibly strip any \r\n’s from quoted text
contained in the CSV?

On Wed, Dec 11, 2013 at 6:30 PM, Mark W. [email protected] wrote:

It looks like your CSV is broken. I do not know what your goal is but
I assume you want the piece above to be treated as a single record.

Actually the \r is the record separator, and it processes the correct
number of records but errors on the \n at the end of the file.

Wouldn’t your gsub line possibly strip any \r\n’s from quoted text
contained in the CSV?

Yes. For production code that needs to be made more robust of course.

Cheers

robert

On Thu, Dec 12, 2013 at 5:47 AM, tamouse pontiki
[email protected] wrote:

How did you generate this file?

I was going to suggest to fix the generation code instead of adjusting
the parsing side - but that seemed too obvious. :slight_smile:

Cheers

robert

On Wed, Dec 11, 2013 at 9:53 AM, Mark W. [email protected] wrote:

Appreciate any assistance!

Robert’s responses notwithstanding, I’ve never seen a CSV file that uses
just CR ("\r") as a record separator. But even then, the CSV parser
wouldn’t know what to do with the LF ("\n") at the very end: is that a
new
record? How would it parse that as data? And it doesn’t have the same
number of fields as the others.

How did you generate this file?

Can you should your code for how you are currently reading and parsing
this
file?

You could always “chomp” the file before passing it to the parser, or
make that the default action.

The Issue is that I don’t have control over the generation of the file.
(Client Uploaded)

Here is the solution I came up with, Since the File is stored on s3, I
have to write a new tempfile then edit that… Unless someone can
suggest how to read the file minus the last 2 chars if they match \r\n

csv_file = open(path_to_file, "r:windows-1251:utf-8")
csv_file.seek(-2, IO::SEEK_END) # go to end of file
if csv_file.read == "\r\n"
  uri = URI.parse(path_to_file)
  tempfile = Tempfile.new File.basename(uri.path),

“#{Rails.root}/tmp”
csv_file.seek 0
tempfile.write csv_file.read
tempfile.seek(-2, IO::SEEK_END)
tempfile.write " "
tempfile.seek 0
csv_file = tempfile
end
::CSV.new(csv_file, :headers => :first_row).each do |row|

You can Chomp a file? I thought that was only strings

Well, if you read the file into a string, you can chomp it.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs