Double quote problem in CSV


#1

Hi all

i am using following method to read the csv file and i want to save it
into the database.

@parsed_file=CSV.open(“filename.csv”,‘r’,"#{col_sep}")
@parsed_file.each_with_index do |row, index|

     // more code here

end
end

my problem is i am unable to read data correctly when my data-fields
are as follows
“abc”, “sdfds"string"ddsfdsf”, “xyz”, “pqr”

it gives me error illegal file format at above line.

Regards
Salil


#2

Salil G. wrote:

my problem is i am unable to read data correctly when my data-fields
are as follows
“abc”, “sdfds"string"ddsfdsf”, “xyz”, “pqr”

it gives me error illegal file format at above line.

That would be because that line is indeed wrongly formatted.

CSV files which have quote-delimited fields must double any quotes which
appear with the field. Your line above should be

“abc”,“sdfds”“string”“ddsfdsf”,“xyz”,“pqr”

This is the output you’ll get if you export as CSV from Excel, for
example.

So I suggest you fix up your data source to use valid CSV. If it is not
valid CSV, you will end up having to write your own parser for it.

But there are good reasons for the doubling-up rule. Imagine, for
example, what happens if a field contains the sequence
quote-comma-quote. How could you distinguish between that being data
within one field, or the end of the field and the start of the next one?


#3

Thanx Brian ,
I know it’s a wrong formatted csv file but i just want to know is
there any possibilty to read file like this.Actually i uploaded lot of
file in my application and i receive the file which is formatted as
above.So, thanks again…pls reply if you know anything related to this
so that i can move in a right direction.

Regards
Salil


#4

On Mar 3, 2009, at 9:13 AM, Salil G. wrote:

I know it’s a wrong formatted csv file but i just want to know is
there any possibilty to read file like this.Actually i uploaded lot of
file in my application and i receive the file which is formatted as
above.So, thanks again…pls reply if you know anything related to
this
so that i can move in a right direction.

As multiple people have told you for days now, you will need to build
your own parser for non-CSV data. I wish there was some shortcut we
could give you, but that’s still our best answer.

James Edward G. II


#5

Salil G. wrote:

I know it’s a wrong formatted csv file but i just want to know is
there any possibilty to read file like this.Actually i uploaded lot of
file in my application and i receive the file which is formatted as
above.So, thanks again…pls reply if you know anything related to this
so that i can move in a right direction.

If the fields don’t contain commas, maybe

line.split(/\s*,\s*/)

will be sufficient.

Otherwise, you’ll need to write something yourself, and to do this
you’ll need to start by working out what rules you want to apply in
order to parse this strange format. For example, what would you expect
from parsing these?

“abc”,“def”",“ghi”
“abc”,“def”,",“ghi”
“abc”,“def,”,“ghi”
“abc”,“def,”,","
“abc”,“def,”,",“ghi”


#6

i found solution to this problem i don’t know it’s correct way or not
neither how to do it programmatically…

my problem is i am unable to read data correctly when my data-fields
are as follows
“abc”, "sdfds “string"ddsfdsf”, “xyz”, “pqr”

it gives me error illegal file format at above line.

but when i open my file in Excel and then save it and close it,my data
look like this…
abc, “sdfds string”“ddsfdsf”"", xyz, pqr

thogh i don’t get data i desired but it also not giving me error illegal
file format.

as above data is an exceptional case but due to it i can’t read my whole
file so if i get some manipulated data for such an exceptional case it’s
ok.

my question is ,is it possible using rails?


#7

On Mar 3, 2009, at 11:55 PM, Salil G. wrote:

look like this…
ok.

my question is ,is it possible using rails?

Oh, sit back down James, let me take this reply. :wink:

I don’t know what version of Excel you are using, but I’ve got
Microsoft Office 2004 for Mac on my system and if I start with your
text in /tmp/notcsv.csv, open it in Excel, put =LEN(A1) in cell A2 and
'=LEN(A1) in cell A3 (and similarly for B2…C3), save it as ~/
Documents/fromExcel.csv as a “CSV (Comma-delimited)” type, then I get
this:

$ head /tmp/notcsv.csv ~/Documents/fromExcel.csv
==> /tmp/notcsv.csv <==
“abc”, "sdfds “string"ddsfdsf”, “xyz”, “pqr”

==> /Users/rab/Documents/fromExcel.csv <==
abc," ““sdfds ““string”“ddsfdsf”””,” ““xyz””"," ““pqr””"
3,24,7,6
=LEN(A1),=LEN(B1),=LEN(C1),=LEN(D1)

So you seem to be missing some quotes in your output. Note that the
first field had quotes in the original, but they are not required (no
leading spaces, no comma or quote in the value). The original file,
while technically invalid, is read by Excel and interpreted in a
reasonable way. The output is strictly conforming to the CSV spec
(although yours seems not to be).

As others have said, you’ll have to parse it yourself if you need to
accept a sloppy input format.

You also seem to have the impression that rails is somehow “more” than
ruby. It isn’t “more” than ruby, it just is written in ruby like any
other ruby program you might write to handle the data. Just decide
how each of the line types ought to be interpreted, write down the
rules that result, and turn them into code.

If you insist on asking questions here, then at least LISTEN to the
responses that you receive. Follow-up questions should include CODE
that demonstrates two things: that you have absorbed the previous
response and that you have made some additional progress which has led
you to a new problem. (Or you don’t understand the response and need a
particular aspect clarified.)

-Rob

Rob B. http://agileconsultingllc.com
removed_email_address@domain.invalid


#8

On Mar 3, 2009, at 11:31 PM, Rob B. wrote:

Oh, sit back down James, let me take this reply. :wink:

Thanks very much. :slight_smile:

James Edward G. II