CSV::IllegalFormatError (Need to club data entry person)


#1

I’m using csv module to read parse 76,000 rows of patient data in a CSV
file. I use the below line to read in the file and loop through the
rows.

 CSV.open("patientfile.txt", "r") do |row|

When I get to a row like below the script blows up:
/usr/local/lib/ruby/1.8/csv.rb:639:in get_row': CSV::IllegalFormatError (CSV::IllegalFormatError) from /usr/local/lib/ruby/1.8/csv.rb:556:ineach’
from /usr/local/lib/ruby/1.8/csv.rb:531:in parse' from /usr/local/lib/ruby/1.8/csv.rb:311:inopen_reader’
from /usr/local/lib/ruby/1.8/csv.rb:85:in `open’
from sync.rb:1

The row is similar to below. Note the embedded “B” within the address
field.

“M1234567”,“John”,“A”,“Doe”,“321 NORTH “B”
ST”,"",“Sometown”,“ST”,“55555”

Is there a way to get around this error and escape the “B” properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?


#2

On 5/5/06, Sean C. removed_email_address@domain.invalid wrote:

Is there a way to get around this error and escape the “B” properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?

It’s just a guess, but maybe you could try replacing every
double-quote character that isn’t either preceded or followed by a
comma with a single quote? Something like the untested code below:

line.gsub(/[^,]"[^,]/,"’")

It would probably require reading the whole file first, writing out a
corrected version, and then calling the CSV methods on that, but it
beats doing it by hand :).


#3

On May 5, 2006, at 11:48 AM, Sean C. wrote:

“M1234567”,“John”,“A”,“Doe”,“321 NORTH “B”
ST”,"",“Sometown”,“ST”,“55555”

Well, the long and the short of this story is that the above line is
not valid CSV. Gotta fix that somehow: by hand, with a
preprocessor, or by fixing the broken software that spit it out. :frowning:

James Edward G. II


#4

Sean C. wrote:


“M1234567”,“John”,“A”,“Doe”,“321 NORTH “B”
ST”,"",“Sometown”,“ST”,“55555”

Is there a way to get around this error and escape the “B” properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?

Escape quotes by doubling them:

gsub(’“B”’, ‘"“B”"’)

Cheers,
Dave


#5

All great ideas! Thanks.


#6

Bira wrote:

line.gsub(/[^,]"[^,]/,"’")

Bira, I’m testing your idea with the below script but I’m having
problems. Thanks for the start though.
TEST PROGRAM:
line = ““NAME”,“610 “A” STREET”,“STATE”,“POSTAL_CODE””
puts line

if double quote not preceeded by a comman and not followed

by a comma, then replace the quotation with a single quote.

new_line = line.gsub(/[^,]"[^,]/,"’")
puts new_line

OUTPUT:
“NAME”,“610 “A” STREET”,“STATE”,“POSTAL_CODE”
“NAME”,“610’” STREET",“STATE”,“POSTAL_CODE”

Any ideas regular expression masters?