Forum: Ruby CSV::IllegalFormatError (Need to club data entry person)

A6b74fef495a52a7d868534e10024091?d=identicon&s=25 Sean Clark (smc7000)
on 2006-05-05 18:48
I'm using csv module to read parse 76,000 rows of patient data in a CSV
file.  I use the below line to read in the file and loop through the
rows.

     CSV.open("patientfile.txt", "r") do |row|

When I get to a row like below the script blows up:
/usr/local/lib/ruby/1.8/csv.rb:639:in `get_row': CSV::IllegalFormatError
(CSV::IllegalFormatError)
        from /usr/local/lib/ruby/1.8/csv.rb:556:in `each'
        from /usr/local/lib/ruby/1.8/csv.rb:531:in `parse'
        from /usr/local/lib/ruby/1.8/csv.rb:311:in `open_reader'
        from /usr/local/lib/ruby/1.8/csv.rb:85:in `open'
        from sync.rb:1

The row is similar to below.  Note the embedded "B" within the address
field.

"M1234567","John","A","Doe","321 NORTH "B"
ST","","Sometown","ST","55555"

Is there a way to get around this error and escape the "B" properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?
439c401f95ee2fac0be4c1727dd74dea?d=identicon&s=25 Bira (Guest)
on 2006-05-05 18:58
(Received via mailing list)
On 5/5/06, Sean Clark <smc7000@gmail.com> wrote:
> Is there a way to get around this error and escape the "B" properly
> before opening the file in CSV.open, or would I be better to just flag
> this record and move on?

It's just a guess, but maybe you could try replacing every
double-quote character that isn't either preceded or followed by a
comma with a single quote? Something like the untested code below:

line.gsub(/[^,]"[^,]/,"'")

It would probably require reading the whole file first, writing out a
corrected version, and then calling the CSV methods on that, but it
beats doing it by hand :).
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-05-05 20:38
(Received via mailing list)
On May 5, 2006, at 11:48 AM, Sean Clark wrote:

> "M1234567","John","A","Doe","321 NORTH "B"
> ST","","Sometown","ST","55555"

Well, the long and the short of this story is that the above line is
not valid CSV.  Gotta fix that somehow:  by hand, with a
preprocessor, or by fixing the broken software that spit it out.  :(

James Edward Gray II
0b561a629b87f0bbf71b45ee5a48febb?d=identicon&s=25 Dave Burt (Guest)
on 2006-05-05 23:41
(Received via mailing list)
Sean Clark wrote:
> ...
> "M1234567","John","A","Doe","321 NORTH "B"
> ST","","Sometown","ST","55555"
>
> Is there a way to get around this error and escape the "B" properly
> before opening the file in CSV.open, or would I be better to just flag
> this record and move on?

Escape quotes by doubling them:

  gsub('"B"', '""B""')

Cheers,
Dave
A6b74fef495a52a7d868534e10024091?d=identicon&s=25 Sean Clark (smc7000)
on 2006-05-06 18:02
All great ideas! Thanks.
A6b74fef495a52a7d868534e10024091?d=identicon&s=25 Sean Clark (smc7000)
on 2006-07-10 00:18
Bira wrote:
> line.gsub(/[^,]"[^,]/,"'")

Bira, I'm testing your idea with the below script but I'm having
problems. Thanks for the start though.
TEST PROGRAM:
line = "\"NAME\",\"610 \"A\" STREET\",\"STATE\",\"POSTAL_CODE\""
puts line
# if double quote not preceeded by a comman and not followed
# by a comma, then replace the quotation with a single quote.
new_line = line.gsub(/[^,]"[^,]/,"'")
puts new_line

OUTPUT:
"NAME","610 "A" STREET","STATE","POSTAL_CODE"
"NAME","610'" STREET","STATE","POSTAL_CODE"


Any ideas regular expression masters?
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.