I'm using csv module to read parse 76,000 rows of patient data in a CSV
file. I use the below line to read in the file and loop through the
rows.
CSV.open("patientfile.txt", "r") do |row|
When I get to a row like below the script blows up:
/usr/local/lib/ruby/1.8/csv.rb:639:in `get_row': CSV::IllegalFormatError
(CSV::IllegalFormatError)
from /usr/local/lib/ruby/1.8/csv.rb:556:in `each'
from /usr/local/lib/ruby/1.8/csv.rb:531:in `parse'
from /usr/local/lib/ruby/1.8/csv.rb:311:in `open_reader'
from /usr/local/lib/ruby/1.8/csv.rb:85:in `open'
from sync.rb:1
The row is similar to below. Note the embedded "B" within the address
field.
"M1234567","John","A","Doe","321 NORTH "B"
ST","","Sometown","ST","55555"
Is there a way to get around this error and escape the "B" properly
before opening the file in CSV.open, or would I be better to just flag
this record and move on?
on 2006-05-05 18:48
on 2006-05-05 18:58
On 5/5/06, Sean Clark <smc7000@gmail.com> wrote: > Is there a way to get around this error and escape the "B" properly > before opening the file in CSV.open, or would I be better to just flag > this record and move on? It's just a guess, but maybe you could try replacing every double-quote character that isn't either preceded or followed by a comma with a single quote? Something like the untested code below: line.gsub(/[^,]"[^,]/,"'") It would probably require reading the whole file first, writing out a corrected version, and then calling the CSV methods on that, but it beats doing it by hand :).
on 2006-05-05 20:38
On May 5, 2006, at 11:48 AM, Sean Clark wrote: > "M1234567","John","A","Doe","321 NORTH "B" > ST","","Sometown","ST","55555" Well, the long and the short of this story is that the above line is not valid CSV. Gotta fix that somehow: by hand, with a preprocessor, or by fixing the broken software that spit it out. :( James Edward Gray II
on 2006-05-05 23:41
Sean Clark wrote: > ... > "M1234567","John","A","Doe","321 NORTH "B" > ST","","Sometown","ST","55555" > > Is there a way to get around this error and escape the "B" properly > before opening the file in CSV.open, or would I be better to just flag > this record and move on? Escape quotes by doubling them: gsub('"B"', '""B""') Cheers, Dave
on 2006-07-10 00:18
Bira wrote:
> line.gsub(/[^,]"[^,]/,"'")
Bira, I'm testing your idea with the below script but I'm having
problems. Thanks for the start though.
TEST PROGRAM:
line = "\"NAME\",\"610 \"A\" STREET\",\"STATE\",\"POSTAL_CODE\""
puts line
# if double quote not preceeded by a comman and not followed
# by a comma, then replace the quotation with a single quote.
new_line = line.gsub(/[^,]"[^,]/,"'")
puts new_line
OUTPUT:
"NAME","610 "A" STREET","STATE","POSTAL_CODE"
"NAME","610'" STREET","STATE","POSTAL_CODE"
Any ideas regular expression masters?
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.