FasterCSV parsing issues

I’m using FasterCSV to do an import into my DB, and the CSV file
contains European words. I have French, Italian, and German words which
contain accents and such. When I try the import it throws a
FasterCSV::MalformedCSV error, but if I remove just the letters with
accents on them, it will upload just fine.

Here is a sample row:

Universal,ID,Kir,“Commonly, white wine with Cassis. Traditionally, the
cocktail kir (also known as vin blanc cassis in French) is made with
Aligoté. Kir Royal is made with Champagne instead of Aligoté.”

Notice the 2 “e” with accents on them. I can remove these and it’s fine.
I’m assuming this is an encoding issue. The CSV file could be created by
any number of people in any number of different locations using any
number of programs. Do I need to do something like use Iconv to convert
to a standard encoding first, then upload?

Thanks

~Jeremy

I’ve had similar issues recently, and they are due to character
encodings. Something like Iconv will probably be necessary to convert
the files to a standard encoding

On Wed, Dec 1, 2010 at 11:52 AM, Jeremy W.

Nathaniel Smith wrote in post #965441:

I’ve had similar issues recently, and they are due to character
encodings. Something like Iconv will probably be necessary to convert
the files to a standard encoding

On Wed, Dec 1, 2010 at 11:52 AM, Jeremy W.

I’ve never actually used Iconv before, but I was just reading
http://blog.grayproductions.net/articles/encoding_conversion_with_iconv
and I did a test. I converted from ISO8859-1 to UTF8, and that actually
changes the characters, so it changes the meaning of the words. Now,
this is assuming that the CSV files I’m getting are all ISO8859-1
encoded (which I think they are).

I tried a test to just tell FasterCSV to read it as 'ISO8859-1’using the
first 3 lines of this CSV file:

Universal,ID,Kir,“Commonly, white wine with Cassis. Traditionally, the
cocktail kir (also known as vin blanc cassis in French) is made with
AligotÈ. Kir Royal is made with Champagne instead of AligotÈ.”
Universal,GRAPE,MourvËdre / Monastrell / Mataro,"Grape: MourvËdre,
MatarÛ, or Monastrell is variety of grape used to make both strong, dark
red wines and rosÈs. It is grown in many regions around the world.
Universal,Tasting,Leafy,Specific aroma/taste descriptor: Having the
smell or taste sensation of Leaves.

ruby-1.8.7-p302 > file = File.open(File.join(Rails.root, ‘public’,
‘sample.csv’))
=> #<File:/Users/jeremywoertink/Sites/winovations/public/sample.csv>
ruby-1.8.7-p302 > csv = FasterCSV.new(file, :encoding => ‘ISO8859-1’)
=> <#FasterCSV io_type:File
io_path:“/Users/jeremywoertink/Sites/winovations/public/sample.csv”
lineno:0 col_sep:“,” row_sep:“\n” quote_char:“"” encoding:“ISO8859-1”>
ruby-1.8.7-p302 > csv.each { |row| puts row }
Universal
ID
Kir
Commonly, white wine with Cassis. Traditionally, the cocktail kir (also
known as vin blanc cassis in French) is made with AligotÈ. Kir Royal is
made with Champagne instead of AligotÈ.
FasterCSV::MalformedCSVError: Unclosed quoted field on line 2.
from
/Users/jeremywoertink/.rvm/gems/ruby-1.8.7-p302/gems/fastercsv-1.5.3/lib/faster_csv.rb:1663:in
shift' from /Users/jeremywoertink/.rvm/gems/ruby-1.8.7-p302/gems/fastercsv-1.5.3/lib/faster_csv.rb:1581:in loop’
from
/Users/jeremywoertink/.rvm/gems/ruby-1.8.7-p302/gems/fastercsv-1.5.3/lib/faster_csv.rb:1581:in
shift' from /Users/jeremywoertink/.rvm/gems/ruby-1.8.7-p302/gems/fastercsv-1.5.3/lib/faster_csv.rb:1526:in each’
from (irb):28

I’m not seeing any unclosed quotes… Also, I thought that when you
iterate through the returned csv file, it gives you rows, but this one
seems to be giving my columns on the first row, then dies when it hits
the second row.

On Dec 1, 2010, at 10:52 AM, Jeremy W. wrote:

I’m using FasterCSV to do an import into my DB, and the CSV file
contains European words. I have French, Italian, and German words which
contain accents and such. When I try the import it throws a
FasterCSV::MalformedCSV error, but if I remove just the letters with
accents on them, it will upload just fine.

The CSV file could be created by
any number of people in any number of different locations using any
number of programs. Do I need to do something like use Iconv to convert
to a standard encoding first, then upload?

Yes, that’s exactly the strategy you need to adopt.

James Edward G. II

scratch that… I found the missing quote (-_-) my bad.

On Dec 1, 2010, at 12:16 PM, Jeremy W. wrote:

I’ve never actually used Iconv before, but I was just reading
http://blog.grayproductions.net/articles/encoding_conversion_with_iconv
and I did a test. I converted from ISO8859-1 to UTF8, and that actually
changes the characters, so it changes the meaning of the words. Now,
this is assuming that the CSV files I’m getting are all ISO8859-1
encoded (which I think they are).

You probably want to hit the files with some encoding guessing script to
be sure.

I tried a test to just tell FasterCSV to read it as 'ISO8859-1’using the
first 3 lines of this CSV file:

ruby-1.8.7-p302 >

On Ruby 1.8.7, FasterCSV supports only four encodings (the same four
Ruby does) and Latin-1 (ISO-8859-1) isn’t one of them. You need to
transcode the data to UTF-8 on the way in or use the standard CSV
library in Ruby 1.9 (which can parse Latin-1 directly).

James Edward G. II

On Dec 2, 2010, at 11:48 AM, Jeremy W. wrote:

I’ve upgraded to Ruby 1.9.2 now, but I’m still running into weird
issues. How come I can only parse a file once?

For the same reason you could only read from an IO object once: it’s
tracking your position. You’re not at the end. However, you could
“rewind” it:

csv = CSV.open(File.join(Rails.root, ‘public’, ‘sample.csv’))
csv.each { |row| }
csv.rewind
csv.each { |row| }

Hope that helps.

James Edward G. II

Thanks for the info, James.

I’ve upgraded to Ruby 1.9.2 now, but I’m still running into weird
issues. How come I can only parse a file once?

ruby-1.9.2-p0 > file = File.open(File.join(Rails.root, ‘public’,
‘sample.csv’))
=> #<File:/Users/jeremywoertink/Sites/winovations/public/sample.csv>
ruby-1.9.2-p0 > csv = CSV.new(file)
=> <#CSV io_type:File
io_path:"/Users/jeremywoertink/Sites/winovations/public/sample.csv"
encoding:ISO-8859-1 lineno:0 col_sep:"," row_sep:"\n" quote_char:""">
ruby-1.9.2-p0 > csv.each { |row| puts row[1] }


ruby-1.9.2-p0 > csv.each { |row| puts row[1] }
=> nil

Thanks,
~Jeremy

Oh. I guess I don’t spend enough time with IO stuff :stuck_out_tongue: I wasn’t aware of
that. Makes sense though!

Ok, sorry to throw all these out here, but I’m trying to understand this
whole thing :stuck_out_tongue:

Ok, so In my sample.csv, I have 1481 lines (according to textmate). When
I print out the rows and line numbers in the console, it gets to line
1409 then stops and returns nil. There’s no error or anything. Is there
a limitation, or would this be caused from a malformed csv file?

ok, actually… I think I get that last one. It’s saying there’s 1409
rows, not technically line numbers because there seems to be some
breaks.

duh… Ok, now if I can just figure out this “Unclosed quoted field”
error and how to avoid it, I’ll be good!

Thanks!

On Dec 2, 2010, at 12:09 PM, Jeremy W. wrote:

Oh. I guess I don’t spend enough time with IO stuff :stuck_out_tongue: I wasn’t aware of
that. Makes sense though!

Ok, sorry to throw all these out here, but I’m trying to understand this
whole thing :stuck_out_tongue:

No worries.

Ok, so In my sample.csv, I have 1481 lines (according to textmate). When
I print out the rows and line numbers in the console, it gets to line
1409 then stops and returns nil. There’s no error or anything. Is there
a limitation, or would this be caused from a malformed csv file?

It would probably be do to CSV content like:

one,“multi-line
two”,three

TextMate would count that as two lines (it is) but it’s only one row of
CSV data.

James Edward G. II

On Dec 2, 2010, at 12:15 PM, Jeremy W. wrote:

Ok, now if I can just figure out this “Unclosed quoted field”
error and how to avoid it, I’ll be good!

That most likely extends from some invalid CSV data.

James Edward G. II

James Edward G. II wrote in post #965490:

ruby-1.8.7-p302 >

On Ruby 1.8.7, FasterCSV supports only four encodings (the same four
Ruby does) and Latin-1 (ISO-8859-1) isn’t one of them.

But binary (-Kn) is one of them, and that should be fine for ISO-8859-1,
shouldn’t it?

OP, are you running on a Mac by any chance? Apple built Ruby for OSX
with a non-standard configuration so that $KCODE=“UTF8” by default. Try
using:

ruby -e ‘puts $KCODE’

If it says UTF8, then try running your script again with ruby -Kn

On Dec 2, 2010, at 5:02 PM, Brian C. wrote:

James Edward G. II wrote in post #965490:

ruby-1.8.7-p302 >

On Ruby 1.8.7, FasterCSV supports only four encodings (the same four
Ruby does) and Latin-1 (ISO-8859-1) isn’t one of them.

But binary (-Kn) is one of them, and that should be fine for ISO-8859-1,
shouldn’t it?

Ah, yes. Excellent point.

James Edward G. II