FasterCSV problem

markh · August 28, 2006, 10:42pm

Is there any way to make the faster CSV library parse this line?

20 6" Multibrand Pricer Insert 2 4

I know i can use the :col_sep options to change the column separator to
a
tab, but it fails to parse this because of an unclosed quoted field. It
seems like there should be an option to say that the fields are not
quoted.

Thanks,

markh · August 28, 2006, 11:24pm

On Tue, 29 Aug 2006, Mark Van H. wrote:

Is there any way to make the faster CSV library parse this line?

20 6" Multibrand Pricer Insert 2 4

I know i can use the :col_sep options to change the column separator to a
tab, but it fails to parse this because of an unclosed quoted field. It
seems like there should be an option to say that the fields are not quoted.

Thanks,

it that’s indeed the case why not simply do it yourself?

 harp:~ > cat a.rb
 require 'rubygems'
 require 'fastercsv'

 def munge line
   line.gsub!(%r/"+/){|q| q.size % 2 == 0 ? q : '"' + q}
   line.gsub!(%r/\ *\t\ */, '","')
   "%s%s%s" % ['"', line, '"']
 end

 def show line
   puts line
   munged = munge line
   puts munged
   p(FCSV.parse(munged).first)
   puts
 end

 lines = <<-lines
 20      6"      Multibrand      Pricer  Insert  2       4
 20      6""     Multibrand      Pricer  Insert  2       4
 20      6"""    Multibrand      Pricer  Insert  2       4
 20      6""""   Multibrand      Pricer  Insert  2       4
 lines

 lines.each{|line| show line.strip}

 harp:~ > ruby a.rb
 20      6"      Multibrand      Pricer  Insert  2       4
 "20","6""","Multibrand","Pricer","Insert","2","4"
 ["20", "6\"", "Multibrand", "Pricer", "Insert", "2", "4"]

 20      6""     Multibrand      Pricer  Insert  2       4
 "20","6""","Multibrand","Pricer","Insert","2","4"
 ["20", "6\"", "Multibrand", "Pricer", "Insert", "2", "4"]

 20      6"""    Multibrand      Pricer  Insert  2       4
 "20","6""""","Multibrand","Pricer","Insert","2","4"
 ["20", "6\"\"", "Multibrand", "Pricer", "Insert", "2", "4"]

 20      6""""   Multibrand      Pricer  Insert  2       4
 "20","6""""","Multibrand","Pricer","Insert","2","4"
 ["20", "6\"\"", "Multibrand", "Pricer", "Insert", "2", "4"]

if fastercsv handled all the ‘simple’ exectptions is would be slow
and
complicated to maintain.

kind regards.

-a

markh · August 28, 2006, 11:26pm

On Aug 28, 2006, at 3:40 PM, Mark Van H. wrote:

quoted.
Well, if quotes aren’t quoted it’s not CVS and all the parser you
really need is:

line.split("\t")

right?

FasterCSV uses a very strict parser, so no it won’t allow this. Sorry.

James Edward G. II

markh · August 29, 2006, 12:23am

I did end up cleaning the row myself. I just wondered if I was missing
the
option somewhere. The only reason I ask is because Excel/OOCalc allow
you to
say whether or not fields are surrounded by "'s. It would be a nice
option.

mark

markh · August 29, 2006, 12:43am

FasterCSV is for parsing CSV. Without quoting, we are not talking
about CSV.

Technically, yes.

Can you please explain how fields = line.split("\t") fails you?

This would work in my situation just fine.

If there’s a real need for this, I’ll consider it. But right now I

would implement it as the above and I hope that’s not what your
asking for.

If this is something you dont thinks should be in the CSV library,
because
is it not actually “official” csv, then that is fine. I look at that
file as
being “almost” CSV (with the exception of putting "'s around fields).
The
only reason I even ran into this is because mysql outputs bad csv

mark

markh · August 29, 2006, 12:31am

On Aug 28, 2006, at 5:22 PM, Mark Van H. wrote:

I did end up cleaning the row myself. I just wondered if I was
missing the
option somewhere. The only reason I ask is because Excel/OOCalc
allow you to
say whether or not fields are surrounded by "'s. It would be a nice
option.

I guess I’m dense today…

FasterCSV is for parsing CSV. Without quoting, we are not talking
about CSV.

Can you please explain how fields = line.split("\t") fails you?

If there’s a real need for this, I’ll consider it. But right now I
would implement it as the above and I hope that’s not what your
asking for.

James Edward G. II

markh · August 29, 2006, 12:59am

On Aug 28, 2006, at 5:42 PM, Mark Van H. wrote:

If this is something you dont thinks should be in the CSV library,
because
is it not actually “official” csv, then that is fine.

Well, it’s more that I don’t see what I can give you that split()
doesn’t. Hard for me to improve on that, you know?

I look at that file as
being “almost” CSV (with the exception of putting "'s around fields).

In proper CSV the 6" field would really be:

“6"”"

It’s pretty different. Without the quotes it’s illegal to use \t,
\r, and \n in fields (I assume). There’s just really nothing there
you need a parser for, in my opinion.

James Edward G. II