Re: Q about the FasterCSV


#1

Dave B. [mailto:removed_email_address@domain.invalid] :

No, it’s not a bug. CSV is a simple delimited format. It’s

delimited by

a comma character, not a comma then some arbitrary whitespace. That’s

how Microsoft’s Excel and Access and SQL Server parsers deal

with it, too.

Dave, you’re a cool rubyist. I think you are cooler than microsoft’s.

and mabye, fastercsv can be more “intelligent” than other csv by

  1. ignoring extra spaces in a captured separated value

eg,

test, “1” ==> [“test”,“1”]

iow, quotes rule (as in shellwords)

  1. not ignore spaces yet escape the quotes

eg,

test, “1” ==> [“test”," “1"”]
test, “1"111 ==> [“test”,” “1"111”]

  1. or maybe, fastercsv can include an option/flag to allow the above

Also, it would be nice if fastercsv could show what particular field it
balked

kind regards -botp


#2

I agree!

I’m really missing this feature for my current work!


#3

On Apr 27, 2006, at 2:32 AM, Peña, Botp wrote:

and mabye, fastercsv can be more “intelligent” than other csv by

FasterCSV is intentionally a strict parser. For one thing, that
helps a lot with the speed.

Also, it would be nice if fastercsv could show what particular
field it balked

Sheesh, I just got it doing line numbers very recently. (Hard in CSV
were \n can be embedded in a field.) It’s never enough… :wink:

James Edward G. II


#4

Peña wrote:

Dave, you’re a cool rubyist. I think you are cooler than microsoft’s.

Thanks, I think. (I don’t know how many Rubyists Microsoft has - I don’t
recall anyone on this list signing their email with an MS
certification.)

  1. not ignore spaces yet escape the quotes

eg,

test, “1” ==> [“test”," “1"”]
test, “1"111 ==> [“test”,” “1"111”]

  1. or maybe, fastercsv can include an option/flag to allow the above

Let’s choose option 1. Ruby lets you modify classes from libraries.
Let’s call this “lenient_and_still_a_little_bit_faster_csv.rb”:

require ‘faster_csv’
class FasterCSV

Pre-compiles parsers and stores them by name for access during

reads, just like the official FasterCSV version, BUT the central

parser allows arbitrary whitespace before and after the column

separator.

def init_parsers( options )
# prebuild Regexps for faster parsing
@parsers = {
:leading_fields =>
/\A#{Regexp.escape(@col_sep)}+/, # for empty leading fields
:csv_row =>
### The Primary Parser ###
/ \G(?:^|#{Regexp.escape(@col_sep)}) # anchor the match
\s* # <----- # ignore some whitespace
(?: “((?>[^”])(?>""[^"]))" # find quoted fields
| # … or …
([^"#{Regexp.escape(@col_sep)}]
) # unquoted fields
)/x,
### End Primary Parser ###
:line_end =>
/#{Regexp.escape(@row_sep)}\Z/ # safer than chomp!()
}
end
end

All that code except for the line consisting entirely of “\s*” was taken
from FasterCSV 0.2.0, and I should have asked Gray Productions for
permission to republish it, but I don’t think Mr. Gray will mind this
particular use of his excellent work.

Cheers,
Dave


#5

James Edward G. II wrote:

On Apr 27, 2006, at 10:28 AM, Dave B. wrote:

  :leading_fields =>
    /\A#{Regexp.escape(@col_sep)}+/,      # for empty leading fields

You should modify the above line too. It takes both to correctly parse
some lines:

/\A\s*#{Regexp.escape(@col_sep)}+/

I looked at this, but I deduced from [1] that a number of fields equal
to the match size are added, so (I guess) " , foo" would get extra
leading fields: [nil, nil, nil, nil, “foo”]. So I skipped it. I’m also
guessing the OP doesn’t need it, anyway.

Looks good to me. Just don’t hold your breath waiting on the patch… :wink:

Oh, I don’t want the patch. It’s a terrible idea! “foo, bar, ‘baz’”
aren’t CSV, they’re CASWSSV (comma and some white-space separated
values). That’s got to be a whole new library :slight_smile:

Cheers,
Dave

[1] faster_csv.rb lines 1114…1115:
csv = if parse.sub!(@parsers[:leading_fields], “”)
[nil] * $&.length

P.S.: There’s a bug here, and not just here, I think. Maybe
init_separators should raise an exception if @col_sep.size != 1, or use
options[:col_sep][0,1]. It currently barfs late and in various
interesting ways for multi-character values of col_sep.


#6

On Apr 27, 2006, at 10:28 AM, Dave B. wrote:

# prebuild Regexps for faster parsing
@parsers    = {
  :leading_fields =>
    /\A#{Regexp.escape(@col_sep)}+/,      # for empty leading  

fields

You should modify the above line too. It takes both to correctly
parse some lines:

/\A\s*#{Regexp.escape(@col_sep)}+/

  :line_end       =>
    /#{Regexp.escape(@row_sep)}\Z/           # safer than chomp!()
}

end
end

All that code except for the line consisting entirely of “\s*” was
taken
from FasterCSV 0.2.0, and I should have asked Gray Productions for
permission to republish it, but I don’t think Mr. Gray will mind this
particular use of his excellent work.

Looks good to me. Just don’t hold your breath waiting on the
patch… :wink:

James Edward G. II


#7

On Apr 27, 2006, at 10:52 AM, Dave B. wrote:

/\A\s*#{Regexp.escape(@col_sep)}+/
aren’t CSV, they’re CASWSSV (comma and some white-space separated
init_separators should raise an exception if @col_sep.size != 1, or
use
options[:col_sep][0,1]. It currently barfs late and in various
interesting ways for multi-character values of col_sep.

Good points all around. Dave knows this code better than I do,
clearly. :wink:

James Edward G. II


#8

I’m trying to upload data into the database and I’ve done so using
paperclip. However, l am having trouble loading the contents into the
database using fastercsv. I am using Hobo, but I suppose after managing
to upload the csv file its standard RoR.

This is my model:

import.rb:

class Import < ActiveRecord::Base

hobo_model # Don’t put anything above this

fields do
datatype :string
abu :string
paul :string
age :integer
timestamps
end

Paperclip

     has_attached_file :csv
     validates_attachment_presence :csv
     validates_attachment_content_type :csv, :content_type =>

[‘text/csv’,‘text/comma-separated-values’,‘text/csv’,‘application/csv’,‘application/excel’,‘application/vnd.ms-excel’,‘application/vnd.msexcel’,‘text/anytext’,‘text/plain’]

this works fine and it loads the csv file in public/systems/csvs

I am having trouble using Fastercsv to load the contents into the
database.

Can you point me to the right direction with this please.

Thanks in advance.

Abu


#9

On Feb 8, 2011, at 11:33 AM, Abu A. wrote:

I am having trouble using Fastercsv to load the contents into the
database.

I’m not totally sure I understand the question, but loading data with
FasterCSV is usually done something like:

FSCV.foreach( path, :header => true,
:header_converters => :symbol ) do |row|
SomeModel.create!(row.to_hash)
end

Hope that helps.

James Edward G. II


#10

James Edward G. II wrote:

Good points all around. Dave knows this code better than I do,
clearly. :wink:

Thanks, but credit to you – IIRC part of your stated aim for FasterCSV
was to make it short, legible, and therefore maintainable, and if I can
pick up this stuff in literally one minute of looking at the code,
you’ve succeeded. Well done.

Cheers,
Dave