Forum: Ruby Re: Q about the FasterCSV

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
6087a044557d6b59ab52e7dd20f94da8?d=identicon&s=25 Peña, Botp (Guest)
on 2006-04-27 09:35
(Received via mailing list)
Dave Burt [mailto:dave@burt.id.au] :

# No, it's not a bug. CSV is a simple delimited format. It's
# delimited by
# a comma character, not a comma then some arbitrary whitespace. That's
# how Microsoft's Excel and Access and SQL Server parsers deal
# with it, too.

Dave, you're a cool rubyist. I think you are cooler than microsoft's.

and mabye, fastercsv can be more "intelligent" than other csv by

1) ignoring extra spaces in a captured separated value

  eg,

  test, "1" ==> ["test","1"]

  iow, quotes rule (as in shellwords)

2) not ignore spaces yet escape the quotes

  eg,

  test, "1" ==> ["test"," \"1\""]
  test, "1"111 ==> ["test"," \"1\"111"]

3) or maybe, fastercsv can include an option/flag to allow the above


Also, it would be nice if fastercsv could show what particular field it
balked

kind regards -botp
F1c441ba5569ec34d0613fe8c218e076?d=identicon&s=25 Eric Luo (Guest)
on 2006-04-27 10:03
(Received via mailing list)
I agree!

I'm really missing this feature for my current work!
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-04-27 13:59
(Received via mailing list)
On Apr 27, 2006, at 2:32 AM, Peña, Botp wrote:

> and mabye, fastercsv can be more "intelligent" than other csv by

FasterCSV is intentionally a strict parser.  For one thing, that
helps a lot with the speed.

> Also, it would be nice if fastercsv could show what particular
> field it balked

Sheesh, I just got it doing line numbers very recently.  (Hard in CSV
were \n can be embedded in a field.)  It's never enough...  ;)

James Edward Gray II
0b561a629b87f0bbf71b45ee5a48febb?d=identicon&s=25 Dave Burt (Guest)
on 2006-04-27 17:30
(Received via mailing list)
Peña wrote:
> Dave, you're a cool rubyist. I think you are cooler than microsoft's.

Thanks, I think. (I don't know how many Rubyists Microsoft has - I don't
recall anyone on this list signing their email with an MS
certification.)

> 2) not ignore spaces yet escape the quotes
>
>   eg,
>
>   test, "1" ==> ["test"," \"1\""]
>   test, "1"111 ==> ["test"," \"1\"111"]
>
> 3) or maybe, fastercsv can include an option/flag to allow the above

Let's choose option 1. Ruby lets you modify classes from libraries.
Let's call this "lenient_and_still_a_little_bit_faster_csv.rb":

require 'faster_csv'
class FasterCSV
  # Pre-compiles parsers and stores them by name for access during
  # reads, just like the official FasterCSV version, BUT the central
  # parser allows arbitrary whitespace before and after the column
  # separator.
  def init_parsers( options )
    # prebuild Regexps for faster parsing
    @parsers    = {
      :leading_fields =>
        /\A#{Regexp.escape(@col_sep)}+/,      # for empty leading fields
      :csv_row        =>
        ### The Primary Parser ###
        / \G(?:^|#{Regexp.escape(@col_sep)})     # anchor the match
          \s*                          # <----- # ignore some whitespace
          (?: "((?>[^"]*)(?>""[^"]*)*)"          # find quoted fields
              |                                  # ... or ...
              ([^"#{Regexp.escape(@col_sep)}]*)  # unquoted fields
              )/x,
        ### End Primary Parser ###
      :line_end       =>
        /#{Regexp.escape(@row_sep)}\Z/           # safer than chomp!()
    }
  end
end

All that code except for the line consisting entirely of "\s*" was taken
from FasterCSV 0.2.0, and I should have asked Gray Productions for
permission to republish it, but I don't think Mr. Gray will mind this
particular use of his excellent work.

Cheers,
Dave
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-04-27 17:36
(Received via mailing list)
On Apr 27, 2006, at 10:28 AM, Dave Burt wrote:

>     # prebuild Regexps for faster parsing
>     @parsers    = {
>       :leading_fields =>
>         /\A#{Regexp.escape(@col_sep)}+/,      # for empty leading
> fields

You should modify the above line too.  It takes both to correctly
parse some lines:

/\A\s*#{Regexp.escape(@col_sep)}+/

>       :line_end       =>
>         /#{Regexp.escape(@row_sep)}\Z/           # safer than chomp!()
>     }
>   end
> end
>
> All that code except for the line consisting entirely of "\s*" was
> taken
> from FasterCSV 0.2.0, and I should have asked Gray Productions for
> permission to republish it, but I don't think Mr. Gray will mind this
> particular use of his excellent work.

Looks good to me.  Just don't hold your breath waiting on the
patch...  ;)

James Edward Gray II
0b561a629b87f0bbf71b45ee5a48febb?d=identicon&s=25 Dave Burt (Guest)
on 2006-04-27 17:55
(Received via mailing list)
James Edward Gray II wrote:
> On Apr 27, 2006, at 10:28 AM, Dave Burt wrote:
>>       :leading_fields =>
>>         /\A#{Regexp.escape(@col_sep)}+/,      # for empty leading fields
>
> You should modify the above line too.  It takes both to correctly parse
> some lines:
>
> /\A\s*#{Regexp.escape(@col_sep)}+/

I looked at this, but I deduced from [1] that a number of fields equal
to the match size are added, so (I guess) "  ,, foo" would get extra
leading fields: [nil, nil, nil, nil, "foo"]. So I skipped it. I'm also
guessing the OP doesn't need it, anyway.

> Looks good to me.  Just don't hold your breath waiting on the patch...  ;)

Oh, I don't want the patch. It's a terrible idea! "foo, bar, 'baz'"
aren't CSV, they're CASWSSV (comma and some white-space separated
values). That's got to be a whole new library :)

Cheers,
Dave

[1] faster_csv.rb lines 1114..1115:
      csv = if parse.sub!(@parsers[:leading_fields], "")
        [nil] * $&.length

P.S.: There's a bug here, and not just here, I think. Maybe
init_separators should raise an exception if @col_sep.size != 1, or use
options[:col_sep][0,1]. It currently barfs late and in various
interesting ways for multi-character values of col_sep.
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-04-27 18:44
(Received via mailing list)
On Apr 27, 2006, at 10:52 AM, Dave Burt wrote:

>> /\A\s*#{Regexp.escape(@col_sep)}+/
> aren't CSV, they're CASWSSV (comma and some white-space separated
> init_separators should raise an exception if @col_sep.size != 1, or
> use
> options[:col_sep][0,1]. It currently barfs late and in various
> interesting ways for multi-character values of col_sep.

Good points all around.  Dave knows this code better than I do,
clearly.  ;)

James Edward Gray II
0b561a629b87f0bbf71b45ee5a48febb?d=identicon&s=25 Dave Burt (Guest)
on 2006-04-27 20:46
(Received via mailing list)
James Edward Gray II wrote:
> Good points all around.  Dave knows this code better than I do,
> clearly.  ;)

Thanks, but credit to you -- IIRC part of your stated aim for FasterCSV
was to make it short, legible, and therefore maintainable, and if I can
pick up this stuff in literally one minute of looking at the code,
you've succeeded. Well done.

Cheers,
Dave
9e5e9418c2f3992fedf266b9328fbd06?d=identicon&s=25 Abu A. (abu_a)
on 2011-02-08 18:33
I'm trying to upload data into the database and I've done so using
paperclip.  However, l am having trouble loading the contents into the
database using fastercsv.  I am using Hobo, but I suppose after managing
to upload the csv file its standard RoR.

This is my model:

import.rb:

class Import < ActiveRecord::Base

 hobo_model # Don't put anything above this

 fields do
   datatype :string
   abu  :string
   paul :string
   age :integer
  timestamps
 end

 # Paperclip
         has_attached_file :csv
         validates_attachment_presence :csv
         validates_attachment_content_type :csv, :content_type =>
['text/csv','text/comma-separated-values','text/csv','application/csv','application/excel','application/vnd.ms-excel','application/vnd.msexcel','text/anytext','text/plain']

this works fine and it loads the csv file in public/systems/csvs

I am having trouble using Fastercsv to load the contents into the
database.

Can you point me to the right direction with this please.

Thanks in advance.

Abu
259f23c3b129f07b0c496b9f0495f07e?d=identicon&s=25 James Edward Gray II (Guest)
on 2011-02-08 20:43
(Received via mailing list)
On Feb 8, 2011, at 11:33 AM, Abu A. wrote:

> I am having trouble using Fastercsv to load the contents into the
> database.

I'm not totally sure I understand the question, but loading data with
FasterCSV is usually done something like:

 FSCV.foreach( path, :header            => true,
                     :header_converters => :symbol ) do |row|
   SomeModel.create!(row.to_hash)
 end

Hope that helps.

James Edward Gray II
This topic is locked and can not be replied to.