Forum: Ruby FasterCSV 0.1.6 -- With Header Support!

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
James G. (Guest)
on 2006-02-26 03:20
(Received via mailing list)
FasterCSV 0.1.6 Released
========================

The first couple of releases brought raw speed and plenty of it.
This release puts all those spare cycles to work with a ton of new
data-centric features.

Want to access you CSV files by the names of the header rows or get
back real data objects instead of just Strings?  Then this release
has what you need.

This release also fixes the number one complaint with automatic line
ending detection (now the default).

What is FasterCSV?
------------------

(from the README)

FasterCSV is intended as a replacement to Ruby's standard CSV
library.  It was designed to address concerns users of that library
had and it has three primary goals:

1.  Be significantly faster than CSV while remaining a pure Ruby
library.
2.  Use a smaller and easier to maintain code base.  (We're about
even now,
     but not if you compare the features!)
3.  Improve on the CSV interface.

What's New?
-----------

(highlights from the CHANGELOG)

* Added built-in and custom data converters.  Built-in handle numbers
and dates.
* Added Array#to_csv and String#parse_csv.  Both accept normal options.
* Added auto-discovery for <tt>:row_sep</tt> (now the default).
* Added FasterCSV::filter() for easy Unix-like CSV filters.
* Added support for accessing fields by headers.
   * Headers can have their own converters.
   * Headers can be skipped or returned as needed.
   * FasterCSV::Row allows index or header access while retaining
order and
     allowing for duplicate headers.

Migrating from CSV to FasterCSV?
--------------------------------

The README includes a section on the differences and you can read
that here:

http://fastercsv.rubyforge.org/

You call also see general usage in the documentation of the
interface, right here:

http://fastercsv.rubyforge.org/classes/FasterCSV.html

If FasterCSV isn't meeting your needs, I want to here about it:

removed_email_address@domain.invalid

Where can I learn more?
-----------------------

FasterCSV is hosted on RubyForge.

Project page:   http://rubyforge.org/projects/fastercsv/
Documentation:  http://fastercsv.rubyforge.org/
Downloads:      http://rubyforge.org/frs/?group_id=1102

How do I get FasterCSV?
-----------------------

FasterCSV is a gem, so as long as you have RubyGems installed it's as
simple as:

$ sudo gem install fastercsv

If you need to install RubyGems, you can download it from:

http://rubyforge.org/frs/?group_id=126&release_id=2471

FasterCSV can also be installed manually.  Just download the latest
release and follow the instructions in INSTALL:

http://rubyforge.org/frs/?group_id=1102&release_id=4438

James Edward G. II
Gregory B. (Guest)
on 2006-02-26 03:23
(Received via mailing list)
On 2/25/06, James Edward G. II <removed_email_address@domain.invalid> wrote:
>
> This release also fixes the number one complaint with automatic line
> ending detection (now the default).

Cool James!  I just ran the units on Ruport, for which you are an
(optional) dependency.  Your new release didn't break anything :)
James G. (Guest)
on 2006-02-26 07:26
(Received via mailing list)
On Feb 25, 2006, at 7:17 PM, James Edward G. II wrote:

> FasterCSV 0.1.6 Released
> ========================

I was in such a hurry to get this out (was almost late to the
symphony!), I almost forgot the most important part!  The features in
this release were partially funded by B-Tree Technology and Stone
Code Productions.  How cool is that to get paid to right free
software?  Thanks guys!

James Edward G. II
PA (Guest)
on 2006-02-26 15:42
(Received via mailing list)
On Feb 26, 2006, at 02:17, James Edward G. II wrote:

> 1.  Be significantly faster than CSV while remaining a pure Ruby
> library.

Couldn't resist the temptation to check how much faster 8^)

Here are some numbers parsing ip-to-country.csv (3MB, 63726 lines)[1].


[ruby csv]
% ruby -v
ruby 1.8.4 (2005-12-24) [powerpc-darwin7.9.0]
% /usr/bin/time ruby TestCSV.rb
        79.06 real        74.42 user         0.30 sys


[ruby faster_csv]
% ruby -v
ruby 1.8.4 (2005-12-24) [powerpc-darwin7.9.0]
% /usr/bin/time ruby TestFasterCSV.rb
         7.56 real         7.16 user         0.19 sys


[lua LUCSV][2][3]
% lua -v
Lua 5.1  Copyright (C) 1994-2006 Lua.org, PUC-Rio
% /usr/bin/time lua TestCSV.lua
         2.79 real         2.45 user         0.09 sys


[python csv][4][5]
% python -V
Python 2.4.2
% /usr/bin/time python TestCSV.py
         0.85 real         0.79 user         0.04 sys


Cheers

--
PA, Onnay Equitursay
http://alt.textdrive.com/


[1] http://ip-to-country.webhosting.info/node/view/6
[2] http://www.lua.org/about.html
[3] http://dev.alt.textdrive.com/browser/lu/LUCSV.lua
[4] http://www.python.org/doc/2.4.2/lib/module-csv.html
[5] Python provides a C implementation of the CSV parser.
James G. (Guest)
on 2006-02-27 18:20
(Received via mailing list)
On Feb 25, 2006, at 7:22 PM, Gregory B. wrote:

>> has what you need.
>>
>> This release also fixes the number one complaint with automatic line
>> ending detection (now the default).
>
> Cool James!  I just ran the units on Ruport, for which you are an
> (optional) dependency.  Your new release didn't break anything :)

That's either a miracle or a sign of poor test coverage, because I
broke quite a bit.  :)

Following my annoying new habit of release-then-make-it-work,
FasterCSV 0.1.8 is now out, and resolves all of the issues found so
far...

Sorry about the hassle.

James Edward G. II
Gregory B. (Guest)
on 2006-02-27 20:24
(Received via mailing list)
On 2/27/06, James Edward G. II <removed_email_address@domain.invalid> wrote:
> On Feb 25, 2006, at 7:22 PM, Gregory B. wrote:

> > Cool James!  I just ran the units on Ruport, for which you are an
> > (optional) dependency.  Your new release didn't break anything :)
>
> That's either a miracle or a sign of poor test coverage, because I
> broke quite a bit.  :)

It's a result of simple needs. I only call two FasterCSV functions.

> Following my annoying new habit of release-then-make-it-work,
> FasterCSV 0.1.8 is now out, and resolves all of the issues found so
> far...

Still passing tests on 0.1.8, FYI

Is this release-then-make-it-work an extension of your svn/cvs double
commit habit?
;)
Brian Moelk (Guest)
on 2006-02-27 20:36
(Received via mailing list)
I've found some discussions about support for Windows CE devices
(http://tinyurl.com/hq23g), I was wondering if there was any new
information
regarding Windows Mobile 5.0 support?

Regards,
Brian Moelk
James G. (Guest)
on 2006-02-27 21:25
(Received via mailing list)
On Feb 27, 2006, at 12:21 PM, Gregory B. wrote:

> Is this release-then-make-it-work an extension of your svn/cvs double
> commit habit?
> ;)

Hey, that move is my unique signature.  We all need at least one.  ;)

Sadly, FasterCSV required one more release to get it working
everywhere:  0.1.9 is out now.  :(

A big thanks to Michael Schoen for help in resolving all of these new
issues!

James Edward G. II
Wilson B. (Guest)
on 2006-02-27 22:09
(Received via mailing list)
On 2/27/06, James Edward G. II <removed_email_address@domain.invalid> wrote:
> >> Want to access you CSV files by the names of the header rows or get
> broke quite a bit.  :)
>
> Following my annoying new habit of release-then-make-it-work,
> FasterCSV 0.1.8 is now out, and resolves all of the issues found so
> far...
>

Ruport has something like 3400 unit tests, if I recall, so I'm going
to cast my vote for 'miracle'. :)
Gregory B. (Guest)
on 2006-02-28 02:15
(Received via mailing list)
On 2/27/06, Wilson B. <removed_email_address@domain.invalid> wrote:

> Ruport has something like 3400 unit tests, if I recall, so I'm going
> to cast my vote for 'miracle'. :)
>
>

34000+ assertions, almost all of them for Ruport::Parser.  (Which are
the units from Parse::Input that James wrote! ;) )

Lest people say 'wow', there are only 60ish tests.

Test coverage went from near 1:1 to about 60% in the last release. :-/
However, the tests covering CSV/FasterCSV are so simple I'd be amazed
if they were faulty.

I'm releasing again either tonight or tomorrow with a large chunk of
code cleanup and a whole lot more tests...

I am making a promise that if anyone finds a problem in Ruport,from
now on I'll at least write a failing test cornering it, so um... let's
try to avoid miracles AND poor test coverage? ;)

Still passing here, despite the triple shot :)
Sascha E. (Guest)
on 2006-02-28 02:46
(Received via mailing list)
Hi James,

first, thanks for FasterCSV. It is very useful. I have been wildly using
it
the last couple of days.

> Sadly, FasterCSV required one more release to get it working
> everywhere:  0.1.9 is out now.  :(

Could you please make a couple of small examples of how each of those
new
features is supposed to be used?

Another idea. I am no C expert. But maybe it is worth to do an optional
parser as a C module. Maybe with the help of RubyInline? Especially the
parse method would be a great target.

Sa?a Ebach
Sascha E. (Guest)
on 2006-02-28 03:47
(Received via mailing list)
I just had a look into the FasterCSV class and I must admit that I
didn't
expect such a large and highly engineered class. That is a nice piece of
work. Nothing I can fully understand in 2 minutes... ;)

I ran a profile on the test_data.csv file.

$ cat prof.rb
require 'faster_csv'
require 'profile'
FasterCSV.read(ARGV.first)

$ ruby prof.rb
/lib/ruby/gems/1.8/gems/fastercsv-0.1.9/test/test_data.csv
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  56.47   131.37    131.37    16161     8.13    11.69  String#gsub!
  10.04   154.73     23.36    16162     1.45    14.11  Kernel.loop
   5.13   166.67     11.94   210287     0.06     0.06  String#count
   5.12   178.59     11.92   242415     0.05     0.05  Array#<<
   4.86   189.90     11.31   210287     0.05     0.05  Fixnum#zero?
   4.55   200.50     10.59   242609     0.04     0.04  String#empty?
   4.11   210.06      9.57   210289     0.05     0.05  NilClass#nil?
   1.23   212.92      2.85    16162     0.18    14.29  FasterCSV#shift
   1.20   215.70      2.79    32322     0.09     0.09  String#sub!
   1.03   218.09      2.39    48485     0.05     0.05  Hash#[]
   0.96   220.31      2.22    32128     0.07     0.07  String#gsub
   0.91   222.43      2.11    16166     0.13     0.19  Class#new
   0.76   224.19      1.76        1  1762.00 232640.00  FasterCSV#each
   0.66   225.74      1.55    16161     0.10     0.14  Kernel.dup
   0.56   227.03      1.30    32128     0.04     0.04  Kernel.nil?
   0.44   228.06      1.03    16163     0.06     0.06  Array#initialize
   0.41   229.03      0.97    16161     0.06     0.06  Array#empty?
   0.41   229.99      0.96    16162     0.06     0.06  IO#gets
   0.41   230.95      0.96    16161     0.06     0.06
FasterCSV#header_row?
   0.41   231.90      0.95    16162     0.06     0.06  String#+
   0.32   232.64      0.74    16161     0.05     0.05
String#initialize_copy
   0.00   232.64      0.00        2     0.00     0.00  String#sub
   0.00   232.64      0.00        2     0.00     0.00
FasterCSV#init_converters
   0.00   232.64      0.00        1     0.00     0.00  IO#read
   0.00   232.64      0.00        1     0.00     0.00  Array#include?
   0.00   232.64      0.00        1     0.00     0.00  Hash#initialize
   0.00   232.64      0.00        1     0.00     0.00  FasterCSV#close
   0.00   232.64      0.00        1     0.00     0.00  Array#last
   0.00   232.64      0.00        1     0.00     0.00  Hash#merge
   0.00   232.64      0.00        1     0.00     0.00
Exception#backtrace
   0.00   232.64      0.00        6     0.00     0.00  Hash#delete
   0.00   232.64      0.00        1     0.00     0.00
Kernel.block_given?
   0.00   232.64      0.00        1     0.00     0.00
Exception#initialize
   0.00   232.64      0.00        1     0.00     0.00
FasterCSV#init_parsers
   0.00   232.64      0.00        6     0.00     0.00  Kernel.==
   0.00   232.64      0.00        1     0.00     0.00  IO#open
   0.00   232.64      0.00        1     0.00     0.00  File#initialize
   0.00   232.64      0.00        1     0.00     0.00  Kernel.puts
   0.00   232.64      0.00        1     0.00     0.00  Array#length
   0.00   232.64      0.00        1     0.00     0.00
Exception#set_backtrace
   0.00   232.64      0.00        1     0.00     0.00  IO#pos
   0.00   232.64      0.00        2     0.00     0.00  Kernel.method
   0.00   232.64      0.00        1     0.00     0.00  String#==
   0.00   232.64      0.00        1     0.00     0.00  String#[]
   0.00   232.64      0.00        2     0.00     0.00  IO#write
   0.00   232.64      0.00        1     0.00     0.00  Kernel.__send__
   0.00   232.64      0.00        1     0.00     0.00  IO#close
   0.00   232.64      0.00        2     0.00 232640.00  FasterCSV#read
   0.00   232.64      0.00        1     0.00     0.00  IO#seek
   0.00   232.64      0.00        1     0.00     0.00
Hash#initialize_copy
   0.00   232.64      0.00        1     0.00 232640.00  Enumerable.to_a
   0.00   232.64      0.00        4     0.00     0.00  Regexp#escape
   0.00   232.64      0.00        1     0.00     0.00
FasterCSV#initialize
   0.00   232.64      0.00        2     0.00     0.00
Kernel.instance_variable_set
   0.00   232.64      0.00        1     0.00     0.00  Array#first
   0.00   232.64      0.00        4     0.00     0.00  Symbol#to_s
   0.00   232.64      0.00        1     0.00     0.00  IO#eof?
   0.00   232.64      0.00        1     0.00 232640.00  FasterCSV#open
   0.00   232.64      0.00        1     0.00     0.00
FasterCSV#init_separators
   0.00   232.64      0.00        1     0.00     0.00  Fixnum#to_s
   0.00   232.64      0.00        1     0.00     0.00  Hash#empty?
   0.00   232.64      0.00        1     0.00     0.00
FasterCSV#init_headers
   0.00   232.64      0.00        1     0.00     0.00  Array#pop
   0.00   232.64      0.00        2     0.00     0.00  Kernel.is_a?
   0.00   232.64      0.00        1     0.00 232640.00  #toplevel

I am still trying to find where exactly the bottleneck is. Anyway, it is
getting late. I'll have another look tomorrow. Mainly because I want to
see
if I can actually use RubyInline for something useful.

Sa?a Ebach
James G. (Guest)
on 2006-02-28 04:37
(Received via mailing list)
On Feb 27, 2006, at 7:45 PM, Sascha E. wrote:

> I just had a look into the FasterCSV class and I must admit that I
> didn't expect such a large and highly engineered class.

Most of it is just interface or the new headers feature
(FasterCSV::Row).  FasterCSV.shift() is the entire parser.  It should
be commented well enough to make sense of, I hope.

You could also read the thread on Ruby Core that spawned it, for
further insight.  Here's my first post in it:

http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-core/6471

Feel free to ask questions if you have them...

James Edward G. II
James G. (Guest)
on 2006-02-28 04:57
(Received via mailing list)
On Feb 27, 2006, at 6:43 PM, Sascha E. wrote:

> Hi James,

Hello.

> first, thanks for FasterCSV. It is very useful. I have been wildly
> using it the last couple of days.

Always nice to here.  Thank you!

>> Sadly, FasterCSV required one more release to get it working
>> everywhere:  0.1.9 is out now.  :(
>
> Could you please make a couple of small examples of how each of
> those new
> features is supposed to be used?

I am working on adding examples to the project tarball, but here is
one I sent to Michael Schoen earlier today:

Neo:~/Desktop$ ls
csv_filter.rb   purchase.csv
Neo:~/Desktop$ cat purchase.csv
Quantity,Product Description,Price
1,Text Editor,25.00
2,MacBook Pros,2499.00
Neo:~/Desktop$ ruby csv_filter.rb purchase.csv > invoice.csv
Neo:~/Desktop$ cat invoice.csv
Quantity,Product Description,Price,Running Total
1,Text Editor,25.0,25.0
2,MacBook Pros,2499.0,5023.0
Neo:~/Desktop$ cat csv_filter.rb
#!/usr/local/bin/ruby -w

require "rubygems"
require "faster_csv"

running_total = 0
FasterCSV.filter( :headers           => true,
                   :return_headers    => true,
                   :header_converters => :symbol,
                   :converters        => :numeric ) do |row|
   if row.header_row?
     row << "Running Total"
   else
     row << (running_total += row[:quantity] * row[:price])
   end
end

__END__

The above is using quite a few of the new features.  Data converters
are used to switch the numbers to Integers and Floats and header
converters are used to convert the headers to Symbols for easy
access.  These are just built-ins, but you can supply lambdas for
custom conversions.

Obviously, this also makes use of the new headers functionality.
FasterCSV is told to convert the first row to headers and allow us to
index columns by them.  You can see that at work when I calculate the
price.  The advantage is that we didn't have to use any indices and
if the column order changes, everything will still work fine.  I also
ask for the headers to be returned to me as a row, so I can add the
new one and print them out.  You can let FasterCSV skip them instead,
if you are just mining data.

You can see the new FasterCSV::filter() method at work here too.
This is just Unix filters for CSV streams.  You can alter the row
after it is read and it will be sent back out after the block returns.

The other big feature at work here is automatic row separator
detection, but I hope you never notice it.  In this case, it used $/
for input and output because that makes the most sense for STDIN and
STDOUT.  If actual files had been involved, it would have tried to
auto-detect the separator.  This should make the code more portable.

Hope this helps.

> Another idea. I am no C expert. But maybe it is worth to do an
> optional parser as a C module. Maybe with the help of RubyInline?
> Especially the parse method would be a great target.

One of my design goals is keeping FasterCSV pure Ruby.  This makes it
trivial to bundle with you app, if needed, and makes it more fun for
me to maintain.

If you want to build a C version though, best of luck to you.

James Edward G. II
Sascha E. (Guest)
on 2006-02-28 14:13
(Received via mailing list)
Hi James,

> One of my design goals is keeping FasterCSV pure Ruby.  This makes it
> trivial to bundle with you app, if needed, and makes it more fun for me
> to maintain.
>
> If you want to build a C version though, best of luck to you.

No, I certainly don't want to build a complete C version. I was thinking
more like *if* RubyInline is installed, the main loop could be
translated
to C. That's all. So the C version would be completely optional. You
know,
if it is there use it else go with the default Ruby only. The question
is
how much faster can it actually be? If I look at PA's benchmark it looks
like what you did is make it 10 times faster. But the Python version is
still 10 times faster. Maybe it's possible to make FasterCSV like 2-3
faster with the use of RubyInline.

Sa?a Ebach
James G. (Guest)
on 2006-02-28 15:51
(Received via mailing list)
On Feb 28, 2006, at 6:12 AM, Sascha E. wrote:

> completely optional. You know, if it is there use it else go with
> the default Ruby only. The question is how much faster can it
> actually be? If I look at PA's benchmark it looks like what you did
> is make it 10 times faster. But the Python version is still 10
> times faster. Maybe it's possible to make FasterCSV like 2-3 faster
> with the use of RubyInline.

Patches welcome.  ;)

James Edward G. II
Sascha E. (Guest)
on 2006-02-28 16:06
(Received via mailing list)
> Patches welcome.  ;)

if I can do it I will. But don't hold your breath ;)

Sa?a Ebach
zdennis (Guest)
on 2006-03-01 04:24
(Received via mailing list)
Thanks for this James, this is really great! We replaced n partial csv
codebase with FasterCSV
today, for maintinability and to reuse code that you have put into play,
rather then maintaining our
own internally. Thanks again,

Zach
James G. (Guest)
on 2006-03-01 06:28
(Received via mailing list)
On Feb 28, 2006, at 8:21 PM, zdennis wrote:

> Thanks for this James, this is really great! We replaced n partial
> csv codebase with FasterCSV today, for maintinability and to reuse
> code that you have put into play, rather then maintaining our own
> internally. Thanks again,

Awesome.  You let me know if you find any problems.

James Edward G. II
Sky Y. (Guest)
on 2006-03-10 20:36
(Received via mailing list)
Hi James,

I'm using ruby1.8.4 on win32. When I require 'faster_csv', it just
fails. I've found the problem comes from 'forwardable' module. I have
no idea why the 'forwardable' fails to be required as well. A bug of
ruby?
unknown (Guest)
on 2006-03-10 20:39
(Received via mailing list)
On Sat, 11 Mar 2006, removed_email_address@domain.invalid wrote:

> Hi James,
>
> I'm using ruby1.8.4 on win32. When I require 'faster_csv', it just
> fails. I've found the problem comes from 'forwardable' module. I have
> no idea why the 'forwardable' fails to be required as well. A bug of
> ruby?


are you sure you are running ruby-1.8.4 and not another ruby on the
system?

-a
James G. (Guest)
on 2006-03-10 21:06
(Received via mailing list)
On Mar 10, 2006, at 12:33 PM, removed_email_address@domain.invalid wrote:

> Hi James,
>
> I'm using ruby1.8.4 on win32. When I require 'faster_csv', it just
> fails. I've found the problem comes from 'forwardable' module. I have
> no idea why the 'forwardable' fails to be required as well. A bug of
> ruby?

Forwardable was standard long before 1.8.4 and I know some people are
using FasterCSV under windows.  I something is goofy with your
setup.  :(

James Edward G. II
Gregory B. (Guest)
on 2006-03-10 21:43
(Received via mailing list)
On 3/10/06, removed_email_address@domain.invalid 
<removed_email_address@domain.invalid> wrote:
> Hi James,
>
> I'm using ruby1.8.4 on win32. When I require 'faster_csv', it just
> fails. I've found the problem comes from 'forwardable' module. I have
> no idea why the 'forwardable' fails to be required as well. A bug of
> ruby?

No problems here:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Gregory B.>ruby -v
ruby 1.8.4 (2005-12-24) [i386-mswin32]

C:\Documents and Settings\Gregory B.>gem install fastercsv
Attempting local installation of 'fastercsv'
Local gem file not found: fastercsv*.gem
Attempting remote installation of 'fastercsv'
Updating Gem source index for: http://gems.rubyforge.org
Successfully installed fastercsv-0.1.9
Installing RDoc documentation for fastercsv-0.1.9...

C:\Documents and Settings\Gregory B.>ruby -e "require 'faster_csv'"

C:\Documents and Settings\Gregory B.>
Sky Y. (Guest)
on 2006-03-10 23:11
(Received via mailing list)
Sorry for the confusion, the tests run fine from command line. The
require failure only occurs in IRB. Anyway, I'm gonna test the water
now...

Thank you for the faster library.
James G. (Guest)
on 2006-03-10 23:14
(Received via mailing list)
On Mar 10, 2006, at 3:08 PM, removed_email_address@domain.invalid wrote:

> The require failure only occurs in IRB.

Does it "fail" or just return false?  Try using it after the
require.  (This is a known RubyGems issue.)

James Edward G. II
Sky Y. (Guest)
on 2006-03-11 00:09
(Received via mailing list)
It returns a 'false'. and "require 'forwardable'" returns the same
'false' as well in IRB. Other dependant modules used by FasterCSV are
fine. Thank you for telling me that issue, which does confuse me for a
while.
This topic is locked and can not be replied to.