FasterCSV RCR?

bbazzarrakk · May 27, 2006, 1:11am

I’m considering submitting my first RCR to add FasterCSV to the
Standard Library.

It’s a pretty mature library now, has a CSV compatibility mode, is
very feature rich (including many CSV lacks), and is wicked fast in
comparison. I see it recommended regularly and get lots of positive
feedback.

What do others think? Worth adding?

James Edward G. II

bbazzarrakk · May 27, 2006, 1:18am

On 5/26/06, James Edward G. II [email protected] wrote:

I’m considering submitting my first RCR to add FasterCSV to the
Standard Library.

Sweet.

It’s a pretty mature library now, has a CSV compatibility mode, is
very feature rich (including many CSV lacks), and is wicked fast in
comparison. I see it recommended regularly and get lots of positive
feedback.

What do others think? Worth adding?

Yes, absolutely. I brought it in house here, and we used it pretty
widely
until someone made an issue out of the fact that it’s not in the
standard
library.

Since we run through fairly large CSVs multiple times a day, I enjoy the
speed FasterCVS gives us and I really don’t want to have to go back.

bbazzarrakk · May 27, 2006, 1:46am

James Edward G. II wrote:

I’m considering submitting my first RCR to add FasterCSV to the Standard
Library.

It should replace the current CSV library.

Regards,

Dan

bbazzarrakk · May 27, 2006, 7:34am

pat eyler wrote:

On 5/26/06, James Edward G. II [email protected] wrote:

It’s a pretty mature library now, has a CSV compatibility mode, is
very feature rich (including many CSV lacks), and is wicked fast in
comparison. I see it recommended regularly and get lots of positive
feedback.

What do others think? Worth adding?

I’d suggest changing the name to CSV. And possibly defaulting to
compat-mode or perhaps issuing a warning if it’s detected that
the user is trying to use the Old Library.

Hal

bbazzarrakk · May 27, 2006, 5:28pm

On May 27, 2006, at 12:31 AM, Hal F. wrote:

I’d suggest changing the name to CSV. And possibly defaulting to
compat-mode or perhaps issuing a warning if it’s detected that
the user is trying to use the Old Library.

FasterCSV looses much of it’s speed in compatibility mode. I think
we want to encourage people to use the new interface, especially
since I think it’s superior.

James Edward G. II

bbazzarrakk · May 27, 2006, 1:59am

On May 26, 2006, at 6:46 PM, Daniel B. wrote:

James Edward G. II wrote:

I’m considering submitting my first RCR to add FasterCSV to the
Standard Library.

It should replace the current CSV library.

I assume we have to keep CSV for backwards compatibility. We still
have ftools, even though fileutils is preferred. runit too.

James Edward G. II

bbazzarrakk · May 27, 2006, 6:23pm

Hi,

In message “Re: FasterCSV RCR?”
on Sat, 27 May 2006 14:31:50 +0900, Hal F.
[email protected] writes:

|I’d suggest changing the name to CSV. And possibly defaulting to
|compat-mode or perhaps issuing a warning if it’s detected that
|the user is trying to use the Old Library.

I agree. I don’t want to have two independent CSV readers in the
distribution. It’s OK that compatible mode is slow, or gives
obsoletion warning. But we have to discuss about when it should
happen - during 1.8.x or for 1.9.

						matz.

bbazzarrakk · May 28, 2006, 12:39am

On 5/26/06, James Edward G. II [email protected] wrote:

I’m considering submitting my first RCR to add FasterCSV to the
Standard Library.

I bugged you about doing this off list, which may be why you posted
this, but just so people know, I use FasterCSV a lot in my work (and
in Ruport) and it has been very pleasant to work with!

bbazzarrakk · May 28, 2006, 2:35am

On May 27, 2006, at 11:20 AM, Yukihiro M. wrote:

I agree. I don’t want to have two independent CSV readers in the
distribution. It’s OK that compatible mode is slow, or gives
obsoletion warning.

Alright, let me take another crack at the compatibility mode then. I
can probably speed in up since I know it’s about to gain importance.

But we have to discuss about when it should
happen - during 1.8.x or for 1.9.

I trust your judgement on what is best.

I guess I should warn you that the compatibility mode is not a 100%
CSV replacement. It works for the majority of applications using
just the CSV.* methods, but I don’t even try to support all the
reader and writer object. I’ve never seen code that uses those, but
it could exist.

James Edward G. II

bbazzarrakk · May 28, 2006, 3:27am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

Long time no post.

James Edward G. II wrote:

I agree. I don’t want to have two independent CSV readers in the
distribution. It’s OK that compatible mode is slow, or gives
obsoletion warning.

Alright, let me take another crack at the compatibility mode then. I
can probably speed in up since I know it’s about to gain importance.

Please do not waste your time any more. (sorry for writing this. I know
you are taking much time to support users for using CSV in Ruby).
Cracks are from difference of our CSV standpoints so it must not be
100% compatible. Just replace csv.rb with faster_csv.rb.

Replacement (in my opiniion):
On 1.9: replace csv.rb with faster_csv.rb.

On 1.8: Never mind. replace csv.rb with faster_csv.rb, with no
compatible mode.

As a bundled library (in my opiniion):

One thing I don’t like faster_csv.rb is String#parse_csv and
Array#to_csv. Please do not bring pollution to standard classes.

Kernel.CSV should be discussed well before introducing it. Needed?
(We already have Kernel.URI though…)

Regards,
// NaHi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)

iD8DBQFEePvdf6b33ts2dPkRAsDUAKCiCz135/QtR2sFJV2bNbBz0EyAiQCgxx0j
Pid8sDiayX7PhYJSsRKFK60=
=88/I
-----END PGP SIGNATURE-----

bbazzarrakk · May 28, 2006, 3:54am

On May 27, 2006, at 8:25 PM, NAKAMURA, Hiroshi wrote:

obsoletion warning.

Alright, let me take another crack at the compatibility mode then. I
can probably speed in up since I know it’s about to gain importance.

Please do not waste your time any more. (sorry for writing this. I
know
you are taking much time to support users for using CSV in Ruby).
Cracks are from difference of our CSV standpoints so it must not be
100% compatible.

Do we have different standpoints? I hope not too different. We’re
just using different parsing techniques, right?

Other than to_csv() and parse_csv(), are there things you don’t like
about FasterCSV? I’m open to suggestions.

Just replace csv.rb with faster_csv.rb.

I just don’t want to break a lot of software.

As a bundled library (in my opiniion):

One thing I don’t like faster_csv.rb is String#parse_csv and
Array#to_csv. Please do not bring pollution to standard classes.

Kernel.CSV should be discussed well before introducing it. Needed?
(We already have Kernel.URI though…)

Maybe I’m alone in this thinking, but I’m not bothered by conversion
methods like this. It’s also fairly common (to_set(), to_yaml(), etc.).

James Edward G. II

bbazzarrakk · May 28, 2006, 4:09am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi James,

James Edward G. II wrote:

Please do not waste your time any more. (sorry for writing this. I know
you are taking much time to support users for using CSV in Ruby).
Cracks are from difference of our CSV standpoints so it must not be
100% compatible.

Do we have different standpoints? I hope not too different. We’re just
using different parsing techniques, right?

As you wrote in your document, followings are from standpoint I think.

streaming
record terminator handling

I don’t think faster_csv is wrong. I just wrote csv.rb from (a little)
different viewpoint 6 years ago.

Other than to_csv() and parse_csv(), are there things you don’t like
about FasterCSV? I’m open to suggestions.

No. That’s all for now. (Sorry, I’ve not yet look into new CSV
features)

Just replace csv.rb with faster_csv.rb.

I just don’t want to break a lot of software.

I understand that it’s a compensation of speed.

As a bundled library (in my opiniion):

One thing I don’t like faster_csv.rb is String#parse_csv and
Array#to_csv. Please do not bring pollution to standard classes.

Kernel.CSV should be discussed well before introducing it. Needed?
(We already have Kernel.URI though…)

Maybe I’m alone in this thinking, but I’m not bothered by conversion
methods like this. It’s also fairly common (to_set(), to_yaml(), etc.).

I don’t like those, too. We should wait selector namespace. (IMO)

Regards,
// NaHi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)

iD8DBQFEeQYEf6b33ts2dPkRAjzUAKC0EAhNXkk+IiAutrWOlhWqctdXvQCgohio
BP16cwL6jGN3vnwUGj9k3sQ=
=Prwb
-----END PGP SIGNATURE-----

bbazzarrakk · May 28, 2006, 5:01am

On May 27, 2006, at 9:09 PM, NAKAMURA, Hiroshi wrote:

Just replace csv.rb with faster_csv.rb.

I just don’t want to break a lot of software.

I understand that it’s a compensation of speed.

Only if you go through that interface. The FasterCSV interface is
still quite quick.

Let me rethink it a little. It was optimized for developer
productivity when I built it. I might be able to do better looking
at it from the idea of easy transitioning for the users.

Of course, I handle open() quite differently, so we’re going to have
problems merging both models of that method in the CSV class. Hmm…

James Edward G. II

bbazzarrakk · May 29, 2006, 3:33am

Sorri to stray off topic here, but is there any tutorials for FasterCSV?

I could not find one from Mr Google and I would like to use it for a
small
project I’m working on.

Thanx
Dan

bbazzarrakk · May 29, 2006, 4:16am

On May 28, 2006, at 9:30 PM, Daniel N wrote:

Sorri to stray off topic here, but is there any tutorials for
FasterCSV?

I could not find one from Mr Google and I would like to use it for
a small
project I’m working on.

Thanx
Dan

It’s not exactly a tutorial, but the examples in the docs [1] should
be enough to get you started.

[1] http://fastercsv.rubyforge.org/classes/FasterCSV.html

bbazzarrakk · May 29, 2006, 6:09am

On May 28, 2006, at 10:39 PM, Daniel N wrote:

Thanx Logan,

Sorry I should have been a bit clearer. I read those but what I
had trouble
with was when I receive the csv file from a web form. It gives me a
StringIO object and I don’t know what to do with it.

Any help is greatly appreciated.

FasterCSV handles StringIO objects just fine:

require “stringio”
=> true

require “fastercsv”
=> true

data = StringIO.new(%Q{1,2,“3,4”,5})
=> #StringIO:0x6ce300

FasterCSV.parse(data)
=> [[“1”, “2”, “3,4”, “5”]]

Hope that helps.

James Edward G. II

bbazzarrakk · May 29, 2006, 5:42am

Thanx Logan,

Sorry I should have been a bit clearer. I read those but what I had
trouble
with was when I receive the csv file from a web form. It gives me a
StringIO object and I don’t know what to do with it.

Any help is greatly appreciated.

bbazzarrakk · May 29, 2006, 6:13am

Cheers. thanx so much for that.

I’ll get out of your thread now

bbazzarrakk · May 31, 2006, 2:16am

On May 27, 2006, at 11:20 AM, Yukihiro M. wrote:

I agree. I don’t want to have two independent CSV readers in the
distribution. It’s OK that compatible mode is slow, or gives
obsoletion warning. But we have to discuss about when it should
happen - during 1.8.x or for 1.9.

Alright, I’ve thought a lot about this and there is really one big
issue here: CSV and FasterCSV are not 100% compatible. If it was
just the method arguments, we could get pretty close to perfect, but
CSV does some odd things like confuse open() with foreach() that I
chose to avoid in FasterCSV. Because of that, I can’t always be sure
what to do when user code calls a given method.

That leaves two options, in my opinion:

CSV’s compatibility mode handles most of the issues very well and
I’m pretty sure I can remove most of the speed penalty. If we go
with that, we have a pretty workable solution right now with one big
gotcha: you can require a file named csv.rb and use CSV just fine,
but the good stuff will actually be hiding under FasterCSV (in the
same file). I have to keep them separate, because of the
compatibility issues mentioned above. This, to me, is the only sane
way to go if we want to target the 1.8.x branch. It would still
break some software, if they use the unusual features of CSV, but I
suspect this is quite rare.
We could drop compatibility and rename FasterCSV to CSV. This
way people get all the good stuff where they expect it. However,
this would break a lot of CSV software (most of it, in fact), so it
only seems reasonable when targeting 1.9.x and up.

My thought is that the second option seems preferable. If we train
people to use FasterCSV, then we just have to switch them again down
the road if we want to revert to CSV. We don’t really gain many big
advantages for the switch either (speed, if I can eliminate the
penalty, but not header parsing or the other good FasterCSV
features). That doesn’t sound like it’s worth breaking software over.

In summary, I recommending targeting 1.9.x with no compatibility mode
and renaming FasterCSV to CSV. Am I making sense here?

James Edward G. II

bbazzarrakk · May 31, 2006, 2:35am

On Wed, 31 May 2006, James Edward G. II wrote:

get all the good stuff where they expect it. However, this would break a lot
In summary, I recommending targeting 1.9.x with no compatibility mode and
renaming FasterCSV to CSV. Am I making sense here?

James Edward G. II

i know matz is against it, but i really think we should have both. we
have

ftools and fileutils

date and date2

getoptlong, getopts, parsearg, and optparse

monitor, mutex, and sync

runit and test/unit

and so on.

i have quite a bit of code that does things like

CSV::Row

CSV::Cell

etc. it’d be very upset if i had to re-write all of it. it does little
to
help people love ruby when scripts that worked stop working after an
upgrade.
that said, i’m 100% for having faster csv in the dist. i just don’t see
what’s wrong with a few extra pure ruby files in there - they are very
tiny.

cheers.

-a