FasterCSV RCR?

Hi,

In message “Re: FasterCSV RCR?”
on Wed, 31 May 2006 09:34:32 +0900, [email protected] writes:

|i know matz is against it, but i really think we should have both. we have
|
| ftools and fileutils
|
| date and date2
|
| getoptlong, getopts, parsearg, and optparse
|
| monitor, mutex, and sync
|
| runit and test/unit
|
|and so on.

They are the mistakes that I try to avoid making again.

| ftools and fileutils
| getoptlong, getopts, parsearg, and optparse

They are unfortunate mistakes I (we) made.

| date and date2

date2 = date + extra libraries.

| monitor, mutex, and sync

They are (somewhat) different.

| runit and test/unit

runit is a compatibility library based on test/unit.

						matz.

On May 30, 2006, at 7:13 PM, James Edward G. II wrote:

  1. We could drop compatibility and rename FasterCSV to CSV. This
    way people get all the good stuff where they expect it. However,
    this would break a lot of CSV software (most of it, in fact), so it
    only seems reasonable when targeting 1.9.x and up.

I have created an RCR for this option:

http://www.rcrchive.net/rcr/show/338

Those in favor (or against) may wish to vote.

James Edward G. II

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I started thinking that just csv.rb should be faster.

James Edward G. II wrote:

method arguments, we could get pretty close to perfect, but CSV does
some odd things like confuse open() with foreach() that I chose to avoid
in FasterCSV. Because of that, I can’t always be sure what to do when

Can you please explain what are “odd”? FasterCSV.build_csv_interface
seems to be a simple delegator.

  1. We could drop compatibility and rename FasterCSV to CSV. This way
    people get all the good stuff where they expect it. However, this would

Can you please explain what are “good”? I’ll introduce those features
into csv.rb. Do those features depend on faster_csv.rb specific
behavior?

Regards,
// NaHi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)

iQEVAwUBRIK2Bh9L2jg5EEGlAQJzbwgAxbK3zt4fLKPMmsADi72pYSz17dsFMuCM
6mshzj7oFzPzLdnefBJLqCRX7ixZXKzbC5KIZ2U79uiT8lFMLy5/r6NuUbwl2Moj
+n28chcdW2NF3MvJ16crI+WMD3OOuivtVKDxOt38wxT/r6iW4BCbV7Xt7FKYJAsN
eFgguVkeaVBljtEGtL6kkUTjgiOD8htCnAJVbxUpVQwRrX5AjhzJoogLHb03OOfX
EFJ4S5Nm2sQy/wcIX6JZued3pytIh6jEZtp3Nz3xY/Ca61aQB6xMPjk99MTN2pqR
hx8xF4/G7I45hjC3gMQLcEfhHNtcAzfYfDO/DU45pg0K3uLMLQxkJg==
=mnFD
-----END PGP SIGNATURE-----

On Jun 4, 2006, at 5:30 AM, NAKAMURA, Hiroshi wrote:

James Edward G. II wrote:

method arguments, we could get pretty close to perfect, but CSV does
some odd things like confuse open() with foreach() that I chose to
avoid
in FasterCSV. Because of that, I can’t always be sure what to do
when

Can you please explain what are “odd”?

My biggest complaint with CSV is that open() behaves “oddly” and thus
defeats all my normal expectations:

File.open(“example.csv”, “w”) do |csv|
?> csv.puts “1,2,3”

csv.puts “a,b,c”
end
=> nil

require “csv”
=> true

typical Ruby style reading…

?> File.open(“example.csv”) do |file|
?> file.each { |row| p row }

end
“1,2,3\n”
“a,b,c\n”
=> #<File:example.csv (closed)>

or…

?> File.foreach(“example.csv”) do |row|
?> p row

end
“1,2,3\n”
“a,b,c\n”
=> nil

CSV’s “odd” open() method…

CSV.open(“example.csv”, “r”) do |row| # “r” required
?> p row # we get rows, not the file object

end
[“1”, “2”, “3”]
[“a”, “b”, “c”]
=> nil

Of course, if you open in a writing mode, you do get a file like
object. It’s inconsistent.

I’m confused about why CSV does this, since it offers the foreach()
method, which normally fills this role.

Other CSV oddities (my opinion):

  • I always have to think, “Now do I want the *_line() method or the
    *_row() method here…”
  • Most methods take a field separator and a row separator, but
    foreach() and readlines() only take the row separator.
  • I have to set a field separator when I really just want to set a
    row separator.
  • A method called “generate_line()” doesn’t involve a line ending.
  1. We could drop compatibility and rename FasterCSV to CSV. This
    way
    people get all the good stuff where they expect it. However, this
    would

Can you please explain what are “good”? I’ll introduce those features
into csv.rb.

Here’s a selection of some features from my CHANGELOG that I am not
aware of in CSV:

  • Added built-in and custom data converters. Built-in handle numbers
    and dates.
  • Added auto-discovery for :row_sep (now the default).
  • Added FasterCSV::filter() for easy Unix-like CSV filters.
  • Added support for accessing fields by headers.
    • Headers can have their own converters.
    • Headers can be skipped or returned as needed.
    • FasterCSV::Row allows index or header access while retaining
      order and
      allowing for duplicate headers.
  • :headers can now be set to an Array of headers to use.
  • :headers can now be set to an external CSV String of
    headers to use.
  • Provided support for the serialization of custom Ruby objects using
    CSV.
  • Added FasterCSV::instance and FasterCSV()/FCSV() shortcuts for easy
    output.

James Edward G. II

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

James Edward G. II wrote:

File.open(“example.csv”, “w”) do |csv|
“1,2,3\n”
CSV.open(“example.csv”, “r”) do |row| # “r” required
?> p row # we get rows, not the file object
end
[“1”, “2”, “3”]
[“a”, “b”, “c”]
=> nil

Of course, if you open in a writing mode, you do get a file like
object. It’s inconsistent.

I can understand your frustration about this point. When I wrote csv.rb
at first, I thought all csv users would do the following when I define
reader style.

CSV.open(“filename.csv”, “r”) do |reader|
reader.each do |row|
…do something…
end
end

Why don’t we just write like this;

CSV.open(“filename.csv”, “r”) do |row|
…do something…
end

I know you are considering that IO-ish methods are important. But I
don’t think CSV object should handle IO methods like fcntl, fileno,
seek, tell, tty?, and so on. Would you please tell me typical and
pragmatic examples of reader style, except ‘each’?

I’m confused about why CSV does this, since it offers the foreach()
method, which normally fills this role.

foreach and readlines are added recently from IO. Now I think it was a
bad choice though…

Other CSV oddities (my opinion):

Thanks!

  • I always have to think, “Now do I want the *_line() method or the
    *_row() method here…”

Users don’t need to use *_line and *_row methods I think. When do you
use generate_line?

  • Most methods take a field separator and a row separator, but
    foreach() and readlines() only take the row separator.

See IO.foreach and IO.readlines. But as I wrote above, CSV should not
have these methods…

  • I have to set a field separator when I really just want to set a row
    separator.

csv.rb in svn repository supports pseudo-keyword-like-method-argument
style. I’ll merge it ruby’s csv repository before the next release.
http://dev.ctor.org/csv/browser/trunk/lib/csv.rb

I defined keywords :fs and :rs but it should be :col_sep and :row_sep

in conformity with faster_csv.

  • A method called “generate_line()” doesn’t involve a line ending.

Do not use it. :slight_smile: At least users rarely use it I think.

I hope that
csv.rb’s open + read + block does not work as you expected
is the only and the big frustrated point of csv.rb (…if csv.rb is
enough faster :slight_smile:

  1. We could drop compatibility and rename FasterCSV to CSV. This way
    people get all the good stuff where they expect it. However, this would

Can you please explain what are “good”? I’ll introduce those features
into csv.rb.

Here’s a selection of some features from my CHANGELOG that I am not
aware of in CSV:

Thanks. I’ll look into this. I hope those features are pluggable into
csv.rb and other modules like DBI, spreadsheet related things, HTML
table formatters, etc. I think some of these features are table
specific, not CSV.

Regards,
// NaHi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)

iQEVAwUBRIOAux9L2jg5EEGlAQLJ7Qf/RTK7xk0KDqlqJ8vuDHY9cfuQLGJ+0Re2
rNwZjSHlXiZ/0bqlJ2ZXcsAFiK1BWeigfxvZbQJg5n3rqXLaYhYSZ0bsMN8q7CrM
L2C+ExEWQwZqKMWfOXFmIgCV6ynOR+FXdwA4hP4BcYY9xaidYR86wRCT/oBG5cvg
FYXSSFO74y4265mDggPfphM4vUqWaDz6kv0J4oX8X1pQ/aKao9tiAzFyr7RcyQXR
TCD8koK1IAqstQ0AEjNvTVJUkThBs00JJYuLjWMCZSFbZzUX6fO0Bo9S+1V5B1oX
JI5+oi4hqYWO5yXM4Rjp+wU5lcLuT9KWgEimGhdifLj05h/N90q/1Q==
=IbUC
-----END PGP SIGNATURE-----

On Jun 4, 2006, at 7:55 PM, NAKAMURA, Hiroshi wrote:

Why don’t we just write like this;

CSV.open(“filename.csv”, “r”) do |row|
…do something…
end

That’s why we have foreach(). Better to use that and gain all the
familiarity of Ruby programmers who are use to things working that way.

I know you are considering that IO-ish methods are important. But I
don’t think CSV object should handle IO methods like fcntl, fileno,
seek, tell, tty?, and so on. Would you please tell me typical and
pragmatic examples of reader style, except ‘each’?

If people only did what I could think of, programming would be very
boring. :wink: It took me five or ten minutes to make all those methods
available and now they are there if someone needs them.

I can tell you that it has already come in handy. I got a bug report
that the line numbers in errors were off, because CSV allows embedded
\n characters in fields. To fix it, I overrode IO’s lineno() method
with correct behavior. This seems very natural and the added bonus
is that you can now get a CSV aware line number.

I’m confused about why CSV does this, since it offers the foreach()
method, which normally fills this role.

foreach and readlines are added recently from IO. Now I think it
was a
bad choice though…

That makes me sad to hear. foreach() is easily my most used method
with CSV and FasterCSV. I like readlines() too.

I still can’t think of any good reason not to just follow Ruby’s
interface as much as is possible and natural. To do anything else
forces programmers to adapt their expectations for no reason I can
understand.

  • I always have to think, “Now do I want the *_line() method or the
    *_row() method here…”

Users don’t need to use *_line and *_row methods I think. When do you
use generate_line?

I’m pretty sure we want to have our CSV library support data not in
files. Am I missing something? Is there a better way to get a CSV
string with your library?

  • Most methods take a field separator and a row separator, but
    foreach() and readlines() only take the row separator.

See IO.foreach and IO.readlines.

That’s comparing apples and oranges. IO.foreach() doesn’t need to be
aware of fields, but CSV.foreach() does. IO.open() doesn’t support a
field separator or a row separator, but your CSV.open() does because
it is needed.

in conformity with faster_csv.

:fs and :rs are fine with me. It’s consistent with your interface.

Here’s a selection of some features from my CHANGELOG that I am not
aware of in CSV:

Thanks. I’ll look into this. I hope those features are pluggable
into
csv.rb and other modules like DBI, spreadsheet related things, HTML
table formatters, etc. I think some of these features are table
specific, not CSV.

This leads me naturally to the question: is there any good reason to
reinvent FasterCSV, when we could just use FasterCSV? :wink:

James Edward G. II

On Jun 5, 2006, at 8:42 PM, NAKAMURA, Hiroshi wrote:

I think I still have not been able to explain well what’s the
difference
of our viewpoint I think. You think a CSV object is an IO. But I
don’t
think so and I defined Writer and Reader in csv.rb. It’s not
‘natural’
from my viewpoint. That’s why I think ‘foreach’ and ‘readlines’
should
not be added.

Yeah, to me CSV is just another data source I want to read from/write
to with slightly special handling of the lines.

The good news is that our users probably don’t care what we think.
If we give them a quick and convenient way to read and write CSV, I
think they’ll be happy. :wink:

Best of luck with your upgrades!

James Edward G. II

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

James Edward G. II wrote:

That’s why we have foreach(). Better to use that and gain all the
familiarity of Ruby programmers who are use to things working that way.

I still can’t think of any good reason not to just follow Ruby’s
interface as much as is possible and natural. To do anything else
forces programmers to adapt their expectations for no reason I can
understand.

I think I still have not been able to explain well what’s the difference
of our viewpoint I think. You think a CSV object is an IO. But I don’t
think so and I defined Writer and Reader in csv.rb. It’s not ‘natural’
from my viewpoint. That’s why I think ‘foreach’ and ‘readlines’ should
not be added.

I feel a sentence “Comma Separated Value is an IO” strange. What do you
think about it? FasterCSV should be CSVIO or CSV::IO, no?

I know you are considering that IO-ish methods are important. But I
don’t think CSV object should handle IO methods like fcntl, fileno,
seek, tell, tty?, and so on. Would you please tell me typical and
pragmatic examples of reader style, except ‘each’?

If people only did what I could think of, programming would be very
boring. :wink: It took me five or ten minutes to make all those methods
available and now they are there if someone needs them.

Agreed to the first sentence. But I don’t think we should do all we can
do even if it’s easy.

I can tell you that it has already come in handy. I got a bug report
that the line numbers in errors were off, because CSV allows embedded \n
characters in fields. To fix it, I overrode IO’s lineno() method with
correct behavior. This seems very natural and the added bonus is that
you can now get a CSV aware line number.

Thank you for the example. CSVIO#lineno or CSV::IO#lineno seems
reasonable for me.

But half of methods you defined as a delegator still seems not
meaningful for me.

* binmode()

* close()

* close_read()

* close_write()

* closed?()

* eof()

* eof?()

* fcntl()

* fileno()

* flush()

* fsync()

* ioctl()

* isatty()

* pid()

* pos()

* reopen()

* rewind()

* seek()

* stat()

* sync()

* sync=()

* tell()

* to_i()

* to_io()

* tty?()

above is excerpted from faster_csv.rb/0.2.0

  • I always have to think, “Now do I want the *_line() method or the
    *_row() method here…”

Users don’t need to use *_line and *_row methods I think. When do you
use generate_line?

I’m pretty sure we want to have our CSV library support data not in
files. Am I missing something? Is there a better way to get a CSV
string with your library?

Please use CSV::Writer for that.

str = ‘’
writer = CSV::Writer.create(str)
writer << [1,2,3]

writer << [x,y,z]
writer.close
puts str

  • Most methods take a field separator and a row separator, but
    foreach() and readlines() only take the row separator.

See IO.foreach and IO.readlines.

That’s comparing apples and oranges. IO.foreach() doesn’t need to be
aware of fields, but CSV.foreach() does. IO.open() doesn’t support a
field separator or a row separator, but your CSV.open() does because it
is needed.

Hmm. I think “same name and different method arguments” is a bad design
because it confuses users. But you already use (pseudo) keyword
argument style so you are thinking “but just adding arguments could be a
good design”, right?

It could be. I need more time to think about it.

Here’s a selection of some features from my CHANGELOG that I am not
aware of in CSV:

Thanks. I’ll look into this. I hope those features are pluggable into
csv.rb and other modules like DBI, spreadsheet related things, HTML
table formatters, etc. I think some of these features are table
specific, not CSV.

This leads me naturally to the question: is there any good reason to
reinvent FasterCSV, when we could just use FasterCSV? :wink:

I wrote ‘introduce’ and meant ‘I won’t reinvent table specific
implementations. I’ll just get it from faster_csv, if it is pluggable’.

Regards,
// NaHi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)

iQEVAwUBRITdFh9L2jg5EEGlAQIIEAf/VkUVW5+fzbBF4vBDpoAMQkfWC6OE/k58
XE8aIs5tQkvPT3k+63BuDnwbWqLTY6l346HRPAOmpqOna+99rYhXgy8kA6RbmI0A
btX0xtHSvb37TzugnY0GavZE2ABo00LYvdPn8xV/IrogVApN5Do/530Zv2AqbCMI
k2mG8am60JRS1OhwOSjEUHamBuCqiC26qu02t5MLTX+vtAyTXTCAOxTwKjciGW9p
NCj+nDadDI97kCmbikQMn/mcDvXDZ6fxSfvjIE4rNkCzav0RUxKHLSa9nqOiRGVD
SPAaEDB5DhqFvEcRCsC+2QKtKAKqYfffN1Tbyvf3fC/KM5dZUmMpZA==
=3UM7
-----END PGP SIGNATURE-----