Comma separating output from array.to_s

jansenh · December 2, 2006, 9:50pm

hi comp.lang.ruby

what is the ruby-way of comma separating the output from array.to_s?

[I have been playing with ruby for some time now, but I often find myself solving problems in a Java/C# manner… That’s me, a non-idiomatic ruby programmer… ]

regards, Henning

jansenh · December 2, 2006, 9:52pm

jansenh wrote:

what is the ruby-way of comma separating the output from array.to_s?
ri Array#join

Devin

jansenh · December 2, 2006, 9:56pm

jansenh wrote:

hi comp.lang.ruby

what is the ruby-way of comma separating the output from array.to_s?

[I have been playing with ruby for some time now, but I often find myself solving problems in a Java/C# manner… That’s me, a non-idiomatic ruby programmer… ]

regards, Henning

[2,4,6,8].join(’,’)
=> “2,4,6,8”

jansenh · December 2, 2006, 9:58pm

On 12/2/06, jansenh [email protected] wrote:

hi comp.lang.ruby

what is the ruby-way of comma separating the output from array.to_s?

some_array.join(“,”)

but if you are thinking of doing CSV output, you should not use a
naive approach like this, since it will likely produce malformed CSV
when dealing with quoted text.

There is the CSV standard library, or if you’d like better performance
and a cleaner interface, FasterCSV available from RubyForge.

jansenh · December 2, 2006, 11:36pm

Gregory B. wrote:

On 12/2/06, jansenh [email protected] wrote:

hi comp.lang.ruby

what is the ruby-way of comma separating the output from array.to_s?

some_array.join(“,”)

but if you are thinking of doing CSV output, you should not use a
naive approach like this, since it will likely produce malformed CSV
when dealing with quoted text.

puts [22,‘He said, “No!”’].map{|x| x=x.to_s
x =~ /[“\n]/ ? '”’ + x.gsub(/“/,'”“') + '”’ : x }.join(‘,’)
22,“He said, ““No!”””

jansenh · December 2, 2006, 10:06pm

jansenh [email protected] wrote:

what is the ruby-way of comma separating the output from array.to_s?

You might be looking for Array#join. In fact, I believe Array#to_s is
Array#join.

m.

jansenh · December 3, 2006, 1:21am

William J. wrote:

[2,4,6,8].join(’,’)
=> “2,4,6,8”

That’s it! Thanx to all of you.

regards, Henning

jansenh · December 2, 2006, 11:40pm

Gregory B. wrote:

On 12/2/06, jansenh [email protected] wrote:

hi comp.lang.ruby

what is the ruby-way of comma separating the output from array.to_s?

some_array.join(“,”)

but if you are thinking of doing CSV output, you should not use a
naive approach like this, since it will likely produce malformed CSV
when dealing with quoted text.

C:>irb --prompt xmp
puts [22,‘He said, “No!”’].map{|x| x=x.to_s
x =~ /[“\n]/ ? '”’ + x.gsub(/“/,'”“') + '”’ : x }.join(‘,’)
22,“He said, ““No!”””

jansenh · December 3, 2006, 5:25am

Gregory B. wrote:

naive approach like this, since it will likely produce malformed CSV
when dealing with quoted text.

puts [22,‘He said, “No!”’].map{|x| x=x.to_s
x =~ /["\n]/ ? ‘"’ + x.gsub(/"/,’""’) + ‘"’ : x }.join(’,’)
22,“He said, ““No!”””

Yeah, now deal with edge cases, and try to run that on 100k rows

It’s easy to handle all cases since CSV is a simple format;
no pompous prolixity is needed:

puts [‘x’,’ y ‘,‘He said, “No!”’].map{|x| x=x.to_s
x =~ /["\n]|^\s|\s$/ ? ‘"’ + x.gsub(/"/,’""’) + ‘"’ : x }.join(’,’)
x," y ",“He said, ““No!”””

If that won’t handle 100k rows, then fasterCsv probably won’t either.

jansenh · December 3, 2006, 2:07am

On 12/2/06, William J. [email protected] wrote:

when dealing with quoted text.

puts [22,‘He said, “No!”’].map{|x| x=x.to_s
x =~ /[“\n]/ ? '”’ + x.gsub(/“/,'”“') + '”’ : x }.join(‘,’)
22,“He said, ““No!”””

Yeah, now deal with edge cases, and try to run that on 100k rows

jansenh · December 3, 2006, 6:27am

On Dec 2, 2006, at 10:25 PM, William J. wrote:

x =~ /["\n]|^\s|\s$/

What is that regex doing? Quoting any field with a quote or a
newline, in addition to any field beginning or ending with whitespace?

That fails on a field containing a comma. Carriage returns also need
to be escaped in CSV. I have no idea what the whitespace tricks are
for either.

A better test is:

x.count(%Q{\r\n,"}).nonzero?

James Edward G. II

jansenh · December 3, 2006, 6:25am

On Sat, 02 Dec 2006 20:22:20 -0800, William J. wrote:

but if you are thinking of doing CSV output, you should not use a

It’s easy to handle all cases since CSV is a simple format;
no pompous prolixity is needed:

Most of use like pompous proxility as it helps us understand what the
heck
is going on. If I ran across your CSV regexp line in code I was
debugging,
I wouldn’t know what it meant.

jansenh · December 3, 2006, 7:48am

On 12/2/06, William J. [email protected] wrote:

It’s easy to handle all cases since CSV is a simple format;
no pompous prolixity is needed:

puts [‘x’,’ y ‘,‘He said, “No!”’].map{|x| x=x.to_s
x =~ /[“\n]|^\s|\s$/ ? '”’ + x.gsub(/“/,'”“') + '”’ : x }.join(‘,’)
x," y ",“He said, ““No!”””

If that won’t handle 100k rows, then fasterCsv probably won’t either.

It is indeed faster by a long shot, but it doesn’t conform to the CSV
spec. (See JEG2’s response)
Also, even in these trivial examples, I sure think that the code which
uses FasterCSV is pretty, even if I’m using the likely-to-be-slowest
form of generating rows in the library…

I’d be interested in seeing a pure ruby CSV implementation which
conforms to the spec and does better than FasterCSV, though I think
James has it pretty fine tuned, given the edge cases he considers and
the strictness of the library.

seltzer:~ sandal$ time ruby -rubygems fcsv.rb

real 0m11.111s
user 0m10.970s
sys 0m0.078s
seltzer:~ sandal$ cat fcsv.rb
require “fastercsv”
a = %w[some row data]
100000.times { a.to_csv }
seltzer:~ sandal$ time ruby william.rb

real 0m0.525s
user 0m0.515s
sys 0m0.007s
seltzer:~ sandal$ cat william.rb

a = %w[some row data]

100000.times {
a.map{|x| x=x.to_s
x =~ /[“\n]|^\s|\s$/ ? '”’ + x.gsub(/“/,'”“') + '”’ : x }.join(‘,’)
}

jansenh · December 3, 2006, 6:50am

On 12/2/06, jansenh [email protected] wrote:

what is the ruby-way of comma separating the output from array.to_s?

A quick way to generate CSV file from an array of arrays is to take
advantage of the fact that Ruby array constant looks a lot like a CSV
record:

irb(main):001:0> a = [1,‘two’,“III”,4]
=> [1, “two”, “III”, 4]
irb(main):002:0> a.inspect
=> “[1, "two", "III", 4]”
irb(main):003:0> puts a.inspect
[1, “two”, “III”, 4]
=> nil
irb(main):004:0> puts a.inspect[1…-1]
1, “two”, “III”, 4
=> nil

I learned this trick reading Hal F.'s excellent “Ruby Way (2nd ed)”.

Cheers,

Luciano

jansenh · December 3, 2006, 5:40pm

James Edward G. II wrote:

On Dec 2, 2006, at 10:25 PM, William J. wrote:

x =~ /["\n]|^\s|\s$/

What is that regex doing? Quoting any field with a quote or a
newline, in addition to any field beginning or ending with whitespace?

That fails on a field containing a comma. Carriage returns also need
to be escaped in CSV.

Easily remedied.

            I have no idea what the whitespace tricks are

for either.

The standard states:

Leading and trailing space-characters adjacent to comma field
separators are ignored.

So quotes must be used to used to preserve that whitespace.

A better test is:

x.count(%Q{\r\n,"}).nonzero?

As noted above, this fails to preserve leading and trailing whitespace.

Gregory B. wrote:

It is indeed faster by a long shot, but it doesn’t conform to the CSV
spec. (See JEG2’s response)

After the addition of the comma (and possibly the carriage return),
it conforms. Remember that the de facto standard is based on how
Microsoft’s programs handle CSV files.
See CSV Comma Separated Value File Format - How To - Creativyst - Explored,Designed,Delivered.(sm)

puts [9, ’ y ‘, “fee, fi”, “one\ntwo”, ‘He said, “No!”’].
map{|x| x=x.to_s
x =~ /[“,\n\r]|^\s|\s$/ ? '”’ + x.gsub(/“/,'”“') + '”’ : x
}.join(‘,’)

9," y ",“fee, fi”,“one
two”,“He said, ““No!”””

jansenh · December 3, 2006, 5:53pm

On Dec 3, 2006, at 10:40 AM, William J. wrote:

James Edward G. II wrote:
            I have no idea what the whitespace tricks are
for either.
The standard states:

Leading and trailing space-characters adjacent to comma field
separators are ignored.

So quotes must be used to used to preserve that whitespace.

Please show me where the CSV RFC states this. If you need a link to
the document, it is at:

http://www.ietf.org/rfc/rfc4180.txt

Quoting from that document:

Spaces are considered part of a field and should not be ignored.

James Edward G. II

jansenh · December 3, 2006, 6:13pm

On Dec 3, 2006, at 12:46 AM, Gregory B. wrote:

Also, even in these trivial examples, I sure think that the code which
uses FasterCSV is pretty, even if I’m using the likely-to-be-slowest
form of generating rows in the library…

Writing CSV is very easy (unlike reading it correctly). FasterCSV
uses something close to the code shown in this thread, though William
James and I obviously disagree about the CSV format. Here’s the code
from the library:

 @io << row.map do |field|
   if field.nil?  # represent +nil+ fields as empty unquoted fields
     ""
   else
     field = String(field)  # Stringify fields
     # represent empty fields as empty quoted fields
     if field.empty? or field.count(%Q{\r\n#{@col_sep}"}).nonzero?
       %Q{"#{field.gsub('"', '""')}"}  # escape quoted fields
     else
       field  # unquoted field
     end
   end
 end.join(@col_sep) + @row_sep  # add separators

If you want a lot of speed, use something like that directly. Ruby’s
method calls are expensive and using the code directly shaves that off.

Of course, I still feel it is better to use the library, though I
appear to be outnumbered in that thinking these days.

100000.times { a.to_csv }

Just FYI, that’s probably the slowest way to use FasterCSV to write
CSV, though it is my favorite too. Here’s the implementation of
to_csv():

class Array

Equivalent to FasterCSV::generate_line(self, options).

def to_csv(options = Hash.new)
FasterCSV.generate_line(self, options)
end
end

As you can see, calling FasterCSV.generate_line() yourself saves a
layer of indirection.

If you are generating many lines and want to go as fast as possible
with the library use one of the following:

to an IO…

FasterCSV.open(…) do |csv|
csv << […]
csv << […]
…
end

to a String

FasterCSV.generate(…) do |csv|
csv << […]
csv << […]
…
end

I try not too loose too much sleep over optimizations like this until
I really need them though and I favor Array.to_csv() in my own code.

James Edward G. II

jansenh · December 3, 2006, 7:11pm

On 12/3/06, James Edward G. II [email protected] wrote:

Of course, I still feel it is better to use the library, though I
appear to be outnumbered in that thinking these days.

I read through the other CSV related thread that seemed that way… I
have to say, I usually don’t even dip down to FasterCSV as Ruport puts
very little overhead on it, so I just use Ruport to keep my apps
looking consistent and clean.

Different strokes for different folks.

end
end

I mentioned I expected to be slowest, though when I benchmarked vs.
generate_line it looks like the method call cost is lost in the noise
at high numbers of iterations

seltzer:~ sandal$ cat fcsv.rb
require “fastercsv”
a = %w[some row data]
100000.times { a.to_csv }

seltzer:~ sandal$ time ruby -rubygems fcsv.rb

real 0m11.135s
user 0m10.974s
sys 0m0.079s

seltzer:~ sandal$ time ruby -rubygems fcsv_no_shortcut.rb

real 0m11.083s
user 0m10.920s
sys 0m0.080s

seltzer:~ sandal$ cat fcsv_no_shortcut.rb
require “fastercsv”
a = %w[some row data]
100000.times { FasterCSV.generate_line a }

As you can see, calling FasterCSV.generate_line() yourself saves a
layer of indirection.

Yes, this is why we use it in Ruport’s CSV writing implementation
rather than a call to to_csv.
(Best to avoid two layers of indirection. )

to a String

FasterCSV.generate(…) do |csv|
csv << […]
csv << […]
…
end

I’ll keep this in mind.

I try not too loose too much sleep over optimizations like this until
I really need them though and I favor Array.to_csv() in my own code.

Me too.

Comma separating output from array.to_s

irb(main):001:0> a = [1,‘two’,“III”,4] => [1, “two”, “III”, 4] irb(main):002:0> a.inspect => “[1, "two", "III", 4]” irb(main):003:0> puts a.inspect [1, “two”, “III”, 4] => nil irb(main):004:0> puts a.inspect[1…-1] 1, “two”, “III”, 4 => nil

Equivalent to FasterCSV::generate_line(self, options).

to an IO…

to a String

to a String

irb(main):001:0> a = [1,‘two’,“III”,4]
=> [1, “two”, “III”, 4]
irb(main):002:0> a.inspect
=> “[1, "two", "III", 4]”
irb(main):003:0> puts a.inspect
[1, “two”, “III”, 4]
=> nil
irb(main):004:0> puts a.inspect[1…-1]
1, “two”, “III”, 4
=> nil