Faster_csv vs File+split, why it is not faster?

pabloq · November 21, 2008, 6:59pm

Hi folks,

Why I’m getting this result? is It due just to this specif problem?

the file has 293858 record, here is some record samples:

“MARCOS, LUIS”,“547 N LAKE ST”,"",“MUNDELEIN”,“IL”,“000000000”
“BALDWIN, T & S”,“4732 NE 203RD ST”,"",“LAKE FOREST
PARK”,“WA”,“000000000”
“RYBOLT, C”,“401 CEDAR DR”,"",“CLINTON”,“IL”,“000000000”
“WELDT, KRISTINA”,“1945 N ORLEANS ST”,"",“MCHENRY”,“IL”,“000000000”
…

CODE

require ‘benchmark’

Benchmark.bm do |x|
x.report do
FasterCSV.foreach(“data_test/match.csv”) do |row|
end
end
end

Benchmark.bm do |x|
x.report do
File.new(“data_test/match.csv”,‘r’).each{|line|
row = line.split("","",-1)
row[0].gsub!(’"’,’’)
row[a.length-1].gsub!(’"’,’’)
}
end
end

RESULTS

  user     system      total        real

16.180000 0.740000 16.920000 ( 17.246190)
user system total real
5.830000 0.120000 5.950000 ( 6.028469)

is this true?

pabloq · November 21, 2008, 7:12pm

On Nov 21, 2008, at 11:55 AM, Pablo Q. wrote:

RESULTS
 user     system      total        real
16.180000 0.740000 16.920000 ( 17.246190)
user system total real
5.830000 0.120000 5.950000 ( 6.028469)

is this true?

Is it true that File.split() is faster than FasterCSV? Yeah, I bet it
is. Likely reasons are:

It’s written in C
It doesn’t handle all types of CSV data, so it has less work to do

To give some examples, you split code doesn’t parse this valid CSV data:

no,quotes

Or this:

“embedded
newlines”

Hope that explains things a bit.

James Edward G. II

pabloq · November 21, 2008, 7:28pm

I thought so…

I’m just comparing a single case of FasterCSV to all the implementation
of
the library.

Thank you for your time!

2008/11/21 James G. [email protected]