alcina
1
If I want the number of lines of the text file , I may use
File.readlines().size
but this builds an useless extra Array, or
%x(wc -l ).to_i
but this needs to be on a *nix system (or have a system command wc.exe
on Windows).Or else a File.read followed by a grep for ‘\n’…
I feel there should be a simpler way to do that…
_md
On 2013-10-19, at 10:02 AM, Michel D. [email protected]
wrote:
I feel there should be a simpler way to do that…
_md
Have you looked at Enumerable’s count method?
mike$ wc -l /etc/passwd
83 /etc/passwd
mike$ ruby -e “puts File.open(‘/etc/passwd’) { |f| f.count }”
83
Hope this helps,
Mike
–
Mike S. [email protected]
http://www.stok.ca/~mike/
The “`Stok’ disclaimers” apply.
On Sat, Oct 19, 2013 at 4:02 PM, Michel D. [email protected]
wrote:
I feel there should be a simpler way to do that…
lines = File.foreach(file).count
Kind regards
robert
Robert K. wrote in post #1124923:
lines = File.foreach(file).count
Thanks, Robert, using ‘foreach’ is cleaner.
FWIW, I benchmarked. The File methods are equivalent and much faster.
require ‘benchmark’
file = FILE
n = 10000
Benchmark.bm do |rep|
rep.report(“readlines”) { n.times { File.readlines(file).size } }
rep.report("wc -l ") { n.times { wc -l #{file}
.to_i } }
rep.report("foreach ") { n.times { File.foreach(file).count } }
end
gives
user system total real
readlines 0.219000 0.499000 0.718000 ( 0.752043)
wc -l 2.542000 5.257000 7.799000 ( 83.502776)
foreach 0.219000 0.531000 0.750000 ( 0.761044)
_md
Robert K. wrote in post #1124958:
It would be interesting to see how that works out for a large file. I
would expect the last version to be more efficiently than the first
one.
I would guess so. But this below shows the same pattern : Readlines a
bit faster.
file = File.join(File.dirname(FILE), ‘test.txt’)
File.open(file, ‘w’) do |file|
3000.times { file.puts ‘bla’ * 10 }
end
n = 10000
Benchmark.bm do |rep|
rep.report(“readlines”) { n.times { File.readlines(file).size } }
rep.report("foreach ") { n.times { File.foreach(file).count} }
end
user system total real
readlines 11.341000 1.217000 12.558000 ( 12.686726)
foreach 12.433000 1.264000 13.697000 ( 13.871793)
On Sun, Oct 20, 2013 at 10:37 AM, Michel D. [email protected]
wrote:
Robert K. wrote in post #1124923:
lines = File.foreach(file).count
Thanks, Robert, using ‘foreach’ is cleaner.
Yes, and it avoids building an Array for the whole file in memory.
FWIW, I benchmarked. The File methods are equivalent and much faster.
Naturally since they avoid the overhead of forking and IPC.
user system total real
readlines 0.219000 0.499000 0.718000 ( 0.752043)
wc -l 2.542000 5.257000 7.799000 ( 83.502776)
foreach 0.219000 0.531000 0.750000 ( 0.761044)
It would be interesting to see how that works out for a large file. I
would expect the last version to be more efficiently than the first
one.
Kind regards
robert
Michel D. wrote in post #1124962:
user system total real
readlines 11.341000 1.217000 12.558000 ( 12.686726)
foreach 12.433000 1.264000 13.697000 ( 13.871793)
With 300_000 lines and 100 times, instead of 3_000 lines and 10_000
times, one gets the same pattern :
user system total real
readlines 11.622000 1.060000 12.682000 ( 12.692726)
foreach 12.246000 0.858000 13.104000 ( 13.156753)
but the difference is smaller…
_md
On Oct 20, 2013, at 4:13 PM, Robert K. [email protected]
wrote:
user system total real
rep.report(“readlines”) { n.times { File.readlines(tmp.path).size } }
–
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
What about space? That’s also a huge consideration here, isn’t it?
foreach should win that by lots and lots, too.
On Sun, Oct 20, 2013 at 5:14 PM, Michel D. [email protected]
wrote:
readlines 11.622000 1.060000 12.682000 ( 12.692726)
foreach 12.246000 0.858000 13.104000 ( 13.156753)
but the difference is smaller…
$ ruby x.rb
user system total real
readlines 56.831000 7.597000 64.428000 ( 64.241000)
foreach 50.357000 5.476000 55.833000 ( 56.153000)
$ cat x.rb
require ‘tempfile’
require ‘benchmark’
LINE = ‘x’ * 99
n = 100
Tempfile.open(ENV[‘TMP’] || ‘/tmp’) do |tmp|
1_000_000.times { tmp.puts LINE }
Benchmark.bm do |rep|
rep.report(“readlines”) { n.times { File.readlines(tmp.path).size }
}
rep.report("foreach ") { n.times { File.foreach(tmp.path).count} }
end
end
So with even larger files the difference shows. 
Kind regards
robert
tamouse m. wrote in post #1124992:
What about space? That’s also a huge consideration here, isn’t it?
foreach should win that by lots and lots, too.
Sure.
_md