is there built-in method to determine the number of lines in a file?
i tried file.readlines.length but it is very slow (dealing with files
1 million lines)
thanks,
DAN
is there built-in method to determine the number of lines in a file?
i tried file.readlines.length but it is very slow (dealing with files
1 million lines)
thanks,
DAN
On Tue, Aug 21, 2007 at 07:24:03AM +0900, blufur wrote:
is there built-in method to determine the number of lines in a file?
i tried file.readlines.length but it is very slow (dealing with files
1 million lines)
Here are a few alternatives that use less memory than File.readlines
(which
slurps in the entire file into memory):
require ‘benchmark’
big_file = ‘/usr/share/dict/words’
Benchmark.bm do |x|
x.report(‘streaming’) do
lines = 0
File.open(big_file).each_line do |line|
lines += 1
end
end
x.report(‘shelling out’) do
lines = Integer(%x(wc -l ‘#{big_file}’)[/^\d+/])
end
end
On my machine:
user system total real
streaming 0.270000 0.010000 0.280000 ( 0.293957)
shelling out 0.000000 0.000000 0.020000 ( 0.052078)
(The file is 234936 lines.)
marcel
On 8/21/07, Jano S. [email protected] wrote:
There was a thread recently how to process a file as fast as possible
– search the archives.
“How to reclaim memory without GC.start”
On Tue, Aug 21, 2007 at 07:34:26AM +0900, Jano S. wrote:
On 8/21/07, blufur [email protected] wrote:
is there built-in method to determine the number of lines in a file?
i tried file.readlines.length but it is very slow (dealing with files
1 million lines)
if on unix:
wc -l #{filename}
or similar (I don’t remember the exact syntax for wc)
Your use of syntax is correct, there. The -l option tells wc to only
report the number of lines.
On 8/21/07, blufur [email protected] wrote:
is there built-in method to determine the number of lines in a file?
i tried file.readlines.length but it is very slow (dealing with files
1 million lines)
if on unix:
wc -l #{filename}
or similar (I don’t remember the exact syntax for
wc)
otherwise:
try counting \r\n or \n. Read file in a loop, counting the occurences.
There was a thread recently how to process a file as fast as possible
– search the archives.
On Aug 20, 2007, at 4:33 PM, Marcel Molina Jr. wrote:
streaming 0.270000 0.010000 0.280000 ( 0.293957)
shelling out 0.000000 0.000000 0.020000 ( 0.052078)(The file is 234936 lines.)
my attempt:
cfp:~ > cat a.rb && ruby a.rb Documents/words.txt && wc -l Documents/
words.txt
require ‘benchmark’
big_file = ARGV.shift || ‘/usr/share/dict/words’
Benchmark.bm do |x|
x.report(‘streaming’) do
lines = 0
File.open(big_file).each_line do |line|
lines += 1
end
end
x.report(‘shelling out’) do
lines = Integer(%x(wc -l ‘#{big_file}’)[/^\d+/])
end
x.report(‘letting ruby do the counting’) do
lines = open(big_file){|fd| fd.each{} and fd.lineno}
end
x.report(‘wow’) do
lines = open(big_file){|fd| fd.read(fd.stat.size).count “\n”}
end
x.report(‘smart’) do
class File
def number_of_lines way_too_big = 2 ** 30
stat.size > way_too_big ?
(each{} and lineno) : read(stat.size).count(“\n”)
end
end
lines = open(big_file){|fd| fd.number_of_lines}
end
end
user system total real
streaming 0.420000 0.010000 0.430000 ( 0.436458)
shelling out 0.000000 0.000000 0.010000 ( 0.028870)
letting ruby do the counting 0.290000 0.010000 0.300000
( 0.296236)
wow 0.010000 0.010000 0.020000 ( 0.025010)
smart 0.010000 0.020000 0.030000 ( 0.029373)
483523 Documents/words.txt
From: Ronald F. [mailto:[email protected]]
wc -l <#{filename}
.chomp.to_i it's ok to lose the #chomp ^^^^^^^
kind regards -botp
if on unix:
wc -l #{filename}
or similar (I don’t remember the exact
syntax for wc)Your use of syntax is correct, there. The -l option tells wc to only
report the number of lines.
Nearly correct. It also prints out the filename. A better approach
when calling from Ruby would be
linecount=wc -l <#{filename}
.chomp.to_i
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs