No speedup...!

Hello,

The Code:

====================================
def look_for_begin
while line = gets
if line =~ /^begin/
puts line
# return
end
end
end

ARGF.each { look_for_begin }

I have files with uuencoded and yencoded
data, and some text-only files, all in all 188 files,
and the size for all are about 16 MB.

The tool needs 3.6 seconds to look for the /^begin/
in all files.
When using exceptions, or break, or return (see the
comment above) to stop reading the file after a /^begin/
was found, I got no speedup!

I tries Perl, OCaml and C and all are a lot faster.
OK, if Ruby is slower, so it is… and I have to live
with that.
But what I can NOT accept, is that the code needs the same
time with the statements and without the statements, that stop the
further reading of the files!

That seems very strange to mee!

Someone who can explain me this?

Thanks In Advance,
Oliver

Oliver B. wrote:

   # return
 end

end
end

ARGF.each { look_for_begin }

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Hi –

On Sat, 26 Aug 2006, William J. wrote:

   puts line
   # return
 end

end
end

ARGF.each { look_for_begin }

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Or maybe:

puts ARGF.find {|s| /^begin/.match(s) }

David

[email protected] wrote:

ARGF.each { look_for_begin }

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Or maybe:

puts ARGF.find {|s| /^begin/.match(s) }
[…]

Theese both things looks like if they would look for all
occurrnces of “begin”, not the first one.

I also think to look only in the first 1000 lines or so…

Ciao,
Oliver

P.S.: But I now also found files, where more than one
uuencoded section was inside…
… so, maybe reading the files complete also could make sense…
(I didn’t found such files before, so I thought it would make
sense to read only until the first occurence of /^begin/)

Oliver B. wrote:

end
puts ARGF.find {|s| /^begin/.match(s) }
No, this only finds one instance. Mine finds the first
in each file.

[…]

Theese both things looks like if they would look for all
occurrnces of “begin”, not the first one.

You know too little of Ruby to tell what the code will do
just by looking at it. Try both if you want to know what
they will do.

I also think to look only in the first 1000 lines or so…

ARGV.each{|f| count = 0
IO.foreach(f) {|line|
if line =~ /^begin/
print line
break
end
count += 1
break if 1000 == count
}
}

Hi –

On Sun, 27 Aug 2006, Oliver B. wrote:

end
puts ARGF.find {|s| /^begin/.match(s) }
[…]

Theese both things looks like if they would look for all
occurrnces of “begin”, not the first one.

Well, if you know how Enumerable#find works, then they look like they
find the first one :slight_smile: (Though, as William pointed out, my code
answers the wrong question, because it only finds one for all the
files instead of one for each.)

David

On Aug 26, 2006, at 4:11 AM, [email protected] wrote:

end

ARGF.each { look_for_begin }

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Or maybe:

puts ARGF.find {|s| /^begin/.match(s) }

No, don’t use match, it is slow: [ruby-talk:204747]


Eric H. - [email protected] - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

Hi –

On Sun, 27 Aug 2006, William J. wrote:

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Or maybe:

puts ARGF.find {|s| /^begin/.match(s) }

No, this only finds one instance. Mine finds the first
in each file.

Whoops; so it does.

David

On Mon, 28 Aug 2006, Eric H. wrote:

No, don’t use match, it is slow: [ruby-talk:204747]

Snipped & adapted for
http://rubygarden.org:3000/Ruby/page/show/RubyOptimization

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

“We have more to fear from
The Bungling of the Incompetent
Than from the Machinations of the Wicked.” (source unknown)

On 8/25/06, Oliver B. [email protected] wrote:

   # return
 end

end
end

ARGF.each { look_for_begin }

iterates over LINES of files passes on commandline, not files.
try ARGV for filenames.

you can see the behaviour when you’ll add
puts “new file”
before while

Oliver B. wrote:

ARGF.each { look_for_begin }

The problem is that you’ve got two loops here. ARGF.each calls
look_for_begin once for each line of each file passed in. Then within
look_for_begin, it has another loop that runs until there are no more
lines to process. So what happens is this: without the return
statement, the look_for_begin function is called once, and its while
loop runs through all of the lines until until there are no more to
process. The function is not called again, because the ARGF.each loop
terminates immediately, because all lines have been read.

If you put in the return, the while loop runs until it finds the first
“begin”. Then the function returns. Then the ARGF.each loop calls
look_for_begin again, and it picks up where it left off, processing the
line after the one where “begin” was found.

So, either way, your function process every line of every file. The
only difference you cause by adding and removing the return statement
is whether it processes all of the lines in one call to look_for_begin,
or over multiple calls.

I think what you wanted to do is use ARGV.each instead of ARGF.each, to
iterate over the list of file names, and pass each file name into the
look_for_begin function. Within the function, you’d process only the
lines in that file. In other words, like this:

def look_for_begin(fn)
IO.foreach(fn) do |line|
if line =~ /^begin/
puts line
return
end
end
end

ARGV.each {|fn| look_for_begin(fn) }

Oliver B. wrote:

   # return
 end

end
end

ARGF.each { look_for_begin }

I have files with uuencoded and yencoded
data, and some text-only files, all in all 188 files,
and the size for all are about 16 MB.

If you have enough RAM to slurp whole files:

while text = gets( nil )

text contains the entire contents of one file.

if text =~ /^begin.*/
puts “In #{ $FILENAME }, found:”
puts $&
end
end

Karl von Laudermann wrote:

Oliver B. wrote:

end

end
end

ARGV.each {|fn| look_for_begin(fn) }

I think, Oliver wanted to iterate all lines in the files whose names
were given as command line arguments. Something like:

ARGF.each do |line|
if line =~ /^begin/
puts line
break
end
end

Kind regards

robert