Help me understand why the Ruby block is slower than without


#1

I just wrote my first Ruby script. I’m an experienced C and perl
programmer, so please, if it looks too much like these languages and not
Ruby, let me know. I’ve got a 100K word list (Linux dictionary) on my
Mac and am opening it then looking for any words that are exactly 10
letters long with no letters repeating (‘profligate\n’) == 11 is a
match. After I wrote my first version I did some playing. I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array. I then tried putting the File.open in a block and found
that this was much slower, even if I subtract out the time for the open,
which I assume is an error in how the profile is counting total time.

Here’s the faster version:

f = File.open("./words")
begin
while f.gets
if $.length == 11
ar = $
.split(//)
if ar.uniq! == nil
print “#{ar.to_s}”
end
end
end
rescue EOFError
f.close
end

And here’s the slower block version:

File.open("./words") { |f|
while f.gets
if $.length == 11
ar = $
.split(//)
if ar.uniq! == nil
print “#{ar.to_s}”
end
end
end
}

Again, the words file is just a list of about 100K unique words from the
dict command or similar on *nix…

Any critique welcome and enlightenment is encouraged.
Thanks!


#2

Alan B. wrote:

that this was much slower, even if I subtract out the time for the open,
print “#{ar.to_s}”
while f.gets
dict command or similar on *nix…

Any critique welcome and enlightenment is encouraged.
Thanks!

File.open(“wordlist”) { |f|
while w = f.gets
puts w if w.size==11 && w.split(//).uniq.size == 11
end
}


#3

File.open(“wordlist”) { |f|
while w = f.gets
puts w if w.size==11 && w.split(//).uniq.size == 11
end
}

Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
Others, any comments on the block slow down?
AB


#4

On Mar 10, 2006, at 5:57 PM, Alan B. wrote:

rescue EOFError
f.close
end

I’m guessing that

print "#{ar.to_s}"

is what is slowing you down. It results in converting
each element of the array into a string (at least 11
extra method calls) and then concatenating the results.
Kind of a waste when you’ve got the result already sitting in $_.

Also, calling to_s to convert an object to a string within a string
interpolation
block is redundant.

print "#{ar}"

works and then you realize that you don’t need the interpolation so

print ar

is even better. Understanding this is what David Black called a
‘Ruby right of passage’. At least I think it was David who said that
recently. I’m too lazy to google for the reference at the moment.

Gary W.


#5

Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
Others, any comments on the block slow down?
AB

I mis-spoke. Not a factor of 10 faster, just marginally. I had
“wordlist” in my directory as a list of the unique 10 letter words.
I do like the code better still, but with out the block, it’s still much
faster. Also using uniq! rather than size is quicker than taking the
size twice.

My current fastest script:

f= File.open("./words")
begin
while w = f.gets
puts w if w.size == 11 && w.split(//).uniq! == nil
end
rescue EOFError
f.close
end

Not measurably faster than the first one, but seems better and more Ruby
like to me.


#6

Alan B. wrote:

I mis-spoke. Not a factor of 10 faster, just marginally. I had
“wordlist” in my directory as a list of the unique 10 letter words.
I do like the code better still, but with out the block, it’s still much
faster. Also using uniq! rather than size is quicker than taking the
size twice.

Solely for my own amusement, since I’m still trying teach myself Ruby…

File.open("./words").read.split.collect! {|x| x if x.length == 10 &&
x.split(//).uniq! == nil}.compact!.each {|x| puts x }


#7

Mark Devlin wrote:

Solely for my own amusement, since I’m still trying teach myself Ruby…

File.open("./words").read.split.collect! {|x| x if x.length == 10 &&
x.split(//).uniq! == nil}.compact!.each {|x| puts x }

Mark:
Thanks for doing this way. I had thought about trying to read it in and
split it up, but didn’t know how to do it as I’ve only read a bit of the
pikaxe book. On my two processor G5 with 4 gb of memory, your version
is about 30% slower than the fastest method above. 18.69 vs 12.52
seconds. I intend to look a bit closer at your code and see if I can
see another way to speed it up.

Gary:
Thanks for your input also. I saw the redundancy when William J.
gave me input, but really don’t fully understand arrays vs strings in
Ruby yet and also the differences in print vs puts and other types of
output. I’ll read through pikaxe a bit more right now.

Others:
Any input as to why it runs slower inside the file block? Have I
overlooked something simple?


#8

Alan B. removed_email_address@domain.invalid writes:

File.open(“wordlist”) { |f|
while w = f.gets
puts w if w.size==11 && w.split(//).uniq.size == 11
end
}

Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
Others, any comments on the block slow down?

I don’t see much of a slowdown.


g@crash:~/tmp$ cat read-slow.rb
File.open("./words") { |f|
while f.gets
if $.length == 11
ar = $
.split(//)
if ar.uniq! == nil
print “#{ar.to_s}”
end
end
end
}
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.56user 0.01system 0:02.64elapsed 97%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.55user 0.01system 0:02.57elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.54user 0.01system 0:02.56elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ cat read-fast.rb
f = File.open("./words")
begin
while f.gets
if $.length == 11
ar = $
.split(//)
if ar.uniq! == nil
print “#{ar.to_s}”
end
end
end
rescue EOFError
f.close
end
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.51user 0.01system 0:02.54elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.50user 0.01system 0:02.56elapsed 97%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.51user 0.01system 0:02.53elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps


There’s a bit of a slowdown, but note that in your “fast” algo, the
stream is never closed, since IO#gets never throws EOFError. Do `ri
IO#gets’ for the method’s documentation. :slight_smile:

Another speedup: replace:

w.split(//).uniq.size == 11

with:

w !~ /(.).*\1/

It’s faster since there’s less intermediate diddlage, but
theoretically it shouldn’t scale as well. You’d have to increase your
“11” quite a lot to notice it though I think.

More shell dump.


g@crash:~/tmp$ cat read-one.rb
File.open(“words”) { |f|
while w = f.gets
puts w if w.size==11 && w.split(//).uniq.size == 11
end
}
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.54user 0.02system 0:02.57elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.54user 0.01system 0:02.56elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.55user 0.01system 0:02.58elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ cat read-two.rb
File.open(“words”) { |f|
while w = f.gets
puts w if w.size==11 && w !~ /(.).*\1/
end
}
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.23user 0.01system 0:01.25elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.27user 0.01system 0:01.29elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.27user 0.02system 0:01.30elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$
g@crash:~/tmp$
g@crash:~/tmp$ diff out-one out-two
g@crash:~/tmp$


#9

Hi –

On Sat, 11 Mar 2006, removed_email_address@domain.invalid wrote:

is even better. Understanding this is what David Black called a
‘Ruby right of passage’. At least I think it was David who said that
recently. I’m too lazy to google for the reference at the moment.

You’re right but spelled rite wrong, Wright :slight_smile:

David


David A. Black (removed_email_address@domain.invalid)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

“Ruby for Rails” chapters now available
from Manning Early Access Program! http://www.manning.com/books/black


#10

Hi,

From: “Mark Devlin” removed_email_address@domain.invalid

Solely for my own amusement, since I’m still trying teach myself Ruby…

File.open("./words").read.split.collect! {|x| x if x.length == 10 &&
x.split(//).uniq! == nil}.compact!.each {|x| puts x }

One detail here is the file handle is not being closed. A few
alternatives
that close the file:

open with block

File.open("./words"){|f| f.read.split.collect! {|x| x if x.length == 10
&& x.split(//).uniq! == nil}.compact.each {|x| puts x } }

File.read method

File.read("./words").split.collect! {|x| x if x.length == 10 &&
x.split(//).uniq! == nil}.compact.each {|x| puts x }

IO.readlines method

IO.readlines("./words").collect! {|x| x if x.length == 11 &&
x.split(//).uniq! == nil}.compact.each {|x| puts x }

Note, used length 11 because readlines keeps linefeeds; also changed all
to
non-bang form of compact, as compact! would return nil if it didn’t do
any work.
(I.e. if all words in the input satisfied the criteria, collect! would
have returned
nil, and we’d have gotten a NoMethodError: undefined method `each’ for
nil:NilClass.)

Regards,

Bill


#11

George O. wrote:

Another speedup: replace:

w.split(//).uniq.size == 11

with:

w !~ /(.).*\1/

It’s faster since there’s less intermediate diddlage, but
theoretically it shouldn’t scale as well. You’d have to increase your
“11” quite a lot to notice it though I think.

George:
Much thanks, I think that you’ve proved what I suspected, that Ruby is
counting the time wrong with the profile (ruby -r profile script.rb) as
when I subtract the profile time for the File.open block it’s only a bit
slower than the faster call. I appreciate all the help and will try to
ask a more difficult question next time.
I’ve always been fairly strong with regexes, but I’d have never thought
to use one here. Thanks for that as well.

David:
Thanks for chiming in, I’ll check out your links as well.

Alan


#12

Stephen W. wrote:

On Mar 10, 2006, at 4:08 PM, Alan B. wrote:

Not measurably faster than the first one, but seems better and more
Ruby
like to me.

I’m curious why you see it so? Personally, seems less Ruby-like to me.

–Steve

Steve:
Let me be the first to say that I certainly don’t understand Ruby idiom
yet, that’s why I’m using the word seems in my earlier reply. I may
feel quite different about the newer code in a few days… I like it
better because it’s more succinct as there’re fewer intermediate steps
and because I can see what it’s doing quite quickly. My code looks a
bit like that K&R C that I learned in the early 80s. You can even see
my OTB style and probably guess that I still use vi(m) :). I’d rather
learn the Ruby idiom and that’s part of what I was asking here.
Learning that ‘puts’ doesn’t throw the EOF as I was expecting, was an
important lesson and so was the rite of passage that Mr Wright pointed
out. The Ruby language feels good to me, and I’d like it to feel as
comfortable as C does to me, so thanks again for all the kind help.

Bill:
Thanks for adding a bit more to reading it all in and processing script.

I like code that’s quite readable and will take that over code that’s
faster but not so readable in any case where I can get away with it.
That said, I still really like the regex that checks to see if the
string is unique. No conversion to an array and no need to re-check the
size of it all so the current piece of code I like best is:

f= File.open("./words").each { |w|
puts w if w.size == 11 && w !~ /(.).*\1/
}

It’s also the fastest that I’ve tested. I would, however, add a comment
to explain what the regex does so I wouldn’t have to stare at it for a
minute or two to figure it out after a few months away.
Alan


#13

On Mar 10, 2006, at 4:08 PM, Alan B. wrote:

Not measurably faster than the first one, but seems better and more
Ruby
like to me.

I’m curious why you see it so? Personally, seems less Ruby-like to me.

–Steve


#14

Alan B. wrote:

f= File.open("./words").each { |w|
puts w if w.size == 11 && w !~ /(.).*\1/
}

Oops, don’t need the f= on this one. me bad.


#15

On 11 Mar 2006, at 02:13, George O. wrote:

Another speedup: replace:

w.split(//).uniq.size == 11

with:

w !~ /(.).*\1/

!? :slight_smile: How on earth does that work? Every time I think I’ve sort of
got the hang of regexp, they spring something new on me.

I was also going to ask why everyone was doing “split( // )” instead
of “split( ‘’ )”?

  • oooh, coffee’s ready…

Cheers,
Benjohn


#16

On Mar 10, 2006, at 5:28 PM, William J. wrote:

File.open(“wordlist”) { |f|
while w = f.gets
puts w if w.size==11 && w.split(//).uniq.size == 11
end
}

That’s what the foreach() iterator is for:

File.foreach(“wordlist”) do |word|
puts word if word.chomp.split("").uniq.size == 10
end

James Edward G. II


#17

On Mar 10, 2006, at 4:57 PM, Alan B. wrote:

I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array.

This sounds like premature optimization. Remember, you start
worrying about speed when the code gets too slow. Not before.

James Edward G. II


#18

Benjohn B. wrote:

On 11 Mar 2006, at 02:13, George O. wrote:

Another speedup: replace:

w.split(//).uniq.size == 11

with:

w !~ /(.).*\1/

!? :slight_smile: How on earth does that work? Every time I think I’ve sort of
got the hang of regexp, they spring something new on me.

I was also going to ask why everyone was doing “split( // )” instead
of “split( ‘’ )”?

  • oooh, coffee’s ready…

Cheers,
Benjohn

Benjohn:
A . matches any char except the \n, putting it in (), makes it save in
\1 each time it matches, the .* matches zero or more chars that’s not a
\n, so if we try to match a string such as “profligate\n” the regex
would first look for §.\p, with the second p being anywhere in the
string then ®.
®, etc. A string with a repeating set of letters
“wristwatch\n” matches (w)rist\w and returns a match. I highly
recommend O’reilly’s “Mastering Regular Expressions”, I’ve only read the
first edition, but it’s an eye-opener (or maybe the opposite if you try
to read it in bed :))

A note for those following along in the DOS world, the dos string ends
\r\n and won’t return the expected result as a matching DOS string will
need to be 12 long. This sacrifices portability for speed (I didn’t
want to use chop after each gets).

As to split, I just used what I’m used to from perl. It’s an empty
pattern and makes sense to me that way.
Alan


#19

James G. wrote:

On Mar 10, 2006, at 5:28 PM, William J. wrote:

File.open(“wordlist”) { |f|
while w = f.gets
puts w if w.size==11 && w.split(//).uniq.size == 11
end
}

That’s what the foreach() iterator is for:

File.foreach(“wordlist”) do |word|
puts word if word.chomp.split("").uniq.size == 10
end

James Edward G. II

James:
This code doesn’t work on my Mac. I do have a version that uses the
file block and each/foreach above, but I’m suspecting that when the
string becomes an array after the split something’s breaking down as I
get words of all sizes out???
Thanks,
Alan


#20

On Mar 11, 2006, at 10:09 AM, Alan B. wrote:

This code doesn’t work on my Mac.

Something is fishy there, for it works just fine on my own Mac:

Neo:~/Desktop$ ls
tens.rb wordlist
Neo:~/Desktop$ cat wordlist
one
two
three
0123456789
five
0123456789
Neo:~/Desktop$ cat tens.rb
#!/usr/local/bin/ruby -w

File.foreach(“wordlist”) do |word|
puts word if word.chomp.split("").uniq.size == 10
end

END
Neo:~/Desktop$ ruby tens.rb
0123456789
0123456789

James Edward G. II