Forum: Ruby Help me understand why the Ruby block is slower than without

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-10 23:57
I just wrote my first Ruby script.  I'm an experienced C and perl
programmer, so please, if it looks too much like these languages and not
Ruby, let me know.  I've got a 100K word list (Linux dictionary) on my
Mac and am opening it then looking for any words that are exactly 10
letters long with no letters repeating ('profligate\n') == 11 is a
match.  After I wrote my first version I did some playing.  I first saw
that the array class mixed in enumerable and that I could use the to_a
call from there, but a quick check using -r profile showed that my
original call to split was a much quicker way to convert from a string
to an array.  I then tried putting the File.open in a block and found
that this was much slower, even if I subtract out the time for the open,
which I assume is an error in how the profile is counting total time.

Here's the faster version:

f = File.open("./words")
begin
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
rescue EOFError
  f.close
end

And here's the slower block version:

File.open("./words") { |f|
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
}

Again, the words file is just a list of about 100K unique words from the
dict command or similar on *nix....

Any critique welcome and enlightenment is encouraged.
Thanks!
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2006-03-11 00:31
(Received via mailing list)
Alan Burch wrote:
> that this was much slower, even if I subtract out the time for the open,
>         print "#{ar.to_s}"
>   while f.gets
> dict command or similar on *nix....
>
> Any critique welcome and enlightenment is encouraged.
> Thanks!


File.open("wordlist") { |f|
  while w = f.gets
     puts w  if w.size==11 && w.split(//).uniq.size == 11
  end
}
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-11 00:38
> File.open("wordlist") { |f|
>   while w = f.gets
>      puts w  if w.size==11 && w.split(//).uniq.size == 11
>   end
> }

Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
Others, any comments on the block slow down?
AB
E7559e558ececa67c40f452483b9ac8c?d=identicon&s=25 unknown (Guest)
on 2006-03-11 01:03
(Received via mailing list)
On Mar 10, 2006, at 5:57 PM, Alan Burch wrote:
> rescue EOFError
>   f.close
> end

I'm guessing that

	print "#{ar.to_s}"

is what is slowing you down. It results in converting
each element of the array into a string (at least 11
extra method calls) and then concatenating the results.
Kind of a waste when you've got the result already sitting in $_.

Also, calling to_s to convert an object to a string within a string
interpolation
block is redundant.

	print "#{ar}"

works and then you realize that you don't need the interpolation so

	print ar

is even better.  Understanding this is what David Black called a
'Ruby right of passage'.  At least I think it was David who said that
recently.  I'm too lazy to google for the reference at the moment.


Gary Wright
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-11 01:08
>
> Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
> Others, any comments on the block slow down?
> AB

I mis-spoke.  Not a factor of 10 faster, just marginally.  I had
"wordlist" in my directory as a list of the unique 10 letter words.
I do like the code better still, but with out the block, it's still much
faster.  Also using uniq! rather than size is quicker than taking the
size twice.

My current fastest script:

f= File.open("./words")
begin
  while w = f.gets
    puts w if w.size == 11 && w.split(//).uniq! == nil
  end
rescue EOFError
  f.close
end

Not measurably faster than the first one, but seems better and more Ruby
like to me.
A18245069865e0a67ed2ceb90b01d965?d=identicon&s=25 Mark Devlin (Guest)
on 2006-03-11 01:58
Alan Burch wrote:

> I mis-spoke.  Not a factor of 10 faster, just marginally.  I had
> "wordlist" in my directory as a list of the unique 10 letter words.
> I do like the code better still, but with out the block, it's still much
> faster.  Also using uniq! rather than size is quicker than taking the
> size twice.

Solely for my own amusement, since I'm still trying teach myself Ruby...

File.open("./words").read.split.collect! {|x| x if x.length == 10 &&
x.split(//).uniq! == nil}.compact!.each {|x| puts x }
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-11 02:32
Mark Devlin wrote:

>
> Solely for my own amusement, since I'm still trying teach myself Ruby...
>
> File.open("./words").read.split.collect! {|x| x if x.length == 10 &&
> x.split(//).uniq! == nil}.compact!.each {|x| puts x }

Mark:
Thanks for doing this way.  I had thought about trying to read it in and
split it up, but didn't know how to do it as I've only read a bit of the
pikaxe book.  On my two processor G5 with 4 gb of memory, your version
is about 30% slower than the fastest method above.  18.69 vs 12.52
seconds.  I intend to look a bit closer at your code and see if I can
see another way to speed it up.

Gary:
Thanks for your input also.  I saw the redundancy when William James
gave me input, but really don't fully understand arrays vs strings in
Ruby yet and also the differences in print vs puts and other types of
output.  I'll read through pikaxe a bit more right now.

Others:
Any input as to why it runs slower inside the file block? Have I
overlooked something simple?
851246810c70dbfcc1815c636b054562?d=identicon&s=25 George Ogata (Guest)
on 2006-03-11 03:14
(Received via mailing list)
Alan Burch <orotone@gmail.com> writes:

>> File.open("wordlist") { |f|
>>   while w = f.gets
>>      puts w  if w.size==11 && w.split(//).uniq.size == 11
>>   end
>> }
>
> Ok, factor of 10 faster, and more Ruby like, much and many Thanks!
> Others, any comments on the block slow down?

I don't see much of a slowdown.

----------------------------------------------------------------------

g@crash:~/tmp$ cat read-slow.rb
File.open("./words") { |f|
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
}
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.56user 0.01system 0:02.64elapsed 97%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.55user 0.01system 0:02.57elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-slow.rb > out-slow
2.54user 0.01system 0:02.56elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+550minor)pagefaults 0swaps
g@crash:~/tmp$ cat read-fast.rb
f = File.open("./words")
begin
  while f.gets
    if $_.length == 11
      ar = $_.split(//)
      if ar.uniq! == nil
        print "#{ar.to_s}"
      end
    end
  end
rescue EOFError
  f.close
end
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.51user 0.01system 0:02.54elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.50user 0.01system 0:02.56elapsed 97%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-fast.rb > out-fast
2.51user 0.01system 0:02.53elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+544minor)pagefaults 0swaps

----------------------------------------------------------------------

There's a bit of a slowdown, but note that in your "fast" algo, the
stream is never closed, since IO#gets never throws EOFError.  Do `ri
IO#gets' for the method's documentation.  :-)

Another speedup:  replace:

  w.split(//).uniq.size == 11

with:

  w !~ /(.).*\1/

It's faster since there's less intermediate diddlage, but
theoretically it shouldn't scale as well.  You'd have to increase your
"11" quite a lot to notice it though I think.

More shell dump.

----------------------------------------------------------------------

g@crash:~/tmp$ cat read-one.rb
File.open("words") { |f|
  while w = f.gets
     puts w  if w.size==11 && w.split(//).uniq.size == 11
  end
}
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.54user 0.02system 0:02.57elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.54user 0.01system 0:02.56elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-one.rb > out-one
2.55user 0.01system 0:02.58elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+548minor)pagefaults 0swaps
g@crash:~/tmp$ cat read-two.rb
File.open("words") { |f|
  while w = f.gets
    puts w  if w.size==11 && w !~ /(.).*\1/
  end
}
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.23user 0.01system 0:01.25elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.27user 0.01system 0:01.29elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$ /usr/bin/time ruby read-two.rb > out-two
1.27user 0.02system 0:01.30elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+713minor)pagefaults 0swaps
g@crash:~/tmp$
g@crash:~/tmp$
g@crash:~/tmp$ diff out-one out-two
g@crash:~/tmp$
1fba4539b6cafe2e60a2916fa184fc2f?d=identicon&s=25 unknown (Guest)
on 2006-03-11 03:38
(Received via mailing list)
Hi --

On Sat, 11 Mar 2006, gwtmp01@mac.com wrote:

> is even better.  Understanding this is what David Black called a
> 'Ruby right of passage'.  At least I think it was David who said that
> recently.  I'm too lazy to google for the reference at the moment.

You're right but spelled rite wrong, Wright :-)


David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! http://www.manning.com/books/black
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-11 04:10
George Ogata wrote:

> Another speedup:  replace:
>
>   w.split(//).uniq.size == 11
>
> with:
>
>   w !~ /(.).*\1/
>
> It's faster since there's less intermediate diddlage, but
> theoretically it shouldn't scale as well.  You'd have to increase your
> "11" quite a lot to notice it though I think.
>

George:
Much thanks, I think that you've proved what I suspected, that Ruby is
counting the time wrong with the profile (ruby -r profile script.rb) as
when I subtract the profile time for the File.open block it's only a bit
slower than the faster call.  I appreciate all the help and will try to
ask a more difficult question next time.
I've always been fairly strong with regexes, but I'd have never thought
to use one here.  Thanks for that as well.

David:
Thanks for chiming in, I'll check out your links as well.

Alan
4feed660d3728526797edeb4f0467384?d=identicon&s=25 Bill Kelly (Guest)
on 2006-03-11 05:42
(Received via mailing list)
Hi,

From: "Mark Devlin" <OnlyMostlyDead@gmail.com>
>
> Solely for my own amusement, since I'm still trying teach myself Ruby...
>
> File.open("./words").read.split.collect! {|x| x if x.length == 10 &&
> x.split(//).uniq! == nil}.compact!.each {|x| puts x }

One detail here is the file handle is not being closed.  A few
alternatives
that close the file:

# open with block
File.open("./words"){|f| f.read.split.collect! {|x| x if x.length == 10
&& x.split(//).uniq! == nil}.compact.each {|x| puts x } }

# File.read method
File.read("./words").split.collect! {|x| x if x.length == 10 &&
x.split(//).uniq! == nil}.compact.each {|x| puts x }

# IO.readlines method
IO.readlines("./words").collect! {|x| x if x.length == 11 &&
x.split(//).uniq! == nil}.compact.each {|x| puts x }

Note, used length 11 because readlines keeps linefeeds; also changed all
to
non-bang form of compact, as compact! would return nil if it didn't do
any work.
(I.e. if all words in the input satisfied the criteria, collect! would
have returned
nil, and we'd have gotten a NoMethodError: undefined method `each' for
nil:NilClass.)


Regards,

Bill
280b41a88665fd8c699e83a9a25ef949?d=identicon&s=25 Stephen Waits (Guest)
on 2006-03-11 06:06
(Received via mailing list)
On Mar 10, 2006, at 4:08 PM, Alan Burch wrote:

>
> Not measurably faster than the first one, but seems better and more
> Ruby
> like to me.

I'm curious why you see it so?  Personally, seems less Ruby-like to me.

--Steve
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-11 07:40
Stephen Waits wrote:
> On Mar 10, 2006, at 4:08 PM, Alan Burch wrote:
>
>>
>> Not measurably faster than the first one, but seems better and more
>> Ruby
>> like to me.
>
> I'm curious why you see it so?  Personally, seems less Ruby-like to me.
>
> --Steve

Steve:
Let me be the first to say that I certainly don't understand Ruby idiom
yet, that's why I'm using the word seems in my earlier reply.  I may
feel quite different about the newer code in a few days...  I like it
better because it's more succinct as there're fewer intermediate steps
and because I can see what it's doing quite quickly.  My code looks a
bit like that K&R C that I learned in the early 80s.  You can even see
my OTB style and probably guess that I still use vi(m) :).  I'd rather
learn the Ruby idiom and that's part of what I was asking here.
Learning that 'puts' doesn't throw  the EOF as I was expecting, was an
important lesson and so was the rite of passage that Mr Wright pointed
out.  The Ruby language feels good to me, and I'd like it to feel as
comfortable as C does to me, so thanks again for all the kind help.

Bill:
Thanks for adding a bit more to reading it all in and processing script.

I like code that's quite readable and will take that over code that's
faster but not so readable in any case where I can get away with it.
That said, I still really like the regex that checks to see if the
string is unique.  No conversion to an array and no need to re-check the
size of it all so the current piece of code I like best is:

f= File.open("./words").each { |w|
    puts w if w.size == 11 && w !~ /(.).*\1/
}

It's also the fastest that I've tested.  I would, however, add a comment
to explain what the regex does so I wouldn't have to stare at it for a
minute or two to figure it out after a few months away.
Alan
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-11 07:45
Alan Burch wrote:

> f= File.open("./words").each { |w|
>     puts w if w.size == 11 && w !~ /(.).*\1/
> }
>

Oops, don't need the f= on this one.  me bad.
Ffcb418e17cac2873d611c2b8d8d891c?d=identicon&s=25 Benjohn Barnes (Guest)
on 2006-03-11 11:39
(Received via mailing list)
On 11 Mar 2006, at 02:13, George Ogata wrote:

> Another speedup:  replace:
>
>   w.split(//).uniq.size == 11
>
> with:
>
>   w !~ /(.).*\1/

!? :) How on earth does that work? Every time I think I've sort of
got the hang of regexp, they spring something new on me.

I was also going to ask why everyone was doing "split( // )" instead
of "split( '' )"?

- oooh, coffee's ready...

Cheers,
	Benjohn
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-03-11 16:38
(Received via mailing list)
On Mar 10, 2006, at 5:28 PM, William James wrote:

> File.open("wordlist") { |f|
>   while w = f.gets
>      puts w  if w.size==11 && w.split(//).uniq.size == 11
>   end
> }

That's what the foreach() iterator is for:

File.foreach("wordlist") do |word|
   puts word if word.chomp.split("").uniq.size == 10
end

James Edward Gray II
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-03-11 16:42
(Received via mailing list)
On Mar 10, 2006, at 4:57 PM, Alan Burch wrote:

> I first saw
> that the array class mixed in enumerable and that I could use the to_a
> call from there, but a quick check using -r profile showed that my
> original call to split was a much quicker way to convert from a string
> to an array.

This sounds like premature optimization.  Remember, you start
worrying about speed when the code gets too slow.  Not before.

James Edward Gray II
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-11 16:44
Benjohn Barnes wrote:
> On 11 Mar 2006, at 02:13, George Ogata wrote:
>
>> Another speedup:  replace:
>>
>>   w.split(//).uniq.size == 11
>>
>> with:
>>
>>   w !~ /(.).*\1/
>
> !? :) How on earth does that work? Every time I think I've sort of
> got the hang of regexp, they spring something new on me.
>
> I was also going to ask why everyone was doing "split( // )" instead
> of "split( '' )"?
>
> - oooh, coffee's ready...
>
> Cheers,
> 	Benjohn

Benjohn:
A . matches any char except the \n, putting it in (), makes it save in
\1 each time it matches, the .* matches zero or more chars that's not a
\n, so if we try to match a string such as "profligate\n" the regex
would first look for (p).*\p, with the second p being anywhere in the
string then (r).*(r), etc.  A string with a repeating set of letters
"wristwatch\n" matches (w)rist\w and returns a match.  I highly
recommend O'reilly's "Mastering Regular Expressions", I've only read the
first edition, but it's an eye-opener (or maybe the opposite if you try
to read it in bed :))

A note for those following along in the DOS world, the dos string ends
\r\n and won't return the expected result as a matching DOS string will
need to be 12 long.  This sacrifices portability for speed (I didn't
want to use chop after each gets).

As to split, I just used what I'm used to from perl.  It's an empty
pattern and makes sense to me that way.
Alan
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-11 16:58
James Gray wrote:
> On Mar 10, 2006, at 4:57 PM, Alan Burch wrote:
>
>> I first saw
>> that the array class mixed in enumerable and that I could use the to_a
>> call from there, but a quick check using -r profile showed that my
>> original call to split was a much quicker way to convert from a string
>> to an array.
>
> This sounds like premature optimization.  Remember, you start
> worrying about speed when the code gets too slow.  Not before.
>
> James Edward Gray II

James:
I'm going to have to respectfully disagree.  I guess maybe I'm getting
to old to code, but I first learned assembler and then C.  Assembler
served me well in that I knew how to write the fastest, least resource
intensive C. Back in the early 80s on a VAX running V7 UNIX that was
more important than maintainability.  As I've continued my craft and
learned many other languages, I've found that truly understanding what's
happening "under the hood" of any language was the key to writing code
that didn't break, executed quickly, and kept clients happy.
Furthermore, there's something in me that makes me better love the
language when I completely master it.  I don't believe the language is
mastered until one understands things such as why one construct executes
quicker than another.  I can now picture how a mix-in works and why
calling the to_a mix-in is a slower construct.  I don't understand all
the nuances of that yet, but I intend to and that will make Ruby that
much more enjoyable to me.
Thanks for a different insight,
Alan
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-11 17:09
James Gray wrote:
> On Mar 10, 2006, at 5:28 PM, William James wrote:
>
>> File.open("wordlist") { |f|
>>   while w = f.gets
>>      puts w  if w.size==11 && w.split(//).uniq.size == 11
>>   end
>> }
>
> That's what the foreach() iterator is for:
>
> File.foreach("wordlist") do |word|
>    puts word if word.chomp.split("").uniq.size == 10
> end
>
> James Edward Gray II

James:
This code doesn't work on my Mac.  I do have a version that uses the
file block and each/foreach above, but I'm suspecting that when the
string becomes an array after the split something's breaking down as I
get words of all sizes out???
Thanks,
Alan
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-03-11 17:46
(Received via mailing list)
On Mar 11, 2006, at 10:09 AM, Alan Burch wrote:

> This code doesn't work on my Mac.

Something is fishy there, for it works just fine on my own Mac:

Neo:~/Desktop$ ls
tens.rb         wordlist
Neo:~/Desktop$ cat wordlist
one
two
three
0123456789
five
0123456789
Neo:~/Desktop$ cat tens.rb
#!/usr/local/bin/ruby -w

File.foreach("wordlist") do |word|
    puts word if word.chomp.split("").uniq.size == 10
end

__END__
Neo:~/Desktop$ ruby tens.rb
0123456789
0123456789

James Edward Gray II
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-03-11 17:50
(Received via mailing list)
On Mar 11, 2006, at 9:58 AM, Alan Burch wrote:

>>
>> This sounds like premature optimization.  Remember, you start
>> worrying about speed when the code gets too slow.  Not before.
>>
>> James Edward Gray II
>
> James:
> I'm going to have to respectfully disagree.

Well, I'm pretty darn sure you are in the minority on that one:  ;)

http://www.google.com/search?q=%22premature+optimization%22

James Edward Gray II
956f185be9eac1760a2a54e287c4c844?d=identicon&s=25 ts (Guest)
on 2006-03-11 17:50
(Received via mailing list)
>>>>> "J" == James Edward Gray <james@grayproductions.net> writes:

J> Something is fishy there, for it works just fine on my own Mac:

 Try it with

J> Neo:~/Desktop$ cat wordlist
J> one
J> two
J> three
J> 0123456789

   01234567890123456789

J> five
J> 0123456789
J> Neo:~/Desktop$ cat tens.rb


Guy Decoux
2ffac40f8a985a2b2749244b8a1c4161?d=identicon&s=25 Mike Stok (Guest)
on 2006-03-11 17:53
(Received via mailing list)
On 11-Mar-06, at 11:09 AM, Alan Burch wrote:

>>
> get words of all sizes out???
>

Doesn't James Gray's code print out words which contain exactly 11
different letters e.g.

abbreviations - 13 characters + \n, but because it wasn't checked for
size before splitting this boils down to 10 different characters.

irb(main):001:0> s = 'abbreviations'
=> "abbreviations"
irb(main):002:0> s.split('').uniq
=> ["a", "b", "r", "e", "v", "i", "t", "o", "n", "s"]
irb(main):003:0> s.split('').uniq.size
=> 10


Interesting.  I crudely benchmarked this (using time on my mac):

#!/usr/bin/env ruby

File.foreach("K6wordlist.txt") do |word|
    # puts word if word.size==11 && word.split(//).uniq.size == 11
      puts word if word.length == 11 and word.chomp.split
(//).uniq.size == 10
    # puts word if word.length == 11 and not word =~ /(.).*\1/
end

and then ran each of the three sending output to /dev/null (after
checking that they all worked the same on my test file.  In order:

real    0m0.347s
user    0m0.294s
sys     0m0.017s

real    0m0.334s
user    0m0.288s
sys     0m0.018s

real    0m0.177s
user    0m0.137s
sys     0m0.015s

There may be interesting behaviour if the last line in the file
doesn't have a trailing \n, I would probably go for something more like

File.foreach("K6wordlist.txt") do |word|
    word.chomp!
    puts word if word.length == 10 and not word =~ /(.).*\1/
end

(timing intentionally omitted :-)

Mike

--

Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-03-11 17:59
(Received via mailing list)
On Mar 11, 2006, at 10:48 AM, ts wrote:

> J> 0123456789
>
>    01234567890123456789

Ah, yes, duh.  Thanks Guy.

Obviously, you do need a check for word size in there, as others have
used.

My point was actually to show that foreach() is open, read loop, and
close combined though.

James Edward Gray II
280b41a88665fd8c699e83a9a25ef949?d=identicon&s=25 Stephen Waits (Guest)
on 2006-03-11 18:03
(Received via mailing list)
On Mar 11, 2006, at 8:47 AM, James Edward Gray II wrote:

>>> This sounds like premature optimization.  Remember, you start
>>> worrying about speed when the code gets too slow.  Not before.
>>
>> I'm going to have to respectfully disagree.
>
> Well, I'm pretty darn sure you are in the minority on that one:  ;)

Well, in this case, being in the majority doesn't necessarily make
you right.  Like many things, I think we've got several shades of
gray here.. err... Gray?  :)  I'm all for not prematurely
optimizing.  But in this case, Alan is attempting to better
understand Ruby's inner-workings which is a perfectly fine example of
playing with performance.

Additionally, the "no premature optimization ideal" is often taken a
little too far.  I intentionally call it an "ideal".  I work on video
games.  A good portion of our job is optimization.  If we didn't do
*some* premature optimization, we'd be in bad shape.

--Steve
9c1018f025d0c39fdf2158f1be358502?d=identicon&s=25 Gary Wright (Guest)
on 2006-03-11 18:10
(Received via mailing list)
On Mar 11, 2006, at 12:00 PM, Stephen Waits wrote:
> Additionally, the "no premature optimization ideal" is often taken
> a little too far.  I intentionally call it an "ideal".  I work on
> video games.  A good portion of our job is optimization.  If we
> didn't do *some* premature optimization, we'd be in bad shape.

It isn't really premature optimization if you are dealing with a
known problem domain and you already have a reasonable
sense of the performance issues that you will face.  It is
nonsensical to throw away the knowledge you've gained from past
experience in the matter.

But when crafting new software, where you don't have any
particular knowledge of the performance issues, it
makes more sense to get something working correctly and in
a timely manner than to make premature assumptions about the
bottlenecks.
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-11 18:26
James Gray wrote:
> On Mar 11, 2006, at 9:58 AM, Alan Burch wrote:
>
>>>
>>> This sounds like premature optimization.  Remember, you start
>>> worrying about speed when the code gets too slow.  Not before.
>>>
>>> James Edward Gray II
>>
>> James:
>> I'm going to have to respectfully disagree.
>
> Well, I'm pretty darn sure you are in the minority on that one:  ;)
>
> http://www.google.com/search?q=%22premature+optimization%22
>
> James Edward Gray II
James:
I'm going to make a leap of faith here and guess that we're in agreement
on this one--it's just a difference in where we are in understanding the
language.  I'm just learning it and need to run the profiler, debugger,
take timing measurements, and read lots of examples to fully understand
it still.  I wouldn't peddle my (lack of) Ruby skills to any client at
this time, but it's by taking these steps that I will become a good Ruby
software developer.  Others may be able to make the transition from a
developer who can make the code work to one who is actually good at it
(accurate, maintainable, resource appropriate code done quickly) without
taking these steps, but I cannot.

From your google search I have:
http://www.cookcomputing.com/blog/archives/000084.html

Premature Optimization....suggests the famous quote originating from
Tony Hoare and restated by Donald Knuth: "Premature optimization is the
root of all evil". I've always thought this quote has all too often led
software designers into serious mistakes because it has been applied to
a different problem domain to what was intended.

The full version of the quote is "We should forget about small
efficiencies, say about 97% of the time: premature optimization is the
root of all evil." and I agree with this. Its usually not worth spending
a lot of time micro-optimizing code before its obvious where the
performance bottlenecks are. But, conversely, when designing software at
a system level, performance issues should always be considered from the
beginning. A good software developer will do this automatically, having
developed a feel for where performance issues will cause problems. An
inexperienced developer will not bother, misguidedly believing that a
bit of fine tuning at a later stage will fix any problems.

Knowing the language well enough, will cause me as an experienced
software developer to automatically build the best code, while the
inexperienced developer will continue to write code that gives people
like me a well above average income :)

Take care,
Alan
9dca00e6c2c0f8675386fcc2869e8e52?d=identicon&s=25 Eric Young (Guest)
on 2006-03-12 08:00
(Received via mailing list)
Benjohn Barnes wrote:
>
> !? :) How on earth does that work? Every time I think I've sort of got
> the hang of regexp, they spring something new on me.
>
> I was also going to ask why everyone was doing "split( // )" instead of
> "split( '' )"?
>
> - oooh, coffee's ready...
>
> Cheers,
>     Benjohn


I was going to ask why everyone is not doing
  w.unpack("C*").uniq! == nil
which seems faster than split(//) but slower than /(.).*\1/

3.2s      puts l if l !~ /(.).*\1/
3.6s      puts l unless l.unpack("C*").uniq!
7.3s      puts l if l.split(//).uniq.size == 11

perhaps I deal with C code to much, but unpack seems a better String ->
array mechanism.

eric (rather new to ruby)
*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from
http://www.SecureIX.com ***
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2006-03-12 13:40
(Received via mailing list)
Alan Burch wrote:
> that this was much slower, even if I subtract out the time for the open,
>         print "#{ar.to_s}"
>   while f.gets
>     if $_.length == 11
>       ar = $_.split(//)
>       if ar.uniq! == nil
>         print "#{ar.to_s}"
>       end
>     end
>   end
> }

IO.foreach('words'){|s|puts s if s=~/(?!.*(.).*\1)^.{10}$/}
1fba4539b6cafe2e60a2916fa184fc2f?d=identicon&s=25 unknown (Guest)
on 2006-03-12 14:44
(Received via mailing list)
Hi --

On Sun, 12 Mar 2006, James Edward Gray II wrote:

>>>
> http://www.google.com/search?q=%22premature+optimization%22
But one doesn't want to suppress one's knowledge.  When I write a
program, if I happen to know that, say:

   puts a

will run faster than:

   eval "puts #{97.chr}"

then I can't really be blamed for using that knowledge, just because
the knowledge pertains to speed.

In other words, I don't think that avoiding premature optimization
means that one should never knowingly take speed into account when
choosing what to put in one's code.  In fact, I would find it really
difficult to do that, because I wouldn't know how to choose among the
various alternatives available in a way that paid no attention to
execution speed.

This isn't an argument in favor of premature optimization; rather, I'm
suggesting that having and using some rough-cut knowledge of execution
speed (as between, say, split and unpack, or something like that)
isn't premature :-)  Nor is it optimization; it's really melioration.


David

--
David A. Black (dblack@wobblini.net)
Ruby Power and Light, LLC (http://www.rubypowerandlight.com)

"Ruby for Rails" chapters now available
from Manning Early Access Program! http://www.manning.com/books/black
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2006-03-12 15:40
(Received via mailing list)
Alan Burch <orotone@gmail.com> wrote:
> Others:
> Any input as to why it runs slower inside the file block? Have I
> overlooked something simple?

I'm not sure whether this question has been answered yet.  It's probably
slower because you do not close the file handle in your first version
(more
precisely, you close it only if there is an error, which doesn't happen
the
way you did it).

f = File.open("./words")
begin
 ...
rescue EOFError
  f.close
end

If you wanted it to be equivalent with the block version "f.close" would
have to go to an "ensure" section.
Generally he block form is preferred because it closes the file handle.
Your first version didn't do that.

This is probably what I'd do

IO.foreach("wordlist") do |line|
  line.scan /\b\w{11}\b/ do |word|
    puts word unless /(.).*\1/ =~ word
  end
end

The first regexp finds all words with length 11 and the second excludes
all
words that contain repeting characters. HTH

Kind regards

    robert
3e8342236bfa0a91f23e8a8de8363806?d=identicon&s=25 Alan Burch (orotone)
on 2006-03-12 16:37
> This isn't an argument in favor of premature optimization; rather, I'm
> suggesting that having and using some rough-cut knowledge of execution
> speed (as between, say, split and unpack, or something like that)
> isn't premature :-)  Nor is it optimization; it's really melioration.
>
>
> David
>
Thanks David and others who stated what I wanted to say better than I
did.  Again, I don't think James and I disagree.  I concede that
premature optimization is not a good thing, but that's not what I was
trying to do here.  I'm trying to understand Ruby to the level that I
understand C.  To me that means I know exactly why I use every call,
every construct.  When I'm able to do this, I'll know Ruby the way I
want to and I'll be able to use Ruby to accomplish non-trivial tasks.
Knowing to use a faster or more easily understood construct at the time
of coding is what one would expect any experience programmer to do.
Trying to optimze beyond that from the beginning, is silly and
wrong--just as James pointed out.

Yes, Robert, I assumed that the gets threw an EOFError when it found
EOF, I just haven't read and understood all I need to yet.  I really
appreciate all the input on this thread.  It's proved to me that the
Ruby community is everything good that is being said about it on the
web.

That said, using the block form is trivially slower, according to the
time calls that I'm making on my Mac, no matter which solution.  I'm not
really concerned about that, and agree with Robert's statement above
about the block being preferred.  To not use the block form would be a
prime example of premature optimization.

Thanks again for all the input,
Alan
F5b3c1ebfb2e9fc5f67bb48b119f6054?d=identicon&s=25 Randy Kramer (Guest)
on 2006-03-12 20:14
(Received via mailing list)
On Sunday 12 March 2006 08:44 am, dblack@wobblini.net wrote:
> This isn't an argument in favor of premature optimization; rather, I'm
> suggesting that having and using some rough-cut knowledge of execution
> speed (as between, say, split and unpack, or something like that)
> isn't premature :-)  Nor is it optimization; it's really melioration.

Thank you!

Randy Kramer
12a71a456ac3d464914a8267f11d43b3?d=identicon&s=25 semmons99@gmail.com (Guest)
on 2006-03-13 13:41
(Received via mailing list)
File.open( "./words" ).readlines.each do |line|

    print line if line.length == 11 and line.split.uniq

end


testing length 11 first makes it so that we don't split the string
unless needed.
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-03-13 14:55
(Received via mailing list)
On Mar 13, 2006, at 6:38 AM, semmons99@gmail.com wrote:

> File.open( "./words" ).readlines.each do |line|
>
>     print line if line.length == 11 and line.split.uniq
>
> end

That can be shortened to:

File.readlines("words").each do |line|
   # ...
end

I think that's more Ruby-like.

Also, wouldn't line.split.uniq always return a true value?

James Edward Gray II
This topic is locked and can not be replied to.