Weird performance problem with "find"

I noticed some strange issues in an application I’m working on where it
works speedy at first, and then gets progressively slower and slower.
After
a little bit of investigation, it seems that Enumerable#find is the
culprit. The more times you run a find statement the slower it gets.
Does
anybody have any ideas why this is?

Here’s a sample that demonstrates the issue:

require ‘mscorlib’
require ‘C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\System.dll’

a = (1…1000).to_a
s = System::Diagnostics::Stopwatch.new()

100.times {
s.start
a.find {|x| x == 1000}
s.stop
puts “find took: #{s.elapsed.ToString}”
}

which outputs:

find took: 00:00:00.0743446
…snip…
find took: 00:00:00.2290894
…snip…
find took: 00:00:00.4942105

I understand you guys aren’t optimizing yet, but I thought you might be
interested to see growth where there shouldn’t be any. Also, if you
have
any ideas off the top of your head, I wouldn’t mind looking into it a
little.

-Eric

Whoops, had the stopwatch setup wrong in that last message

require ‘mscorlib’
require ‘C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\System.dll’

a = (1…1000).to_a
1000.times {
s = System::Diagnostics::Stopwatch.new()
s.start
a.find {|x| x == 1000}
s.stop
puts “find took: #{s.elapsed.ToString}”
}

and the output is:

find took: 00:00:00.0732210
find took: 00:00:00.0048562
find took: 00:00:00.0022469
find took: 00:00:00.0021351
… snip …

find took: 00:00:00.0196016
find took: 00:00:00.0201994
find took: 00:00:00.0218765
… snip …
find took: 00:00:00.0371834
find took: 00:00:00.0367817
find took: 00:00:00.0361911

On Wed, 06 Feb 2008 12:38:46 -0700, Eric N.
[email protected]
wrote:

puts "find took: #{s.elapsed.ToString}"

}

You’re reusing the same Stopwatch instance, so you have to add a call to
the Reset method at the end of each iteration. In doing so you should
find your timings are as they should be.


/M:D

M. David P.
Co-Founder & Chief Architect, 3rd&Urban, LLC
Email: [email protected] | [email protected]
Mobile: (206) 418-9027
http://3rdandUrban.com | http://amp.fm |
http://www.oreillynet.com/pub/au/2354

He’s right about the issue however:

find took: 00:00:00.0081136
find took: 00:00:00.0022516
find took: 00:00:00.0021312

[snip]

find took: 00:00:00.0220530
find took: 00:00:00.0245721
find took: 00:00:00.0238102

[snip]

find took: 00:00:00.0377458
find took: 00:00:00.0453373
find took: 00:00:00.0369415

On Feb 6, 2008 3:13 PM, M. David P. [email protected] wrote:

s.start

Ironruby-core mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ironruby-core


Michael L.
[Polymath Programmer]
http://michaeldotnet.blogspot.com

On Wed, 06 Feb 2008 13:23:10 -0700, Michael L.
[email protected] wrote:

He’s right about the issue however:

find took: 00:00:00.0377458
find took: 00:00:00.0453373
find took: 00:00:00.0369415

Whoops. My bad. I was testing via Ruby.NET which, coincidentally,
doesn’t showcase this type of behavior,

find took: 00:00:00.0216940
find took: 00:00:00.0012342
find took: 00:00:00.0012423

find took: 00:00:00.0013994
find took: 00:00:00.0013016
find took: 00:00:00.0017697

find took: 00:00:00.0012989
find took: 00:00:00.0014228
find took: 00:00:00.0013389

I guess maybe Ruby.NET still has its place in this world afterall. :wink:
:smiley:


(the modified code in case anyone wants to run the comparison on their
own
machine)

require ‘mscorlib’
require ‘C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\System.dll’

a = (1…1000).to_a
s = System::Diagnostics::Stopwatch.new()

100.times {
s.Start
a.find {|x| x == 1000}
s.Stop
puts “find took: #{s.Elapsed}”
s.Reset
}


/M:D

M. David P.
Co-Founder & Chief Architect, 3rd&Urban, LLC
Email: [email protected] | [email protected]
Mobile: (206) 418-9027
http://3rdandUrban.com | http://amp.fm |
http://www.oreillynet.com/pub/au/2354

Michael L.:

m = Proc.new {|x| x == 1000 }
find took: 00:00:00.0012915
find took: 00:00:00.0141006
find took: 00:00:00.0106432

This is a strange; nothing obvious jumps out right now. We’ll have to
look at this down the road when we do our perf push. Clearly this is a
bug. Can someone open up a bug for this?

Thanks,
-John

I have some ideas on what some of the issues may be, primarily related
to how the Enumerable class creates a new Proc object every time
through the loop. Anyway, doing the following seems to help to some
extent:

require ‘mscorlib’
require ‘C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\System.dll’

a = (1…1000).to_a
s = System::Diagnostics::Stopwatch.new()
m = Proc.new {|x| x == 1000 }
1000.times {
s.start
y = a.find &m
s.stop
puts “find took: #{s.elapsed.ToString}”
s.reset
}

find took: 00:00:00.0012998
find took: 00:00:00.0012719
find took: 00:00:00.0012915

[snip]

find took: 00:00:00.0049048
find took: 00:00:00.0052548
find took: 00:00:00.0051403

[snip]

find took: 00:00:00.0127650
find took: 00:00:00.0141006
find took: 00:00:00.0106432

You can see the times still going up (rules are being created based on
Proc.Id?) but not nearly the same pace.

On Feb 6, 2008 3:31 PM, M. David P. [email protected] wrote:

doesn’t showcase this type of behavior,
find took: 00:00:00.0014228
require ‘mscorlib’
s.Reset
http://3rdandUrban.com | http://amp.fm |
http://www.oreillynet.com/pub/au/2354


Ironruby-core mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ironruby-core


Michael L.
[Polymath Programmer]
http://michaeldotnet.blogspot.com

So “Enumerable#find” is definitely a red herring. I’m guessing it has
something to do with block creation. Here’s a better example without
the
find:

10000.times { | iter |
s.reset
s.start
x = 0
1000.times { x += 1}
s.stop
puts “iteration #{iter} took: #{s.elapsed.ToString}” if iter % 100
== 0
}
Output:

iteration 0 took: 00:00:00.0471160
iteration 100 took: 00:00:00.0024187
iteration 200 took: 00:00:00.0037267
iteration 300 took: 00:00:00.0039901
iteration 400 took: 00:00:00.0054012
iteration 500 took: 00:00:00.0063639
iteration 600 took: 00:00:00.0070567
iteration 700 took: 00:00:00.0103230
iteration 800 took: 00:00:00.0093428
iteration 900 took: 00:00:00.0095257
[snip]
iteration 9700 took: 00:00:00.0892543
iteration 9800 took: 00:00:00.0899337
iteration 9900 took: 00:00:00.0963234

It’s very consistent linear growth. Several iterations took much
longer
presumably due to garbage collection.

Compare that to this version without the inner block:

1000.times do |iter|
s.reset
s.start
x = 0
x+= 1 while x < 1000
s.stop
puts “iteration #{iter} took: #{s.elapsed.ToString}” if iter % 100
== 0
iter += 1
end

Output:

iteration 0 took: 00:00:00.1119421
iteration 100 took: 00:00:00.0001190
iteration 200 took: 00:00:00.0001430
iteration 300 took: 00:00:00.0001326
iteration 400 took: 00:00:00.0001212
iteration 500 took: 00:00:00.0001173
iteration 600 took: 00:00:00.0001190
iteration 700 took: 00:00:00.0001178
iteration 800 took: 00:00:00.0001176
iteration 900 took: 00:00:00.0001178
[snip]
iteration 9900 took: 00:00:00.0001142

So maybe block creation/parsing/execution is a little slower each time?

I don’t know if i’d call it a “red herring” as the performance with
find dropped considerably faster, it’s probably a combination.

On Feb 6, 2008 6:56 PM, Eric N. [email protected] wrote:

puts "iteration #{iter} took: #{s.elapsed.ToString}" if iter % 100 == 0

iteration 700 took: 00:00:00.0103230
Compare that to this version without the inner block:

iteration 800 took: 00:00:00.0001176

Bug has been submitted. I’ll probably poke it with a stick some while

to how the Enumerable class creates a new Proc object every time
s.start
[snip]

[email protected]


Ironruby-core mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ironruby-core


Michael L.
[Polymath Programmer]
http://michaeldotnet.blogspot.com

Bug has been submitted. I’ll probably poke it with a stick some while
I’m at it.

On Feb 6, 2008 5:58 PM, John L. (DLR) [email protected] wrote:

[snip]


Ironruby-core mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ironruby-core


Michael L.
[Polymath Programmer]
http://michaeldotnet.blogspot.com

Michael L.:

I don’t know if i’d call it a “red herring” as the performance with
find dropped considerably faster, it’s probably a combination.

Probably because Enumerable#find uses a block internally when it calls
#each, so you get two block dispatches instead of one.

  • John

The example with a nested times call also consists of two block
dispatches and doesn’t ramp up nearly as bad, or am I missing
something?

On Feb 6, 2008 7:41 PM, John M. [email protected] wrote:

Ironruby-core mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ironruby-core


Michael L.
[Polymath Programmer]
http://michaeldotnet.blogspot.com