Ruby 1.9 threading performance goes non-linear

dubstep · December 16, 2011, 10:45pm

Two threads connected by a queue, passing one fixnum at a time, doing
trivial calculations. Run time should be linear in number of inputs,
right? Well, yeah, for 1.8.7 and jruby, but not for 1.9.3p0.

Here’s the code, with timing outputs in the DATA section:

require ‘thread’

N = Integer(ARGV.shift || 1000)

q = Queue.new

th = Thread.new do
n = 0
while x = q.pop
n += 1
end
puts “n = #{n}”
end

time0 = Process.times

N.times do
100.times do |x|
q.push x
end
end
q.push nil

th.join

time1 = Process.times

puts “utime = #{time1.utime - time0.utime}”

END

$ ruby -v
ruby 1.9.3p0 (2011-10-30) [x86_64-linux]
$ ruby th.rb 10
n = 1000
utime = 0.01
$ ruby th.rb 100
n = 10000
utime = 0.01
$ ruby th.rb 1000
n = 100000
utime = 0.33999999999999997
$ ruby th.rb 10000
n = 1000000
utime = 44.65

Memory usage is pretty flat–seems to stay under 15MB.

For comparison, here’s the same series on ruby 1.8.7:

$ ruby1.8 -v
ruby 1.8.7 (2010-06-23 patchlevel 299) [x86_64-linux]
$ ruby1.8 th.rb 10
n = 1000
utime = 0.01
$ ruby1.8 th.rb 100
n = 10000
utime = 0.14
$ ruby1.8 th.rb 1000
n = 100000
utime = 0.16
$ ruby1.8 th.rb 10000
n = 1000000
utime = 1.43

And FWIW the same series on jruby, but the comparison is probably
muddled by the JIT compiler:

$ jruby -v -e ‘p RUBY_VERSION’
jruby 1.6.5 (ruby-1.9.2-p136) (2011-10-25 9dcd388) (Java HotSpot™
64-Bit Server VM 1.6.0_26) [linux-amd64-java]
“1.9.2”
$ jruby th.rb 10
n = 1000
utime = 0.0349998474121094
$ jruby th.rb 100
n = 10000
utime = 0.293999910354614
$ jruby th.rb 1000
n = 100000
utime = 1.52300000190735
$ jruby th.rb 10000
n = 1000000
utime = 2.90700006484985

Joel_VanderWerf · December 16, 2011, 11:28pm

Joel VanderWerf [email protected] wrote:

Two threads connected by a queue, passing one fixnum at a time,
doing trivial calculations. Run time should be linear in number of
inputs, right? Well, yeah, for 1.8.7 and jruby, but not for 1.9.3p0.

N.times do
100.times do |x|
q.push x

Adding “Thread.pass” after q.push seems to improve things as it
keeps the queue smaller.

Part of the problem is the Queue in MRI 1.9.x uses an Array internally.
Calling Array#shift is expensive on larger Arrays as it needs to move
all the unshifted elements. Using a linked list as the insternal
structure should improve things.

The array was growing larger due to thread scheduling being different
and less likely to switch threads.

I think there was an implementation of Queue for 1.9 in
bugs.ruby-lang.org in C which may improve things, but I think there
are some improvements to the Ruby-only version Queue that can be
fixed.

Joel_VanderWerf · December 17, 2011, 12:06am

Peter Z. [email protected] wrote:

O(n) only if the array grows bigger. Are there any implications I
don’t see?

It would require a lot of changes to existing C code in both Ruby itself
and C extensions that depend on RARRAY_PTR().

Better to start with a new data structure (implemented entirely in Ruby)
for an uncommon case.

Joel_VanderWerf · December 17, 2011, 12:12am

Peter Z. wrote in post #1037076:

I wonder if we can make a more efficient Array by making it a circular
buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I don’t
see?

Hi,

Is there some C++ STL-like data structure in Ruby? Or is it a good idea
if I (or some other people) try to make one? With STL-like data
structure you can have the best structure for the problem at hand, and
we don’t have to use Array and Hash all the time.

Regards,

Bill

Joel_VanderWerf · December 16, 2011, 11:34pm

Eric W. писал 17.12.2011 02:27:

keeps the queue smaller.

Part of the problem is the Queue in MRI 1.9.x uses an Array
internally.
Calling Array#shift is expensive on larger Arrays as it needs to move
all the unshifted elements. Using a linked list as the insternal
structure should improve things.

I wonder if we can make a more efficient Array by making it a circular
buffer,
so that #shift and #pop would always be O(1), and #unshift and #push be
O(n) only if the array grows bigger. Are there any implications I don’t
see?

Joel_VanderWerf · December 17, 2011, 12:15am

Eric W. писал 17.12.2011 03:05:

circular buffer,
so that #shift and #pop would always be O(1), and #unshift and #push
be
O(n) only if the array grows bigger. Are there any implications I
don’t see?

It would require a lot of changes to existing C code in both Ruby
itself
and C extensions that depend on RARRAY_PTR().

Yes, RARRAY_PTR will break. This is unacceptable.

Better to start with a new data structure (implemented entirely in
Ruby)
for an uncommon case.

I’m not sure that a pure Ruby implementation will be even nearly as
efficient
as Array on MRI. How would you implement it?

Joel_VanderWerf · December 17, 2011, 12:31am

Peter Z. [email protected] wrote:

Eric W. писал 17.12.2011 03:05:

Better to start with a new data structure (implemented entirely in
Ruby)
for an uncommon case.

I’m not sure that a pure Ruby implementation will be even nearly as
efficient
as Array on MRI. How would you implement it?

Tried a singly linked list, but object allocation overhead sucks :<

Maybe keeping the array small, swapping Thread#wakeup for Thread#run
and calling Thread#run outside of the Mutex#synchronize block
will be better…

Joel_VanderWerf · December 17, 2011, 12:18am

Admin T. писал 17.12.2011 03:12:

Hi,
Bill
Array is std::vector, and Hash, again, is exactly std::map, if you
leave out
the strong typing part. This whole discussion is about implementing a
construct
like std::queue (or deque).

Joel_VanderWerf · December 17, 2011, 1:30am

Eric W. [email protected] wrote:

Tried a singly linked list, but object allocation overhead sucks :<

Btw, it’s in the “ll-queue” branch of git://bogomips.org/ruby
(http://bogomips.org/ruby.git/commit/?h=ll-queue) against trunk.

Performance sucks, but at least it appears to be linear as it grows:

$ for i in 1000 10000 100000; do ./trunk/ruby -I lib /tmp/zzz.rb $i;
done
n = 100000
utime = 0.27999999999999997
n = 1000000
utime = 6.08
in = 10000000
utime = 57.82

(/tmp/zzz.rb is Joel’s original script without Thread.pass)

Maybe keeping the array small, swapping Thread#wakeup for Thread#run
and calling Thread#run outside of the Mutex#synchronize block
will be better…

Nope… Throwing Thread.pass/Thread#run in various places didn’t seem
to significantly improve over the Ruby linked list implementation.

Maybe this is worth revisiting: Feature #3620: Add Queue, SIzedQueue and ConditionVariable implementations in C in addition to ruby ones - Ruby master - Ruby Issue Tracking System

Joel_VanderWerf · December 19, 2011, 12:10am

On Fri, Dec 16, 2011 at 4:27 PM, Eric W. [email protected]
wrote:

are some improvements to the Ruby-only version Queue that can be
fixed.

FWIW, JRuby’s Queue is implemented in Java, primarily because we it’s
faster to use Java’s synchronization primitives directly. And indeed,
it uses a LinkedList.

Charlie

Joel_VanderWerf · June 26, 2012, 9:50am

Peter Z. wrote in post #1037082:

Eric W. писал 17.12.2011 03:05:

I wonder if we can make a more efficient Array by making it a circular buffer,
so that #shift and #pop would always be O(1), and #unshift and #push
be
O(n) only if the array grows bigger. Are there any implications I
don’t see?

It would require a lot of changes to existing C code in both Ruby
itself
and C extensions that depend on RARRAY_PTR().

Yes, RARRAY_PTR will break. This is unacceptable.

Well, I actually did it, see array.c: steel shared array's container when ARY_SHARED_NUM == 1 by funny-falcon · Pull Request #133 · ruby/ruby · GitHub
(corresponding issue Feature #6638: Array as queue - Ruby master - Ruby Issue Tracking System )
and see tests here Test patch for increasing "array as queue" speed · GitHub

And it doesn’t break RARRAY_PTR() cause Array#shift already moves array
to shared copy, I just used that fact in a full way.

Strictly speaking, it still some times do whole memmove, but very very
rare, so that it could be considered as real circular buffer in
practice.

I’ve only implemented it for #push/#shift pattern, cause most libraries
actually use it. But it could be extended for #unshift/#pop as well.

Sokolov Yura aka funny-falcon

Joel_VanderWerf · June 26, 2012, 11:35am

Ok, I’ve updated patch, and now array is a real circular buffer both
with
#push/#shift pattern and #unshift/#pop pattern

patch for 1.9.3 and tests are here Test patch for increasing "array as queue" speed · GitHub
pull request array.c: steel shared array's container when ARY_SHARED_NUM == 1 by funny-falcon · Pull Request #133 · ruby/ruby · GitHub
issue Feature #6638: Array as queue - Ruby master - Ruby Issue Tracking System

Sokolov Yura aka funny-falcon