Making Ruby faster

fmwr · February 21, 2007, 6:32am

Here’s a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):

http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here’s a blog post with details;

taw's blog: Making Ruby faster

fmwr · February 21, 2007, 8:44am

I think you need to post that to ruby-lang, actually. Sounds very
interesting though.

fmwr · February 21, 2007, 9:36am

On 2/21/07, Tomasz W. [email protected] wrote:

Here’s a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):

http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here’s a blog post with details;

taw's blog: Making Ruby faster

Great article. The optimization of ‘switch’ is cute.
And the optimization of ‘fix_equal’ is awesome.

fmwr · February 21, 2007, 11:53am

On 2/21/07, Tomasz W. [email protected] wrote:

Definitely a great read, I cannot judge the technical quality in depth
but for what I understand this is reasonable stuff.
I got a stupid question too, did you try the debugger after using -O3?

However you should post this to ruby-core if you hope for your patches
to be accepted.
It might not be worth to do so however in the advent of Ruby2 and the
small improvement.
But I am an ignorant so just try your luck there

Again this is good stuff even if it might not be adopted.

Cheers
Robert

fmwr · February 21, 2007, 1:39pm

On 21/02/07, Robert D. [email protected] wrote:

I got a stupid question too, did you try the debugger after using -O3?

I never use C debugger for single-stepping. If I want to know what’s
happening inside
a program I usually printf or strace or efence or ltrace or fenris or
objdump or
LD_PRELOAD or something like that.
The only use I have for C debugger is backtrace printing with segfaults.
It works fine with -O3, you just need to pass -g option instead of
-fomit-frame-pointer
(on some platforms -O* causes -fomit-frame-pointer, but not on x86).
In any case, there’s not much difference between -O and -O3 when it
comes
to debugging - any optimizations make single-stepping difficult.

fmwr · February 22, 2007, 3:26am

On Feb 20, 11:32 pm, “Tomasz W.”
[email protected] wrote:

Here’s a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
*http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here’s a blog post with details;
*taw's blog: Making Ruby faster

–
Tomasz W. [http://t-a-w.blogspot.com/]

3% is a modest improvment. Let’s see what can be
achieved by switching to a language that’s designed
to be fast.

        Ruby   Lua  LuaJIT

tarai 0% 408% 1918%
fib 0% 497% 2874%
mandelbrot 0% 920% 1347%

The source:

local function tarai( x, y, z )
if x <= y then
return y
else
return tarai( tarai(x-1, y, z),
tarai(y-1, z, x),
tarai(z-1, x, y))
end
end

tarai(12, 6, 0)

local function fib( n )
if n < 3 then
return 1
else
return fib(n-1) + fib(n-2)
end
end

local sqrt = math.sqrt
local insert = table.insert

local function c_mul( c1, c2 )
local a,b = unpack(c1)
local c,d = unpack(c2)
return { ac - bd, ad + cb }
end

local function c_abs( c )
local a, b = unpack( c )
return sqrt( aa + bb )
end

local function is_mandelbrot( z )
for i = 1, 100 do
z = c_mul( z, z )
if c_abs( z ) > 2 then
return false
end
end
return true
end

ary = {}

for xx = 0, 100 do
for yy = 0, 100 do
local x, y = xx / 50.0, yy / 50.0
local c = {x, y}
if is_mandelbrot( c ) then
insert( ary, c )
end
end
end

fmwr · February 22, 2007, 11:21am

On Thu, 22 Feb 2007 03:25:10 +0100, William J. [email protected]
wrote:

3% is a modest improvment. Let’s see what can be
achieved by switching to a language that’s designed
to be fast.

-1, Troll.

David V.

fmwr · February 22, 2007, 3:52am

Hi,

In message “Re: Making Ruby faster”
on Thu, 22 Feb 2007 11:25:10 +0900, “William J.”
[email protected] writes:

|3% is a modest improvment. Let’s see what can be
|achieved by switching to a language that’s designed
|to be fast.
|

Ruby Lua LuaJIT
tarai 0% 408% 1918%
fib 0% 497% 2874%
mandelbrot 0% 920% 1347%

Here’s my numbers, meaningful or not:

        Ruby  YARV

tarai 0% 547%
fib 0% 540%
mandelbrot 0% 172%
total 0% 472%

def tarai( x, y, z )
if x <= y then
return y
else
return tarai( tarai(x-1, y, z),
tarai(y-1, z, x),
tarai(z-1, x, y))
end
end

p tarai(12, 6, 0)

def fib( n )
if n < 3 then
return 1
else
return fib(n-1) + fib(n-2)
end
end

p fib(12)

include Math

def c_mul( c1, c2 )
a,b = c1
c,d = c2
return ac - bd, ad + cb
end

def c_abs(c)
a, b = c
return sqrt( aa + bb )
end

def is_mandelbrot( z )
for i in 1…100 do
z = c_mul( z, z )
if c_abs( z ) > 2 then
return false
end
end
return true
end

ary = []

for xx in 0…100 do
for yy in 0…100 do
c = [xx / 50.0, yy / 50.0]
if is_mandelbrot( c ) then
ary.push(c)
end
end
end

fmwr · February 22, 2007, 1:40pm

On 21 feb, 02:32, “Tomasz W.”
[email protected] wrote:

Here’s a patch that makes Ruby interpretter slightly faster (~3%) by
optimizing a few commonly used functions (rb_call0, st_*, fix_equal):
*http://zabor.org/taw/ruby-performance-patch-2007-02-21.patch.gz

And here’s a blog post with details;
*taw's blog: Making Ruby faster

–
Tomasz W. [http://t-a-w.blogspot.com/]

Looks cool. Couple of comments…

if-then-else-switch… Not clear to me why it would be faster than
the switch. Also, most of that stuff is gone in ruby1.9. Seems you
were using ruby1.8, right? You should probably redo this patch
against yarv (see vm.c, for example).
Fixnum improvements. Kind of a good idea, but maybe a problematic
implementation. FIX2LONG() and similar are macros that allow hiding
the implementation details. This could allow in the future to have
Ruby treat Fixnums as true real objects (like Python), without
changing the source code. Currently, yes, they are pretty inefficient
as they do a shift each time, even when unneeded.
Removing the use of the macro may not be such a good idea, as
fix_equal like you have written it could break as soon as Fixnums are
no longer represented as longs, but as a class. You might work around
it by creating your own pseudo FIX2LONG_SANS_SHIFT() macro. That way,
if stuff changes, it will become obvious, and changing the macro to be
== to FIX2LONG() is all it should take.
Also, if I read your code correctly, fix_equal() as you presented it
is slightly different in that it does not check that the pointers are
the same when it is not a fixnum (this means something like a == a may
perform slower – albeit I’ll admit that is a weird case). I would
probably ask in the dev list why that check was needed in the first
place.
Judy is in general not faster than a hash lookup. It is just more
memory efficient.
Ruby might benefit from a better hash, thou, like google’s
dense_hash_map (albeit it is C++ code, which Matz is
against).
You should probably make a separate patch for each improvement you
made, instead of all of them together. That will make it easier to
merge (and evaluate) them in case one of them is rejected but others
accepted.

Other than that, pretty cool. You should keep working on it and join
the ruby dev list to start merging your changes.
Here’s also the docs for sending patches:
http://www.ruby-lang.org/en/community/ruby-core/

fmwr · February 22, 2007, 3:27pm

On 22/02/07, gga [email protected] wrote:

Judy is in general not faster than a hash lookup. It is just more
memory efficient.

I cannot fully agree with that characterization, even if it’s correct
considering
only the operation count.

It’s not something you see on typical benchmarks [1], as they tend to
access all
memory with the same frequency, but real programs’ memory access
patterns
more or less follow the power law - very few data structures are
accessed all the time,
and gradually bigger groups of data structures are accessed less
frequently,
ending with most of the program memory being accessed very rarely.

For typical programs computers are pretty good at having the more
commonly
accessed things in faster caches. This means even modest decrease in
data structure sizes will pull frequently accessed data structures into
faster
caches. In most benchmarks all items are equally probably, so the items
pulled
into freed caches tends to be uninteresting.

I’d like to see Judy vs hash tables performance comparison on real
programs,
or at least on more realistic benchmarks.

One more thing - Ruby uses numeric hash tables for symbol lookup.
Symbols ids are highly non-random:

irb> ids = ObjectSpace.enum_for(:each_object,
Module).inject([]){|ms,md| ms + md.instance_methods}.uniq.map{|x|
x.to_sym.object_id}.sort; [ids.min, ids.max, ids.size]
=> [378, 269218, 1061]

That’s exactly the situation Judy is optimized for.

[1] - A performance comparison of Judy to hash tables

Making Ruby faster

tarai 0% 547% fib 0% 540% mandelbrot 0% 172% total 0% 472%

tarai 0% 547%
fib 0% 540%
mandelbrot 0% 172%
total 0% 472%