Benchmark for Ruby

Ok let us get off our nice host thread, which is much better of course.

Austin what you are suggesting seems very interesting to me, you claim
that
we do not know anything about benchmarking.
For myself I accept this as a safe and comfortable working theory.
I am more than willing to learn though (to know even less afterwards but
philopsophy can wait, unless Ara is with us;).
So it is Ed, if I read correctly, who could teach us some tricks, R U
with
us Ed?

Links?

I am looking forward to this.

Cheers
Robert


Deux choses sont infinies : l’univers et la bêtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.

  • Albert Einstein

Robert D. wrote:

Links?

I am looking forward to this.

Cheers
Robert

Yeah, I’m with you. I actually took a look at the shootout page. First
of all, it isn’t as bad a site as some people make it out to be. Second,
they are running Debian and Gentoo, which means almost anyone could
duplicate their work (assuming the whole enchilada can be downloaded as
a tarball).

Analysis Phase (Trick 1):

  1. Collect the whole matrix of benchmarks. The rows will be benchmark
    names and the columns will be languages, and the cells in the matrix
    will be benchmark run times. Pick a language to be the “standard”. C is
    probably the obvious choice, since it’s likely to be the most “practical
    low-level language” (meaning not as many folks know Forth.) :slight_smile:

  2. Now you compute the natural log of ratios of the times for all the
    languages to the standard for each of the benchmarks. In some convenient
    statistics package (A spreadsheet works fine, but I’d do it in R because
    the kernel density estimators, boxplots, etc. are built in), compute the
    histograms (or kernel density estimators, or boxplots, or all of the
    above) of the ratios for each language. That tells you how the ratios
    are distributed.

Example:

Ruby	Perl	Python	PHP	C

Bench1 tr1 tp1 ty1 th1 tc1
Bench2 tr2 tp2 ty2 th2 tc2
Bench3 tr3 tp3 ty3 th3 tc3

Ruby		Perl		Python	PHP	C

Bench1 ln(tr1/tc1) ln(tp1/tc1) . . 1
Bench2 ln(tr2/tc2) ln(tp2/tc2) . . 1
Bench3 ln(tr3/tc3) ln(tp3/tc3) . . 1

And then take the histograms of the columns (smaller is better).

Tuning Phase (Trick 2):

Find the midpoints on the density curves, boxplots or histograms. These
are the “typical” benchmarks. They are more representative than the
“outliers”. I saw one, for example, where Ruby was over 100 times as
fast as Perl. That’s not worth investing any time in – it’s some kind
of fluke, something either Perl sucks at, Ruby is wonderful at, or a
better implementation in the Ruby code than the Perl code.

Now you build a “profiling Ruby”, run the mid-range benchmarks with
profiling, and see where Ruby is spending its time. If you happen to
have a friend on the YARV team or the Cardinal team, have them run the
benchmarks too.

Some other tricks:

Once you know where Ruby is spending its time, play with compiler flags.
gcc has oodles of possible optimizations, and gcc itself was tuned by
processes like this. It’s worth spending a lot of time compiling the
Ruby interpreter, since it’s going to be run often.

Those are simple “low-hanging fruit” tricks … stuff you can do without
actually knowing what’s going on inside the Ruby interpreter. It will be
painfully obvious from the profiles, I think, where the opportunities
are.

M. Edward (Ed) Borasky wrote:

duplicate their work (assuming the whole enchilada can be downloaded as
a tarball).

Grab the CVS tree
http://shootout.alioth.debian.org/gp4/faq.php#downsource

If you have problems building and measuring
http://shootout.alioth.debian.org/gp4/faq.php#talk

Or try something different
http://shootout.alioth.debian.org/gp4/faq.php#similar

Ed, Isaac

hopefully I have not wasted your time, I read your posts with interest,
but
that is not what I wanted, or understood.
I really should have been clearer, sorry, sorry!

What I want is a Benchmark site for ruby
ruby vs. ruby, ruby only
teaching people how to benchmark code and giving them a good idea what
is
fast, what is slow

an example: inject vs. each

Austin pointed out that this is more complicated than one might think,
so
what I was interested in and I should have said so (no more posts after
23:00 local time!!!)

  • are there problems with rubies benchmark module
  • how to use or enhance it correctly
  • what OS conditions to we have to assure for fair comparision
  • maybe more

Cheers
Robert


Deux choses sont infinies : l’univers et la bêtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.

  • Albert Einstein

On Friday 15 September 2006 03:48, M. Edward (Ed) Borasky wrote:

Once you know where Ruby is spending its time, play with compiler flags.
gcc has oodles of possible optimizations, and gcc itself was tuned by
processes like this. It’s worth spending a lot of time compiling the
Ruby interpreter, since it’s going to be run often.

I compiled Ruby on my system without --enable-pthreads and had a ~15-20%
performance increase in real-world runs of my application, which makes
no use
of threads or external libraries. I emphasise that this is specific to
my
system (linux kernel 2.6, nptl-only) and my application, but that’s
still a
non-insignificant performance increase for a real application run
(rather
than a micro-benchmark).

Regards, Alex

On 9/14/06, M. Edward (Ed) Borasky [email protected] wrote:

Once you know where Ruby is spending its time, play with compiler flags.
gcc has oodles of possible optimizations, and gcc itself was tuned by
processes like this. It’s worth spending a lot of time compiling the
Ruby interpreter, since it’s going to be run often.

Playing with compiler flags is of limited utility across the board.
The compiler flags for each platform will differ, and GCC
optimizations are different than native compiler optimizations (and
GCC isn’t well-optimized off PPC and Intel; I will never compile
something with a non-native compiler if I can avoid it for any
reason).

I’m not interested in things that require compiler flag tweaks –
that’s far too variant and should be done on a per-system basis. I’m
looking for a benchmark suite that show areas where the implementation
can be improved. I’m not looking for artificially limited benchmarks
where a simple tweak of an operating system option (e.g., ulimit)
enables the benchmark to run or run faster.

There’s the difference.

-austin

On Fri, 15 Sep 2006, M. Edward (Ed) Borasky wrote:

Analysis Phase (Trick 1):

  1. Collect the whole matrix of benchmarks. The rows will be benchmark
    names and the columns will be languages, and the cells in the matrix
    will be benchmark run times. Pick a language to be the “standard”. C is
    probably the obvious choice, since it’s likely to be the most “practical
    low-level language” (meaning not as many folks know Forth.) :slight_smile:

I’ve tried, but FORTH still hasn’t clicked with me yet…
[…]

Some other tricks:

Once you know where Ruby is spending its time, play with compiler flags.
gcc has oodles of possible optimizations, and gcc itself was tuned by
processes like this. It’s worth spending a lot of time compiling the
Ruby interpreter, since it’s going to be run often.

There exists at least this effort to use Genetic Algorithms for
tuning compiler options. I’ve not explored it yet.

http://www.coyotegulch.com/products/acovea/index.html

One may need a cluster of machines (of many platforms?) to do this
usefully, but still. Maybe Rinda can help us all contribute…

Those are simple “low-hanging fruit” tricks … stuff you can do without
actually knowing what’s going on inside the Ruby interpreter. It will be
painfully obvious from the profiles, I think, where the opportunities are.

    Hugh

On 9/15/06, Austin Z. [email protected] wrote:

I’m not interested in things that require compiler flag tweaks –
that’s far too variant and should be done on a per-system basis. I’m
looking for a benchmark suite that show areas where the implementation
can be improved. I’m not looking for artificially limited benchmarks
where a simple tweak of an operating system option (e.g., ulimit)
enables the benchmark to run or run faster.

There’s the difference.

I’m with Austin on this. Raw performance improvements on particular
systems
are interesting and worth doing and can be achieved in a whole range of
good
ways. But it would be really interesting to all Ruby programmers to
understand where the implementation itself falls short (or less
provocatively, where it can be improved). I recently went through a
ruby-prof exercise with Net::LDAP’s search function and I find a whole
raft
of surprising things. In the first place, there were no “hot spots”
where
the code was spending a double-digit percentage of its time. But there
were
a lot of opportunities for 2% and 5% improvements, and they added up to
about a 60-70% improvement overall (meaning that a query which used to
execute in x time now takes about 0.4x).

Some of the surprises: Symbol#=== is really slow. Replace case
statements
against Symbols with if/then constructions. Accessing hash tables is
really
slow (no big surprise), so in really hot loops look for an algorithmic
alternative. And there a quite a few more. Maybe they ought to be
compiled
and published.

And recently I was discussing GC with Kirk H. and decided to test my
oft-expressed feeling that Ruby performance degrades very rapidly with
working-set size. And I turned up some strong hints (perhaps not
surprising)
that Ruby would be a whole lot faster with generational GC.

On Fri, 2006-09-15 at 21:43 +0900, Francis C. wrote:

Some of the surprises: Symbol#=== is really slow. Replace case statements
against Symbols with if/then constructions.

I have a lot of requirement of the kind shown below, where ‘:asdf’ would
be passed in as a parameter. I have written trivial versions of both
‘case’ and ‘if/elsif/else’. The difference is very little, even over 10
million comparisons. But, in fact, ‘case’ seems to be faster.

js@srinivasj:~> cat tmp/test/1.rb
def timer
b = Time.now
yield
e = Time.now
puts “Time taken is #{e - b} seconds.”
end

timer do
10_000_000.times do | __ |
x = case :asdf
when :a
‘a’
when :b
‘b’
when :c
‘c’
when :d
‘d’
when :e
‘e’
else
‘asdf’
end
end
end

timer do
10_000_000.times do | __ |
x = if :asdf == :a
‘a’
elsif :asdf == :b
‘b’
elsif :asdf == :c
‘c’
elsif :asdf == :d
‘d’
elsif :asdf == :e
‘e’
else
‘asdf’
end
end
end

Were you referring to some other kind of ‘if/then’? I am very interested
in this, since, as I mentioned above, I need this construct several
times.

Greetings,
JS

M. Edward (Ed) Borasky wrote:

/ …

Speaking of such, if you are set up to rebuild your Linux kernel,
single-processor machines tend to run faster if you turn off SMP when
you rebuild the kernel.

Further, I have seen single-processor machines lock up when running some
builds of SMP kernels, to the degree that I never allow them to run.

A. S. Bradbury wrote:

system (linux kernel 2.6, nptl-only) and my application, but that’s still a
non-insignificant performance increase for a real application run (rather
than a micro-benchmark).

Regards, Alex

Despite the fact that gizmos like hyperthreading and dual-core
processors are the “default” in new boxes, a lot of “us” are still
running on quite serviceable single-processor machines. For such
machines, turning off pthreads when you recompile is usually a good
thing, for Ruby and quite a few other languages and applications that
implement their own threading models.

Speaking of such, if you are set up to rebuild your Linux kernel,
single-processor machines tend to run faster if you turn off SMP when
you rebuild the kernel.

On 9/15/06, Srinivas JONNALAGADDA [email protected] wrote:

 x = case :asdf
    when :a
      'a'

You have a literal Symbol in the case statement. Try it with a variable
that
refers to an object of type Symbol. I got nearly a three percent speed
improvement by changing this out, and it was only over a few hundred
thousand iterations, not ten million. I haven’t looked at the
implementation
(yet) so I have no clue why this behaves as it does.

Hugh S. wrote:

I’ve tried, but FORTH still hasn’t clicked with me yet…

Check out the gForth and vmgen manuals at

http://www.ugcs.caltech.edu/manuals/devtool/vmgen-0.6.2/index.html

There was a project to build a Ruby virtual machine using vmgen.

There exists at least this effort to use Genetic Algorithms for
tuning compiler options. I’ve not explored it yet.

http://www.coyotegulch.com/products/acovea/index.html

One may need a cluster of machines (of many platforms?) to do this
usefully, but still. Maybe Rinda can help us all contribute…

I think I installed acovea once – it’s part of Gentoo – but I don’t
remember doing anything with it. But the concept is certainly
intriguing, and might be more so to the folks on this list who are
always talking about how machine cycles are cheaper than programmer
cycles. Of course, if the programmer has to spend his or her cycles
waiting for a genetic algorithm to converge …

:slight_smile:

I’ve had bad experiences in the past with this sort of optimization.
“Real” compiler optimization is a hard problem in the complexity sense,
plus there’s all the time you have to spend correctness-testing the
optimized versions. My experience has been it’s far better to pluck the
low-hanging fruit, which is what gcc does by itself, and which is what
the designers of virtual machines do.

Those are simple “low-hanging fruit” tricks … stuff you can do without
actually knowing what’s going on inside the Ruby interpreter. It will be
painfully obvious from the profiles, I think, where the opportunities are.

    Hugh

Yeah …

On Sat, 16 Sep 2006, M. Edward (Ed) Borasky wrote:

Hugh S. wrote:

I’ve tried, but FORTH still hasn’t clicked with me yet…

    [...]

Thanks, I’ll reply off list about that.

remember doing anything with it. But the concept is certainly
intriguing, and might be more so to the folks on this list who are
always talking about how machine cycles are cheaper than programmer
cycles. Of course, if the programmer has to spend his or her cycles
waiting for a genetic algorithm to converge …

:slight_smile:

GA’s aren’t that quick, and would not be for a ruby build. But it’s
something to explore, just because it might teach us something.

I’ve had bad experiences in the past with this sort of optimization.
“Real” compiler optimization is a hard problem in the complexity sense,
plus there’s all the time you have to spend correctness-testing the

Well, at least we have a set of tests for ruby, and we can use that
as part of the fitness function.

optimized versions. My experience has been it’s far better to pluck the
low-hanging fruit, which is what gcc does by itself, and which is what
the designers of virtual machines do.

People have stated that implementation method despatch in ruby are
naive.

http://smallthought.com/avi/?p=16

that creating Procs, and continuations are slow:

http://lambda-the-ultimate.org/node/1470

and other people have mentioned the garbage collection system.

I’m certainly not in a position to suggest what might be done about
these things, or to denigrate the implementations as they stand, but
these are about the only specific things I can find people pointing
to, (other than the general remarks about ruby being slow, which add
more heat than light). So I think we have some juicy pieces of fruit
to bite into here, but I don’t think they are low hanging, not for
me anyway. :slight_smile:

    Hugh

On 9/15/06, Kenosis [email protected] wrote:

something to explore, just because it might teach us something.

the designers of virtual machines do.

I can’t recall whether I read about this on this site or in some
magazine article but I recall it being interesting to me: I think it
was called profile driven optimization. My vague recollection is that
gcc can optimize based on a runtime profile. So, you run ruby over
your application while profiling it all together, essentually profiling
ruby in the context of your application. Then you use the profile guide
gcc to rebuild a version of ruby optimized for your application. Might
be worth some research.

RedHat Magazine had an article on GCC optimizations that talked about
this:

http://www.redhat.com/magazine/011sep05/features/gcc/

it’s also a recurring topic at the GCC summit:
http://www.gccsummit.org/2005/view_abstract.php?content_key=7
http://www.gccsummit.org/2006/view_abstract.php?content_key=17

pat eyler wrote:

this:

http://www.redhat.com/magazine/011sep05/features/gcc/

it’s also a recurring topic at the GCC summit:
http://www.gccsummit.org/2005/view_abstract.php?content_key=7
http://www.gccsummit.org/2006/view_abstract.php?content_key=17
Well … the good news is that I have gcc 4.1.1 and a “test suite”
consisting of a single benchmark, plus scripts to build Ruby and
YARV-Ruby with “gprof” enabled. The bad news is that I have very little
play time this weekend because I’m attending a couple of workshops on
… well … other programming languages. :slight_smile:

The test suite can be found at

http://rubyforge.org/cgi-bin/viewvc.cgi/MatrixBenchmark/?root=cougar

By the way, the kind of code I’m interested in running efficiently in
Ruby is well-represented by the matrix benchmark. Pretty much everything
I want to do can be expressed ultimately in terms of matrix
multiplication, and I’m on the verge of filing a Ruby Change Request to
get the Mathn, Rational, Complex and Matrix libraries coded up in C and
become part of the base Ruby language.

I’m well aware of the dozens of C and C++ math libraries that have been
interfaced with Ruby, and dozens more that could be interfaced, given
some love (and SWIG). :slight_smile: However, the “pure Ruby” libraries I listed
above are exactly what I need.

Hugh S. wrote:

Thanks, I’ll reply off list about that.

remember doing anything with it. But the concept is certainly
I’ve had bad experiences in the past with this sort of optimization.
People have stated that implementation method despatch in ruby are naive.
these things, or to denigrate the implementations as they stand, but
these are about the only specific things I can find people pointing
to, (other than the general remarks about ruby being slow, which add
more heat than light). So I think we have some juicy pieces of fruit
to bite into here, but I don’t think they are low hanging, not for
me anyway. :slight_smile:

    Hugh

I can’t recall whether I read about this on this site or in some
magazine article but I recall it being interesting to me: I think it
was called profile driven optimization. My vague recollection is that
gcc can optimize based on a runtime profile. So, you run ruby over
your application while profiling it all together, essentually profiling
ruby in the context of your application. Then you use the profile guide
gcc to rebuild a version of ruby optimized for your application. Might
be worth some research.

Ken

For those who are interested, I already measured the effect of GCC
optimization flags on Ruby speed using the MatrixBenchmark, which was
the
only one I had at the time. Results here:
http://www.jhaampe.org/software/ruby-gcc

Quoting Robert D. [email protected]:

Well I do not believe in the right of OP to have the thread following the
and Ruby2 to have a sound base for discussion.

Thanks for all contributions (included the future ones :wink:

  1. In his specific case, the choice of a built-in matrix multiply and
    inverse
    was deliberate for two reasons. First, the natural expression of the
    problems
    I’m ultimately trying to solve in Ruby is in terms of matrices. I don’t
    consider that a performance blunder. Second, I wanted something to
    benchmark
    that exercised a lot of the Ruby internals.

  2. The overall objective of the project from which this benchmark came
    is to
    build a performance modeling tool set in pure Ruby. Given that, one
    can only
    improve performance by tweaking the way the Ruby interpreter is
    compiled. I
    don’t know when YARV and Ruby 2 will be as stable as Ruby 1.8.5 is now.
    YARV is
    indeed four times as fast on this benchmark, and I like that. I don’t
    see why
    the same speed improvements can’t be made in the base Ruby 1.8
    interpreter.

  3. I am exploring other options, such as interfacing with existing
    C/C++ math
    libraries, rewriting Matrix, Rational, Complex and Mathn in C, etc. But
    I
    personally don’t see why a built-in function to process vectors and
    matrices
    can’t run at the same speed as C code. Of course, to do so would require
    rational and complex numbers, matrices and vectors to be built-in data
    types
    just like Fixnums, Floats, Bignums and Strings, and that means
    convincing the
    Ruby community that it’s a good idea and waiting for the version of Ruby
    where
    such semantic changes are part of the language definition.

In short, there are many ways to do it. Implementing efficient matrix
and vector
operations in the language is what I would consider the “best way”,
assuming we
could get the Ruby community to agree on whether or not the elements of
a
vector or matrix can be changed once the object has been created. :slight_smile: My
opinion
is unaltered on that subject – it should be possible, with an optional
warning.

On 9/15/06, Sylvain J. [email protected] wrote:

For those who are interested, I already measured the effect of GCC
optimization flags on Ruby speed using the MatrixBenchmark, which was the
only one I had at the time. Results here:
http://www.jhaampe.org/software/ruby-gcc

Sylvain J.

I find all that very interesting, completely OT, but interesting.
Well I do not believe in the right of OP to have the thread following
the
topic in a ML, the irony is that I just got off another thread because
some
other people do not share that idea and I thaught it might be a good
idea to
respect their believes.

However what I really wanted to say:
Very interesting but is there nothing which can be said about Ruby,
everybody is talking about implementations and tweaks.
I thaught it might be a good idea to talk about crimes and blunders,
performance wise.
I am not sure any more that this might be possible, maybe we wait for
YARV
and Ruby2 to have a sound base for discussion.

Thanks for all contributions (included the future ones :wink:

Robert


Deux choses sont infinies : l’univers et la bêtise humaine ; en ce qui
concerne l’univers, je n’en ai pas acquis la certitude absolue.

  • Albert Einstein