Forum: Ruby Huge performance gap

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
8342950f34a66d9ec7cfe10b33d5494c?d=identicon&s=25 Alexis Reigel (Guest)
on 2006-02-23 23:04
(Received via mailing list)
Hi all

I've ported the following c++ code to ruby. It is a recursive
(backtracking) sudoku-solving algorithm. Well, I was rather surprised by
the execution time I got:

c++ code:   0.33 seconds
ruby code: 27.65 seconds

The implementation should do the same, at least they run through the
method/function "next_state"/"nextone" both 127989 times.

Now how can it be that the ruby code is so awfully slow?
Is that normal for ruby?
Or is my implementation so horribly bad?
I am aware that the non-native and object-oriented ruby code won't be as
fast as the c++ one, but I didn't expect such a gap.


Thanks for comments.

Alexis.
D111305c32e46f7dd2794a956208d347?d=identicon&s=25 E. Saynatkari (Guest)
on 2006-02-23 23:07
Alexis Reigel wrote:
> Hi all
>
> I've ported the following c++ code to ruby. It is a recursive
> (backtracking) sudoku-solving algorithm. Well, I was rather surprised by
> the execution time I got:
>
> c++ code:   0.33 seconds
> ruby code: 27.65 seconds
>
> The implementation should do the same, at least they run through the
> method/function "next_state"/"nextone" both 127989 times.
>
> Now how can it be that the ruby code is so awfully slow?
> Is that normal for ruby?
> Or is my implementation so horribly bad?
> I am aware that the non-native and object-oriented ruby code won't be as
> fast as the c++ one, but I didn't expect such a gap.

Post the code somewhere, there might be room for improvement
in the algorithm though it will still be considerably slower.

> Thanks for comments.
>
> Alexis.


E
81d609425e306219d54d793a0ad98bce?d=identicon&s=25 Matthew Moss (Guest)
on 2006-02-23 23:38
(Received via mailing list)
On 2/23/06, Alexis Reigel <mail@koffeinfrei.org> wrote:
> method/function "next_state"/"nextone" both 127989 times.
>
> Now how can it be that the ruby code is so awfully slow?
> Is that normal for ruby?
> Or is my implementation so horribly bad?
> I am aware that the non-native and object-oriented ruby code won't be as
> fast as the c++ one, but I didn't expect such a gap.

Others can better speak on ruby specifics, but...
Ruby is interpreted (inefficiently?), C++ is compiled.
And the algorithm is something like O(n^3)? Or worse?
Seems like a reasonable difference to me...
48d1aca7191f2d16e184971054c7c143?d=identicon&s=25 Meinrad Recheis (Guest)
on 2006-02-23 23:38
(Received via mailing list)
On 2/23/06, Alexis Reigel <mail@koffeinfrei.org> wrote:
> [...]
>

ruby code may be up to 100 times slower than c++ binaries. i think it is
normal.
also i don't see any common performance crimes (such as using += with
strings) in your code.
-- henon
280b41a88665fd8c699e83a9a25ef949?d=identicon&s=25 Stephen Waits (Guest)
on 2006-02-23 23:41
(Received via mailing list)
E. Saynatkari wrote:
>
> Post the code somewhere, there might be room for improvement
> in the algorithm though it will still be considerably slower.

It looks, to me, like he attached his code to the OP.

Regardless, it doesn't matter.  Algorithmic improvements may help both
the C++ and Ruby versions - but it's not going to change the fact that
one is a relatively low-level language, compiled to native machine code,
and the other is an interpreted dynamic language.  To compare them is
either ridiculous, or more likely in this case, simply ignorant.

--Steve
D111305c32e46f7dd2794a956208d347?d=identicon&s=25 E. Saynatkari (Guest)
on 2006-02-23 23:45
Stephen Waits wrote:
> E. Saynatkari wrote:
>>
>> Post the code somewhere, there might be room for improvement
>> in the algorithm though it will still be considerably slower.
>
> It looks, to me, like he attached his code to the OP.

Ah, caveat forum-user!

> Regardless, it doesn't matter.  Algorithmic improvements may help both
> the C++ and Ruby versions - but it's not going to change the fact that
> one is a relatively low-level language, compiled to native machine code,
> and the other is an interpreted dynamic language.  To compare them is
> either ridiculous, or more likely in this case, simply ignorant.

In general, sure. Ruby will afford doing some things better
than most C++ coders would (or would bother to), so it might
be worth looking into.

Plus, if one were to get the Ruby time down to 15 seconds, it
would still be worth it even if the C++-version were cut to 0.15
seconds (mainly because it would probably take at least twice
as long to implement in C++).

> --Steve


E
Bb6ecee0238ef2461bef3416722b35c5?d=identicon&s=25 pat eyler (Guest)
on 2006-02-23 23:50
(Received via mailing list)
On 2/23/06, Meinrad Recheis <meinrad.recheis@gmail.com> wrote:
> >
> > [...]
> >
>
> ruby code may be up to 100 times slower than c++ binaries. i think it is
> normal.
> also i don't see any common performance crimes (such as using += with
> strings) in your code.

there are a couple of (really minor) things like using for loops instead
of uptos, but those won't buy the kind of time Alexis wants to see.

RubyInline might be just the ticket though.   Run the code through
the profiler (a long process) and Inline the method that's making the
biggest hit.
8342950f34a66d9ec7cfe10b33d5494c?d=identicon&s=25 Alexis Reigel (Guest)
on 2006-02-23 23:56
(Received via mailing list)
Stephen Waits wrote:
> the C++ and Ruby versions - but it's not going to change the fact that
> one is a relatively low-level language, compiled to native machine code,
> and the other is an interpreted dynamic language.  To compare them is
> either ridiculous, or more likely in this case, simply ignorant.
>
> --Steve
>
Why should that be ridiculous or ignorant?
I stated that I was aware of the differences between interpreted and
compiled languages. But that does not change the fact that I believe
that this does not explain the performance gap. An execution time of
27.65 seconds against 0.33 seconds is not just nothing is it? It's a
factor of over 80 times. Besides, I implemented the same code in java
too, which isn't native code as well and runs in a virtual machine too,
and it executed in about the same time as c++.
D111305c32e46f7dd2794a956208d347?d=identicon&s=25 E. Saynatkari (Guest)
on 2006-02-24 00:02
Alexis Reigel wrote:
> Stephen Waits wrote:
>> the C++ and Ruby versions - but it's not going to change the fact that
>> one is a relatively low-level language, compiled to native machine code,
>> and the other is an interpreted dynamic language.  To compare them is
>> either ridiculous, or more likely in this case, simply ignorant.
>>
>> --Steve
>>
> Why should that be ridiculous or ignorant?
> I stated that I was aware of the differences between interpreted and
> compiled languages. But that does not change the fact that I believe
> that this does not explain the performance gap. An execution time of
> 27.65 seconds against 0.33 seconds is not just nothing is it? It's a
> factor of over 80 times. Besides, I implemented the same code in java
> too, which isn't native code as well and runs in a virtual machine too,
> and it executed in about the same time as c++.

Well, Ruby is strictly interpreted using the parse tree instead
of VM opcodes which may or may not (depending on who you ask)
make a difference. Ruby is pretty slow but usually Fast Enough(tm).

You could try to run your script on YARV[1] to see if it helps.

[1] http://atdot.net/yarv


E
B97225f66bb5caac601b12735d430a0d?d=identicon&s=25 Marcin MielżyÅ?ski (Guest)
on 2006-02-24 00:08
(Received via mailing list)
Alexis Reigel wrote:
>> one is a relatively low-level language, compiled to native machine code,
> factor of over 80 times. Besides, I implemented the same code in java
> too, which isn't native code as well and runs in a virtual machine too,
> and it executed in about the same time as c++.
>
>

Which version of java did you use ? Since 1.4 there is a JIT compiler so
java _IS_ compiled into native code unless you disable it explicitly.

lopex
B97225f66bb5caac601b12735d430a0d?d=identicon&s=25 Marcin MielżyÅ?ski (Guest)
on 2006-02-24 00:11
(Received via mailing list)
E. Saynatkari wrote:

>
> You could try to run your script on YARV[1] to see if it helps.
>
> [1] http://atdot.net/yarv
>
>
> E
>

And turn the magic opcodes on :D

http://eigenclass.org/hiki.rb?yarv+ueber+algorithm...

lopex
E34b5cae57e0dd170114dba444e37852?d=identicon&s=25 Logan Capaldo (Guest)
on 2006-02-24 00:11
(Received via mailing list)
On Feb 23, 2006, at 5:55 PM, Alexis Reigel wrote:

> Why should that be ridiculous or ignorant?
> I stated that I was aware of the differences between interpreted and
> compiled languages. But that does not change the fact that I believe
> that this does not explain the performance gap. An execution time of
> 27.65 seconds against 0.33 seconds is not just nothing is it? It's a
> factor of over 80 times. Besides, I implemented the same code in java
> too, which isn't native code as well and runs in a virtual machine
> too,
> and it executed in about the same time as c++.

Ruby's method (function) lookup is gonna be slower no matter what
because of the typing situation. That's probably the biggest hit
here. The C++ code can for the most part turns function calls into
jumps at compile time (excluding virtual methods, although even there
there is less indirection than ruby). Similarly for Java. Have you
tried running it in YARV?
2ee1a7960cc761a6e92efb5000c0f2c9?d=identicon&s=25 William James (Guest)
on 2006-02-24 00:14
(Received via mailing list)
Alexis Reigel wrote:
> > Regardless, it doesn't matter.  Algorithmic improvements may help both
> that this does not explain the performance gap. An execution time of
> 27.65 seconds against 0.33 seconds is not just nothing is it? It's a
> factor of over 80 times. Besides, I implemented the same code in java
> too, which isn't native code

More ignorance.  Java has a JIT compiler which produces
machine code.
4feed660d3728526797edeb4f0467384?d=identicon&s=25 Bill Kelly (Guest)
on 2006-02-24 00:26
(Received via mailing list)
From: "William James" <w_a_x_man@yahoo.com>
>> >
>> compiled languages. But that does not change the fact that I believe
>> that this does not explain the performance gap. An execution time of
>> 27.65 seconds against 0.33 seconds is not just nothing is it? It's a
>> factor of over 80 times. Besides, I implemented the same code in java
>> too, which isn't native code
>
> More ignorance.  Java has a JIT compiler which produces
> machine code.

Alexis,

This is usually a friendly community, by the way.    :rolleyes:

But yes, it's harder to make a language like Ruby, which is highly
dynamic at runtime, fast like C++ and Java, which are primarily
statically compiled.  The Smalltalk folks have reportedly done pretty
well though, so there exists the possiblilty that Ruby may get
substantially faster in the future.  YARV is already making some
headway.

Regards,

Bill
8342950f34a66d9ec7cfe10b33d5494c?d=identicon&s=25 Alexis Reigel (Guest)
on 2006-02-24 00:33
(Received via mailing list)
>
> More ignorance.  Java has a JIT compiler which produces
> machine code.
>

Why are you being so mean? I wasn't aware of that.
I was just wondering why my results were so significantly different. All
I was asking was if someone had some explanations and comments on why
that is like that. I got some nice and reasonable answers, but I got
some unkind answers too... I didn't ask for bitter and unconstructive
comments. Did I somehow offend your honor? I did not say that ruby is
crap compared to c++ or java. I find ruby is an absolute fantastic
language. I was just surprised about my results.
280b41a88665fd8c699e83a9a25ef949?d=identicon&s=25 Stephen Waits (Guest)
on 2006-02-24 00:57
(Received via mailing list)
Alexis Reigel wrote:
>  >
>> More ignorance.  Java has a JIT compiler which produces
>> machine code.
>
> Why are you being so mean? I wasn't aware of that.

Calm down Alexis.. nobody is being mean.  When someone doesn't
understand something because they aren't educated about it, they are
considered ignorant.  There's nothing wrong with that.

Enough people have explained the reasons for the performance gap by this
point that it should be clear.  If you still need more help with this
issue, please let us know.

Meanwhile, I'm certain that none of us intended to be mean, or otherwise
attack you in any way.

--Steve
70c8da82d09d3866222976ab8978133c?d=identicon&s=25 Daniel Nugent (Guest)
on 2006-02-24 03:43
(Received via mailing list)
Yikes!  What's with the snarkiness guys?

One thing you should consider in particular, Alexis, is the
performance impact Ruby's objects are causing.  In your C++ code, it
appears that everything's running on the stack.  The Ruby interpreter
is allocating and disposing of every object in the Ruby code onto the
heap and running garbage collection on them.

It might be worth attempting to at least change the code into
accepting Sodoku puzzles of any size (bear in mind I haven't looked at
Sodoku solvers very much myself, so this may have all sorts of
technical challenges I'm unaware of).
4fea1ef11180adaaa299d503ca6010d0?d=identicon&s=25 John W. Kennedy (Guest)
on 2006-02-24 03:46
(Received via mailing list)
Alexis Reigel wrote:
>> one is a relatively low-level language, compiled to native machine code,
> factor of over 80 times. Besides, I implemented the same code in java
> too, which isn't native code as well and runs in a virtual machine too,
> and it executed in about the same time as c++.

Most modern Java implementations (on full computers, not PDAs and the
like) are /not/ interpreted. The interpreter compiles the bytecode into
machine code.

Furthermore, even when interpreted, Java has typed variables. A Java int
is always a 32-bit 2's-complement integer. "i = j + k;", where each of
i, j, and k is an int, is a simple operation involving about three
instructions in either the Java Virtual Machine or the real machine. A
Ruby variable could be an integer, a big-integer, a floating-point
number, a character string, or even something to which "+" doesn't
apply, and, every time an expression is evaluated, that all has to be
worked out.

The convenience of Ruby, Perl, REXX, JavaScript, and similar languages
is considerable. But it comes at a price. If the bottleneck in the
program is the speed of your disk, or of your IP connection, that price
probably doesn't matter. But if you're doing substantial calculations in
RAM, it may not be worth it.

You can't always generalize, though. Ruby is faster than Java at finding
perfect numbers (probably because Ruby's implementation of big integers
is faster than Java's), and both are considerably faster than Perl
(probably because Perl forces /all/ numbers to be big integers, if any
are) (and GNU Common LISP is faster than Ruby).
Cee38055ae36590c654c04c2d5cc2778?d=identicon&s=25 Sky Yin (cookoo)
on 2006-02-24 04:41
(Received via mailing list)
100 times is normal. Just use ruby for where it works well and use
compling
language for computation-intensive tasks. If you look at other
languages,
there's still a gap (about 5 times) between VMs without JIT and ones
with
JIT. Good news is that now we have .Net ruby bridge. Prototyping in ruby
and
optimizing critical part in .net sounds very efficient and productive.

Sky
25e11a00a89683f7e01e425a1a6e305c?d=identicon&s=25 Wilson Bilkovich (Guest)
on 2006-02-24 17:05
(Received via mailing list)
On 2/23/06, Bill Kelly <billk@cts.com> wrote:
> >> > It looks, to me, like he attached his code to the OP.
> >> I stated that I was aware of the differences between interpreted and
>
> This is usually a friendly community, by the way.    :rolleyes:
>
> But yes, it's harder to make a language like Ruby, which is highly
> dynamic at runtime, fast like C++ and Java, which are primarily
> statically compiled.  The Smalltalk folks have reportedly done pretty
> well though, so there exists the possiblilty that Ruby may get
> substantially faster in the future.  YARV is already making some
> headway.
>

Yep. YARV does do better on this test:

# YARV 0.4.0
e:\yarv\bin\ruby sudoku-solver.rb
time elapsed: 18.953 sec.
count: 127989
3 6 2 4 9 5 7 8 1
9 7 1 6 2 8 5 3 4
8 5 4 1 3 7 9 6 2
2 9 3 5 6 4 1 7 8
5 1 7 3 8 2 4 9 6
6 4 8 9 7 1 2 5 3
7 2 9 8 1 3 6 4 5
1 8 5 7 4 6 3 2 9
4 3 6 2 5 9 8 1 7

# Regular ruby 1.8.4
ruby sudoku-solver.rb
time elapsed: 27.812 sec.
count: 127989
3 6 2 4 9 5 7 8 1
9 7 1 6 2 8 5 3 4
8 5 4 1 3 7 9 6 2
2 9 3 5 6 4 1 7 8
5 1 7 3 8 2 4 9 6
6 4 8 9 7 1 2 5 3
7 2 9 8 1 3 6 4 5
1 8 5 7 4 6 3 2 9
4 3 6 2 5 9 8 1 7
8979474815030ad4a5d59718d1905715?d=identicon&s=25 unknown (Guest)
on 2006-02-25 00:04
(Received via mailing list)
Alexis Reigel wrote:
> comments. Did I somehow offend your honor? I did not say that ruby is
> crap compared to c++ or java. I find ruby is an absolute fantastic
> language. I was just surprised about my results.

If you have the curiousity there are all kinds of results to wonder
about, for example Ruby compared to a C interpreter ;-)
http://shootout.alioth.debian.org/gp4sandbox/bench...
308cbef6e86dfc49cce3b2d4cf42aedc?d=identicon&s=25 SASADA Koichi (Guest)
on 2006-06-30 03:33
(Received via mailing list)
Hi,

I found this article today :)

E. Saynatkari wrote:
> Well, Ruby is strictly interpreted using the parse tree instead
> of VM opcodes which may or may not (depending on who you ask)
> make a difference. Ruby is pretty slow but usually Fast Enough(tm).
>
> You could try to run your script on YARV[1] to see if it helps.

On my Linux (on VMware) machine:

  Ruby: time elapsed: 27.811268 sec.
  YARV: time elapsed: 2.892428 sec.

(YARV with some special optimization option)

Regards,
2abf5beb51d5d66211d525a72c5cb39d?d=identicon&s=25 Paul Battley (Guest)
on 2006-06-30 11:10
(Received via mailing list)
On 23/02/06, Alexis Reigel <mail@koffeinfrei.org> wrote:
> Hi all
>
> I've ported the following c++ code to ruby. It is a recursive
> (backtracking) sudoku-solving algorithm. Well, I was rather surprised by
> the execution time I got:
>
> c++ code:   0.33 seconds
> ruby code: 27.65 seconds

Ruby *is* relatively slow, but the algorithm is the problem. This Ruby
program I wrote a while ago can solve it in under half a second:

http://po-ru.com/files/sudoku/1.0/sudoku.tar.gz

It's quite a bit more complicated, however.

Paul.
11694e3999990056e0094cb285d44346?d=identicon&s=25 Reggie Mr (rpw)
on 2006-07-01 13:02
Here is a simple graph of performance by different platforms.

http://www.usenetbinaries.com/doc/Web_Platform_Ben...
31ab75f7ddda241830659630746cdd3a?d=identicon&s=25 Austin Ziegler (austin)
on 2006-07-01 13:56
(Received via mailing list)
On 7/1/06, Reggie Mr <buppcpp@yahoo.com> wrote:
> Here is a simple graph of performance by different platforms.
>
> http://www.usenetbinaries.com/doc/Web_Platform_Ben...

I can't think of a more useless "test" other than anything put out by
the Alioth shootout.

The numbers have limited interest because they're not necessarily
using the *power* of the frameworks and merely measure the response to
what is essentially a static return value. It would be more
interesting to benchmark a simple guestbook in all of the things they
did. Guestbooks are relatively easy to write and would be
significantly more "test-worthy" than a "hello world" implementation.

-austin
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 M. Edward (Ed) Borasky (Guest)
on 2006-07-01 19:10
(Received via mailing list)
Austin Ziegler wrote:
> I can't think of a more useless "test" other than anything put out by
> the Alioth shootout.
>
> The numbers have limited interest because they're not necessarily
> using the *power* of the frameworks and merely measure the response to
> what is essentially a static return value. It would be more
> interesting to benchmark a simple guestbook in all of the things they
> did. Guestbooks are relatively easy to write and would be
> significantly more "test-worthy" than a "hello world" implementation.
My experience with web application benchmarks (nearly all
Windows/IIS/ASP/SQL Server and Linux/Apache 1.3/PostgreSQL/PHP) has
shown that the two most likely bottlenecks in a *real* web application
are the network bandwidth and the database. If the application generates
too much network traffic, or if its database design is poorly done,  it
doesn't matter all that much whether the underlying glue logic is in C,
Perl, PHP, Python, Java or Ruby, whether the OS is Windows or Linux or
some other, or what the web server itself is. The database, on the other
hand, does matter -- a *lot*. And that, my friends, explains why Larry
Ellison is such a rich man. :)

Having said that, the benchmark in the original poster's link, on the
other hand, measures the web server and a single module within it. I
think it's a perfectly valid benchmark for the specific web server/glue
language *component* of a web application, and I think there is a lesson
or two in the results for all of us:

1. C scales better than Perl, PHP, Python and Ruby, all other things
being equal. Make sure your C skills are up to date. :)

2. You probably want to hold off upgrading from PHP 4 to PHP 5, and you
want to performance test before you do.

3. Unless there is some compelling *business* reason to use one of the
other technologies, you probably want to avoid anything below PHP 4 in
this chart. Programmer time to implement the application is certainly a
compelling business reason. :)

4. The Ruby community needs to get Ruby's performance up where PHP 4 is
on benchmarks like this. It would be wonderful if it was better than
Perl and PHP, but a bare minimum is to be competitive with PHP 4.

On 4, I'm not sure a "virtual machine" is the answer, by the way.
"Virtual machines", or as I prefer to call them, "abstract machines",
were primarily intended for portability, not performance. C happens to
be a great abstract machine, and GCC happens to be a great way to
achieve portability and performance.


--
M. Edward (Ed) Borasky

http://linuxcapacityplanning.com
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 M. Edward (Ed) Borasky (Guest)
on 2006-07-01 19:16
(Received via mailing list)
SASADA Koichi wrote:
>
> (YARV with some special optimization option)
>
Guest machines under VMware are pretty much useless as a performance
profiling platform, for a variety of reasons.

--
M. Edward (Ed) Borasky

http://linuxcapacityplanning.com
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-07-01 19:43
(Received via mailing list)
On Sun, 2 Jul 2006, M. Edward (Ed) Borasky wrote:

> On 4, I'm not sure a "virtual machine" is the answer, by the way.  "Virtual
> machines", or as I prefer to call them, "abstract machines", were primarily
> intended for portability, not performance. C happens to be a great abstract
> machine, and GCC happens to be a great way to achieve portability and
> performance.

amen!

-a
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 M. Edward (Ed) Borasky (Guest)
on 2006-07-01 20:39
(Received via mailing list)
ara.t.howard@noaa.gov wrote:
>
> amen!
>
> -a
I really *should* learn how to program in C. :) I can just barely read
C, actually.

Then again, at least a generation of C programmers have implemented GCC,
Perl, R, Linux, Ruby, etc., so I haven't felt the need. And some of the
Lisp and Scheme environments seem to be just as efficient abstract
machines as C/GCC. And then there's Forth -- another efficient abstract
machine. Choices, choices, too many choices. :)



--
M. Edward (Ed) Borasky

http://linuxcapacityplanning.com
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-07-02 00:05
(Received via mailing list)
2006/7/1, M. Edward (Ed) Borasky <znmeb@cesmail.net>:
> 4. The Ruby community needs to get Ruby's performance up where PHP 4 is
> on benchmarks like this. It would be wonderful if it was better than
> Perl and PHP, but a bare minimum is to be competitive with PHP 4.
>
> On 4, I'm not sure a "virtual machine" is the answer, by the way.
> "Virtual machines", or as I prefer to call them, "abstract machines",
> were primarily intended for portability, not performance. C happens to
> be a great abstract machine, and GCC happens to be a great way to
> achieve portability and performance.

There's a significant difference between GCC and the JVM for example:
VM's can collect performance data while the application is running
whereas GCC has to optimize at compile time. This yields advantages
for the VM approach because it can better target optimizations.
Depending on application performance of a Java app doesn't differ
significantly from a C app but the programming model is more
convenient, more robust and thus more efficient.

Kind regards

robert
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-07-02 00:20
(Received via mailing list)
On Sun, 2 Jul 2006, Robert Klemme wrote:

> There's a significant difference between GCC and the JVM for example: VM's
> can collect performance data while the application is running whereas GCC
> has to optimize at compile time. This yields advantages for the VM approach
> because it can better target optimizations.  Depending on application
> performance of a Java app doesn't differ significantly from a C app but the
> programming model is more convenient, more robust and thus more efficient.
>
> Kind regards
>
> robert

how is this performance data available significantly different from that
made
transparent by gcc/gprof/gdb/dmalloc/etc - gcc can encode plenty of
information for tools like these to dump reams of info at runtime.  or
are you
referring to a vm's ability to actually adapt the runtime code?  if so
then it
seems like even compiled languages can accomplish this if the language
is a
first class data type and the code segment can be manipulated as data

   http://64.233.167.104/search?q=cache:mNkjHYGIbE4J:...

is an interesting read.  if one accepts that compile time metaprogamming
is
useful that it's a small leap to execute compile time metaprogramming to
enhance performance based on runtime characteristics.  but maybe i'm way
off
base here...

kind regards.

-a
F1d37642fdaa1662ff46e4c65731e9ab?d=identicon&s=25 Charles O Nutter (Guest)
on 2006-07-02 00:42
(Received via mailing list)
On 7/1/06, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov> wrote:
> > performance of a Java app doesn't differ significantly from a C app but
> transparent by gcc/gprof/gdb/dmalloc/etc - gcc can encode plenty of
>
> is an interesting read.  if one accepts that compile time metaprogamming
> is
> useful that it's a small leap to execute compile time metaprogramming to
> enhance performance based on runtime characteristics.  but maybe i'm way
> off
> base here...
>

Ahhh, venturing into a domain I love talking about.

Runtime-modification of code is exactly what sets the JVM apart from
static
compilation and optimization in something like GCC. The link Ara posted
above is another way to look at the same kind of runtime modification.
The
JVM, because it JIT compiles code rather than AOT, can change the
parameters
of that compilation whenever it likes. If a particular piece of
JIT-compiled
code is heavier on integer math than on memory allocation, it may re-JIT
to
avoid processor-intensive aspects of object creation and garbage
collection.
If a piece of code is used heavily, there may be opportunities to
dynamically inline or reorder subroutines at runtime. All the tricks C
coders might have to do by hand or decide on at compile time can be done
as
needed based on runtime performance profiling.

You could go through a gcc/gdb/gprof and so on cycle, but the point of
the
JVM is that you don't have to burn those hours. You write the code once,
and
the JVM gobbles it up, runs it interpreted for a (short) while, and then
starts generating native machine code that fits the runtime profile it
has
gathered. As that profile changes, code can be regenerated, and indeed
the
longer an application runs, the faster it gets.

It is for this reason that many algorithms running in the JVM run as
fast as
C or C++ equivalents.

Ruby has great potential to make these same kinds of optimizations at
runtime, and as I understand it, YARV will do quite a bit of "smart"
optimization at runtime.

The JVM got a bad wrap because in versions 1.2 and earlier, it really
was
slow. Since 1.3, however, it has increased tremendously in
performance. 1.3was many times faster than
1.2. 1.4 was twice as fast as 1.3. 1.5 and 1.6 are each another 20-25%
faster again. I think the JVM and the recent success of the .NET CLR
have
shown that a VM approach is a great way to go.
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 M. Edward (Ed) Borasky (Guest)
on 2006-07-02 01:21
(Received via mailing list)
Robert Klemme wrote:
>
> There's a significant difference between GCC and the JVM for example:
> VM's can collect performance data while the application is running
> whereas GCC has to optimize at compile time. This yields advantages
> for the VM approach because it can better target optimizations.
Ah, but at least for the multiprogramming case, so can (and *does*) the
operating system! And the *interpreter* can "collect performance data
while the application is running" and optimize just as easily -- maybe
ever more easily -- than some underlying abstract machine.

In any event:

1. The hardware is optimized to statistical properties of the workloads
it is expected to run.

2. The operating system is optimized to statistical properties of the
workloads it is expected to run and the hardware it is expected to run
on.

3. Compilers are optimized to statistical properties of the programs
they are expected to compile and the hardware the compiled programs are
expected to run on.

As a result, I don't see the need for another layer of abstraction. It's
something else that needs to be optimized!


--
M. Edward (Ed) Borasky

http://linuxcapacityplanning.com
F1d37642fdaa1662ff46e4c65731e9ab?d=identicon&s=25 Charles O Nutter (Guest)
on 2006-07-02 01:46
(Received via mailing list)
On 7/1/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:
>
> Ah, but at least for the multiprogramming case, so can (and *does*) the
> operating system! And the *interpreter* can "collect performance data
> while the application is running" and optimize just as easily -- maybe
> ever more easily -- than some underlying abstract machine.


An interpreter *is* an underlying abstract machine. Ruby and Java both
have
interpreters that do basically the same thing. Java additionally has a
JIT
compiler that takes code the next step toward native.
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 unknown (Guest)
on 2006-07-02 03:44
(Received via mailing list)
Quoting Charles O Nutter <headius@headius.com>:

> An interpreter *is* an underlying abstract machine. Ruby and Java both have
> interpreters that do basically the same thing. Java additionally has a JIT
> compiler that takes code the next step toward native.
Exactly! Now when Java first came about, the underlying hardware was
fairly
diverse and the designers of the Java runtime chose to implement an
extra layer
for the sake of portability. After all, they had to run on almost any
flavor of
OS, including some ancient models of Windows, Tru64, OS, ancient models
of
MacOs, Solaris, and IIRC the IBM mainframes as well. IIRC, Windows NT
also ran
on Alphas and MIPS back then. :) And to top it off, memories were
smaller and
machines were slower.

A lot of those options don't make a lot of sense any more. The hardware
is
dominated by x86 and x86-64, with SPARC and PowerPC running well behind.
There's Windows, there's Linux, there's Solaris and MacOS among the
living,
breathing operating systems. And there's GCC, Microsoft's compiler
chain,
perhaps a native Sun compiler, and a few specialized compilers. "Write
once,
run anywhere" doesn't have to worry about as many "anywheres" as it used
to. :)

So Java can afford a just-in-time compiler. But do they *need* an
intermediate
abstract machine? Probably not any more. It probably would cost them
more to
refactor around it than to leave it in place. But I think if they were
designing Java today, there wouldn't need to be a JVM.

Just putting on my statistician's hat, I'd say Ruby should have a JIT
compiler
for x86 and x86-64 built by the "Ruby community", that an extra abstract
machine layer isn't necessary, and that only Windows, Linux and MacOS
need a
complete Ruby environment built and maintained by the "Ruby community".
Just
about anything more might make sense to do if some business thought they
could
make a profit from the effort, but I don't see it as a viable community
project.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-07-02 11:56
(Received via mailing list)
2006/7/2, M. Edward (Ed) Borasky <znmeb@cesmail.net>:
> Robert Klemme wrote:
> > There's a significant difference between GCC and the JVM for example:
> > VM's can collect performance data while the application is running
> > whereas GCC has to optimize at compile time. This yields advantages
> > for the VM approach because it can better target optimizations.
> Ah, but at least for the multiprogramming case, so can (and *does*) the
> operating system! And the *interpreter* can "collect performance data
> while the application is running" and optimize just as easily -- maybe
> ever more easily -- than some underlying abstract machine.

As Charles pointed out the interpreter *is* a virtual machine and thus
equivalent to a JVM with regard to the runtime information it can
collect (AFAIK the current Ruby runtime does not, but it could).

> expected to run on.
>
> As a result, I don't see the need for another layer of abstraction. It's
> something else that needs to be optimized!

I'm not sure whether you read Charles excellent posting about the
properties of a VM. All optimizations you mention are *static*, which
is reflected in the fact that they are based on statistical
information of a large set of applications, i.e. there is basically
just one application that those optimizations can target. A VM on the
other hand (and this is especially true for the JVM) has more precise
information about the current application's behavior and thus can
target optimizations better.

I'll try an example: consider method inlining. With C++ you can have
methods inlined at compile time. This will lead to code bloat and the
developer will have to decide which methods he wants inlined.  This
takes time, because he has to do tests and profile the application.
Even then it might be that his tests do not reflect the production
behavior due to some error in the setup of wrong assumptions about the
data to be processed etc. In the worst case method inlining can have
an advers
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-07-02 12:08
(Received via mailing list)
** sorry for the last incomplete post, I accidentally hit the wrong
button **

2006/7/2, M. Edward (Ed) Borasky <znmeb@cesmail.net>:
> Robert Klemme wrote:
> > There's a significant difference between GCC and the JVM for example:
> > VM's can collect performance data while the application is running
> > whereas GCC has to optimize at compile time. This yields advantages
> > for the VM approach because it can better target optimizations.
> Ah, but at least for the multiprogramming case, so can (and *does*) the
> operating system! And the *interpreter* can "collect performance data
> while the application is running" and optimize just as easily -- maybe
> ever more easily -- than some underlying abstract machine.

As Charles pointed out the interpreter *is* a virtual machine and thus
equivalent to a JVM with regard to the runtime information it can
collect (AFAIK the current Ruby runtime does not, but it could).

> expected to run on.
>
> As a result, I don't see the need for another layer of abstraction. It's
> something else that needs to be optimized!

I'm not sure whether you read Charles excellent posting about the
properties of a VM. All optimizations you mention are *static*, which
is reflected in the fact that they are based on statistical
information of a large set of applications, i.e. there is basically
just one application that those optimizations can target. A VM on the
other hand (and this is especially true for the JVM) has more precise
information about the current application's behavior and thus can
target optimizations better.

I'll try an example: consider method inlining. With C++ you can have
methods inlined at compile time. This will lead to code bloat and the
developer will have to decide which methods he wants inlined.  This
takes time, because he has to do tests and profile the application.
Even then it might be that his tests do not reflect the production
behavior due to some error in the setup of wrong assumptions about the
data to be processed etc. In the worst case method inlining can have
an adversary effect on performance due to the increases size of the
memory image.

A VM on the other hand profiles the running application and decides
*then* which hot spots to optimize.  Even more so it can undo
optimizations later if the behavior changes over time or new classes
are loaded that change the game.  It has more accurate information
about the application at hand and thus can react / optimize more
appropriately.  The point is that there is not the single optimal
optimization for a piece of code.  And a VM can do much, much better
at picking this optimum than any compiled language can just because of
the more accurate set of information it has about a running
application.

Another advantage of a VM is that it makes instrumentation of code
much easier. Current JVM's have a sophisticated API that provides
runtime statistical data to analysis tools. With compiled applications
you typically have to compile for instrumentation. So you're
effectively profiling a different application (although the overhead
may be neglectible).

From all I can see in current developments in CS the trend seems to go
towards virtual machines as they provide a lot of advantages over
traditional compiled code.  I personally prefer to implement in Java
over C++ for example. YMMV though.

Kind regards

robert
11694e3999990056e0094cb285d44346?d=identicon&s=25 Reggie Mr (rpw)
on 2006-07-02 14:55
Austin Ziegler wrote:
> On 7/1/06, Reggie Mr <buppcpp@yahoo.com> wrote:
>> Here is a simple graph of performance by different platforms.
>>
>> http://www.usenetbinaries.com/doc/Web_Platform_Ben...
>
> I can't think of a more useless "test" other than anything put out by
> the Alioth shootout.
>

I would agree...except Ruby did VERY poorly in this "useless" test.
31ab75f7ddda241830659630746cdd3a?d=identicon&s=25 Austin Ziegler (austin)
on 2006-07-02 15:12
(Received via mailing list)
On 7/2/06, Reggie Mr <buppcpp@yahoo.com> wrote:
> Austin Ziegler wrote:
> > On 7/1/06, Reggie Mr <buppcpp@yahoo.com> wrote:
> >> Here is a simple graph of performance by different platforms.
> >> http://www.usenetbinaries.com/doc/Web_Platform_Ben...
> > I can't think of a more useless "test" other than anything put out by
> > the Alioth shootout.
> I would agree...except Ruby did VERY poorly in this "useless" test.

No except. It's a useless test. I wouldn't trust a single thing about
it. What this is *essentially* measuring, especially with CGI-style
output, is startup time. Ruby does have a slower start-up time than
other options.

This is *particularly* true of the nonsensical test of running Rails
in a CGI mode.

When real-world tests are done, Ruby *is* slower for now. But it's not
as slow as this would suggest. I never ended up doing an apache bench
on it, but at least subjectively, Ruwiki rendered as fast or faster
than most PHP wikis.

That's why I said that a guestbook would be a far better test and more
reliable. It's simple enough to implement in C, yet complex enough
that you're going to get more interesting results. You'd also get a
decent measure of code size differences.

-austin
1b5341b64f7ce0244366eae17f06c801?d=identicon&s=25 unknown (Guest)
on 2006-07-02 16:13
(Received via mailing list)
On Sun, 2 Jul 2006, Reggie Mr wrote:

>>> http://www.usenetbinaries.com/doc/Web_Platform_Ben...
>>
>> I can't think of a more useless "test" other than anything put out by
>> the Alioth shootout.
>>
>
> I would agree...except Ruby did VERY poorly in this "useless" test.

Except...the test results have to be taken with a very large grain of
salt.

Just a moment ago I did a couple quick tests.

A "Hello, World" in PHP4:

<?php echo "Hello, World" ?>


and a "Hello, World" delivered via IOWA.

This is running on a machine which is not unloaded.  It's also nowhere
near the power of the machine the above test was done on, being a simple
AMD Athlon box with slower RAM running a Linux 2.4 kernel and a 2.0.58
Apache, and it is not running the fastest configuration for IOWA (which
is
through FastCGI), but, rather, is running through mod_ruby.

Okay, enough with the disclaimers.

Multiple runs of ab -n 1000 produced a mean of about 700 requests per
second from the PHP page, and about 200 from the IOWA page.

The ratio there is much better than on the Web_Platform_Benchmarks.html,
and if I were to setup a test using fastcgi, it would improve further.

However, this is a weak comparison, because, like things really are not
being compared.

So, to get something a little more similar, I dropped a "Hello World"
into
an existing CakePHP (1.1.3.2967) site that I have, and likewise dropped
a
"Hello World" into an IOWA (0.99.2.6) site with a comparable page layout
and final page size.

CakePHP, if you are unfamiliar, is an MVC framework for PHP that is
stylistically similar to Rails.  So we're at least comparing frameworks
to
frameworks here, in the respective langauges.

IOWA beats CakePHP handily, and would expect RoR and Nitro to, as well,
given what I know about their performance.

On an ab -n 1000 -c 1

I average 18 requests per second with CakePHP and 60 per second with
IOWA.
Again, with similarly sized pages, though the navigation in the IOWA
example is generated dynamically, while it is static in the CakePHP
example.

Playing with different levels of concurrency, I managed to get 35/second
out of the CakePHP app and 80/second out of the IOWA one, which is still
a
ratio that falls dramatically in Ruby's favor when comparing actual
frameworks.

Performance comparison can be an entertaining exercise, but with
something
with as many variables as web page delivery, all performance comparisons
need to be interpreted with a bit of skepticism, including those I
present
above.  Still, Ruby doesn't strike me as surprisingly slow in any
comparisons that I have ever done.


Kirk Haines
E0526a6bf302e77598ef142d91bdd31c?d=identicon&s=25 Daniel DeLorme (Guest)
on 2006-07-02 16:20
(Received via mailing list)
Reggie Mr wrote:
> Austin Ziegler wrote:
>> On 7/1/06, Reggie Mr <buppcpp@yahoo.com> wrote:
>>> Here is a simple graph of performance by different platforms.
>>>
>>> http://www.usenetbinaries.com/doc/Web_Platform_Ben...
>> I can't think of a more useless "test" other than anything put out by
>> the Alioth shootout.

Sure it's a simple test but that doesn't make it useless. Systemic
performance
testing is the most relevant, but "unit" performance testing also has
its use.

> I would agree...except Ruby did VERY poorly in this "useless" test.

Ruby didn't do poorly. With fastcgi, it compares with PHP5. I think
that's quite
respectable. What did poorly was RoR. ruby+fastcgi has good performance,
but it
drops by over an order of magnitude if you add rails to the mix. Now
*that* is
quite telling.

Daniel
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-07-02 16:44
(Received via mailing list)
> Daniel
> <<<<<<<<



We recently did a simple hello world test with Rails on a very low-end
machine and compared it with a Ruby framework that we built for our
commercial apps. Both apps had no database, and simply served the phrase
"Hello, world" with a text/plain mime type. The test client was running
localhost to minimize TCP and network effects. Rails was running in
fast-cgi
mode (one process for the whole run) and our framework was running in
CGI
mode (one fork per request).

Rails did 20 pages per second. The other app did 200 per second.
(Straight-run apache with a cached static page of similar size could
probably do 1000/second or more on this machine.)

Bear in mind, both of these frameworks are *Ruby*. This tells me the
comparison to other languages is misleading at best.
Cb75e9a5b18ad023ab1cce64e7cdebab?d=identicon&s=25 Lothar Scholz (Guest)
on 2006-07-02 20:02
(Received via mailing list)
Hello Charles,
CON> Ruby has great potential to make these same kinds of optimizations
at
CON> runtime, and as I understand it, YARV will do quite a bit of
"smart"
CON> optimization at runtime.

The last time i looked into papers about YARV it does nothing about
this. It optimizes the control flow but doesn't do anything about data
based optimizations. So YARV is pretty simple, it's just that the
current implementation of ruby is so weak that there is a lot room for
simple optimizations.

Okay this is one year ago but i doublt a lot changed as there is still
no official YARV release today.
F1d37642fdaa1662ff46e4c65731e9ab?d=identicon&s=25 Charles O Nutter (Guest)
on 2006-07-02 20:02
(Received via mailing list)
On 7/2/06, Austin Ziegler <halostatue@gmail.com> wrote:
>
> On 7/2/06, Reggie Mr <buppcpp@yahoo.com> wrote:
> > I would agree...except Ruby did VERY poorly in this "useless" test.
>
> No except. It's a useless test. I wouldn't trust a single thing about
> it. What this is *essentially* measuring, especially with CGI-style
> output, is startup time. Ruby does have a slower start-up time than
> other options.
>

Austin makes a good point; I'd expect they'd all blow away Java in CGI
mode
:)
Dd76a12d66f843de5c5f8782668e7127?d=identicon&s=25 Mauricio Fernandez (Guest)
on 2006-07-02 20:54
(Received via mailing list)
On Sun, Jul 02, 2006 at 07:38:42AM +0900, Charles O Nutter wrote:
> Ruby has great potential to make these same kinds of optimizations at
> runtime, and as I understand it, YARV will do quite a bit of "smart"
> optimization at runtime.

IIRC it just did use inline caches for ... constant lookup.  No inline
caches
for method calls, let alone PICs. And no inlining either.

Some opcodes are (statically) optimized: hardcoded operations are used
(instead of a full method call) for Fixnum, Float, [String, Array,
Hash... for
the opcodes for which they make sense] (tested in roughly that order) if
the
corresponding methods have not been redefined (this code invalidation
wasn't
implemented last time I read the sources though, but it's been a while).
And
IIRC there was also some sort of optimization for Integer#times and some
other
methods one often uses in synthetic benchmarks.

But at any rate it was far from doing lots of dynamic optimizations.
F1d37642fdaa1662ff46e4c65731e9ab?d=identicon&s=25 Charles O Nutter (Guest)
on 2006-07-02 21:07
(Received via mailing list)
On 7/2/06, Mauricio Fernandez <mfp@acm.org> wrote:
> Some opcodes are (statically) optimized: hardcoded operations are used
> methods one often uses in synthetic benchmarks.
>
> But at any rate it was far from doing lots of dynamic optimizations.
>

Well, that's too bad, but many of the optimizations you mention do sound
similar to what we're doing in JRuby. Of course, JRuby has other issues
to
tidy up before these optimizations will be very fruitful (like our
remaining
yet-to-be-implemented-in-Java native libraries) but it's good to see
we're
going down similar paths. We're also planning on doing some mixed-mode
JIT,
however, once I find time to work on the compilation side of things. All
told there should be plenty of excellent VM options for Ruby in the
future.
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 M. Edward (Ed) Borasky (Guest)
on 2006-07-02 21:38
(Received via mailing list)
Robert Klemme wrote:
> I'm not sure whether you read Charles excellent posting about the
> properties of a VM. All optimizations you mention are *static*, which
> is reflected in the fact that they are based on statistical
> information of a large set of applications, i.e. there is basically
> just one application that those optimizations can target. A VM on the
> other hand (and this is especially true for the JVM) has more precise
> information about the current application's behavior and thus can
> target optimizations better.
So, in fact, does a CISC chip with millions of transistors at its
disposal. :) Real machines are pretty smart too, at least the ones from
Intel are. The point of my comment was the emphasis on *statistical*
properties of applications. Since this is the area I've spent quite a
bit of time in, it's a more natural approach to me than, say, the
niceties of discrete math required to design an optimizing compiler or
interpreter.

In the end, most of the "interesting" discrete math problems in
optimization are either totally unsolvable or NP complete, and you end
up making statistical / probabalistic compromises anyhow. You end up
solving problems you *can* solve for people who behave reasonably
rationally, and you try to design your hardware, OS, compilers,
interpreters and languages so rational behavior is rewarded with
*satisfactory* performance, not necessarily optimal performance. And you
try to design so that irrational behavior is detected and prevented from
injuring the rational people.
>
> I'll try an example: consider method inlining. With C++ you can have
> methods inlined at compile time. This will lead to code bloat and the
> developer will have to decide which methods he wants inlined.  This
> takes time, because he has to do tests and profile the application.
> Even then it might be that his tests do not reflect the production
> behavior due to some error in the setup of wrong assumptions about the
> data to be processed etc. In the worst case method inlining can have
> an adversary effect on performance due to the increases size of the
> memory image.
Not to mention what happens with the "tiny" caches most of today's
machines have.
>
> Another advantage of a VM is that it makes instrumentation of code
> much easier. Current JVM's have a sophisticated API that provides
> runtime statistical data to analysis tools. With compiled applications
> you typically have to compile for instrumentation. So you're
> effectively profiling a different application (although the overhead
> may be neglectible).
Don't get me wrong, the Sun *Intel x86* JVM is a marvelous piece of
software engineering. Considering how many person-years of tweaking it's
had, that's not surprising. But the *original* goal of Java and the
reason for using a VM was "write once, run anywhere". "Anywhere" no
longer includes the Alpha, and may have *never* included the MIPS or
HP-PARISC. IIRC "anywhere" no longer includes MacOS. And since I've
never tested it, I don't know for a fact that the Solaris/SPARC version
of the JVM is as highly tuned as the Intel one.

To bring this back to Ruby, my recommendations stand:

1. Focus on building a smart(er) interpreter rather than an extra
virtual machine layer.
2. Focus optimizations on the Intel x68 and x86-64 architectures for the
"community" projects. Leverage off of GCC for *all* platforms; i.e.,
don't use Microsoft's compilers on Windows. And don't be afraid of a
little assembler code. It works for Linux, it works for ATLAS
(Automatically Tuned Linear Algebra Subroutines) and I suspect there's
some in the Sun JVM.
3. Focus on Windows, Linux and MacOS for complete Ruby environments for
the "community" projects.

--
M. Edward (Ed) Borasky

http://linuxcapacityplanning.com
0b81f42cf440f1377fbf38f73be16a9c?d=identicon&s=25 Robert Mela (Guest)
on 2006-07-02 22:00
(Received via mailing list)
What tools exist for profiling Ruby?

What might answer a lot of these questions is something along the lines
of a call tree showing time spent ( clock/sys/user) or CPU cycles for
each node of the tree (node and node+children).

Other questions:

Does a Rails development do more checking and recompiling than a rails
production environment?  If so, by how much does that affect results?

Is your CGI loading and initializing the Ruby interpreter each time its
invoked?
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-07-02 22:12
(Received via mailing list)
Ruby has a built-in profiler. Fair enough, let's run it, it would be
interesting. You started a new thread, but my comment was part of a
different thread comparing (and disparaging) Ruby against other
languages
(and frameworks) typically used for Web development. And my point was
that
Ruby itself isn't the problem.

My experience suggests that Ruby's performance "problems" are negligible
with small working sets and are very serious with large ones (program
size
and number of code points seem to matter relatively little). My
hypothesis
(no proof adduced) is that this is a necessary consequence of Ruby's
extremely dynamic nature and is only somewhat amenable to improvements
like
YARV and automatic optimizations (as the many Java-like VM proponents
suggest). So in my own work I tend to design small Ruby processes
performing
carefully-circumscribed tasks and knitting them together with a
message-passing framework. I think you can make Ruby perform just as
fast as
anything else but a style change is required.

And of course the point of the effort is to get Ruby's productivity
improvements without losing too much at the other end. Time-to-market is
a
measurable quality dimension too.
F1d37642fdaa1662ff46e4c65731e9ab?d=identicon&s=25 Charles O Nutter (Guest)
on 2006-07-02 22:32
(Received via mailing list)
On 7/2/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:
> So, in fact, does a CISC chip with millions of transistors at its
> disposal. :) Real machines are pretty smart too, at least the ones from
> Intel are. The point of my comment was the emphasis on *statistical*
> properties of applications. Since this is the area I've spent quite a
> bit of time in, it's a more natural approach to me than, say, the
> niceties of discrete math required to design an optimizing compiler or
> interpreter.


Which VMs also benefit from when they compile to native code. Isn't that
why
we compile, JIT or AOT, in the first place?

VMs also benefit from online profiling BEFORE compile to ensure the
generated code is closer to optimal. That runtime profiling allows a VM
to
leverage the underlying processor *better* than you could by just
guessing
at it up front, since it makes decisions based on realtime data, rather
than
statistical averages. Yes, there are times it has to guess or go with a
"typical" model, but as execution proceeds it can adjust compilation
parameters to re-optimize code.

There's a body of research on this stuff online; I don't really need to
defend it.

Don't get me wrong, the Sun *Intel x86* JVM is a marvelous piece of
> software engineering. Considering how many person-years of tweaking it's
> had, that's not surprising. But the *original* goal of Java and the
> reason for using a VM was "write once, run anywhere". "Anywhere" no
> longer includes the Alpha, and may have *never* included the MIPS or
> HP-PARISC. IIRC "anywhere" no longer includes MacOS. And since I've
> never tested it, I don't know for a fact that the Solaris/SPARC version
> of the JVM is as highly tuned as the Intel one.
>

JVM discussions are fairly OT, but I have to knock this one down. Sun
has
JVM implementations for x86, x86-64, Sparc, and Itanium, running
Solaris,
Linux or Windows (except Linux on Sparc). IBM has JVMs for Linux on
IA32,
AMD64, POWER 64-bit, and z-Series 31-bit and 64-bit. Apple has a JVM for
OS
X on PowerPC and for x86. There's a whole slew of open source JVMs at
gnu.org/software/classpath/stories.html and bunches of other commercial
JVMs
for everything from absurdly small devices (like aJile's native Java
chips,
ajile.com) to absurdly large ones (Azul Systems network-attached
processing,
azulsystems.com).

Fighting the VM tide seems a little silly to me. YARV is on the right
track.
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-07-02 22:35
(Received via mailing list)
On 7/2/06, Robert Mela <rmela@rcn.com> wrote:
>
> Is your CGI loading and initializing the Ruby interpreter each time its
> invoked?


No it doesn't. The web server is in Ruby so it's integrated into the
framework. (Sorry but it was incourteous of me not to actually answer
your
question in my original response.)

We did this benchmark with a production-config of Rails. I think Rails
just
does a tremendous amount of work, which isn't surprising considering how
much value it adds. There may be subdomains of web-development that
could
benefit from a different set of feature choices than the ones Rails
made.
Cb75e9a5b18ad023ab1cce64e7cdebab?d=identicon&s=25 Lothar Scholz (Guest)
on 2006-07-02 23:07
(Received via mailing list)
Hello M.,

MEEB> HP-PARISC. IIRC "anywhere" no longer includes MacOS. And since
I've
MEEB> never tested it, I don't know for a fact that the Solaris/SPARC
version
MEEB> of the JVM is as highly tuned as the Intel one.

It runs much better on Solaris/SPARC then on XXX/Intel.
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 M. Edward (Ed) Borasky (Guest)
on 2006-07-03 01:03
(Received via mailing list)
Francis Cianfrocca wrote:
> My experience suggests that Ruby's performance "problems" are negligible
> with small working sets and are very serious with large ones (program
> size
> and number of code points seem to matter relatively little). My
> hypothesis
> (no proof adduced) is that this is a necessary consequence of Ruby's
> extremely dynamic nature and is only somewhat amenable to improvements
> like
> YARV and automatic optimizations (as the many Java-like VM proponents
> suggest).
Interesting ... maybe we shouldn't be profiling Ruby code with the
*Ruby* profiler, but profiling the Ruby *interpreter* with "gprof" or
"oprofile". I had assumed that had already been done, though. :)

I personally don't think it a "necessary consequence of Ruby's extremely
dynamic nature." There are a couple of things it could be:

1. Page faulting with large working sets. There are things you can do to
the interpreter to enhance locality and minimize page faulting, but if
you have two 256 MB working sets in a 256 MB real memory, something's
gotta give.

2. Some process in the run-time environment that grows faster than N log
N, where N is the number of bytes in the working set. Again, putting on
my statistician's hat, you want the interpreter to exhibit N log N or
better behavior on the average.


> So in my own work I tend to design small Ruby processes performing
> carefully-circumscribed tasks and knitting them together with a
> message-passing framework. I think you can make Ruby perform just as
> fast as
> anything else but a style change is required.
>
> And of course the point of the effort is to get Ruby's productivity
> improvements without losing too much at the other end. Time-to-market
> is a
> measurable quality dimension too.
I think this is good advice regardless of the language or the
application. Still, that does pass some of the burden on to the
interpreter and OS, and it doesn't mean we shouldn't use your large
working set codes as test cases to make the Ruby run-time better. :)

--
M. Edward (Ed) Borasky

http://linuxcapacityplanning.com
0b81f42cf440f1377fbf38f73be16a9c?d=identicon&s=25 Robert Mela (Guest)
on 2006-07-03 01:10
(Received via mailing list)
Francis Cianfrocca wrote:
> Ruby has a built-in profiler. Fair enough, let's run it, it would be
> interesting. You started a new thread, but my comment was part of a
> different thread comparing (and disparaging) Ruby against other languages
> (and frameworks) typically used for Web development. And my point was
> that
> Ruby itself isn't the problem.
>
Agreed, and in fact, my intent was to point the profiler at the
framework itself to locate the problem.

And thank you for the rest of your post.   It's going to save me a lot
of agony going forward!!
F1d37642fdaa1662ff46e4c65731e9ab?d=identicon&s=25 Charles O Nutter (Guest)
on 2006-07-03 01:10
(Received via mailing list)
On 7/2/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:
> > YARV and automatic optimizations (as the many Java-like VM proponents
> > suggest).
> Interesting ... maybe we shouldn't be profiling Ruby code with the
> *Ruby* profiler, but profiling the Ruby *interpreter* with "gprof" or
> "oprofile". I had assumed that had already been done, though. :)


I heard a rumor that Ruby's heavy use of setjmp/longjmp interferes with
profiling in some way, but I could have been misinformed or confused.
Can
anyone confirm that?
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-07-03 01:56
(Received via mailing list)
On 7/2/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:
> N, where N is the number of bytes in the working set. Again, putting on
> my statistician's hat, you want the interpreter to exhibit N log N or
> better behavior on the average.


 Ok, but these are problems that affect any program regardless of what
it's
written in. If that's your theory, then you still need to explain why
Ruby
in particular seems to be so slow ;-).

I could figure this out if I were frisky enough (but someone probably
already knows), but it seems like Ruby takes a Smalltalk-like approach
to
method-dispatch. Meaning, it searches for the method to send a message
to,
on behalf of each object. Whereas a language like Java views
method-dispatch
as calling a function pointer in a dispatch table that is associated
with
each class, and can easily be optimized. That's what I meant by Ruby's
"extremely dynamic nature." And the fact that classes and even objects
are
totally open throughout runtime makes it all the more challenging. As a
former language designer, I have a hard time imagining how you would
automatically optimize such fluid data structures at runtime. You
mentioned
page faulting, but it's even more important (especially on
multiprocessors)
not to miss L1 or L2 caches or mispredict branches either. If you're
writing
C++, you have control over this, but not in Ruby.

The more I work with Ruby, the more I find myself metaprogramming almost
everything I do. This seems to put such a burden on Ruby's runtime that
I'm
looking for simpler and more automatic ways to run Ruby objects in
automatically-distributed containers, to minimize the working sets. The
problem is worth solving because the productivity-upside is just so
attractive.
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 M. Edward (Ed) Borasky (Guest)
on 2006-07-03 02:36
(Received via mailing list)
Francis Cianfrocca wrote:
>> 2. Some process in the run-time environment that grows faster than N log
>> N, where N is the number of bytes in the working set. Again, putting on
>> my statistician's hat, you want the interpreter to exhibit N log N or
>> better behavior on the average.
>
>
> Ok, but these are problems that affect any program regardless of what
> it's
> written in.
1 affects any program regardless of what it's written in. 2 could be
either some fundamental constraint of the langauge semantics (which I
doubt) or an optimization opportunity in the run-time to deal more
efficiently with the semantics of the language.
> I could figure this out if I were frisky enough (but someone probably
> already knows), but it seems like Ruby takes a Smalltalk-like approach to
> method-dispatch. Meaning, it searches for the method to send a message
> to,
> on behalf of each object.
Ah ... now searching is something we *can* optimize!
> Whereas a language like Java views method-dispatch
> as calling a function pointer in a dispatch table that is associated with
> each class, and can easily be optimized. That's what I meant by Ruby's
> "extremely dynamic nature." And the fact that classes and even objects
> are
> totally open throughout runtime makes it all the more challenging. As a
> former language designer, I have a hard time imagining how you would
> automatically optimize such fluid data structures at runtime.
I suspect the dynamic nature means you have to keep *more* data
structures, and that they need to be *larger*, but it's still pretty
much known techniques in computer science.
> You mentioned
> page faulting, but it's even more important (especially on
> multiprocessors)
> not to miss L1 or L2 caches or mispredict branches either. If you're
> writing
> C++, you have control over this, but not in Ruby.
Yes, you've given this task to the run time environment. At least one
poster claims, and I have no reason to doubt him, that the Sun JVM is
smart enough to do this kind of thing, though I don't recall this
specific task being given as one that it does in fact do. If the Sun JVM
can do it, a Ruby interpreter should be able to do it as well.
> The more I work with Ruby, the more I find myself metaprogramming almost
> everything I do. This seems to put such a burden on Ruby's runtime
> that I'm
> looking for simpler and more automatic ways to run Ruby objects in
> automatically-distributed containers, to minimize the working sets. The
> problem is worth solving because the productivity-upside is just so
> attractive.
I'm not sure what you mean here, both in terms of "objects in
automatically-distributed containers" and "productivity-upside". Are you
looking for something like "lightweight processes/threads" or what is
known as "tasks" in classic FORTH? Little chunks of code, sort of like
an interrupt service routine, that do a little bit of work, stick some
results somewhere and then give up the processor to some "master
scheduler"?

I don't know Ruby well enough to figure out how to do that sort of
thing. Then again, if I wanted to write something that was an ideal
FORTH application, I'd probably write it in FORTH. :)

In any event, I'm working on a Ruby project in my spare time, and I can
certainly dig into the workings of the Ruby run-time if I find that it's
too slow. The application area is matrix calculation for the most part,
so I expect "mathn", "rational", "complex" and "matrix" are going to be
the bottlenecks. I suspect the places where Ruby needs be tuned
underneath will stick out like the proverbial sore thumbs for the kind
of application I have in mind.

--
M. Edward (Ed) Borasky

http://linuxcapacityplanning.com
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-07-03 03:35
(Received via mailing list)
On 7/2/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:
>
> ....could be
> either some fundamental constraint of the langauge semantics (which I
> doubt) or an optimization opportunity in the run-time to deal more
> efficiently with the semantics of the language.

If the opportunity is there, why hasn't someone seen it yet? I'll take
even
incremental improvements, but it seems unlikely that something really
major
has been missed.



Ah ... now searching is something we *can* optimize!

Searching can be improved but even so, it's a lot of work to do at
runtime.
Languages that treat method dispatch as lookups into indexed tables have
a
big edge. Even Python does this.



If the Sun JVM
> can do it, a Ruby interpreter should be able to do it as well.

No knock against Sun's engineers, some of the sharpest folks in the
business. But that poster was referring to the Solaris/Sparc JVM, which
in
my experience is perhaps the least well-executed JVM around. Ugh.
There's a
limit to the amount of server RAM I'm willing to buy, power, and cool
just
to run bloatware.



I'm not sure what you mean here, both in terms of "objects in
> automatically-distributed containers" and "productivity-upside". Are you
> looking for something like "lightweight processes/threads" or what is
> known as "tasks" in classic FORTH?

Nothing like that. I'm trying to make my life easier, not harder ;-).
The
main reason I'm attracted to Ruby is the promise of developing a lot
more of
the unbelievably large amount of code that has to get written while
reducing
the critical dependency on high-quality personnel, which is a
highly-constrained resource in uncertain supply. (I'm eliding some of my
thinking here of course, so you may well challenge that statement.)
I'd like to run plain-old Ruby objects in container processes that know
how
to distribute loads adapatively and keep individual process sizes small,
and
interoperate with objects written in other languages. In general I work
with
applications that require extremely high throughputs but can tolerate
relatively large latencies (milliseconds as opposed to microseconds) as
long
as all the system resources are fully utilized. I want to take advantage
of
the coming multiprocessor hardware, but I don't want to do it with
multithreading (life's too short for that).

>>>The application area is matrix calculation for the most part,
That sounds like the kind of thing I would use Ruby only to prototype.
But
you've been around the block a few times, so who am I to say? ;-)
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 M. Edward (Ed) Borasky (Guest)
on 2006-07-03 04:57
(Received via mailing list)
Francis Cianfrocca wrote:
> major
> has been missed.
As far as I know, at least since I've been reading this list, you're the
first person to come up with a "clue", in the form of noticing that big
working sets were slower than small ones. That's something a performance
engineer can take his or her measuring and analysis tools and do
something with, unlike "it's slower than Python". :)
> Ah ... now searching is something we *can* optimize!
>
> Searching can be improved but even so, it's a lot of work to do at
> runtime.
> Languages that treat method dispatch as lookups into indexed tables
> have a
> big edge. Even Python does this.
Is that language-specific or interpreter-specific? I don't know much
about Ruby and I know even less about Python. Does Python build the
tables once at "compile time", or is it dynamic enough to require table
rebuilds at run time?
> No knock against Sun's engineers, some of the sharpest folks in the
> business. But that poster was referring to the Solaris/Sparc JVM,
> which in
> my experience is perhaps the least well-executed JVM around. Ugh.
> There's a
> limit to the amount of server RAM I'm willing to buy, power, and cool
> just
> to run bloatware.
Ah, but there's also a limit to how many developer-hours you're willing
to buy as well, so you've chosen to use Ruby rather than C. :) Of
course, C is pretty much as fast as it's going to get, but the Ruby
run-time is probably "laden with low-hanging fruit".
> Nothing like that. I'm trying to make my life easier, not harder ;-). The
> main reason I'm attracted to Ruby is the promise of developing a lot
> more of
> the unbelievably large amount of code that has to get written while
> reducing
> the critical dependency on high-quality personnel, which is a
> highly-constrained resource in uncertain supply. (I'm eliding some of my
> thinking here of course, so you may well challenge that statement.)
Developing "a large amount of code?" Why is there so much code required?
Are there a lot of detailed special cases that can't be made into data?

> advantage of
> the coming multiprocessor hardware, but I don't want to do it with
> multithreading (life's too short for that).
Boy, you sure don't ask for much! :) But ... hang on for a moment ...
let me type a few magic words in a terminal window:

$ cd ~/PDFs/Pragmatic
$ acroread Pickaxe.pdf

Searching for "Rinda", we find, on page 706:

"Library Rinda ... Tuplespace Implementation

"Tuplespaces are a distributed blackboard system. Processes may add
tuples to the blackboard, and other processes may remove tuples from the
blackboard that match a certain pattern. Originally presented by David
Gelernter, tuplespaces offer an interesting scheme for distributed
cooperation among heterogeneous processes.

"Rinda, the Ruby implementation of tuplespaces, offers some interesting
additions to the concept. In particular, the Rinda implementation uses
the === operator to match tuples. This means that tuples may be matched
using regular expressions, the classes of their elements, as well as the
element values."
> The application area is matrix calculation for the most part,

> That sounds like the kind of thing I would use Ruby only to prototype.
> But
> you've been around the block a few times, so who am I to say? ;-)
Yes ... the "natural" implementation of it is using Axiom for the
prototyping and R as the execution engine. But I want to use Ruby for
both, as a learning exercise and as a benchmark of Ruby's math
capabilities. I know it's going to be slow -- matrices are stored as
arrays and the built-in LU decomposition is done using "BigDecimal"
operations, for example.

It will be interesting -- to me, anyhow -- to see just how much slower
the built-in Ruby LU decomposition is than the same process using the
ATLAS library, which contains assembly language kernels. Think of ATLAS
as a "virtual machine" for numerical linear algebra. :) In the end, I'll
wish I had used ATLAS right from the beginning. But maybe Ruby will get
better in the process.

There are other things this application needs to do besides number
crunching and symbolic math. It has to have a GUI, draw various graphs,
both node-edge type and 2d/3d plot type, and handle objects large enough
that they probably will end up in a PostgreSQL database. Curiously
enough, all of these exist as add-on packages in the R library already.
:) But I think Ruby is more "natural" for those pieces of the
application, and I can always "shell out to R" if I need to.

--
M. Edward (Ed) Borasky

http://linuxcapacityplanning.com
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-07-03 09:57
(Received via mailing list)
2006/7/2, M. Edward (Ed) Borasky <znmeb@cesmail.net>:
> disposal. :) Real machines are pretty smart too, at least the ones from
> Intel are.

True, they do quite a lot of smart things nowadays.  I feel however
that optimizing on a higher level of abstraction can yield better
improvements (i.e. removing an operation from a loop vs. just making
it as fast as possible).

> The point of my comment was the emphasis on *statistical*
> properties of applications. Since this is the area I've spent quite a
> bit of time in, it's a more natural approach to me than, say, the
> niceties of discrete math required to design an optimizing compiler or
> interpreter.

Well, VM also use statistical data - but that's derived from a
different set of data points. :-)

> In the end, most of the "interesting" discrete math problems in
> optimization are either totally unsolvable or NP complete, and you end
> up making statistical / probabalistic compromises anyhow. You end up
> solving problems you *can* solve for people who behave reasonably
> rationally, and you try to design your hardware, OS, compilers,
> interpreters and languages so rational behavior is rewarded with
> *satisfactory* performance, not necessarily optimal performance. And you
> try to design so that irrational behavior is detected and prevented from
> injuring the rational people.

Agree.

> Don't get me wrong, the Sun *Intel x86* JVM is a marvelous piece of
> software engineering. Considering how many person-years of tweaking it's
> had, that's not surprising. But the *original* goal of Java and the
> reason for using a VM was "write once, run anywhere". "Anywhere" no
> longer includes the Alpha, and may have *never* included the MIPS or
> HP-PARISC. IIRC "anywhere" no longer includes MacOS. And since I've
> never tested it, I don't know for a fact that the Solaris/SPARC version
> of the JVM is as highly tuned as the Intel one.

I once had a link to an article that came from Sun development where
they claimed that their Solaris JVM is too bad compared with the
Windows version...  Unfortunately I cannot dig it up at the moment.

> To bring this back to Ruby, my recommendations stand:

We're probably less far away from each other than it seemed:

> 1. Focus on building a smart(er) interpreter rather than an extra
> virtual machine layer.

I don't care what it's called or whether it uses bytecode or what not.
My basic point was that a runtime environment (aka VM aka interpreter)
is a good architecture because it provides better options for runtime
optimization.

> 2. Focus optimizations on the Intel x68 and x86-64 architectures for the
> "community" projects. Leverage off of GCC for *all* platforms; i.e.,
> don't use Microsoft's compilers on Windows.

I can't comment on MS compiler vs. GCC - all I've heard in the past is
that some compilers yield better performance characteristics than
others so the platform's native compiler seems to have an edge there.

> And don't be afraid of a
> little assembler code. It works for Linux, it works for ATLAS
> (Automatically Tuned Linear Algebra Subroutines) and I suspect there's
> some in the Sun JVM.

Yes.

> 3. Focus on Windows, Linux and MacOS for complete Ruby environments for
> the "community" projects.

Sounds reasonable.  For more server oriented apps Solaris might be an
option, too.  But I have the feeling that it's on the decline...

Kind regards

robert
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-07-03 11:45
(Received via mailing list)
On 7/2/06, M. Edward (Ed) Borasky <znmeb@cesmail.net> wrote:
>
>
> > There's a
> > limit to the amount of server RAM I'm willing to buy, power, and cool
> > just
> > to run bloatware.
> Ah, but there's also a limit to how many developer-hours you're willing
> to buy as well, so you've chosen to use Ruby rather than C. :) Of
> course, C is pretty much as fast as it's going to get, but the Ruby
> run-time is probably "laden with low-hanging fruit".

<<<<<
You've made several interesting points here in just two sentences. Part
of
my business is building appliances, and another part is running them in
farms. I'm getting really aware of the marginal costs of production
computing. Pure machine cycles are still getting cheaper (not nearly as
fast
as they once did) but the ancillary costs of running a cycle are NOT
getting
cheaper. This is starting to have an effect on the standard calculus and
it's no longer unambiguously true that "iron is cheaper than
programmers."
That's a big discussion on its own, but then you go on to make your
point
about the Ruby runtime. I've expressed the intuition (it's nothing more
than
that) that re-basing Ruby on a VM will not solve much of the real-world
performance problems we experience with Ruby. If I'm right and you're
right,
then we all may indeed be missing some opportunities just by looking in
the
wrong place.



> Developing "a large amount of code?" Why is there so much code required?
> Are there a lot of detailed special cases that can't be made into data?

<<<<<
No, I just meant the business needs for software are enormous and
getting
larger every day. Every extra day it takes you to come to market (and
I'm
talking about internal applications, not just commercial ones) is
forgone
business value. We have so much software we need to write in my little
company that I'm obsessing over finding much better ways to do it. (I
know
this is not a new problem. That doesn't mean it's not an urgent one.)



>
> Boy, you sure don't ask for much! :)

<<<
If you don't ask for much, you won't get it ;-)


Rinda/Linda: yes, but. I've been working with Linda spaces and actor
spaces
for over ten years now. It's a high-level abstraction which is
interesting
and powerful, but we need a few steps in between before it (or something
based on it, or inspired by it), can be used for large-scale
development.

Your application: you've neatly described a split between things that
belong
in Ruby and things (the raw matrix calculations) that belong in
something
else. It's real nice that Ruby makes this easy to do.
C914fa463a2b1b067586c6432b12a824?d=identicon&s=25 Juergen Strobel (Guest)
on 2006-07-03 17:55
(Received via mailing list)
On Mon, Jul 03, 2006 at 10:31:55AM +0900, Francis Cianfrocca wrote:
>
>
>
> Ah ... now searching is something we *can* optimize!
>
> Searching can be improved but even so, it's a lot of work to do at runtime.
> Languages that treat method dispatch as lookups into indexed tables have a
> big edge. Even Python does this.

Lisp (CLOS) has an even more complicated method dispatch than Ruby,
since it may have to search up the parent classes of all parameters
(multi dispatch) and MI is allowed. History shows this type of method
dispatch can be highly optimized and be made very performant.

It sure took some time for Lisp to reach this stage though.

-Jürgen
Dd76a12d66f843de5c5f8782668e7127?d=identicon&s=25 Mauricio Fernandez (Guest)
on 2006-07-03 17:59
(Received via mailing list)
On Mon, Jul 03, 2006 at 04:57:53AM +0900, Robert Mela wrote:
> What tools exist for profiling Ruby?

You definitely want ruby-prof
(http://rubyforge.org/projects/ruby-prof/),
but this has little to do with Ruby's performance or how to make its
implementation faster in general...
Dd76a12d66f843de5c5f8782668e7127?d=identicon&s=25 Mauricio Fernandez (Guest)
on 2006-07-03 18:09
(Received via mailing list)
On Tue, Jul 04, 2006 at 12:53:09AM +0900, Juergen Strobel wrote:
> (multi dispatch) and MI is allowed. History shows this type of method
> dispatch can be highly optimized and be made very performant.

Ruby's method cache hit rate was over 98% IIRC so full searches are
relatively
rare, but... method dispatching is still fairly slow. I'd bet YARV will
use
inline method caches at some point (it did already have ICs for constant
lookup last time I read the sources) ;)
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-07-03 18:13
(Received via mailing list)
On 7/3/06, Juergen Strobel <strobel@secure.at> wrote:
>
>  Lisp (CLOS) has an even more complicated method dispatch than Ruby,
> since it may have to search up the parent classes of all parameters
> (multi dispatch) and MI is allowed. History shows this type of method
> dispatch can be highly optimized and be made very performant.


I think about this quite a lot. I've never implemented LISP or LISP-like
so
I'm not really qualified to speak, but I have implemented some other
lambda-based languages (ML, parts of Haskell). One of the huge issues is
tail-recursion elimination. The runtime environment of these languages
is
fundamentally about mapping functionality to lists, and it you can do
that
without building a stack frame for each function-application, you win
big.
This doesn't apply at all to Ruby, which is an Algol-derivative.

I've avoided asking this question because I assume that all the rest of
you
have already clawed through the Ruby interpreter line by line, squeezing
cycles out of it.
I have an unproven hunch that it spends much of its time doing hash
lookups,
in place of the work that other languages do by de-indexing
function-pointer
tables. If so, that's the thing to optimize. Any comments?
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 M. Edward (Ed) Borasky (Guest)
on 2006-07-04 05:13
(Received via mailing list)
Juergen Strobel wrote:
> Lisp (CLOS) has an even more complicated method dispatch than Ruby,
> since it may have to search up the parent classes of all parameters
> (multi dispatch) and MI is allowed. History shows this type of method
> dispatch can be highly optimized and be made very performant.
>
> It sure took some time for Lisp to reach this stage though.
>
And most Lisp processors have the ability to compile to machine code.

--
M. Edward (Ed) Borasky

http://linuxcapacityplanning.com
Cb75e9a5b18ad023ab1cce64e7cdebab?d=identicon&s=25 Lothar Scholz (Guest)
on 2006-07-04 05:40
(Received via mailing list)
Hello M.,

MEEB> Juergen Strobel wrote:
>> Lisp (CLOS) has an even more complicated method dispatch than Ruby,
>> since it may have to search up the parent classes of all parameters
>> (multi dispatch) and MI is allowed. History shows this type of method
>> dispatch can be highly optimized and be made very performant.
>>
>> It sure took some time for Lisp to reach this stage though.
>>
MEEB> And most Lisp processors have the ability to compile to machine
code.

Well, but lisp compiled machine code is crashing on the simplest error
because to get speed everything is casted to void*.

And it also needs - at least the implementations i know - a sealed
universe
to optimize the method dispatching. I also don't know about any
transparent
JIT compiler that cleanly handles eval and recompiles the necessary
parts
on its own (but my last look at LISP was 2003). AFAIK all lisps need
manual help from the programmer here.

So we are not really talking about the same thing here.
D36eff3004b39abc4b93fe8a410d8bd3?d=identicon&s=25 Ron M (Guest)
on 2006-07-05 02:23
(Received via mailing list)
Charles O Nutter wrote:
> On 7/1/06, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov> wrote:
>> how is this performance data available significantly different
>> from that made transparent by gcc/gprof/gdb/dmalloc/etc -
>> gcc can encode plenty of information for tools like these
>> to dump reams of info at runtime.
...
> Ahhh, venturing into a domain I love talking about.
>
> Runtime-modification of code is exactly what sets the JVM apart from static
> compilation and optimization in something like GCC.

An even more interesting example of what JIT compilers
can do is "dynamic deoptimization" [1,2].

A JIT compiler can optimistically perform some very aggressive
optimizations and undo them later on if something (like a new
class being loaded, or a dynamic class modified) would make
these optimizations no longer valid.


One example is inlining of virtual methods or methods from
dynamic classes.

A VM can optimistically treat virtual function calls as normal
calls and even inline them -- and later if it notices that
someone did create a derived class or modify the dynamic
class it could de-optimize those function calls when the
class is modified.

I think this could be *extremely* interesting in as dynamic
a language as Ruby.

   Ron

[1] http://research.sun.com/self/papers/dynamic-deopti...
[2]
http://portal.acm.org/citation.cfm?id=143114&dl=AC...
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-07-05 07:28
(Received via mailing list)
2006/7/5, Ron M <rm_rails@cheapcomplexdevices.com>:
> An even more interesting example of what JIT compilers
> can do is "dynamic deoptimization" [1,2].

> I think this could be *extremely* interesting in as dynamic
> a language as Ruby.

Although I too find this very amazing I'm not so sure about the
"extreme" in the case of Ruby. Since Ruby is a whole lot more dynamic
than Java, too much of this optimization deoptimization might occur
and thus degrade performance. It's a tradeoff - as always.

Kind regards

robert
D36eff3004b39abc4b93fe8a410d8bd3?d=identicon&s=25 Ron M (Guest)
on 2006-07-05 19:45
(Received via mailing list)
Robert Klemme wrote:
> 2006/7/5, Ron M <rm_rails@cheapcomplexdevices.com>:
>> An even more interesting example of what JIT compilers
>> can do is "dynamic deoptimization" [1,2].
>
> Although I too find this very amazing

IBM has an even better description of the technique here
where they show nice examples of dynamic deoptimization
in Java with some benchmarking.
  http://www-128.ibm.com/developerworks/library/j-jtp12214/
They also have a great conclusion:
    "So, what can we conclude from this "benchmark?" Virtually
    nothing, except that benchmarking dynamically compiled
    languages is much more subtle than you might think."

> I'm not so sure about the
> "extreme" in the case of Ruby. Since Ruby is a whole lot more dynamic
> than Java, too much of this optimization deoptimization might occur
> and thus degrade performance.

Well, it's great for long running programs (Rails) and bad for
short-lived ones.   For long running programs you'll end up in
a steady-state where all the methods someone never changes can
be inlined and all the methods someone changes in subclasses
aren't.  I have web servers that have been running for 3 years.
Surely most classes that did get subclassed or dynamically
modified would have done so in the first few months.

For example, if someone never touches 90% of the methods
in String or Array in a Rails application, it would help
quite a bit to apply every optimization technique known
including inlining to those.

> It's a tradeoff - as always.

In this case I think it could be made to always be a win.

For example - don't apply the aggressive optimizations
until after some period of time (a minute?, a week?,
a month?) after the program started running.

Of course a simpler way is what Java apparently does -- provide
a runtime switch to indicate that something's a long running
process.  I believe the way the enable the technique is with
the server vs client versions of their VM.   Here's[1] how Sun
describes it:
http://java.sun.com/products/hotspot/docs/whitepap...
Bec38d63650c8912b6ba9b557fb953b9?d=identicon&s=25 Roger Pack (rogerdpack)
on 2008-05-12 08:54
Bill Kelly wrote:
> But yes, it's harder to make a language like Ruby, which is highly
> dynamic at runtime, fast like C++ and Java, which are primarily
> statically compiled.  The Smalltalk folks have reportedly done pretty
> well though, so there exists the possiblilty that Ruby may get
> substantially faster in the future.  YARV is already making some
> headway.

My question is what does 1.9 exactly do with its "Inline (Method)
cache"[1]?  Is there room for more improvement with a better JIT
compiler?  How much would this help scripts?
Thanks!
-R
[1] http://www.atdot.net/yarv/rc2006_sasada_yarv_on_rails.pdf
Caf3d97ceff60d6d105c45305d34658c?d=identicon&s=25 Vidar Hokstad (Guest)
on 2008-05-12 15:35
(Received via mailing list)
On May 12, 7:54 am, Roger Pack <rogerpack2...@gmail.com> wrote:
> compiler?  How much would this help scripts?
There's a huge amount of room for improvement in the still. I haven't
seen any of the Ruby implementations even try to apply more
sophisticated
VM/JIT techniques such as tracing and polymorphic inline caches
properly.

There's nothing inherently in Ruby preventing "near C" performance (at
least within the same magnitude, but probably a lot closer), though
there
are lots of things that make it a lot of work to get there (the level
of dynamism, certainly).

Even thought Ruby programs in theory are extremely dynamic, most paths
through a program will be heavily dominated by the same types over and
over, and that can be exploited to massively reduce overhead, with
some
pretty cheap checks to shunt execution over to a fallback if certain
assumptions don't hold etc.

It's a question of when, not if, we see far faster Ruby
implementations
than the current range of VMs.

Vidar
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2008-05-13 05:27
(Received via mailing list)
Vidar Hokstad wrote:
>> compiler?  How much would this help scripts?
>
> There's a huge amount of room for improvement in the still. I haven't
> seen any of the Ruby implementations even try to apply more
> sophisticated
> VM/JIT techniques such as tracing and polymorphic inline caches
> properly.

I've experimented with PICs in JRuby, and for most tests I ran they did
not help very much. Granted, there's an unfortunate lack of nontrivial
polymorphic benchmarks in the wild, so it's possible a good PIC would
help more on real apps.

What does make a huge difference for JRuby is eliminating some of the
extra overhead related to rarely-used Ruby features. For example, the
difference between a normal run and a run that eliminates an unnecessary
frame object, uses fast dispatch for math operations, and eliminates
some thread checkpointing:

~/NetBeansProjects/jruby âž” jruby -J-server
test/bench/bench_fib_recursive.rb
   0.746000   0.000000   0.746000 (  0.746000)
   0.371000   0.000000   0.371000 (  0.370000)
   0.357000   0.000000   0.357000 (  0.357000)
   0.357000   0.000000   0.357000 (  0.357000)
   0.358000   0.000000   0.358000 (  0.357000)
~/NetBeansProjects/jruby âž” jruby -J-server
-J-Djruby.compile.fastest=true test/bench/bench_fib_recursive.rb
   0.960000   0.000000   0.960000 (  0.959000)
   0.243000   0.000000   0.243000 (  0.243000)
   0.238000   0.000000   0.238000 (  0.238000)
   0.237000   0.000000   0.237000 (  0.237000)
   0.235000   0.000000   0.235000 (  0.235000)

And Ruby 1.9:

~/NetBeansProjects/jruby âž” ../ruby1.9/ruby -I ../ruby1.9/lib
test/bench/bench_fib_recursive.rb
   0.400000   0.010000   0.410000 (  0.412421)
   0.400000   0.000000   0.400000 (  0.407236)
   0.400000   0.000000   0.400000 (  0.415222)
   0.400000   0.010000   0.410000 (  0.417042)
   0.400000   0.000000   0.400000 (  0.452934)

So yes, there's definitely room to improve all the implementations...

- Charlie
Bec38d63650c8912b6ba9b557fb953b9?d=identicon&s=25 Roger Pack (rogerdpack)
on 2008-05-23 21:57
Charles Oliver Nutter wrote:
> What does make a huge difference for JRuby is eliminating some of the
> extra overhead related to rarely-used Ruby features. For example, the
> difference between a normal run and a run that eliminates an unnecessary
> frame object, uses fast dispatch for math operations, and eliminates
> some thread checkpointing:

Maybe something is possible along the lines of

vm_optimized :no_frame_pointer, :fast_math, :no_thread_checkpointing do
      # some code that should run very fast
end

:)
Thanks for your work :)
-R
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2008-06-01 03:15
(Received via mailing list)
Roger Pack wrote:
> Maybe something is possible along the lines of
>
> vm_optimized :no_frame_pointer, :fast_math, :no_thread_checkpointing do
>       # some code that should run very fast
> end
>
> :)
> Thanks for your work :)
> -R

Yeah, I'm looking into those possibilities, trying to find a
nonintrusive way to introduce compiler pragmas that we could use for
implementing parts of JRuby in Ruby code (or that others could use). For
example, something like this (a bogus name...I don't want to reveal any
pragmas yet):

def foo
   ____NO_FRAMING = true
end

- Charlie
Bec38d63650c8912b6ba9b557fb953b9?d=identicon&s=25 Roger Pack (rogerdpack)
on 2008-06-01 03:21
> Yeah, I'm looking into those possibilities, trying to find a
> nonintrusive way to introduce compiler pragmas that we could use for
> implementing parts of JRuby in Ruby code (or that others could use). For
> example, something like this (a bogus name...I don't want to reveal any
> pragmas yet):
>
> def foo
>    ____NO_FRAMING = true
> end

One option that YARV could use is compiler definitions. :)
Programatically would seem easier on the user.
-R
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2008-06-01 14:05
(Received via mailing list)
On 01.06.2008 03:14, Charles Oliver Nutter wrote:
>
> Yeah, I'm looking into those possibilities, trying to find a
> nonintrusive way to introduce compiler pragmas that we could use for
> implementing parts of JRuby in Ruby code (or that others could use). For
> example, something like this (a bogus name...I don't want to reveal any
> pragmas yet):
>
> def foo
>    ____NO_FRAMING = true
> end

That's an interesting idea, but I'd rather separate Ruby code from
platform specific options.  So I'd prefer command line arguments to the
engine (as you presented before) or a smart mechanism to store switches
externally (e.g. .rc file).  Ideally there would even be a mechanism
that finds optimal switches automatically, but I guess that would be
really hard. :-)

Kind regards

  robert
Ede2aa10c6462f1d825143879be59e38?d=identicon&s=25 Charles Oliver Nutter (Guest)
on 2008-06-02 00:20
(Received via mailing list)
Roger Pack wrote:
>       # some code that should run very fast
> end

A flag is problematic for a couple reasons:

- Most of the optimizations I'm trying to specify are not compatible
with everything in Ruby; when enabled they'll limit available features
to a subset that doesn't incur as much runtime overhead (and this is
overhead both JRuby and MRI contend with).
- For the cases where there are optimizations that can be applied
globally...we just spin a new release and apply them globally. There's
not really a need to hold back on such things if they don't break Ruby.

Ideally, we'd be able to apply these optimizations everywhere they'll be
safe automatically, but that's a very hard problem with Ruby's dynamic
nature. So I see these as more a "programmer promise" that they won't
use certain higher-overhead features in exchange for better performance.
It's not something I'd see a lot of people using for general apps, but
it might be useful when building JRuby internal or core framework code.

I'm open to all thoughts on this though. There's lots and lots of things
we can optimize by incrementally shutting down particular features. And
I think it's a reasonable choice to offer people...if you want something
faster and are willing to give up a little, it should be your choice.
And yes, I fully appreciate the compatibility aspect of this...so it's
definitely not intended for the uninitiated and probably not for general
use.

- Charlie
Fe6a008c1e3065327d1f1b007d8f1362?d=identicon&s=25 Paul Brannan (cout)
on 2008-06-02 17:31
(Received via mailing list)
On Tue, May 13, 2008 at 12:26:48PM +0900, Charles Oliver Nutter wrote:
> So yes, there's definitely room to improve all the implementations...
Similar speedups with ludicrous, which isn't even very smart about its
optimizations:

cout@bean:~/download/jruby/jruby/test/bench$ ruby1.9
bench_fib_recursive.rb
  1.450000   0.030000   1.480000 (  1.628318)
  0.930000   0.020000   0.950000 (  1.956004)
  0.920000   0.020000   0.940000 (  0.937877)
  0.930000   0.020000   0.950000 (  0.958286)
  0.930000   0.020000   0.950000 (  0.946474)

cout@bean:~/download/jruby/jruby/test/bench$ ludicrous
bench_fib_recursive.rb
  0.670000   0.010000   0.680000 (  0.679764)
  0.670000   0.020000   0.690000 (  0.688893)
  0.680000   0.010000   0.690000 (  0.695944)
  0.680000   0.010000   0.690000 (  0.694999)
  0.680000   0.020000   0.700000 (  0.696761)

Paul
This topic is locked and can not be replied to.