Huge performance gap

alexisrichardson · February 25, 2006, 12:04am

Alexis R. wrote:

comments. Did I somehow offend your honor? I did not say that ruby is
crap compared to c++ or java. I find ruby is an absolute fantastic
language. I was just surprised about my results.

If you have the curiousity there are all kinds of results to wonder
about, for example Ruby compared to a C interpreter
http://shootout.alioth.debian.org/gp4sandbox/benchmark.php?test=all&lang=ruby&lang2=ch

alexisrichardson · June 30, 2006, 3:33am

Hi,

I found this article today

E. Saynatkari wrote:

Well, Ruby is strictly interpreted using the parse tree instead
of VM opcodes which may or may not (depending on who you ask)
make a difference. Ruby is pretty slow but usually Fast Enough™.

You could try to run your script on YARV[1] to see if it helps.

On my Linux (on VMware) machine:

Ruby: time elapsed: 27.811268 sec.
YARV: time elapsed: 2.892428 sec.

(YARV with some special optimization option)

Regards,

alexisrichardson · July 1, 2006, 1:56pm

On 7/1/06, Reggie Mr [email protected] wrote:

Here is a simple graph of performance by different platforms.

UsenetBinaries.com is for sale | HugeDomains

I can’t think of a more useless “test” other than anything put out by
the Alioth shootout.

The numbers have limited interest because they’re not necessarily
using the power of the frameworks and merely measure the response to
what is essentially a static return value. It would be more
interesting to benchmark a simple guestbook in all of the things they
did. Guestbooks are relatively easy to write and would be
significantly more “test-worthy” than a “hello world” implementation.

-austin

alexisrichardson · July 1, 2006, 1:02pm

Here is a simple graph of performance by different platforms.

alexisrichardson · June 30, 2006, 11:10am

On 23/02/06, Alexis R. [email protected] wrote:

Hi all

I’ve ported the following c++ code to ruby. It is a recursive
(backtracking) sudoku-solving algorithm. Well, I was rather surprised by
the execution time I got:

c++ code: 0.33 seconds
ruby code: 27.65 seconds

Ruby is relatively slow, but the algorithm is the problem. This Ruby
program I wrote a while ago can solve it in under half a second:

http://po-ru.com/files/sudoku/1.0/sudoku.tar.gz

It’s quite a bit more complicated, however.

Paul.

alexisrichardson · July 1, 2006, 7:16pm

SASADA Koichi wrote:

(YARV with some special optimization option)

Guest machines under VMware are pretty much useless as a performance
profiling platform, for a variety of reasons.

–
M. Edward (Ed) Borasky

alexisrichardson · July 1, 2006, 7:10pm

Austin Z. wrote:

I can’t think of a more useless “test” other than anything put out by
the Alioth shootout.

The numbers have limited interest because they’re not necessarily
using the power of the frameworks and merely measure the response to
what is essentially a static return value. It would be more
interesting to benchmark a simple guestbook in all of the things they
did. Guestbooks are relatively easy to write and would be
significantly more “test-worthy” than a “hello world” implementation.
My experience with web application benchmarks (nearly all
Windows/IIS/ASP/SQL Server and Linux/Apache 1.3/PostgreSQL/PHP) has
shown that the two most likely bottlenecks in a real web application
are the network bandwidth and the database. If the application generates
too much network traffic, or if its database design is poorly done, it
doesn’t matter all that much whether the underlying glue logic is in C,
Perl, PHP, Python, Java or Ruby, whether the OS is Windows or Linux or
some other, or what the web server itself is. The database, on the other
hand, does matter – a lot. And that, my friends, explains why Larry
Ellison is such a rich man.

Having said that, the benchmark in the original poster’s link, on the
other hand, measures the web server and a single module within it. I
think it’s a perfectly valid benchmark for the specific web server/glue
language component of a web application, and I think there is a lesson
or two in the results for all of us:

C scales better than Perl, PHP, Python and Ruby, all other things
being equal. Make sure your C skills are up to date.
You probably want to hold off upgrading from PHP 4 to PHP 5, and you
want to performance test before you do.
Unless there is some compelling business reason to use one of the
other technologies, you probably want to avoid anything below PHP 4 in
this chart. Programmer time to implement the application is certainly a
compelling business reason.
The Ruby community needs to get Ruby’s performance up where PHP 4 is
on benchmarks like this. It would be wonderful if it was better than
Perl and PHP, but a bare minimum is to be competitive with PHP 4.

On 4, I’m not sure a “virtual machine” is the answer, by the way.
“Virtual machines”, or as I prefer to call them, “abstract machines”,
were primarily intended for portability, not performance. C happens to
be a great abstract machine, and GCC happens to be a great way to
achieve portability and performance.

–
M. Edward (Ed) Borasky

alexisrichardson · July 1, 2006, 7:43pm

On Sun, 2 Jul 2006, M. Edward (Ed) Borasky wrote:

On 4, I’m not sure a “virtual machine” is the answer, by the way. “Virtual
machines”, or as I prefer to call them, “abstract machines”, were primarily
intended for portability, not performance. C happens to be a great abstract
machine, and GCC happens to be a great way to achieve portability and
performance.

amen!

-a

alexisrichardson · July 1, 2006, 8:39pm

[email protected] wrote:

amen!

-a
I really should learn how to program in C. I can just barely read
C, actually.

Then again, at least a generation of C programmers have implemented GCC,
Perl, R, Linux, Ruby, etc., so I haven’t felt the need. And some of the
Lisp and Scheme environments seem to be just as efficient abstract
machines as C/GCC. And then there’s Forth – another efficient abstract
machine. Choices, choices, too many choices.

–
M. Edward (Ed) Borasky

alexisrichardson · July 2, 2006, 12:05am

2006/7/1, M. Edward (Ed) Borasky [email protected]:

The Ruby community needs to get Ruby’s performance up where PHP 4 is
on benchmarks like this. It would be wonderful if it was better than
Perl and PHP, but a bare minimum is to be competitive with PHP 4.

On 4, I’m not sure a “virtual machine” is the answer, by the way.
“Virtual machines”, or as I prefer to call them, “abstract machines”,
were primarily intended for portability, not performance. C happens to
be a great abstract machine, and GCC happens to be a great way to
achieve portability and performance.

There’s a significant difference between GCC and the JVM for example:
VM’s can collect performance data while the application is running
whereas GCC has to optimize at compile time. This yields advantages
for the VM approach because it can better target optimizations.
Depending on application performance of a Java app doesn’t differ
significantly from a C app but the programming model is more
convenient, more robust and thus more efficient.

Kind regards

robert

alexisrichardson · July 2, 2006, 12:20am

On Sun, 2 Jul 2006, Robert K. wrote:

There’s a significant difference between GCC and the JVM for example: VM’s
can collect performance data while the application is running whereas GCC
has to optimize at compile time. This yields advantages for the VM approach
because it can better target optimizations. Depending on application
performance of a Java app doesn’t differ significantly from a C app but the
programming model is more convenient, more robust and thus more efficient.

Kind regards

robert

how is this performance data available significantly different from that
made
transparent by gcc/gprof/gdb/dmalloc/etc - gcc can encode plenty of
information for tools like these to dump reams of info at runtime. or
are you
referring to a vm’s ability to actually adapt the runtime code? if so
then it
seems like even compiled languages can accomplish this if the language
is a
first class data type and the code segment can be manipulated as data

http://64.233.167.104/search?q=cache:mNkjHYGIbE4J:tratt.net/laurie/research/publications/papers/tratt__compile-time_meta-programming_in_a_dynamically_typed_oo_language.pdf+lisp+compiled+metaprogramming&hl=en&gl=us&ct=clnk&cd=1

is an interesting read. if one accepts that compile time metaprogamming
is
useful that it’s a small leap to execute compile time metaprogramming to
enhance performance based on runtime characteristics. but maybe i’m way
off
base here…

kind regards.

-a

alexisrichardson · July 2, 2006, 12:42am

On 7/1/06, [email protected] [email protected] wrote:

performance of a Java app doesn’t differ significantly from a C app but
transparent by gcc/gprof/gdb/dmalloc/etc - gcc can encode plenty of

is an interesting read. if one accepts that compile time metaprogamming
is
useful that it’s a small leap to execute compile time metaprogramming to
enhance performance based on runtime characteristics. but maybe i’m way
off
base here…

Ahhh, venturing into a domain I love talking about.

Runtime-modification of code is exactly what sets the JVM apart from
static
compilation and optimization in something like GCC. The link Ara posted
above is another way to look at the same kind of runtime modification.
The
JVM, because it JIT compiles code rather than AOT, can change the
parameters
of that compilation whenever it likes. If a particular piece of
JIT-compiled
code is heavier on integer math than on memory allocation, it may re-JIT
to
avoid processor-intensive aspects of object creation and garbage
collection.
If a piece of code is used heavily, there may be opportunities to
dynamically inline or reorder subroutines at runtime. All the tricks C
coders might have to do by hand or decide on at compile time can be done
as
needed based on runtime performance profiling.

You could go through a gcc/gdb/gprof and so on cycle, but the point of
the
JVM is that you don’t have to burn those hours. You write the code once,
and
the JVM gobbles it up, runs it interpreted for a (short) while, and then
starts generating native machine code that fits the runtime profile it
has
gathered. As that profile changes, code can be regenerated, and indeed
the
longer an application runs, the faster it gets.

It is for this reason that many algorithms running in the JVM run as
fast as
C or C++ equivalents.

Ruby has great potential to make these same kinds of optimizations at
runtime, and as I understand it, YARV will do quite a bit of “smart”
optimization at runtime.

The JVM got a bad wrap because in versions 1.2 and earlier, it really
was
slow. Since 1.3, however, it has increased tremendously in
performance. 1.3was many times faster than
1.2. 1.4 was twice as fast as 1.3. 1.5 and 1.6 are each another 20-25%
faster again. I think the JVM and the recent success of the .NET CLR
have
shown that a VM approach is a great way to go.

alexisrichardson · July 2, 2006, 1:46am

On 7/1/06, M. Edward (Ed) Borasky [email protected] wrote:

Ah, but at least for the multiprogramming case, so can (and does) the
operating system! And the interpreter can “collect performance data
while the application is running” and optimize just as easily – maybe
ever more easily – than some underlying abstract machine.

An interpreter is an underlying abstract machine. Ruby and Java both
have
interpreters that do basically the same thing. Java additionally has a
JIT
compiler that takes code the next step toward native.

alexisrichardson · July 2, 2006, 1:21am

Robert K. wrote:

There’s a significant difference between GCC and the JVM for example:
VM’s can collect performance data while the application is running
whereas GCC has to optimize at compile time. This yields advantages
for the VM approach because it can better target optimizations.
Ah, but at least for the multiprogramming case, so can (and does) the
operating system! And the interpreter can “collect performance data
while the application is running” and optimize just as easily – maybe
ever more easily – than some underlying abstract machine.

In any event:

The hardware is optimized to statistical properties of the workloads
it is expected to run.
The operating system is optimized to statistical properties of the
workloads it is expected to run and the hardware it is expected to run
on.
Compilers are optimized to statistical properties of the programs
they are expected to compile and the hardware the compiled programs are
expected to run on.

As a result, I don’t see the need for another layer of abstraction. It’s
something else that needs to be optimized!

–
M. Edward (Ed) Borasky

alexisrichardson · July 2, 2006, 3:44am

Quoting C. O Nutter [email protected]:

An interpreter is an underlying abstract machine. Ruby and Java both have
interpreters that do basically the same thing. Java additionally has a JIT
compiler that takes code the next step toward native.
Exactly! Now when Java first came about, the underlying hardware was
fairly
diverse and the designers of the Java runtime chose to implement an
extra layer
for the sake of portability. After all, they had to run on almost any
flavor of
OS, including some ancient models of Windows, Tru64, OS, ancient models
of
MacOs, Solaris, and IIRC the IBM mainframes as well. IIRC, Windows NT
also ran
on Alphas and MIPS back then. And to top it off, memories were
smaller and
machines were slower.

A lot of those options don’t make a lot of sense any more. The hardware
is
dominated by x86 and x86-64, with SPARC and PowerPC running well behind.
There’s Windows, there’s Linux, there’s Solaris and MacOS among the
living,
breathing operating systems. And there’s GCC, Microsoft’s compiler
chain,
perhaps a native Sun compiler, and a few specialized compilers. “Write
once,
run anywhere” doesn’t have to worry about as many “anywheres” as it used
to.

So Java can afford a just-in-time compiler. But do they need an
intermediate
abstract machine? Probably not any more. It probably would cost them
more to
refactor around it than to leave it in place. But I think if they were
designing Java today, there wouldn’t need to be a JVM.

Just putting on my statistician’s hat, I’d say Ruby should have a JIT
compiler
for x86 and x86-64 built by the “Ruby community”, that an extra abstract
machine layer isn’t necessary, and that only Windows, Linux and MacOS
need a
complete Ruby environment built and maintained by the “Ruby community”.
Just
about anything more might make sense to do if some business thought they
could
make a profit from the effort, but I don’t see it as a viable community
project.

alexisrichardson · July 2, 2006, 12:08pm

** sorry for the last incomplete post, I accidentally hit the wrong
button **

2006/7/2, M. Edward (Ed) Borasky [email protected]:

Robert K. wrote:

There’s a significant difference between GCC and the JVM for example:
VM’s can collect performance data while the application is running
whereas GCC has to optimize at compile time. This yields advantages
for the VM approach because it can better target optimizations.
Ah, but at least for the multiprogramming case, so can (and does) the
operating system! And the interpreter can “collect performance data
while the application is running” and optimize just as easily – maybe
ever more easily – than some underlying abstract machine.

As Charles pointed out the interpreter is a virtual machine and thus
equivalent to a JVM with regard to the runtime information it can
collect (AFAIK the current Ruby runtime does not, but it could).

expected to run on.

As a result, I don’t see the need for another layer of abstraction. It’s
something else that needs to be optimized!

I’m not sure whether you read Charles excellent posting about the
properties of a VM. All optimizations you mention are static, which
is reflected in the fact that they are based on statistical
information of a large set of applications, i.e. there is basically
just one application that those optimizations can target. A VM on the
other hand (and this is especially true for the JVM) has more precise
information about the current application’s behavior and thus can
target optimizations better.

I’ll try an example: consider method inlining. With C++ you can have
methods inlined at compile time. This will lead to code bloat and the
developer will have to decide which methods he wants inlined. This
takes time, because he has to do tests and profile the application.
Even then it might be that his tests do not reflect the production
behavior due to some error in the setup of wrong assumptions about the
data to be processed etc. In the worst case method inlining can have
an adversary effect on performance due to the increases size of the
memory image.

A VM on the other hand profiles the running application and decides
then which hot spots to optimize. Even more so it can undo
optimizations later if the behavior changes over time or new classes
are loaded that change the game. It has more accurate information
about the application at hand and thus can react / optimize more
appropriately. The point is that there is not the single optimal
optimization for a piece of code. And a VM can do much, much better
at picking this optimum than any compiled language can just because of
the more accurate set of information it has about a running
application.

Another advantage of a VM is that it makes instrumentation of code
much easier. Current JVM’s have a sophisticated API that provides
runtime statistical data to analysis tools. With compiled applications
you typically have to compile for instrumentation. So you’re
effectively profiling a different application (although the overhead
may be neglectible).

From all I can see in current developments in CS the trend seems to go
towards virtual machines as they provide a lot of advantages over
traditional compiled code. I personally prefer to implement in Java
over C++ for example. YMMV though.

Kind regards

robert

alexisrichardson · July 2, 2006, 11:56am

2006/7/2, M. Edward (Ed) Borasky [email protected]:

Robert K. wrote:

There’s a significant difference between GCC and the JVM for example:
VM’s can collect performance data while the application is running
whereas GCC has to optimize at compile time. This yields advantages
for the VM approach because it can better target optimizations.
Ah, but at least for the multiprogramming case, so can (and does) the
operating system! And the interpreter can “collect performance data
while the application is running” and optimize just as easily – maybe
ever more easily – than some underlying abstract machine.

As Charles pointed out the interpreter is a virtual machine and thus
equivalent to a JVM with regard to the runtime information it can
collect (AFAIK the current Ruby runtime does not, but it could).

expected to run on.

As a result, I don’t see the need for another layer of abstraction. It’s
something else that needs to be optimized!

I’m not sure whether you read Charles excellent posting about the
properties of a VM. All optimizations you mention are static, which
is reflected in the fact that they are based on statistical
information of a large set of applications, i.e. there is basically
just one application that those optimizations can target. A VM on the
other hand (and this is especially true for the JVM) has more precise
information about the current application’s behavior and thus can
target optimizations better.

I’ll try an example: consider method inlining. With C++ you can have
methods inlined at compile time. This will lead to code bloat and the
developer will have to decide which methods he wants inlined. This
takes time, because he has to do tests and profile the application.
Even then it might be that his tests do not reflect the production
behavior due to some error in the setup of wrong assumptions about the
data to be processed etc. In the worst case method inlining can have
an advers

alexisrichardson · July 2, 2006, 3:12pm

On 7/2/06, Reggie Mr [email protected] wrote:

Austin Z. wrote:

On 7/1/06, Reggie Mr [email protected] wrote:

Here is a simple graph of performance by different platforms.
UsenetBinaries.com is for sale | HugeDomains
I can’t think of a more useless “test” other than anything put out by
the Alioth shootout.
I would agree…except Ruby did VERY poorly in this “useless” test.

No except. It’s a useless test. I wouldn’t trust a single thing about
it. What this is essentially measuring, especially with CGI-style
output, is startup time. Ruby does have a slower start-up time than
other options.

This is particularly true of the nonsensical test of running Rails
in a CGI mode.

When real-world tests are done, Ruby is slower for now. But it’s not
as slow as this would suggest. I never ended up doing an apache bench
on it, but at least subjectively, Ruwiki rendered as fast or faster
than most PHP wikis.

That’s why I said that a guestbook would be a far better test and more
reliable. It’s simple enough to implement in C, yet complex enough
that you’re going to get more interesting results. You’d also get a
decent measure of code size differences.

-austin

alexisrichardson · July 2, 2006, 2:55pm

Austin Z. wrote:

On 7/1/06, Reggie Mr [email protected] wrote:

Here is a simple graph of performance by different platforms.

UsenetBinaries.com is for sale | HugeDomains

I can’t think of a more useless “test” other than anything put out by
the Alioth shootout.

I would agree…except Ruby did VERY poorly in this “useless” test.

alexisrichardson · July 2, 2006, 4:13pm

On Sun, 2 Jul 2006, Reggie Mr wrote:

UsenetBinaries.com is for sale | HugeDomains

I can’t think of a more useless “test” other than anything put out by
the Alioth shootout.

I would agree…except Ruby did VERY poorly in this “useless” test.

Except…the test results have to be taken with a very large grain of
salt.

Just a moment ago I did a couple quick tests.

A “Hello, World” in PHP4:

<?php echo "Hello, World" ?>

and a “Hello, World” delivered via IOWA.

This is running on a machine which is not unloaded. It’s also nowhere
near the power of the machine the above test was done on, being a simple
AMD Athlon box with slower RAM running a Linux 2.4 kernel and a 2.0.58
Apache, and it is not running the fastest configuration for IOWA (which
is
through FastCGI), but, rather, is running through mod_ruby.

Okay, enough with the disclaimers.

Multiple runs of ab -n 1000 produced a mean of about 700 requests per
second from the PHP page, and about 200 from the IOWA page.

The ratio there is much better than on the Web_Platform_Benchmarks.html,
and if I were to setup a test using fastcgi, it would improve further.

However, this is a weak comparison, because, like things really are not
being compared.

So, to get something a little more similar, I dropped a “Hello World”
into
an existing CakePHP (1.1.3.2967) site that I have, and likewise dropped
a
“Hello World” into an IOWA (0.99.2.6) site with a comparable page layout
and final page size.

CakePHP, if you are unfamiliar, is an MVC framework for PHP that is
stylistically similar to Rails. So we’re at least comparing frameworks
to
frameworks here, in the respective langauges.

IOWA beats CakePHP handily, and would expect RoR and Nitro to, as well,
given what I know about their performance.

On an ab -n 1000 -c 1

I average 18 requests per second with CakePHP and 60 per second with
IOWA.
Again, with similarly sized pages, though the navigation in the IOWA
example is generated dynamically, while it is static in the CakePHP
example.

Playing with different levels of concurrency, I managed to get 35/second
out of the CakePHP app and 80/second out of the IOWA one, which is still
a
ratio that falls dramatically in Ruby’s favor when comparing actual
frameworks.

Performance comparison can be an entertaining exercise, but with
something
with as many variables as web page delivery, all performance comparisons
need to be interpreted with a bit of skepticism, including those I
present
above. Still, Ruby doesn’t strike me as surprisingly slow in any
comparisons that I have ever done.

Kirk H.