Compiler for ruby

Hi,

Complete newbie here; is there anything in the works as far as a
compiler
for Ruby?

Thanks.

n/a wrote:

Hi,

Complete newbie here; is there anything in the works as far as a compiler
for Ruby?

Thanks.

http://www.catb.org/~esr/faqs/smart-questions.html


Phillip “CynicalRyan” Gawlowski
http://cynicalryan.110mb.com/

Rule of Open-Source Programming #37:

Duplicate effort is inevitable. Live with it.

On Sat, 31 Mar 2007 07:46:49 +0900, Phillip G. wrote:

I meant under windows and/or Linux environments.

And source code to machine code.

I had done a search of this group but I guess didn’t download enough
headers to see the previous threads.

I see there are various forms of compilers that work at different levels
of
code, e.g. XRuby to Java Bytecode, etc.

On 3/30/07, n/a [email protected] wrote:

I meant under windows and/or Linux environments.

And source code to machine code.

I had done a search of this group but I guess didn’t download enough
headers to see the previous threads.

I see there are various forms of compilers that work at different levels of
code, e.g. XRuby to Java Bytecode, etc.

Yes there are a variety of approaches to compiling Ruby to either Java
bytecodes (e.g. XRuby) or bytecodes more specifically tuned to Ruby
semantics (e.g. YARV which is now in Ruby 1.9). I think most people
these days think that bytecode == Java bytecode, but the idea preceded
Java.

As of today YARV seems to be the best performing, at least according
to the benchmarks I’ve seen.

As for compiling directly to machine code, it could be done I suppose,
but it’s not clear that it would be the best approach. Why?

  • The dynamic nature of Ruby means that methods can be dynamically
    created at run-time and would therefore need to be compiled at
    run-time. Additional bookkeeping would be required to make all the
    semantic effects on the compiled code would be properly implemented.

  • Previous experience with compiling dynamic OO languages has shown
    that the much smaller code representation of byte codes compared to
    machine code can actually lead to better performance on machines with
    virtual memory (almost all machines these days) due to the smaller
    working set. Digitalk tried direct compilation of Smalltalk to
    machine code, because they were sick of getting blasted for being
    ‘interpreted’ and found that the byte coded stuff ran significantly
    faster. The practice these days is to do two-stage compilation, first
    to byte-codes, and then to machine code for selected code when the
    run-time detects that that code is frequently executed.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

On 3/30/07, n/a [email protected] wrote:

On Sat, 31 Mar 2007 07:46:49 +0900, Phillip G. wrote:

n/a wrote:

Hi,

Complete newbie here; is there anything in the works as far as a compiler
for Ruby?

Np I guess your question was very nicely answered by a nice and
competent member I do not agree with Phil’s welcome message.

Welcome to the group
Robert

On Sun, 01 Apr 2007 01:12:09 +0900, Robert D. wrote:

competent member I do not agree with Phil’s welcome message.

Welcome to the group
Robert

Robert,
It’s not always easy being brand new to programming AND to Ruby so
a bit of a friendly attitude such as yours goes a long way, IMHO. And
is
very much appreciated.

Rick DeNatale wrote:

Yes there are a variety of approaches to compiling Ruby to either Java
bytecodes (e.g. XRuby) or bytecodes more specifically tuned to Ruby
semantics (e.g. YARV which is now in Ruby 1.9). I think most people
these days think that bytecode == Java bytecode, but the idea preceded
Java.
I first encountered the idea of a “virtual machine” for reasons of
portability in the early 1960s. However, the idea probably predates that
and goes back to the very early days of computer languages.
semantic effects on the compiled code would be properly implemented.
run-time detects that that code is frequently executed.
The prototype for a lot of this is (most implementations of) Forth.
There is an “inner interpreter”, which was originally indirect threaded
for portability. However, it can be direct threaded, which is faster,
subroutine threaded, which is still faster, or “token” threaded, which
is the most compact. This last corresponds most closely to what we think
of as “byte code”.

Yes, compactness of code is indeed a virtue on “modern machines”,
although I suspect it’s more an issue of caching than virtual memory. By
the way, in “reality”, I don’t think Ruby is any more “dynamic” than
languages we normally think of as “static”. Almost any decent-sized
program or collection of programs is going to have things that are bound
early and things that aren’t bound till run time, regardless of what
languages the implementors used.


M. Edward (Ed) Borasky, FBG, AB, PTA, PGS, MS, MNLP, NST, ACMC(P)
http://borasky-research.blogspot.com/

If God had meant for carrots to be eaten cooked, He would have given
rabbits fire.

On 3/31/07, n/a [email protected] wrote:

very much appreciated.

Oh do not mention it if it had not been me somebody else would have
said the same, it is the group really…glad you feel well here.

Cheers
Robert

On Sat, 31 Mar 2007 23:42:31 +0900, Rick DeNatale wrote:

I see there are various forms of compilers that work at different levels of

machine code can actually lead to better performance on machines with
virtual memory (almost all machines these days) due to the smaller
working set. Digitalk tried direct compilation of Smalltalk to
machine code, because they were sick of getting blasted for being
‘interpreted’ and found that the byte coded stuff ran significantly
faster. The practice these days is to do two-stage compilation, first
to byte-codes, and then to machine code for selected code when the
run-time detects that that code is frequently executed.

Rick,
Thanks for the informative reply. Very helpful.

Rick DeNatale wrote:

  • Previous experience with compiling dynamic OO languages has shown
    that the much smaller code representation of byte codes compared to
    machine code can actually lead to better performance on machines with
    virtual memory (almost all machines these days) due to the smaller
    working set.

Absolutely correct… but there’s even more to this
than meets the eye. The working set that matters most
is the cache, not the RAM. When you have an instruction
cycle time that’s more than 50x the RAM cycle time, you
can do a lot of work on something that’s already in the
cache while you’re waiting for the next cache line to
fill.

Reducing the working set on boxes with GB of RAM typically
has more effect through decreasing cache spills than via
reductions in page faults. The byte-codes also go in
d-cache while the interpreter itself is in I-cache.

Clifford H…

Clifford H. wrote:

Reducing the working set on boxes with GB of RAM typically
has more effect through decreasing cache spills than via
reductions in page faults. The byte-codes also go in
d-cache while the interpreter itself is in I-cache.
You need a very carefully designed inner interpreter for this to be
useful. See
http://dec.bournemouth.ac.uk/forth/euro/ef03/ertl-gregg03.pdf and
http://dec.bournemouth.ac.uk/forth/euro/ef02/ertl02.pdf for some
interesting ways this can be done with the inner interpreter still in C
(although it does exploit some features of GCC that not all C compilers
know about.


M. Edward (Ed) Borasky, FBG, AB, PTA, PGS, MS, MNLP, NST, ACMC(P)
http://borasky-research.blogspot.com/

If God had meant for carrots to be eaten cooked, He would have given
rabbits fire.

M. Edward (Ed) Borasky wrote:

The byte-codes also go in
d-cache while the interpreter itself is in I-cache.
You need a very carefully designed inner interpreter for this to be
useful.

Good stuff, Ed, but not really what I meant.
They’re modifying direct-threaded code to
aggregate common sequences of functions AIUI,
where I wasn’t really talking about threaded
code at all, but byte-code. I’ve used aggressive
inlining to build an interpreter with nearly all
the primitives in one function, leaving normal
C register variables available as registers, and
found that worked quite well (for emulating a
small microprocessor on a 386, rather than for
byte code). The interesting thing is what a good
compiler can do with such a large function if
it’s built this way. You can avoid most call
overhead and have a compact switch table if you
have a well-designed byte-code. Even if the byte
code is highly dense, so that each code needs to
be looked at several times to be executed, that
isn’t a problem once it’s in cache, as the very
next thing you’re often going to do is to fetch
more data or byte-code, and you’ll have to wait
for that - so using some of those CPU cycles
decoding the byte-code doesn’t hurt much.

Clifford H…