How can we make a ruby compiler

robin · October 5, 2010, 2:03am

On Fri, Oct 1, 2010 at 7:52 PM, Ryan D. [email protected]
wrote:

What’s “awesome” is that not a soul on this thread has asked you a single
clarifying question…

WHY do you want one and WHAT do you want to do with it?

Perhaps the most straightforward answer for wanting a Hiphop or
Starkiller-like compiler is FOR SPEED! For example, you’re a company on
the
scale of Facebook who has written a large web site in a dynamic language
and
suddenly realize you could be saving a lot of money on servers if your
app
ran a lot faster. Source translation to C++ is a potential way to go

I don’t think that applies to anyone who uses Ruby, though. Maybe
Twitter…

robin · October 5, 2010, 2:21am

On Oct 4, 2010, at 17:02 , Tony A. wrote:

On Fri, Oct 1, 2010 at 7:52 PM, Ryan D. [email protected] wrote:

What’s “awesome” is that not a soul on this thread has asked you a single
clarifying question…

WHY do you want one and WHAT do you want to do with it?

Perhaps [… lots of speculation …]

My point was that none of your speculation matters in the absence of
clarifying questions to the OP.

robin · October 3, 2010, 7:34pm

On Sun, Oct 3, 2010 at 1:43 PM, Philip R. [email protected]
wrote:

OK, I have been lurking but will respond now: I have an actual existing
application (C/C++) for population genetics simulations which I would dearly
love to convert to Ruby. It was originally all in C and, as a learning
exercise, I converted parts of it to C++ - but it was such a pain . . it
would have been so pleasant to re-write in Ruby. However I would have to
get resulting code converted back to C or compiled somehow to get the
performance back to something usable.

Couldn’t you just profile and write the bottlenecks in C? Also, take a
look at OCaml for ruby-like expressiveness with
reasonably-close-to-C++ performance.

So it appears that there is still no libre software ready for prime time in
terms of being able to write code nicely in Ruby and have some sort of
converted run time that has near C performance . .

I believe projects like ruby2c and zenobfuscate wouldn’t get you
near-C performance, since they’d still be doing a lot of what the ruby
runtime is doing under the hood.

martin

robin · October 5, 2010, 3:33am

On Mon, Oct 4, 2010 at 6:20 PM, Ryan D. [email protected]
wrote:

My point was that none of your speculation matters in the absence of
clarifying questions to the OP.

Given the OP hasn’t responded to any of the discussion, the world may
never
know…

robin · October 5, 2010, 7:16am

Tony A. wrote:

… FOR SPEED! For example, you’re a company on the
scale of Facebook who has written a large web site in a dynamic language and
suddenly realize you could be saving a lot of money on servers if your app
ran a lot faster. Source translation to C++ is a potential way to go

And in all the world, there are how many companies that have a
web site that large (in traffic)? Maybe not even one other.
And they could afford to rewrite the site in whatever language
they chose, instead of relying on a dodgey translator to give
them an unreadable result.

I can’t believe how many people use the old “but what if we
become as large as Facebook?” argument to drive their technology
choices. Like the number of places using NoSQLs to produce sites
that are even less reliable than they used to…

robin · October 5, 2010, 10:42am

On 5 October 2010 07:10, Clifford H. [email protected] wrote:

And they could afford to rewrite the site in whatever language
they chose, instead of relying on a dodgey translator to give
them an unreadable result.

I can’t believe how many people use the old “but what if we
become as large as Facebook?” argument to drive their technology
choices. Like the number of places using NoSQLs to produce sites
that are even less reliable than they used to…

The lack of reliability is not caused by using a less traditional
database but by programming errors either in the site itself or the
tools it uses.

Some databases not associated with NoSQL such as MySQL tend to be very
unreliable.

Using new and experimental technology is not always a good idea but
using old and known broken technology is never so and many people do
that anyway.

Thanks

Michal

robin · October 5, 2010, 11:49am

On 5/10/2010, at 9:42 PM, Michal S. wrote:

On 5 October 2010 07:10, Clifford H. [email protected] wrote:

Tony A. wrote:

… FOR SPEED! For example, you’re a company on the
scale of Facebook who has written a large web site in a dynamic language
and
suddenly realize you could be saving a lot of money on servers if your app
ran a lot faster. Source translation to C++ is a potential way to go

I thought I’d throw in my 2cents since I’ve actually done a little bit
of profiling.

There are two main reasons for “compilation”:

Faster execution for code that is repeated many times
Better foreign function interface

Other parts of modern compilers which are separate from compilation
include:

Syntax correctness
Type checking / program validity

From my experience, a fast interpreter is better than a compiler in many
cases, unless you are optimising an inner loop for code which is pushing
the CPU. I found this out from writing an interpreter and compiler and
doing the tests for myself. I was surprised by the result.

Many problems are algorithmic in nature, and a compiler will only
provide a marginal improvement in speed.

If you have a truly dynamic language (like Ruby), it is almost
impossible to compile it adequately. This is because compilation is all
about making assumptions. A dynamic language makes it very hard to make
assumptions (there is quite a bit of research in this area, it is worth
reading about it).

From testing code that is either interpreted or compiled, I found that
you had to run the code at least 10 times before you saw any kind of
parity. For an inner loop on some complex function, this could be
beneficial.

Every specific situation is different, and this is my experience.

Kind regards,
Samuel

robin · October 5, 2010, 2:32pm

Dear Ryan,

Thanks for your comments. In this discussion there are many opinions, so
please keep in mind this is simply my perspective based on my
experience.

On 5/10/2010, at 11:01 PM, Ryan D. wrote:

On Oct 5, 2010, at 02:49 , Samuel W. wrote:

If you have a truly dynamic language (like Ruby), it is almost impossible to compile it adequately. This is because compilation is all about making assumptions. A dynamic language makes it very hard to make assumptions (there is quite a bit of research in this area, it is worth reading about it).

Well this just isn’t true (or is overly vague and my tired brain is reading more into it than it should). Look at anything written by David Ungar, or the research done on self, smalltalk, the latest javascript engines, etc…

I’m aware of most of this work, however I don’t consider many of these
languages to be completely dynamic. By dynamic, I mean that it is not
possible to make an assumption about the result of an expression unless
it is executed. If you can make an assumption about an expression, I
don’t consider it to be dynamic.

For example, in most of those languages, the name of a function is
specified explicitly and can’t change due to the environment or scope of
execution. We also know that all arguments to a function will be
evaluated in the current scope. We can do some analysis and determine
that an expression won’t change in a loop, and then optimise for this
case. Many of these languages provide some semantic models which allow
the interpreter to make assumptions.

A good indication of a non-dynamic programming language is the presence
of semantically meaningful keywords, especially those that have fixed
behaviour. Examples include “def”, “if”, “while”, “switch” and “try”.
These expressions can all be analysed with the knowledge of a given
semantic model. A truly dynamic language has no such luxury…

In the case of a dynamic language we are reduced to statistical analysis
at run time. Compilation becomes a method of speeding up the interpreter
execution rather than optimising based on assumptions in the code
itself. Few, if any, programming languages are completely dynamic.
Scheme would be one language that I would consider very dynamic, as an
example.

An interpreter is simply a high level processor (i.e. CPU). However,
there are intrinsic semantic structures which cannot be lowered.
“Sufficiently smart compilers”, and all that. Programming languages
range from completely dynamic to completely static, depending on the
semantic and execution model.

Kind regards,
Samuel

robin · October 5, 2010, 12:02pm

On Oct 5, 2010, at 02:49 , Samuel W. wrote:

If you have a truly dynamic language (like Ruby), it is almost impossible to compile it adequately. This is because compilation is all about making assumptions. A dynamic language makes it very hard to make assumptions (there is quite a bit of research in this area, it is worth reading about it).

Well this just isn’t true (or is overly vague and my tired brain is
reading more into it than it should). Look at anything written by David
Ungar, or the research done on self, smalltalk, the latest javascript
engines, etc…

robin · October 5, 2010, 3:00pm

On 6/10/2010, at 1:31 AM, Samuel W. wrote:

Scheme would be one language that I would consider very dynamic, as an example.

Sorry, I meant to say “Scheme would be one language that I would
consider very close to being completely dynamic”.

robin · October 5, 2010, 3:05pm

(2010/10/05 21:31), Samuel W. wrote:

For example, in most of those languages, the name of a function is specified explicitly and can’t change due to the environment or scope of execution.

Oh yes they can. For instance:

rhino
Rhino 1.7 release 2 2010 01 20
js> foo = {
foo: function() {
this.foo = function () {
return “bar”
};
return “foo”;
}
};
[object Object]
js> foo.foo();
foo
js> foo.foo();
bar

robin · October 5, 2010, 5:14pm

On Tue, Oct 5, 2010 at 8:31 AM, Samuel W.
[email protected] wrote:

Well this just isn’t true (or is overly vague and my tired brain is reading more into it than it should). Look at anything written by David Ungar, or the research done on self, smalltalk, the latest javascript engines, etc…

I’m aware of most of this work, however I don’t consider many of these languages to be completely dynamic. By dynamic, I mean that it is not possible to make an assumption about the result of an expression unless it is executed. If you can make an assumption about an expression, I don’t consider it to be dynamic.

What assumptions can you make about the result of an expression in any
of these languages without running it?

The result of a method send in Smalltalk, Ruby, or Self depends on the
runtime state of the receiver of the message, and in the case of Ruby
and Self, where compile time is the same as run-time methods can
change as the program runs.

In Ruby this evolution happens as the program requires new code, mixes
in new modules, defines singleton methods, redefines methods, uses
meta-programming techniques, such as the alias_method_chain found in
Rails …

In Self static analysis of a method can’t even determine if the
reference to an ‘instance variable’ is really just a value reference
or a method call.

For example, in most of those languages, the name of a function is specified explicitly and can’t change due to the environment or scope of execution. We also know that all arguments to a function will be evaluated in the current scope. We can do some analysis and determine that an expression won’t change in a loop, and then optimise for this case.

No I don’t think we can. Not for Ruby, nor Smalltalk, nor Self, nor
as Shyouhei-san has pointed out, for JavaScript.

Smalltalk is a bit more static than Ruby, Self, or JavaScript, in
that, although run-time and development take place in the same
environment, code changes happen mostly when a programmer changes a
method/class definition in the IDE, which causes an incremental
compilation of the affected methods. Smalltalk classes are
‘statically’ defined in this sense.

Many of these languages provide some semantic models which allow the interpreter to make assumptions.

A good indication of a non-dynamic programming language is the presence of semantically meaningful keywords, especially those that have fixed behaviour. Examples include “def”, “if”, “while”, “switch” and “try”. These expressions can all be analysed with the knowledge of a given semantic model. A truly dynamic language has no such luxury…

Smalltalk at the language level has no such keywords, control flow is
defined in terms of method sends.

For example if is implemented by ifTrue:, ifFalse:, and
ifTrue:ifFalse: messages, and boolean classes define these methods to
evaluate one of (or none) of the block arguments.

One could model this in Ruby with something like:

class Object

define the if methods for truthy values

def if_true(eval_if_true)
eval_if_true.call
end

def if_true_else(eval_if_true, eval_if_false)
eval_if_true.call
end

def if_false(eval_if_false)
nil
end
end

module FalsyIfMethods

define the if methods for falsy values

def if_true(eval_if_true)
nil
end

def if_true_else(eval_if_true, eval_if_false)
eval_if_false.call
end

def if_false(eval_if_false)
eval_if_false.call
end
end

class NilClass
include FalsyIfMethods
end

class FalseClass
include FalsyIfMethods
end

is_truthy = lambda {“truthy”}
is_falsy = lambda {“falsy”}

1.if_true(is_truthy) # => “truthy”
1.if_true_else(is_truthy, is_falsy) # => “truthy”
1.if_false(is_falsy) # => nil

nil.if_true(is_truthy) # => nil
nil.if_true_else(is_truthy, is_falsy) # => “falsy”
nil.if_false(is_falsy) # => “falsy”

true.if_true(is_truthy) # => “truthy”
true.if_true_else(is_truthy, is_falsy) # => “truthy”
true.if_false(is_falsy) # => nil

(1 == 2).if_true_else(is_truthy, is_falsy) # => “falsy”

Now most Smalltalk implementations do cheat on things like this and
treat methods like ifTrue: and its brethren as special cases and
compile them to test and branch bytecodes, with or without an escape
if the receiver turns out not to be a boolean. But Self came about
primarily because Dave Ungar, whose PhD dissertation was on Smalltalk
performance, wanted to explore doing away with such cheats, as well as
statically determination of whether something was an iv or a method,
as well as relying on the ‘static’ class definitions in Smalltalk and
see if dynamic runtime techniques could achieve equivalent if not
better performance.

And that work led to things like the JIT implementations in the Java
hotspot VM, and Self appears to have been an strong influence on
JavaScript, which uses the same kind of prototype technique for
implementation sharing, rather than a class hierarchy.

In the case of a dynamic language we are reduced to statistical analysis at run time. Compilation becomes a method of speeding up the interpreter execution rather than optimising based on assumptions in the code itself. Few, if any, programming languages are completely dynamic. Scheme would be one language that I would consider very dynamic, as an example.

Which you later correct to say ‘Sorry, I meant to say “Scheme would be
one language that I would consider very close to being completely
dynamic”.’

Which I interpret to mean that you put Scheme as being even more
dynamic than Smalltalk, Ruby, Self or JavaScript

An interpreter is simply a high level processor (i.e. CPU). However, there are intrinsic semantic structures which cannot be lowered. “Sufficiently smart compilers”, and all that. Programming languages range from completely dynamic to completely static, depending on the semantic and execution model.

I’m confused by your argument at this point, because there are quite a
few compilers for Scheme
http://en.wikipedia.org/wiki/Category:Scheme_compilers some of these
compile to an intermediate ‘language’ like C or JVM bytecodes, others
directly to some machine language.

And, Guy Steele, one the inventors of Scheme wrote his PhD
dissertation on a Scheme compiler “Rabbit”

So it’s not impossible to write a compiler for a dynamic language, but
it takes more thought and techniques than most introductory compiler
texts/courses teach.

And Scheme was a seminal influence on those who have tackled the task.
Particularly in the form of the “Lambda Papers” a series of M.I.T.
A.I. lab memos by Guy Steele and Gerald Sussman as they explored usage
and implementation around various issues such as the lambda calculus,
continuations and a few other things leading up to and including
Steele’s dissertation on Rabbit.

http://library.readscheme.org/page1.html

And Gerald Sussman went on to write, with Julie Sussman and Harold
Abelson, “The Structure and Interpretation of Computer Programs” which
should be an eye-opener to anyone who has thought of C as the
prototypical programming language.

Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Github: rubyredrick (Rick DeNatale) · GitHub
Twitter: @RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

robin · October 5, 2010, 6:00pm

my 3 cents:

static type-inferencing analyzers/compilers wouldn’t help much in a
language where you can do something like monkey patching and modify
behaviour of classes at runtime: there’s not many things remaining
static (for too long) in a ruby program
static type-inferencing analyzers/compilers could work if they were
whole-program compilers, but for a typical ruby program this could
mean a few hours or days of compilation time and memory swapping
static type-inferencing analyzers/compilers could work best for
ruby if they were JITs, but still wouldn’t get you anywhere near the
performance of a static language by nature.

robin · October 5, 2010, 5:59pm

Rick DeNatale wrote:

And Gerald Sussman went on to write, with Julie Sussman and Harold
Abelson, “The Structure and Interpretation of Computer Programs” which
should be an eye-opener to anyone who has thought of C as the
prototypical programming language.

Thanks to MIT’s open course ware program, 21 hours of video of Sussman
and Abelson teaching a course of the same name are available here:

http://www.youtube.com/view_play_list?p=E18841CABEA24090

Links to the book itself and other information are here:

– Bill

robin · October 5, 2010, 7:03pm

On Tue, Oct 5, 2010 at 10:15 AM, Philip R. [email protected]
wrote:

What level of programmer is this stuff aimed at?

SICP is actually aimed at freshmen at MIT. So I might be inclined to say
“beginners”…

robin · October 5, 2010, 7:20pm

On 5 out, 13:54, Tony A. [email protected] wrote:

[Note: parts of this message were removed to make it a legal post.]

On Tue, Oct 5, 2010 at 10:15 AM, Philip R. [email protected] wrote:

What level of programmer is this stuff aimed at?

SICP is actually aimed at freshmen at MIT. So I might be inclined to say
“beginners”…

freshmen at MIT are higher level than your average joe. Plus, they
may be freshman, but after going through SICP they come out
wizards…

Have you ever taken the time or curiosity to read some of it?

robin · October 5, 2010, 6:17pm

Bill,

On 2010-10-06 02:59, William R. wrote:

Links to the book itself and other information are here:
Structure and Interpretation of Computer Programs - Wikipedia

What level of programmer is this stuff aimed at? I want a Ruby compiler
(see previous message) but am interested generally and would like to
know more but I am a biologist and not really a serious low-level coder
. .

Thanks,

Phil.

Philip R.

GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

robin · October 5, 2010, 11:25pm

Philip R. wrote:

Thanks to MIT’s open course ware program, 21 hours of video of Sussman
compiler (see previous message) but am interested generally and would
like to know more but I am a biologist and not really a serious
low-level coder . .

Thanks,

Phil.
The book was the text for the introductory programming course at MIT.
The preface, describing the teaching of the material over several years
to 600-700 students, says:
Most of these students have had little or no prior formal training in
computation, although many have played with computers a bit and a few
have had extensive programming or hardware-design experience.
MIT, of course, is not a trade-school and this book is not really about
the Scheme language. Its about computation and programming itself.
Perhaps it will help to think of it as being like an introduction to
cell biology as opposed to a cook book for a sushi chef. (Ignore the
metaphor if it doesn’t resonate.)

You can download the book at the cost of a little bandwidth. I suggest
you do so and read the forward, preface, table of contents, then skim
the first chapter and flip thru the rest of the book. Note that it ends
with a compiler and garbage collector. Then you can make an informed
decision about how much time to devote to it.

As for a Ruby compiler, a useful one is very unlikely for many reasons.
Now a compiler for a different language that has a distant family
resemblance to Ruby may be possible but likely of not much practical
use.

– Bill

robin · October 5, 2010, 7:41pm

On Tue, Oct 5, 2010 at 11:20 AM, namekuseijin
[email protected]wrote:

Have you ever taken the time or curiosity to read some of it?

I’ve never read the wizard book but I’ve watched the Ableson/Sussman
lectures which are very interesting.

robin · October 6, 2010, 2:35am

On 5 Oct 2010, at 22:07, William R. wrote:

As for a Ruby compiler, a useful one is very unlikely for many reasons. Now a compiler for a different language that has a distant family resemblance to Ruby may be possible but likely of not much practical use.

Lisp, Scheme, Forth and Smalltalk are all compilable so in principle
Ruby should be as well. It’s just not clear that there’s any real gain
by doing so outside of very particular applications (i.e. dominated by
maths or interpretation performance bottlenecks). Moreover multicore is
teasing us towards a world where all processing will be a negligible
cost compared to I/O latencies and throughput, much as was the case
fifty years ago.

Ellie

Eleanor McHugh
Games With Brains
http://feyeleanor.tel

raise ArgumentError unless @reality.responds_to? :reason

How can we make a ruby compiler

define the if methods for truthy values

define the if methods for falsy values

And Gerald Sussman went on to write, with Julie Sussman and Harold Abelson, “The Structure and Interpretation of Computer Programs” which should be an eye-opener to anyone who has thought of C as the prototypical programming language.

Phil.

Eleanor McHugh Games With Brains http://feyeleanor.tel

And Gerald Sussman went on to write, with Julie Sussman and Harold
Abelson, “The Structure and Interpretation of Computer Programs” which
should be an eye-opener to anyone who has thought of C as the
prototypical programming language.

Eleanor McHugh
Games With Brains
http://feyeleanor.tel