What’s “awesome” is that not a soul on this thread has asked you a single
clarifying question…
WHY do you want one and WHAT do you want to do with it?
Perhaps the most straightforward answer for wanting a Hiphop or
Starkiller-like compiler is FOR SPEED! For example, you’re a company on
the
scale of Facebook who has written a large web site in a dynamic language
and
suddenly realize you could be saving a lot of money on servers if your
app
ran a lot faster. Source translation to C++ is a potential way to go
I don’t think that applies to anyone who uses Ruby, though. Maybe
Twitter…
OK, I have been lurking but will respond now: I have an actual existing
application (C/C++) for population genetics simulations which I would dearly
love to convert to Ruby. It was originally all in C and, as a learning
exercise, I converted parts of it to C++ - but it was such a pain . . it
would have been so pleasant to re-write in Ruby. However I would have to
get resulting code converted back to C or compiled somehow to get the
performance back to something usable.
Couldn’t you just profile and write the bottlenecks in C? Also, take a
look at OCaml for ruby-like expressiveness with
reasonably-close-to-C++ performance.
So it appears that there is still no libre software ready for prime time in
terms of being able to write code nicely in Ruby and have some sort of
converted run time that has near C performance . .
I believe projects like ruby2c and zenobfuscate wouldn’t get you
near-C performance, since they’d still be doing a lot of what the ruby
runtime is doing under the hood.
… FOR SPEED! For example, you’re a company on the
scale of Facebook who has written a large web site in a dynamic language and
suddenly realize you could be saving a lot of money on servers if your app
ran a lot faster. Source translation to C++ is a potential way to go
And in all the world, there are how many companies that have a
web site that large (in traffic)? Maybe not even one other.
And they could afford to rewrite the site in whatever language
they chose, instead of relying on a dodgey translator to give
them an unreadable result.
I can’t believe how many people use the old “but what if we
become as large as Facebook?” argument to drive their technology
choices. Like the number of places using NoSQLs to produce sites
that are even less reliable than they used to…
And they could afford to rewrite the site in whatever language
they chose, instead of relying on a dodgey translator to give
them an unreadable result.
I can’t believe how many people use the old “but what if we
become as large as Facebook?” argument to drive their technology
choices. Like the number of places using NoSQLs to produce sites
that are even less reliable than they used to…
The lack of reliability is not caused by using a less traditional
database but by programming errors either in the site itself or the
tools it uses.
Some databases not associated with NoSQL such as MySQL tend to be very
unreliable.
Using new and experimental technology is not always a good idea but
using old and known broken technology is never so and many people do
that anyway.
… FOR SPEED! For example, you’re a company on the
scale of Facebook who has written a large web site in a dynamic language
and
suddenly realize you could be saving a lot of money on servers if your app
ran a lot faster. Source translation to C++ is a potential way to go
I thought I’d throw in my 2cents since I’ve actually done a little bit
of profiling.
There are two main reasons for “compilation”:
Faster execution for code that is repeated many times
Better foreign function interface
Other parts of modern compilers which are separate from compilation
include:
Syntax correctness
Type checking / program validity
From my experience, a fast interpreter is better than a compiler in many
cases, unless you are optimising an inner loop for code which is pushing
the CPU. I found this out from writing an interpreter and compiler and
doing the tests for myself. I was surprised by the result.
Many problems are algorithmic in nature, and a compiler will only
provide a marginal improvement in speed.
If you have a truly dynamic language (like Ruby), it is almost
impossible to compile it adequately. This is because compilation is all
about making assumptions. A dynamic language makes it very hard to make
assumptions (there is quite a bit of research in this area, it is worth
reading about it).
From testing code that is either interpreted or compiled, I found that
you had to run the code at least 10 times before you saw any kind of
parity. For an inner loop on some complex function, this could be
beneficial.
Every specific situation is different, and this is my experience.
Thanks for your comments. In this discussion there are many opinions, so
please keep in mind this is simply my perspective based on my
experience.
On 5/10/2010, at 11:01 PM, Ryan D. wrote:
On Oct 5, 2010, at 02:49 , Samuel W. wrote:
If you have a truly dynamic language (like Ruby), it is almost impossible to compile it adequately. This is because compilation is all about making assumptions. A dynamic language makes it very hard to make assumptions (there is quite a bit of research in this area, it is worth reading about it).
Well this just isn’t true (or is overly vague and my tired brain is reading more into it than it should). Look at anything written by David Ungar, or the research done on self, smalltalk, the latest javascript engines, etc…
I’m aware of most of this work, however I don’t consider many of these
languages to be completely dynamic. By dynamic, I mean that it is not
possible to make an assumption about the result of an expression unless
it is executed. If you can make an assumption about an expression, I
don’t consider it to be dynamic.
For example, in most of those languages, the name of a function is
specified explicitly and can’t change due to the environment or scope of
execution. We also know that all arguments to a function will be
evaluated in the current scope. We can do some analysis and determine
that an expression won’t change in a loop, and then optimise for this
case. Many of these languages provide some semantic models which allow
the interpreter to make assumptions.
A good indication of a non-dynamic programming language is the presence
of semantically meaningful keywords, especially those that have fixed
behaviour. Examples include “def”, “if”, “while”, “switch” and “try”.
These expressions can all be analysed with the knowledge of a given
semantic model. A truly dynamic language has no such luxury…
In the case of a dynamic language we are reduced to statistical analysis
at run time. Compilation becomes a method of speeding up the interpreter
execution rather than optimising based on assumptions in the code
itself. Few, if any, programming languages are completely dynamic.
Scheme would be one language that I would consider very dynamic, as an
example.
An interpreter is simply a high level processor (i.e. CPU). However,
there are intrinsic semantic structures which cannot be lowered.
“Sufficiently smart compilers”, and all that. Programming languages
range from completely dynamic to completely static, depending on the
semantic and execution model.
If you have a truly dynamic language (like Ruby), it is almost impossible to compile it adequately. This is because compilation is all about making assumptions. A dynamic language makes it very hard to make assumptions (there is quite a bit of research in this area, it is worth reading about it).
Well this just isn’t true (or is overly vague and my tired brain is
reading more into it than it should). Look at anything written by David
Ungar, or the research done on self, smalltalk, the latest javascript
engines, etc…
Well this just isn’t true (or is overly vague and my tired brain is reading more into it than it should). Look at anything written by David Ungar, or the research done on self, smalltalk, the latest javascript engines, etc…
I’m aware of most of this work, however I don’t consider many of these languages to be completely dynamic. By dynamic, I mean that it is not possible to make an assumption about the result of an expression unless it is executed. If you can make an assumption about an expression, I don’t consider it to be dynamic.
What assumptions can you make about the result of an expression in any
of these languages without running it?
The result of a method send in Smalltalk, Ruby, or Self depends on the
runtime state of the receiver of the message, and in the case of Ruby
and Self, where compile time is the same as run-time methods can
change as the program runs.
In Ruby this evolution happens as the program requires new code, mixes
in new modules, defines singleton methods, redefines methods, uses
meta-programming techniques, such as the alias_method_chain found in
Rails …
In Self static analysis of a method can’t even determine if the
reference to an ‘instance variable’ is really just a value reference
or a method call.
For example, in most of those languages, the name of a function is specified explicitly and can’t change due to the environment or scope of execution. We also know that all arguments to a function will be evaluated in the current scope. We can do some analysis and determine that an expression won’t change in a loop, and then optimise for this case.
No I don’t think we can. Not for Ruby, nor Smalltalk, nor Self, nor
as Shyouhei-san has pointed out, for JavaScript.
Smalltalk is a bit more static than Ruby, Self, or JavaScript, in
that, although run-time and development take place in the same
environment, code changes happen mostly when a programmer changes a
method/class definition in the IDE, which causes an incremental
compilation of the affected methods. Smalltalk classes are
‘statically’ defined in this sense.
Many of these languages provide some semantic models which allow the interpreter to make assumptions.
A good indication of a non-dynamic programming language is the presence of semantically meaningful keywords, especially those that have fixed behaviour. Examples include “def”, “if”, “while”, “switch” and “try”. These expressions can all be analysed with the knowledge of a given semantic model. A truly dynamic language has no such luxury…
Smalltalk at the language level has no such keywords, control flow is
defined in terms of method sends.
For example if is implemented by ifTrue:, ifFalse:, and
ifTrue:ifFalse: messages, and boolean classes define these methods to
evaluate one of (or none) of the block arguments.
One could model this in Ruby with something like:
class Object
define the if methods for truthy values
def if_true(eval_if_true)
eval_if_true.call
end
def if_true_else(eval_if_true, eval_if_false)
eval_if_true.call
end
def if_false(eval_if_false)
nil
end
end
module FalsyIfMethods
define the if methods for falsy values
def if_true(eval_if_true)
nil
end
def if_true_else(eval_if_true, eval_if_false)
eval_if_false.call
end
def if_false(eval_if_false)
eval_if_false.call
end
end
Now most Smalltalk implementations do cheat on things like this and
treat methods like ifTrue: and its brethren as special cases and
compile them to test and branch bytecodes, with or without an escape
if the receiver turns out not to be a boolean. But Self came about
primarily because Dave Ungar, whose PhD dissertation was on Smalltalk
performance, wanted to explore doing away with such cheats, as well as
statically determination of whether something was an iv or a method,
as well as relying on the ‘static’ class definitions in Smalltalk and
see if dynamic runtime techniques could achieve equivalent if not
better performance.
And that work led to things like the JIT implementations in the Java
hotspot VM, and Self appears to have been an strong influence on
JavaScript, which uses the same kind of prototype technique for
implementation sharing, rather than a class hierarchy.
In the case of a dynamic language we are reduced to statistical analysis at run time. Compilation becomes a method of speeding up the interpreter execution rather than optimising based on assumptions in the code itself. Few, if any, programming languages are completely dynamic. Scheme would be one language that I would consider very dynamic, as an example.
Which you later correct to say ‘Sorry, I meant to say “Scheme would be
one language that I would consider very close to being completely
dynamic”.’
Which I interpret to mean that you put Scheme as being even more
dynamic than Smalltalk, Ruby, Self or JavaScript
An interpreter is simply a high level processor (i.e. CPU). However, there are intrinsic semantic structures which cannot be lowered. “Sufficiently smart compilers”, and all that. Programming languages range from completely dynamic to completely static, depending on the semantic and execution model.
I’m confused by your argument at this point, because there are quite a
few compilers for Scheme http://en.wikipedia.org/wiki/Category:Scheme_compilers some of these
compile to an intermediate ‘language’ like C or JVM bytecodes, others
directly to some machine language.
And, Guy Steele, one the inventors of Scheme wrote his PhD
dissertation on a Scheme compiler “Rabbit”
So it’s not impossible to write a compiler for a dynamic language, but
it takes more thought and techniques than most introductory compiler
texts/courses teach.
And Scheme was a seminal influence on those who have tackled the task.
Particularly in the form of the “Lambda Papers” a series of M.I.T.
A.I. lab memos by Guy Steele and Gerald Sussman as they explored usage
and implementation around various issues such as the lambda calculus,
continuations and a few other things leading up to and including
Steele’s dissertation on Rabbit.
And Gerald Sussman went on to write, with Julie Sussman and Harold
Abelson, “The Structure and Interpretation of Computer Programs” which
should be an eye-opener to anyone who has thought of C as the
prototypical programming language.
static type-inferencing analyzers/compilers wouldn’t help much in a
language where you can do something like monkey patching and modify
behaviour of classes at runtime: there’s not many things remaining
static (for too long) in a ruby program
static type-inferencing analyzers/compilers could work if they were
whole-program compilers, but for a typical ruby program this could
mean a few hours or days of compilation time and memory swapping
static type-inferencing analyzers/compilers could work best for
ruby if they were JITs, but still wouldn’t get you anywhere near the
performance of a static language by nature.
And Gerald Sussman went on to write, with Julie Sussman and Harold
Abelson, “The Structure and Interpretation of Computer Programs” which
should be an eye-opener to anyone who has thought of C as the
prototypical programming language.
Thanks to MIT’s open course ware program, 21 hours of video of Sussman
and Abelson teaching a course of the same name are available here:
What level of programmer is this stuff aimed at? I want a Ruby compiler
(see previous message) but am interested generally and would like to
know more but I am a biologist and not really a serious low-level coder
. .
Thanks to MIT’s open course ware program, 21 hours of video of Sussman
compiler (see previous message) but am interested generally and would
like to know more but I am a biologist and not really a serious
low-level coder . .
Thanks,
Phil.
The book was the text for the introductory programming course at MIT.
The preface, describing the teaching of the material over several years
to 600-700 students, says:
Most of these students have had little or no prior formal training in
computation, although many have played with computers a bit and a few
have had extensive programming or hardware-design experience.
MIT, of course, is not a trade-school and this book is not really about
the Scheme language. Its about computation and programming itself.
Perhaps it will help to think of it as being like an introduction to
cell biology as opposed to a cook book for a sushi chef. (Ignore the
metaphor if it doesn’t resonate.)
You can download the book at the cost of a little bandwidth. I suggest
you do so and read the forward, preface, table of contents, then skim
the first chapter and flip thru the rest of the book. Note that it ends
with a compiler and garbage collector. Then you can make an informed
decision about how much time to devote to it.
As for a Ruby compiler, a useful one is very unlikely for many reasons.
Now a compiler for a different language that has a distant family
resemblance to Ruby may be possible but likely of not much practical
use.
As for a Ruby compiler, a useful one is very unlikely for many reasons. Now a compiler for a different language that has a distant family resemblance to Ruby may be possible but likely of not much practical use.
Lisp, Scheme, Forth and Smalltalk are all compilable so in principle
Ruby should be as well. It’s just not clear that there’s any real gain
by doing so outside of very particular applications (i.e. dominated by
maths or interpretation performance bottlenecks). Moreover multicore is
teasing us towards a world where all processing will be a negligible
cost compared to I/O latencies and throughput, much as was the case
fifty years ago.