Fwd: Launching Ruby scripts and the future of MVM

I tossed this message off to the Ruby-core list about a month ago, and
sent a follow-up email today. The basic idea is that if there were a
Kernel#run_script method or similar, all Ruby apps that want to launch
external scripts could do so in a platform and
implementation-independent way. In future versions of Ruby and in
JRuby today, that could mean launching the additional script within
the current process, but even now it would provide a simpler way to
launch external scripts.

Does anyone have an opinion on this?

---------- Forwarded message ----------
From: Charles O Nutter [email protected]
Date: Jan 25, 2006 8:50 AM
Subject: Launching Ruby scripts and the future of MVM
To: [email protected]

Hello again from the JRuby project!

It has come to our attention that there may need to be a standard way
to tell Ruby to launch a given script in a new interpreter engine.
Currently, it appears that many different approaches are used, ranging
from launching a separate process to forking and eval’ing a given
script. With the possibility of Ruby running in a multi-vm scenario
not far off (already possible today with JRuby and perhaps possible
soon in YARV) I believe it would be beneficial to have a way of
telling Ruby to “run this script in a new interpreter” and allow the
underlying ruby implementation to decide whether to launch a new
process or not. A potential method might be Kernel#run_script.

The issue we have with JRuby is that certain applications, Rake for
one, tend to want to launch subscripts in new Ruby interpreters. While
this is straightforward and relatively low-cost in the C Ruby world,
it incurs a severe performance and memory penalty in the JRuby world.
Launching a new “JRuby process” incurs the added pain of starting up a
new JVM process, not a trivial bit of work. This currently works as
expected, but is very slow and resource-intensive.

Perhaps it would be ideal if applications could call something like
Kernel#run_script, allowing the underlying Ruby implementation to
decide how to run that script. In today’s 1.8 Ruby implementation,
that may simply mean running an external process, either by using
popen or system. In implementations like JRuby or YARV, the run_script
call could be handled by launching a new Ruby VM within the same
process, avoiding the process-startup penalty. It would allow us to
run some of the most complicated Rake scripts all in a single JVM
process with JRuby, utilizing our MVM capability very effectively.

What thoughts do you have? I know 1.8 is supposed to be pretty well
settled, but it sure would help us if this idea were implemented
sooner rather than later, so third-party apps could start using a
platform and implementation-independent mechanism for launching Ruby
scripts.

  • Charlie

FYI, I also created an RCR for this, #328. Please post comments and
vote your mind when you have a chance. Thank you!

On Feb 23, 2006, at 5:05 PM, Charles O Nutter wrote:

I tossed this message off to the Ruby-core list about a month ago, and
sent a follow-up email today. The basic idea is that if there were a
Kernel#run_script method or similar, all Ruby apps that want to launch
external scripts could do so in a platform and
implementation-independent way.

In what way is what you are proposing different from Kernel#system?

Gary W.

On Feb 23, 2006, at 5:42 PM, [email protected] wrote:

In what way is what you are proposing different from Kernel#system?

Gary W.

system(x) # x is arbitrary shell command
run_script(x) # x is guaranteed to be a script written in ruby

This means for instance, that run_script could get away with not
forking a new process, but rather just a new ruby VM assuming that
the ruby implementation had that capability
In the OPs example, using system to run another ruby script is
going to cause a whole new JVM to be created, along with the overhead
of a new ruby interpreter. Apparently JRuby has the ability to have
multiple instances of ruby per process. By incorporating this method
into ruby, scripts that want to run other ruby scripts can run faster
than they do currently (and no slower). In the C implementation of
ruby of course run_script could easily be implemented in terms of
system, but the JRuby guys would be able to implement it in a more
performant manner for their situation. Likewise YARV could
theoretically create a new instance of itself instead of a whole
nother process.

On Feb 23, 2006, at 6:20 PM, Logan C. wrote:

In what way is what you are proposing different from Kernel#system?

system(x) # x is arbitrary shell command
run_script(x) # x is guaranteed to be a script written in ruby

This means for instance, that run_script could get away with not
forking a new process, but rather just a new ruby VM assuming that
the ruby implementation had that capability

You already have coroutines, threads, fork/exec, system, and load/
require
all of which give you different ways to manage multiple threads of
control
and/or interpret external ruby code.

If you had two (or more) Ruby ‘contexts’ in a single process you
would still have
to get memory management and IO to work correctly and you would have
two (or more)
top-level objects and invariably you are going to want to communicate
between the
two contexts which means creating some sort of inter-context
communication
system, which would have to play nice with the OS, other ruby threads
and so on.
Then you would have the problem of which classes are defined in which
context.
If it is the same, then why multiple contexts? If it is different
then your
inter-context communication/data sharing just became a lot more complex.

It all sounds like a lot of work much that would end up still not
providing
the features you already have with fork/exec (for example) and I’m still
not sure what problem is being addressed that can’t be solved with the
existing toolset, which already has quite a few options.

I’m not a Windows guy, maybe there are limitations in that
environment (especially
with fork/exec) that I’m just not aware of.

Gary W.

On 2/23/06, [email protected] [email protected] wrote:

run_script(x) # x is guaranteed to be a script written in ruby

This means for instance, that run_script could get away with not
forking a new process, but rather just a new ruby VM assuming that
the ruby implementation had that capability
You already have coroutines, threads, fork/exec, system, and load/
require all of which give you different ways to manage multiple
threads of control and/or interpret external ruby code.

Right. But this is meaningfully different than all of the above,
especially within the context of JRuby. JVMs are expensive to start,
but independent Java threads are pretty easy to start. I think that the
intent is that JRuby is going to introduce Kernel#run_script or
something similar to it because they want to give JRuby programmers a
way to start an external script in a lightweight manner. The suggestion
being made here is to reincorporate it into CRuby, something I support.

In CRuby (without multiple VM instances possible):
Kernel#run_script(name, *args)
would be no different than:
Kernel#system(“ruby”, name, *args)

I’m not fond of the name (maybe Kernel#ruby might be appropriate) but
the functionality is appropriate, IMO.

If you had two (or more) Ruby ‘contexts’ in a single process you would
still have to get memory management and IO to work correctly and you
would have two (or more) top-level objects and invariably you are
going to want to communicate between the two contexts which means
creating some sort of inter-context communication system, which would
have to play nice with the OS, other ruby threads and so on.

Actually, you may not. When you’re calling Kernel#system, you don’t
necessarily want all of the above. You’re often looking for a return
code that indicates that the command was run successfully … and you
don’t care if it’s just a second VM that’s started up to run the Ruby
script you just called. The VMs will have to do the synchronization to
OS resources, but you won’t, in either process.

This would certainly make Rake’s default mode of operation for tests
more portable and reliable.

It all sounds like a lot of work much that would end up still not
providing the features you already have with fork/exec (for example)
and I’m still not sure what problem is being addressed that can’t be
solved with the existing toolset, which already has quite a few
options.

I think you’re trying to overthink this. This is about reducing the
startup cost for JRuby at a minimum and future Ruby interpreters that
have multiple VM support. This isn’t about pseudo-IPC within those VMs.

-austin

Just a simple note and no further info but…

Very recently ko1, the YARV developer, wrote that preliminary multiple
VM implementation had been done in his diary / blog / something such.

He also wrote that it’s too hackish / ad-hoc too commit, though.

MVM itself is considered useful especially for shared interpreter
environments such as mod_ruby, or providing more complete sandbox.

being made here is to reincorporate it into CRuby, something I support.

In CRuby (without multiple VM instances possible):
Kernel#run_script(name, *args)
would be no different than:
Kernel#system(“ruby”, name, *args)

I’m not fond of the name (maybe Kernel#ruby might be appropriate) but
the functionality is appropriate, IMO.

I’d only correct this by saying we’d really like to add something
like Kernel#run_script/ruby, but unless existing apps were ported to
use it (necessitating its existance in CRuby) it wouldn’t accomplish
much for us.

This would certainly make Rake’s default mode of operation for tests
more portable and reliable.

This is, in fact, the exact use case we’re having trouble with. As
JRuby gets closer and closer to full Ruby compatibility, we have been
starting to run more and more test cases against it. Many of these are
most easily run using rake, but two issues with JRuby make that
scenario less than ideal: 1. Launching new JVMs is very expensive, and
2. Java’s support for launching and controlling external processes is
far more clunky than it is in POSIX. We have managed to hack around
the launching by intercepting Kernel#system calls, searching for
“ruby” or “jruby” in the beginning of the command, and if found
launching the appropriate script within the same JVM. The two
instances of JRuby are treated independently, as though they were
separate processes. It works, but it makes me itch.

I think you’re trying to overthink this. This is about reducing the
startup cost for JRuby at a minimum and future Ruby interpreters that
have multiple VM support. This isn’t about pseudo-IPC within those VMs.

I have seen MVM mentioned in at least a couple YARV presentations, and
I think it’s a good idea in the long term to be able to launch a new
script in a new Ruby “VM” without launching a whole new process. I
also think the script-launching method would make firing off scripts
easier both now and in the future. Here’s hoping it happens!

And a friendly reminder about the RCR…I think this would be a really
nice feature to add. :slight_smile:

http://www.rcrchive.net/rcr/show/328

Just to keep this discussion going, anyone want to take a stab at a
sample implementaton of Kernel#ruby based on how Rake does it? Rake
defines two methods in its FileUtils class: ‘sh’ and ‘ruby’. ‘sh’
provides a somewhat neater interface to running a system command with
or without a shell, where ‘ruby’ provides a way to simply execute a
Ruby command line using the approprate executable, passing args
directly. In all, it’s about 40 lines of code, and the ‘sh’
implementation may be unnecessary for our Kernel purposes. The key bit
of logic, though, is how it gets the ruby interpreter to run:

RUBY = File.join(Config::CONFIG[‘bindir’],
Config::CONFIG[‘ruby_install_name’])

If called within Ruby 1.8.x, Kernel#ruby(args, script) would most
likely just use the above definition of RUBY and the specified args
and/or script (or some other organization of command-line params) to
launch an external process. In a multi-VM-aware interpreter, however,
it could simply launch a second VM as appropriate, avoiding the second
process entirely.