Concurrent Ruby?

kylemathews · July 29, 2008, 6:17am

Apologies if this is a really stupid question, I am new to programming,
but after reading about Erlang and it’s speed increase on multi-core
devices I had to ask.

With Matz supposedly making Ruby 2.0 right now, is it possible to make
it concurrent like Erlang so as to take advantage of the future
multi-core devices? Thank you.

kylemathews · July 29, 2008, 1:02pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Jul 29, 2008, at 8:55 AM, David M. wrote:

a long mail

Nice writeup. You forgot one thing about Erlang, though: It is
(mostly) sideeffect-free while
object orientated languages always rely on sideeffects.
This makes it harder when it comes to concurrency.

Regards,
Skade
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iEYEARECAAYFAkiO+JAACgkQJA/zY0IIRZYWAQCgjyeagX/cPnHcYZWqgJq4BQSM
HjcAoKAhINdMzbO6tGzjnNoX37J6Oqu9
=P443
-----END PGP SIGNATURE-----

kylemathews · July 29, 2008, 5:04pm

2008/7/29 Florian G. [email protected]:

On Jul 29, 2008, at 8:55 AM, David M. wrote:

a long mail

Nice writeup.

Absolutely agree. Thanks David!

You forgot one thing about Erlang, though: It is (mostly)
sideeffect-free while

Well, he said that data does not change which is basically the same.

object orientated languages always rely on sideeffects.

I’d rather say “usually” because immutable classes are quite common.

This makes it harder when it comes to concurrency.

Obviously.

Kind regards

robert

kylemathews · July 29, 2008, 6:01pm

On Tuesday 29 July 2008 06:01:10 Florian G. wrote:

On Jul 29, 2008, at 8:55 AM, David M. wrote:

a long mail

Nice writeup.

Thanks!

You forgot one thing about Erlang, though: It is
(mostly) sideeffect-free while
object orientated languages always rely on sideeffects.

If I understand it right, side effects in Erlang simply take a different
form.
Nothing’s stopping me from sending random, spurious messages in the
middle of
a supposedly-innocuous function.

I did talk about data not being mutable, which provides both a semantic
(lock-free) and a technical advantage (raw speed).

I’m trying to figure out how to at least partly duplicate the semantic
advantage in Ruby, but it’s not easy – I’m stuck either #freeze-ing
everything, or wrapping every message in an actor of its own, and both
approaches seem more obnoxious and error-prone than forcing the
developer to
deal with it.

kylemathews · July 29, 2008, 8:56am

On Monday 28 July 2008 23:17:22 Kyle M. wrote:

With Matz supposedly making Ruby 2.0 right now, is it possible to make
it concurrent like Erlang

Not like Erlang, no.

Erlang does a couple of things differently. The most obvious one, which
makes
it so scalable, is the message-passing – Erlang uses “processes” and
message-passing almost as a programming paradigm. We talk
about “Object-Oriented Programming”; Erlang people talk
about “Concurrency-Oriented Programming”.

These are much easier to write and scale than threads, and they perform
much
better than single threads.

There are a few of us working to rectify this situation, at least
semantically – there’s Revactor, Dramatis, and my own unreleased
project
which I’ve been wasting a few weekend hours on.

Another reason, which I’m running into while working on the above
project, is
that Erlang has no mutable data. It even goes so far as to make
variables
single-assignment, which is just annoying, but the data structures
themselves
are never changed. Take a simple (contrived) Ruby example:

def some_function(options={})
options[:foo] ||= ‘Foo’
options[:bar] ||= ‘Bar’
options[:foobar] ||= options[:foo] + options[:bar]

some_file.each_line do |line|
line.chomp!
line.gsub! /curses/i, ‘******’
puts line
end
end

See, we’re changing things. Arrays, strings, whatever – it’s actually
the
characters inside the string that are changing.

In Erlang, (almost) no data ever changes, you just create new data.
Which
means that when you send a message to another process, it’s as simple as
sending a pointer across – which means it’s not only a constant-time
operation, it’s an absurdly cheap constant-time operation. So the data
is
shared, but because it never changes, you don’t have to lock it.

Which means that in Erlang, message-passing is so cheap we don’t have to
worry
about it. If we ported the message-passing to Ruby, it’s either
unreliable or
it’s massively expensive and still somewhat unreliable. I’m not sure
there’s
a good way around this, though if there is, I intend to find it.

so as to take advantage of the future
multi-core devices? Thank you.

This might happen – maybe, sort of. Keeping all of the above in mind,
threading in Ruby is modeled after the traditional C and Java model,
which
means they’re probably more expensive to create, and certainly more
dangerous, which means there won’t be as many of them.

On top of all that…

Right now, Ruby shares a problem with Python called the GIL – the
Global (or
Giant) Interpreter Lock. What this means is that only one Ruby
instruction
may execute at a time. So even though they’re using separate OS threads,
and
even though different Ruby threads might run on different cores, the
speed of
your program (at least the Ruby part) is limited to the speed of a
single
core.

The standard response, which you’ll probably already see (since I’m
taking the
time to write a longer answer), is that you can do threading in two
ways:
Either fork off a whole new Ruby process, so you probably can’t have any
shared-memory problems – and/or write the expensive parts in C, and
have
your C extension release the Ruby GIL.

(See, you can have more than one bit of C code running in a Ruby program
at
once, even alongside all the Ruby stuff – at least until they need to
do
something with Ruby itself.)

There’s also JRuby, which uses Java’s native threads, and has no GIL.
There
have been some problems with them lately, but they should work – but
again,
keep all of the above in mind. You’ll be threading as well as Java does,
not
as well as Erlang does.

As you can probably tell, I’m not really happy about all of this.

Now, unlike Python, it looks as though the Ruby GIL might eventually be
removed. And there is JRuby. And there’s the various actor projects
(mine
included). So it’s conceivable that we’d get Ruby scalable to arbitrary
numbers of processors.

But again, I suspect Erlang is still going to do it better, if all you
care
about is multicore and efficiency. (Ruby is doing a better job of
Unicode,
has much more library support, and I much prefer its syntax.)

kylemathews · July 29, 2008, 7:15pm

On Jul 29, 2008, at 10:00 AM, David M. wrote:

I’m trying to figure out how to at least partly duplicate the semantic
advantage in Ruby, but it’s not easy – I’m stuck either #freeze-ing
everything, or wrapping every message in an actor of its own, and both
approaches seem more obnoxious and error-prone than forcing the
developer to
deal with it.

fan out multiple processes with a message queue each - easy to do with
drb. naive impl:

cfp:~> cat a.rb
b got “hello” (pid=94677)
a got “hello” (pid=94676)

cfp:~> cat a.rb

a =
actor {
recv_msg { |msg|
puts “a got #{ msg.inspect } (pid=#{ Process.pid })”
}
}

b =
actor {
recv_msg { |msg|
puts “b got #{ msg.inspect } (pid=#{ Process.pid })”
a.send_msg msg
}
}

b.send_msg ‘hello’

STDIN.gets

BEGIN {

require ‘rubygems’
require ‘thread’
require ‘drb’
require ‘slave’

class Actor
include ::DRb::DRbUndumped

 def initialize &block
   @q = Queue.new
   @block = block
   act!
 end

 def act!
   @thread = Thread.new do
     Thread.current.abort_on_exception = true
     instance_eval &@block
   end
 end

 def send_msg message
   @q.push message
 end

 def recv_msg
   while(( message = @q.pop ))
     yield message
   end
 end

end

def actor(*a, &b)
Slave.new{ Actor.new(*a, &b) }.object
end

STDOUT.sync = true

}

a @ http://codeforpeople.com/

kylemathews · July 29, 2008, 6:58pm

David M. wrote:

There’s also JRuby, which uses Java’s native threads, and has no GIL. There
have been some problems with them lately, but they should work – but again,
keep all of the above in mind. You’ll be threading as well as Java does, not
as well as Erlang does.

I’m not sure what you mean by problems…there have not been problems
with them lately; they work as you’d expect native threads to work. They
do require a bit more diligence on your part if you’re sharing data
across the threads, since for performance reasons we don’t do any extra
synchronization of e.g. Array, Hash, String. But native threads work
fine on JRuby.

Charlie

kylemathews · July 30, 2008, 6:33am

On Tuesday 29 July 2008 12:13:53 ara.t.howard wrote:

fan out multiple processes with a message queue each - easy to do with
drb.

That implies a full copy (I think), which isn’t always what’s needed.

Without actually testing your implementation, what happens when I send,
say, a
reference to an actor? (Kind of an essential feature.)

And without actually doing any benchmarks (how’s that for naive?), I
still
find it hard to believe that DRb+Queue would scale better than
Thread+Queue,
for large numbers of actors. (Keep in mind, it’s not unusual for an
Erlang
program to have thousands of processes.)

Given that I still have a vague hope that YARV will eventually remove
the GIL,
I’d rather stick to Threads, if I can make them safe.

kylemathews · July 30, 2008, 6:36am

On Tuesday 29 July 2008 11:56:43 Charles Oliver N. wrote:

David M. wrote:

There’s also JRuby, which uses Java’s native threads, and has no GIL.
There

have been some problems with them lately, but they should work – but
again,

keep all of the above in mind. You’ll be threading as well as Java does,
not

as well as Erlang does.

I’m not sure what you mean by problems…there have not been problems
with them lately;

Maybe it wasn’t actually “lately”.

And there’s still the rest of it:

They
do require a bit more diligence on your part if you’re sharing data
across the threads,

That’s the whole problem that I’m attacking right now – while a pure
actor
model wouldn’t share any data, I’m not even sure I can safely clone
everything properly, if I was going that route. And I’d rather not, for
obvious performance reasons.

kylemathews · July 30, 2008, 7:35am

On Jul 29, 2008, at 10:33 PM, David M. wrote:

That implies a full copy (I think), which isn’t always what’s needed.

Without actually testing your implementation, what happens when I
send, say, a
reference to an actor? (Kind of an essential feature.)

DRb handles references. DRbUndumped provides a means to pass
references to remote objects around.

And without actually doing any benchmarks (how’s that for naive?), I
still
find it hard to believe that DRb+Queue would scale better than Thread
+Queue,
for large numbers of actors. (Keep in mind, it’s not unusual for an
Erlang
program to have thousands of processes.)

no doubt that’s true. processes can help you now though - especially
since threads don’t scale right now in ruby with multi processor
machines.

Given that I still have a vague hope that YARV will eventually
remove the GIL,
I’d rather stick to Threads, if I can make them safe.

sure, but if you want to burn up processors you simply have to use
processes attm.

you might find this interesting

http://groups.google.com/group/ruby-talk-google/browse_thread/thread/b4e346478eeeead4/0cbc4a86f2237476?lnk=gst&q=threadify+jruby#0cbc4a86f2237476

a @ http://codeforpeople.com/

kylemathews · July 30, 2008, 8:03am

On Wednesday 30 July 2008 00:33:53 ara.t.howard wrote:

references to remote objects around.
Alright. What if I send a complex datastructure? Strings, I can live
with, but
what about multidimensional arrays?

machines.
I believe work is going on to make Threads scale in 1.9 – current 1.9
still
has a GIL, though.

They do scale in JRuby, and probably in IronRuby (haven’t tried).

Given that I still have a vague hope that YARV will eventually
remove the GIL,
I’d rather stick to Threads, if I can make them safe.

sure, but if you want to burn up processors you simply have to use
processes attm.

Or I could use JRuby. Or IronRuby.

I don’t want to burn up processors atm. I want to build an architecture
which
will be able to burn up processors in the future. I want to solve
concurency
on a single machine once and be done with it – without having to use
Erlang.

http://groups.google.com/group/ruby-talk-google/browse_thread/thread/b4e346478eeeead4/0cbc4a86f2237476?lnk=gst&q=threadify+jruby#0cbc4a86f2237476

From that link:

“the sync overhead is prohibitive
for in memory stuff”

I am, specifically, interested in doing in-memory stuff. If I can solve
that
problem, I’m not as worried about the network stuff, especially as
others
have already solved that well enough (DRb and friends).

kylemathews · July 30, 2008, 9:45am

David M. wrote:

On Tuesday 29 July 2008 11:56:43 Charles Oliver N. wrote:

They
do require a bit more diligence on your part if you’re sharing data
across the threads,

That’s the whole problem that I’m attacking right now – while a pure actor
model wouldn’t share any data, I’m not even sure I can safely clone
everything properly, if I was going that route. And I’d rather not, for
obvious performance reasons.

Well if there are specific threading issues, we’d like to solve them.
And at this very moment we’re debating and working on ways to make the
core collection types (String, Array, Hash) at least not dump a stack
trace when they’re used unsafely. So I think there’s little reason why
you couldn’t implement a decent Actor framework on top of JRuby.

Also, we recently added Rubinius’s MVM API atop our existing MVM
support, so that’s another route you can go and really isolate
instances. But of course, they eat up more memory that way.

Charlie

kylemathews · July 30, 2008, 7:43pm

On Mon, Jul 28, 2008 at 10:17 PM, Kyle M. [email protected]
wrote:

Apologies if this is a really stupid question, I am new to programming,
but after reading about Erlang and it’s speed increase on multi-core
devices I had to ask.

With Matz supposedly making Ruby 2.0 right now, is it possible to make
it concurrent like Erlang so as to take advantage of the future
multi-core devices? Thank you.

Rubinius is able to spawn a VM per CPU core, and allow quasi-Erlang
style
concurrency using Actor objects which can communicate across inter-VM
message buses.

It’s not as elegant as Erlang’s SMP scheduler (something like that
really
isn’t possible without a shared-nothing process architecture), but it
more
or less provides the same approach Erlang uses for distributed systems
(i.e.
each CPU is a “node”)

kylemathews · July 30, 2008, 9:23pm

Tony A. wrote:

Rubinius is able to spawn a VM per CPU core, and allow quasi-Erlang style
concurrency using Actor objects which can communicate across inter-VM
message buses.

It’s not as elegant as Erlang’s SMP scheduler (something like that really
isn’t possible without a shared-nothing process architecture), but it more
or less provides the same approach Erlang uses for distributed systems (i.e.
each CPU is a “node”)

It’s worth mentioning JRuby also supports the MVM API, and sub-VMs share
nothing with their parents save them message queue. Sub-VMs also are
launched in their own native thread (though of course JRuby has native
threads within a given VM as well). It wouldn’t be much of a leap to
implement the Actor model as well.

Charlie