Celluloid 0.0.3: a concurrent object framework for Ruby

luislavena · June 17, 2011, 10:22pm

Celluloid is a concurrent object framework for Ruby inspired by Erlang
and the Actor Model:

Celluloid provides thread-backed objects that run concurrently,
allowing the familiarity of plain old Ruby objects for the
most common use cases, but also the ability to call methods
asynchronously. Asynchronous method calls allow the receiver
to do things in the background while the caller carries on with its
business.

If you’re looking for a longer introduction, please check out this
post on my blog:

http://www.unlimitednovelty.com/2011/05/introducing-celluloid-concurrent-object.http://www.unlimitednovelty.com/2011/05/introducing-celluloid-concurrent-object.html
htmlhttp://www.unlimitednovelty.com/2011/05/introducing-celluloid-concurrent-object.html

Also view the screencast I did for EMRubyConf here:

murfhy · June 18, 2011, 12:25am

Tony A. [email protected] wrote:

Celluloid provides thread-backed objects that run concurrently,

Cool!

I assume this interacts transparently with existing apps that already
use threads?

For instance, Rainbows! already offers ThreadPool/ThreadSpawn options
for hosting Rack applications, can the Rack applications themselves use
Celluloid without any changes to Rainbows!?

Or would we have to add explicit support for Celluloid in Rainbows!
to support folks that want to write Rack apps using Celluloid?

Rainbows! of course also supports Revactor, Cool.io, EventMachine,
NeverBlock, XEpollThread*, etc… Adding explicit support for Celluloid
isn’t out of the question, just a matter of developer time.

murfhy · June 18, 2011, 12:28am

On Friday, June 17, 2011 03:21:48 PM Tony A. wrote:

to do things in the background while the caller carries on with its
business.

That’s pretty awesome. I was working on something like this, but ended
up
abandoning it after I had it deadlocking for awhile, and I couldn’t get
the
semantics right.

Speaking of semantics, my biggest problems were:

Is there a way to handle exceptions in a Ruby-esque way?

It looks like I have to explicitly trap actor exceptions. But this is a
place
I have to be aware that this is an actor and not just a Ruby object.
Your
parallel map is a perfect example of what I’d actually want here: If an
exception is raised, re-raise that exception when I try to call methods
on the
actor, rather than DeadActorException. Is there a reason not to do that?

How do you handle cycles?

An actor can only process one method at a time, which makes sense. One
thing I
wanted to do was give two actors references to each other, so they can
send
messages back and forth. Futures seem like a good solution to avoid a
lot of
annoying asynchronous callbacks. Two problems:

class Foo
include Celluloid::Actor
attr_reader :bar, :value
def initialize
@bar = Bar.spawn self
@value = 3
end

def first
bar.second!
bar.result * 2
end
end

class Bar
include Celluloid::Actor
attr_reader :foo
def initialize parent
@foo = parent
end

def second
@result = foo.value + 3
end

def result
@result
end
end

This doesn’t actually run – it seems to deadlock on ‘new’. But there
are two
other problems: First, ‘self’ wouldn’t be an actor reference, it’d be
the
object itself, right? But more importantly, what happens when I call
‘first’?
It looks like we deadlock again, but it seems reasonable that since Foo
is
waiting for Bar, that maybe Bar can now call another method on Foo,
acting as
though the two actors were just plain objects and we’re just building a
call-
stack.

That’s the part I could never get working.

Couple other annoyances:

Why spawn instead of new? It seems like if I’ve decided to make
something
an actor, it’s going to expect to be an actor most of the time – it’s
hard to
imagine a case where I want the original ‘new’ instead.
I really don’t like the registry – one flat namespace of actors? Ew.
But
I’m not really sure how to solve this – some sort of super-reference,
which
points to the currently-alive actor from a given supervisor? But then I
might
send something which asynchronously kills the actor, and I’ll get a
fresh
actor for the next line, which seems like a bad thing. There needs to be
some
clean semantics for “Give me a reference to the currently-alive version
of
this actor” which doesn’t rely on a global, flat registry.

And a thought: I just had every method return a future. If people wanted
something to run asynchronously, all they had to do is ignore the
future. The
downside is that this makes it hard to force things to be synchronous. I
actually thought of this as a good thing – if I make the call up at the
top
of a method, and don’t use the result till the bottom, that’s some
surprise
parallelism right there. The biggest problem is that if there’s an
exception,
you don’t know about it until the future is resolved.

murfhy · June 18, 2011, 12:43am

On Fri, Jun 17, 2011 at 4:20 PM, Eric W. [email protected]
wrote:

Tony A. [email protected] wrote:

Celluloid provides thread-backed objects that run concurrently,

Cool!

I assume this interacts transparently with existing apps that already
use threads?

Yep. Check out the screencast for Celluloid being used in conjunction
with
Sinatra.

For instance, Rainbows! already offers ThreadPool/ThreadSpawn options
for hosting Rack applications, can the Rack applications themselves use
Celluloid without any changes to Rainbows!?

Indeed, Celluloid should work just fine with ThreadPool/ThreadSpawn.

murfhy · June 18, 2011, 12:54am

On Fri, Jun 17, 2011 at 4:25 PM, David M. [email protected]
wrote:

Speaking of semantics, my biggest problems were:

Is there a way to handle exceptions in a Ruby-esque way?

It looks like I have to explicitly trap actor exceptions. But this is a
place
I have to be aware that this is an actor and not just a Ruby object.

When making synchronous calls, exceptions which occur in the context of
the
receiver are automatically reraised in the caller just like any other
Ruby
object, regardless of if you’re using any actor-specific features like
linking or trapping exits. It will also crash the receiver.

Your parallel map is a perfect example of what I’d actually want here:
If an

exception is raised, re-raise that exception when I try to call methods on
the
actor, rather than DeadActorException. Is there a reason not to do that?

I think reraising the original exception in the caller context gives the
caller appropriate context to bail out of whatever they’re doing and
avoid
making subsequent calls at all. Other threads may be trying to make
calls,
and if an exception entirely unrelated to the calls they’re making is
raised
because the actor is dead, I think that’d be rather confusing.

How do you handle cycles?

I don’t, but they can be detected if you don’t mind a bit of a
performance
penalty. For that I need to track chains of synchronous calls and detect
if
the receiver of a given method exists earlier in the call chain. If so,
Celluloid can raise an exception in the caller context indicating that a
deadlock would occur. This is a bit of a glaring deficiency right now.

First, ‘self’ wouldn’t be an actor reference, it’d be the object itself,

right?

Yes. I provide Celluloid.current_actor to use in lieu of self. This
feels a
bit ugly, but I don’t know of any way to redefine self (nor do I think
that’d be a particularly good idea either)

Couple other annoyances:

Why spawn instead of new? It seems like if I’ve decided to make
something
an actor, it’s going to expect to be an actor most of the time – it’s hard
to
imagine a case where I want the original ‘new’ instead.

This is a good point. I could easily redefine new to have the same
behavior
as spawn.

I don’t know of a better solution. This is the same approach Erlang
uses.
The only evolution it’s seen in recent history is systems like Ulf
Wiger’s
gproc.

That’s an interesting approach, but a bit different than the one I’m
shooting for in Celluloid, where I want concurrent objects to quack like
normal Ruby objects as much as possible.

murfhy · June 19, 2011, 12:40am

On Fri, Jun 17, 2011 at 6:25 PM, David M. [email protected]
wrote:

Still, I shouldn’t have to create an entire new actor, link it to your
actor,
and have it trap errors in order to find the actual exception I caused
which
lead to the actor’s death. Maybe it’s appropriate for bang methods to
return
some object which can be used to retrieve an exception?

If you want that sort of behavior, you can use the built-in
Celluloid::Future functionality. It does exactly what you describe,
calling
a block asynchronously, then letting you retrieve the exception (or
value)
later. If an exception was raised in the block given to the future
originally, it will be re-raised when the value is requested every
single
time.

If something goes wrong in an async call, you can either handle the
error
within that method directly, or rely on the supervisor to restart the
object
in a clean state. Really I think supervisors are going to be the de
facto
way to handle errors in asynchronous calls. I don’t think there’s a lot
of
good use cases for having callers handle errors in asynchronous calls
that
aren’t already covered by Celluloid::Future.

That’s what I was trying to do, except I wasn’t planning to deadlock. I
was

in
yours, if I pass ‘self’ around, we get the same result. Why should it be
different if I call a method on another actor which then calls a method
on
me?

Still, it’s tricky to come up with an efficient way to do this, and I never
managed to get anything to work, no matter how inefficient.

Hmmmmmmmmmmm!

I think the best approach would be to wrap the dispatching of incoming
calls
in a fiber. Whenever that fiber makes an outgoing call to another actor,
it
defers back to the central receive loop which processes the mailbox.
This
would let an actor continue processing incoming calls while waiting for
a
response to a call.

You’re actually the second person I’ve talked to who’s proposed this in
regard to handling circular call chains, the other person was Steven
Parkes
who created the Dramatis actor framework. At the time I had my head in
Reia/Erlang, where gen_server state is pure functional and immutable and
there would really be no way to implement this sort of approach. In a
language like Ruby, though, it’s possible, and would actually be quite
similar to what you could do with plain old Ruby objects.

So, there is a way, but you probably won’t like it…

You’re right, I don’t like that at all

Well, now you definitely have me thinking. If I do allow an actor to
process
multiple calls using fibers, I’ve definitely left the realm of what
could be
done in a language like Reia. That sort of approach relies directly on
concurrent objects having mutable hidden state.

While this approach couldn’t apply to Reia, I really like it’s
semantics,
and I think it solves the long-standing problem of circular calls. My
answer
to this question for the past two years has been “circular calls are an
error”, when really there should be a way to make them work.

Looking again, maybe the supervisor already does this?

supervisor = Sheen.supervise “Charlie Sheen”
charlie = supervisor.actor

This would solve both problems, right? (Assuming the supervisor is itself
threadsafe.) It could use some sugar, but I’m not entirely sure how.

The easiest way to add some sugar would be to have the supervisor create
a
thread safe proxy object that always refers to the latest version of a
given
actor. That way you could just use that object directly rather than
always
having to call supervisor.actor to get to it.

murfhy · June 18, 2011, 2:26am

On Friday, June 17, 2011 05:54:35 PM Tony A. wrote:

object, regardless of if you’re using any actor-specific features like
linking or trapping exits. It will also crash the receiver.

That makes sense.

But when making asynchronous calls:

and if an exception entirely unrelated to the calls they’re making is
raised because the actor is dead, I think that’d be rather confusing.

That makes a lot of sense.

Still, I shouldn’t have to create an entire new actor, link it to your
actor,
and have it trap errors in order to find the actual exception I caused
which
lead to the actor’s death. Maybe it’s appropriate for bang methods to
return
some object which can be used to retrieve an exception?

How do you handle cycles?

I don’t, but they can be detected if you don’t mind a bit of a performance
penalty. For that I need to track chains of synchronous calls and detect if
the receiver of a given method exists earlier in the call chain. If so,
Celluloid can raise an exception in the caller context indicating that a
deadlock would occur. This is a bit of a glaring deficiency right now.

That’s what I was trying to do, except I wasn’t planning to deadlock. I
was
planning to allow the call… somehow. Basically, if you had any sort of
pattern where two objects call methods on each other, it should work the
way
it does synchronously.

I think this makes sense, semantically. After all, if an actor calls a
method
on itself, we don’t get any sort of deadlock. If an actor calls a method
on
another object running in the same thread, which then calls a method on
the
actor, at least with my implementation, this also doesn’t deadlock –
and in
yours, if I pass ‘self’ around, we get the same result. Why should it be
different if I call a method on another actor which then calls a
method on
me?

Still, it’s tricky to come up with an efficient way to do this, and I
never
managed to get anything to work, no matter how inefficient.

First, ‘self’ wouldn’t be an actor reference, it’d be the object itself,

right?

Yes. I provide Celluloid.current_actor to use in lieu of self. This feels a
bit ugly, but I don’t know of any way to redefine self (nor do I think
that’d be a particularly good idea either)

So, there is a way, but you probably won’t like it…

One experiment I did here was:

Grab all methods, stuff them in a hash, and undef them.
When a method is called, intercept it like a proxy, and do whatever I
need
to do to get it to the right thread.
To actually call the method, grab the method object, bind it to self,
and
apply.

It’s not really redefining self, but it accomplishes what’s needed here.

However, I suspect it breaks all kinds of inheritance, unless I also
absorb
that kind of functionality – that is, whenever something inherits from
this
class, give it a clone of the hash to start with.

One advantage to this approach is that I could very easily allow some
methods
to require the actor thread, and some methods to run in the calling
thread –
by default, they run in the actor thread. The obvious application is
when a
method really doesn’t need to involve the actor:

class Sheen
include Suit

# define a new threadsafe method
threadsafe :status do
  :winning
end

end

But maybe you want to anyway:

class Sheen
include Suit

attr_reader :status, :sober
def initialize
  @status = :winning
  @sober = true
end
def fall_off_wagon!
  @status = :WINNING
  @sober = false
end
def is_off_wagon?
  !sober && status == :WINNING
end

threadsafe :hello do
  if is_off_wagon?
    puts 'WINNING!!!'
  else
    puts 'Hi.'
  end
end

end

It makes sense that fall_off_wagon! and is_off_wagon? should run on the
actor
thread. It makes sense that the ‘hello’ method doesn’t really need to
run on
the actor thread, and maybe it’s a performance improvement that the
Sheen
thread doesn’t actually have to talk, or ever wait for output, etc. I’m
really
reaching here, because I don’t actually have a real application for
this, but
I don’t think it’s entirely unreasonable – kind of like the Java
‘synchronized’ keyword, except message-passing behavior is the default.

But notice that the ‘threadsafe’ call doesn’t have to call ‘self’ at
all. In
fact, that syntax is actually syntactic sugar for:

def hello
…
end
threadsafe :hello

I’m still just writing normal methods, but every method call, whether
it’s to
‘self’ or not, is still going through the same logic to determine
whether or
not it needs to run on the Sheen thread.

I was much more interested in getting the semantics right, to show that
it can
be done, rather than making it performant and immediately useful. Like
you, I
wanted to use this to sort of prototype those semantics, with the hope
that
they would get into something like Reia eventually. (I started this
before I
heard of Reia, and probably before Reia was in any way practical, so I
wasn’t
deliberately reinventing the wheel.)

I don’t know of a better solution. This is the same approach Erlang uses.
The only evolution it’s seen in recent history is systems like Ulf Wiger’s
gproc.

Looking again, maybe the supervisor already does this?

supervisor = Sheen.supervise “Charlie Sheen”
charlie = supervisor.actor

This would solve both problems, right? (Assuming the supervisor is
itself
threadsafe.) It could use some sugar, but I’m not entirely sure how.

That’s an interesting approach, but a bit different than the one I’m
shooting for in Celluloid, where I want concurrent objects to quack like
normal Ruby objects as much as possible.

And this does quack like a normal Ruby object, unless something goes
wrong and
an exception is raised. But I was never quite satisfied with how
exceptions
were dealt with. For one thing, it’s not OK that someone might ignore a
future
and never see the exception.

murfhy · June 20, 2011, 6:31am

On Sat, Jun 18, 2011 at 4:39 PM, Tony A.
[email protected]wrote:

Parkes who created the Dramatis actor framework. At the time I had my head
in Reia/Erlang, where gen_server state is pure functional and immutable and
there would really be no way to implement this sort of approach. In a
language like Ruby, though, it’s possible, and would actually be quite
similar to what you could do with plain old Ruby objects.

If you check HEAD on Github, Celluloid now supports circular call graphs
by using fibers to dispatch methods:

murfhy · June 19, 2011, 7:58pm

On Saturday, June 18, 2011 05:40:09 PM Tony A. wrote:

On Fri, Jun 17, 2011 at 6:25 PM, David M. [email protected] wrote:

Still, I shouldn’t have to create an entire new actor, link it to your
actor,
and have it trap errors in order to find the actual exception I caused
which
lead to the actor’s death. Maybe it’s appropriate for bang methods to
return
some object which can be used to retrieve an exception?
[…]
I don’t think there’s a
lot of good use cases for having callers handle errors in asynchronous
calls that aren’t already covered by Celluloid::Future.

Maybe not, other than that Future applies to a block, where I want the
result
of a method call. Maybe it’s not a good use case, but this still seems
cool:

actors.map(&:some_calculation).reduce{|a,b| …}

I guess the bigger annoyance, though I didn’t really have a good
solution, is
that adopting bang to mean “asynchronous” means that these don’t quite
quack
like Ruby objects anymore – they can’t have bang methods of their own
that
mean something, and every method gets a bang whether it makes sense or
not.

You’re actually the second person I’ve talked to who’s proposed this in
regard to handling circular call chains, the other person was Steven P.
who created the Dramatis actor framework. At the time I had my head in
Reia/Erlang, where gen_server state is pure functional and immutable and
there would really be no way to implement this sort of approach. In a
language like Ruby, though, it’s possible, and would actually be quite
similar to what you could do with plain old Ruby objects.

So, it’s been awhile since I looked at Erlang, but I don’t actually see
an
obstacle to this in Erlang itself or in the VM. Maybe in gen_server.

But there’s really nothing preventing me from creating the effect of
mutable
state in a generic Erlang process, right?

So, there is a way, but you probably won’t like it…

You’re right, I don’t like that at all

I don’t like it either, and I avoided it as much as I could. One thing I
thought of was trying to filter the reference any way that it would get
out of
the object, since I was already wrapping things in futures and the like
anyway. The problem is, there’s no guarantee that a bare ‘self’ will
cross any
filter I set up. I mean, it’d be almost trivial to catch this:

def get_self
self
end

But what if they stuff it deep in some data structure? What if it’s in a
call
to some other object?

The other option was to make the blankslate-like proxy class a child
class of
the original, so calling method ‘foo’ would look like:

original.instance_method(:foo).bind(self).call(*args, &block)

That’s a minor win in that it might be somewhat more tolerant of the
parent
classes being redefined. But it’s not much of a win, because I have to
watch
the parent classes anyway to remove methods from the child – in fact,
the
only sane way I could find to do that was to watch every single method
created. So this doesn’t really buy me much.

A way to make that significantly better would be to bind those methods
to a
BasicObject proxy instead, but you can’t do that, because binding
methods is
one case where Ruby is not duck-typed at all – you can only bind a
method
to an object which is actually an instance of that class, or something
which
inherits from or includes it.

In the end, while the approach I went with is pretty ridiculous, I still
like
it for the simple reason that if I forget to call
Celluloid.current_actor
instead of self, I’ve completely broken the concurrency model by doing
the
normal Ruby thing. With my approach, aside from the fact that my attempt
at
cycles currently deadlocks, I can still more or less pretend that an
actor is
a normal object.

supervisor = Sheen.supervise “Charlie Sheen”
charlie = supervisor.actor

This would solve both problems, right? (Assuming the supervisor is itself
threadsafe.) It could use some sugar, but I’m not entirely sure how.

The easiest way to add some sugar would be to have the supervisor create a
thread safe proxy object that always refers to the latest version of a
given actor. That way you could just use that object directly rather than
always having to call supervisor.actor to get to it.

Except in this case, I’m thinking of the call to ‘supervisor.actor’ as
being
something like starting a transaction. That is, let’s say someone
actually
convinces (or court-orders) Sheen to go to rehab before we let him back
on the
road. So we might have a series of calls like:

charlie.rehab!
charlie.give license

But maybe the withdrawal kills him. If ‘charlie’ is a thread-safe proxy
which
always refers to the latest version, we end up with a situation where
rehab
kills him, we get a new version who hasn’t been to rehab, and we give
the new
version a license. This is clearly an error, and worse, it’s almost
silent.

By contrast, if we force people to call something like supervisor.actor
to
start something like this, we end up with the best of both worlds –
we’re
guaranteed he’s alive before we send him to rehab, and we either fail
(because
we have a dead actor) or ensure that he’s actually recovered before we
give
him the license.

Or, in other words, any time we’re sending more than one message to an
actor
and depending on those messages being processed in order, we need to
know, at
a minimum, that we’re talking to the same actor. On the other hand, in
a
situation like this, we also have to think about what other calls might
happen
in between – for example:

charlie.rehab! if charlie.out_of_control?

There’s potentially a race condition between receiving the
out_of_control?
value and sending him to rehab. Still, if someone else kills charlie,
he’s
just as dead and I still don’t want to give the new version a license
until he
goes through rehab again.

Also: There has got to be a better metaphor.

murfhy · June 20, 2011, 6:32am

On Sun, Jun 19, 2011 at 10:30 PM, Tony A.
[email protected]wrote:

If you check HEAD on Github, Celluloid now supports circular call graphs
by using fibers to dispatch methods

And to clarify this a little bit, where before A → B → A synchronous
call
chains would deadlock your program, now it works!

This brings Celluloid actors one step closer to working as close as
possible
to sequential Ruby objects.

murfhy · June 28, 2011, 12:27am

2011/6/17 Tony A. [email protected]:

Celluloid provides thread-backed objects that run concurrently,
allowing the familiarity of plain old Ruby objects for the
most common use cases, but also the ability to call methods
asynchronously. Asynchronous method calls allow the receiver
to do things in the background while the caller carries on with its
business.

Hi, I’ve taken a look to the code. Does it create a new Thread for
each new call/request?

murfhy · June 28, 2011, 1:33am

On Mon, Jun 27, 2011 at 4:23 PM, Iaki Baz C. [email protected]
wrote:

Hi, I’ve taken a look to the code. Does it create a new Thread for
each new call/request?

It creates a new fiber for each request, regardless of whether the
request
is a traditional call (i.e. request/response) or an asynchronous call.
The
latter was actually broken in the currently released version (0.1.0) but
is
fixed on github.

JRuby implements fibers as native threads, so on JRuby it will create a
new
thread per request. 1.9.2/YARV and Rubinius both implement Fibers in
userspace using various stack manipulation tricks (setjmp/longjmp on
YARV,
ucontexts on Rubinius)

The performance isn’t great but it isn’t terrible. Some initial
performance
numbers on my quad i7 Mac:

1.9.2: 18000 calls/s (475x slower)
JRuby 1.6.2: 5700 calls/s (1400x slower)
rbx-2.0.0pre: 8100 calls/s (1200x slower)

These show how many synchronous calls can be made to one of Celluloid’s
concurrent actors in one second. The “Nx slower” indicates how much
slower a
call to a Celluloid actor is as compared to a normal method call.