Continuations across fork?

I wrote some code that deliberately uses continuations that cross a
fork boundary. When I went to test it, I was surprised by the error:
“continuation called across trap”. I can think of a few reasons why
people might not want to use continuations across fork, but not
knowing Ruby internals particularly well, I couldn’t see a reason why
it should be absolutely prohibited.

Curious person that I am, I removed the test at the beginning of
rb_cont_call in eval.c, and my program ran like it was supposed to.
If I were more familiar with Ruby’s internals, I’d be asking my
question on ruby-core, but I figure that if there’s a super obvious
reason why continuations aren’t allowed across fork, someone here
would be able to point it out.

My reason for experimenting with this isn’t purely theoretical.
Ruby’s IO system (at least under Linux) uses fd_set, which limits the
number of open file descriptors to 1024. Using send_io and recv_io, I
can trivially write a forking server that uses a pool of children to
collectively have more than 1024 connections open at once. The parent
does the accept, then sends the fd down to a child. However, I want
my connections to be DRb connections on top of certificate-verifying
SSL, and I want my certificate password protected.

When I’ve done this before in C, I’ve used
SSL_CTX_set_default_passwd_cb.
That allows me to read in the password once, and then distribute that
password multiple times before destroying it. Ruby’s OpenSSL
implementation doesn’t use or expose SSL_CTX_set_default_passwd_cb, so
I had the idea of “chopping” DRb’s start_service so that the password
reading
would be done once in the parent. By using continuations, I could then
fork my children and have each of them finish start_service, with the
certificate already unlocked.

I realize there are several other things I could do other than the
above. However, I coded up my solution without realizing that you
can’t call a continuation across fork. Now I’m curious why it’s
disallowed.


Cliff M. [email protected]

P.S., if anyone wants to see my proof-of-concept server, I don’t mind
sharing the code. The trick is to redefine
DRb::DRbTCPSocket.open_server_inaddr_any to save its context before
going to the context to do the forks. However, since this is DRb
implementation specific and since continuations across forks aren’t
actually allowed, my code is more of a curiosity to be poked with ten
foot pole.

On Fri, Sep 29, 2006 at 02:10:52AM +0900, Clifford T. Matthews wrote:

I wrote some code that deliberately uses continuations that cross a
fork boundary. When I went to test it, I was surprised by the error:
“continuation called across trap”. I can think of a few reasons why
people might not want to use continuations across fork, but not
knowing Ruby internals particularly well, I couldn’t see a reason why
it should be absolutely prohibited.

Probably because you’re attemptin to jumP to part of a completely
different
process. Program counters can’t jump between processes like that.

You’ll have to do what’s usually done: set up an IPC channel of some
kind
between the parent and child processes (be it a pipe, socket, a mmap’d
file
object, SysV shared memory and semaphores, whatever) and get them to
talk
to one another.

I realize there are several other things I could do other than the
above. However, I coded up my solution without realizing that you
can’t call a continuation across fork. Now I’m curious why it’s
disallowed.

K.

“Keith” == Keith G. [email protected] writes:

Keith> On Fri, Sep 29, 2006 at 02:10:52AM +0900, Clifford
Keith> T. Matthews wrote:
>> I wrote some code that deliberately uses continuations that
>> cross a fork boundary.  When I went to test it, I was surprised
>> by the error: "continuation called across trap".  I can think
>> of a few reasons why people might not want to use continuations
>> across fork, but not knowing Ruby internals particularly well,
>> I couldn't see a reason why it should be absolutely prohibited.

Keith> Probably because you're attemptin to jumP to part of a
Keith> completely different process. Program counters can't jump
Keith> between processes like that.

Thanks for the reply.

I can’t speak for the other operating systems, but I’m fairly familiar
with UNIX and Linux:

FORK(2) Linux Programmer¢s Manual
FORK(2)

fork() creates a child process that differs from the parent
process
only in its PID and PPID, and in the fact that resource
utilizations
are set to 0. File locks and pending signals are not inherited.

Of course that’s the fork system call and not the Ruby fork
implementation. However, in my use of continuations across fork,
there was no problem with the PC. My code ran fine, did exactly what
I expected it to do with the results I wanted. All I had to do was
disable the check in eval.c:

if (th->thgroup != cont_protect) {
rb_raise(rb_eRuntimeError, "continuation called across trap");
}

In general, I’m relatively aware of memory layout, assembler,
interpreters, etc. In a previous life I did some non-trivial work in
emulation and reverse-engineering. That doesn’t preclude me from
overlooking the obvious, so if you still think I’m mistaken and can
elaborate on “Program counters can’t jump between processes like
that,” I’ll listen.

“Gary” == gwtmp [email protected] writes:

Gary> On Sep 28, 2006, at 4:49 PM, Keith G. wrote:

>> On Fri, Sep 29, 2006 at 02:10:52AM +0900, Clifford T. Matthews
>> wrote: Probably because you're attemptin to jumP to part of a
>> completely different process. Program counters can't jump
>> between processes like that.

Gary> I don't think he is trying to initiate a continuation in the
Gary> child from the parent or vice versa (right Clifford?). If
Gary> you create a continuation before the fork, and then fork,
Gary> both the parent and the child process will have duplicate
Gary> copies of the continuation.  You've cloned the process and
Gary> everything inside it including the continuations.

Right.

Gary> Having the child resume its continuation or the parent
Gary> resume its continuation should be ok. I'm surprised that it
Gary> didn't work.  It is either an oversight or something about
Gary> the way Ruby implements fork (exceptions or IO?), but I
Gary> don't think it is an inherent problem in the concept.

OK, since nobody has pointed out something obvious, I’ll ask on
ruby-core tomorrow. The portion of ruby that disallows the call is
the second if statement (from eval.c):

static VALUE
rb_cont_call(argc, argv, cont)
int argc;
VALUE *argv;
VALUE cont;
{
rb_thread_t th = rb_thread_check(cont);

if (th->thread != curr_thread->thread) {
rb_raise(rb_eRuntimeError, "continuation called across threads");
}
if (th->thgroup != cont_protect) {
rb_raise(rb_eRuntimeError, "continuation called across trap");
}

Presumably there’s a reason why that second check is there, but I’m
not sufficiently versed in Ruby internals to know why. It’s quite
possible that a different check could be used that would allow my use
of continuations but still disallow whatever bad thing the current
check is designed to catch.

On Sep 28, 2006, at 7:44 PM, Clifford T. Matthews wrote:

Presumably there’s a reason why that second check is there, but I’m
not sufficiently versed in Ruby internals to know why. It’s quite
possible that a different check could be used that would allow my use
of continuations but still disallow whatever bad thing the current
check is designed to catch.

I also am not familiar with Ruby internals but now that I think about it
a little bit, it is probably there because of how signal handlers are
managed when transferring control between continuations and/or when
preparing to fork.

Gary W.

On Sep 28, 2006, at 4:49 PM, Keith G. wrote:

On Fri, Sep 29, 2006 at 02:10:52AM +0900, Clifford T. Matthews wrote:
Probably because you’re attemptin to jumP to part of a completely
different
process. Program counters can’t jump between processes like that.

I don’t think he is trying to initiate a continuation in the child
from the parent
or vice versa (right Clifford?). If you create a continuation before
the fork, and then fork,
both the parent and the child process will have duplicate copies of
the continuation.
You’ve cloned the process and everything inside it including the
continuations.

Having the child resume its continuation or the parent resume its
continuation
should be ok. I’m surprised that it didn’t work. It is either an
oversight or something
about the way Ruby implements fork (exceptions or IO?), but I don’t
think it is an inherent problem in
the concept.

Gary W.

“K.” == Keith G. [email protected] writes:

K.> Ok, after reading the rest of your reply, I think I may have
K.> misunderstood you. Could you give us a minimal fragment of
K.> code to trigger this? It could be an OS-specific problem.

This isn’t necessarily the minimum, and it’s obviously not anything
useful and can be rewritten, but this will cause the error:

#! /bin/env ruby
$VERBOSE = true

i = 0

class X
def self.c
@@c
end
def self.c=(val)
@@c = val
end
end

callcc do |z|
d = z
callcc do |cont|
X.c = cont
end
d.call if i < 2
exit
end

i += 1
fork do
puts “child”
X.c.call
end
puts “parent”

The entire toy server that actually uses my technique is about 100
lines, with comments. I don’t mind posting it.

On Fri, Sep 29, 2006 at 06:10:58AM +0900, Clifford T. Matthews wrote:

Keith> Probably because you're attemptin to jumP to part of a
Keith> completely different process. Program counters can't jump
Keith> between processes like that.

Thanks for the reply.

Ok, after reading the rest of your reply, I think I may have
misunderstood
you. Could you give us a minimal fragment of code to trigger this? It
could
be an OS-specific problem.

K.

unknown wrote:

Curious person that I am, I removed the test at the beginning of
rb_cont_call in eval.c, and my program ran like it was supposed to.
If I were more familiar with Ruby’s internals, I’d be asking my
question on ruby-core, but I figure that if there’s a super obvious
reason why continuations aren’t allowed across fork, someone here
would be able to point it out.

If you felt inclined enough to hack the ruby source to test your
assumptions, then I’m sure you qualify for asking questions on
ruby-core! :slight_smile:

Nic

Clifford T. Matthews wrote:

I wrote some code that deliberately uses continuations that cross a
fork boundary. When I went to test it, I was surprised by the error:
“continuation called across trap”. I can think of a few reasons why
people might not want to use continuations across fork, but not
knowing Ruby internals particularly well, I couldn’t see a reason why
it should be absolutely prohibited.

I’ve been playing with this (not recently though, one or two years
ago).
I’ve also experienced the “continuation called across trap” phenomenon.
However, I also made a try with the 1.9 branch… and cross-fork
continuations were working there! Hopefully it’s still working, I
suggest
you to make a try with that.

Csaba

“Csaba” == csaba [email protected] writes:

Csaba> Clifford T. Matthews wrote:
>> I wrote some code that deliberately uses continuations that
>> cross a fork boundary.  When I went to test it, I was surprised
>> by the error: "continuation called across trap".  I can think
>> of a few reasons why people might not want to use continuations
>> across fork, but not knowing Ruby internals particularly well,
>> I couldn't see a reason why it should be absolutely prohibited.

Csaba> I've been playing with this (not recently though, one or
Csaba> two years ago).  I've also experienced the "continuation
Csaba> called across trap" phenomenon.  However, I also made a try
Csaba> with the 1.9 branch... and cross-fork continuations were
Csaba> working there! Hopefully it's still working, I suggest you
Csaba> to make a try with that.

ruby 1.9.0 (2006-10-02) from CVS HEAD gives the same “continuation
called across trap” RuntimeError.

Friday afternoon I did a little more research and then posted to
ruby-core, but, not surprisingly for such a low-priority esoteric
matter, so far nobody has replied over there.

I think the error is a side-effect of code that was put in place to
prevent truly illegal uses of continuations. E.g., before the checks
were in place, it was possible to define a continuation within a
signal handler and then call it from outside the signal handler.