Basic threading question: can ruby use real threads?

kyle_s · May 11, 2007, 11:54pm

On Sat, 12 May 2007 03:46:58 +0900, Joel VanderWerf
[email protected] wrote:

This is disturbing.

Is #timeout inherently unsafe, if it is implemented as a thread, even in
MRI ruby’s green threads?

Correct. #timeout as presently implemented is NEVER safe to use.

What if there were two kind of ensure clauses, one which is
uninterruptible (to be used only for cleanup that is deterministic) and
one which is interruptible (and not guaranteed to finish)?

Nope, that’s still not sufficient. The fundamental problem is that
other threads can arbitrarily mess with control flow by injecting
exceptions via Thread#raise. There’s no way to guarantee invariants
will be preserved, no matter how clever your ensure implementation is.

-mental

kyle_s · May 11, 2007, 11:44pm

From: “Joel VanderWerf” [email protected]

Ruby gives you a lot of freedom to do anything you want inside of ensure
clauses, and I guess this means that ensure clauses can’t be given
special treatment–the ensure clause itself might be what needs to be
interrupted by the timeout. That seems to rule out treating ensure
clauses as a critical section, for example.

Yeah… I agree in general. However, . . .

What if there were two kind of ensure clauses, one which is
uninterruptible (to be used only for cleanup that is deterministic) and
one which is interruptible (and not guaranteed to finish)?

I’d probably settle for a Thread#raise_asap (or #raise_safe, or
whatever.) Because in theory, even though one could write
unfriendly code in an ensure block that took a long time to execute; in
practice all the ensure blocks I can recall seeing were doing what
appeared to be very simple, deterministic cleanup.

With that, it would be trivial to implement a Timeout#timeout_safe; and
I suspect it would be rare indeed when someone would need or want
to use the unsafe version…

What’s the best practice in current MRI ruby? Use Timeout only in cases
where you know it is safe, and otherwise use #select timeouts or
whatever else is appropriate?

That’s pretty much my current practice. I use Timeout sparingly, and am
extremely careful with it.

Regards,

Bill

kyle_s · May 12, 2007, 12:08am

MenTaLguY wrote:
…

What if there were two kind of ensure clauses, one which is
uninterruptible (to be used only for cleanup that is deterministic) and
one which is interruptible (and not guaranteed to finish)?

Nope, that’s still not sufficient. The fundamental problem is that other threads can arbitrarily mess with control flow by injecting exceptions via Thread#raise. There’s no way to guarantee invariants will be preserved, no matter how clever your ensure implementation is.

But what if we limit that arbitrary power by adding a new construct?

Suppose thread1 is executing this code:

begin
ensure_uninterruptible # not in ruby yet
# quick and deterministic cleanup code
end

and suppose that thread2 does this:

thread1.raise

Can’t we implement ensure_uninterruptible in such a way that
thread1.raise waits until thread1 finishes the clause (or maybe raises
an error in thread2 instead of in thread1)?

I’m not sure it’s a good idea, but is there a reason it couldn’t be done
with green threads?

kyle_s · May 11, 2007, 11:59pm

On Sat, 12 May 2007 06:43:39 +0900, “Bill K.” [email protected] wrote:

I’d probably settle for a Thread#raise_asap (or #raise_safe, or
whatever.) Because in theory, even though one could write
unfriendly code in an ensure block that took a long time to execute; in
practice all the ensure blocks I can recall seeing were doing what
appeared to be very simple, deterministic cleanup.

No, that’s still not sufficient. If an exception can be injected at an
arbitrary point by an external source, there’s simply no way to write
sane code.

The only safe model for inter-thread communication is one where both
participants agree to communicate. You’d need something like a
Thread#receive_exception that was explicitly called on the receiving
side – of course, that’s not too helpful for the purposes to which
Thread#raise is usually put.

The fundamental problem is really that most of Ruby’s blocking
operations don’t have allowance for timeout built into their API, which
is particularly sad since they’re all built on top of select (in MRI,
anyway) and that would be really easy to do. The whole #timeout thing
is an unsafe workaround which can never be made safe by its very nature.

-mental

kyle_s · May 12, 2007, 12:48am

From: “MenTaLguY” [email protected]

On Sat, 12 May 2007 06:43:39 +0900, “Bill K.” [email protected] wrote:

I’d probably settle for a Thread#raise_asap (or #raise_safe, or
whatever.) Because in theory, even though one could write
unfriendly code in an ensure block that took a long time to execute; in
practice all the ensure blocks I can recall seeing were doing what
appeared to be very simple, deterministic cleanup.

No, that’s still not sufficient. If an exception can be injected at an arbitrary
point by an external source, there’s simply no way to write sane code.

I thought what I was proposing was to limit the current arbitrariness.

When I write code, I’m used to considering that any method call I make
may raise an exception. (Including exceptions like NoMemoryError and
Interrupt.)

If we could prevent Thread#raise from happening within an ensure block,
and we could guarantee that an assignment to a variable would similarly
be non-interruptable (meaning, an exception can’t be raised between
the point where a method call completes, and its result is assigned to
a variable) … wouldn’t that be getting us pretty close to being able
to
write “sane” code?

Or am I just missing out on something fundamental? (If so, I’m most
definitely interested to learn.)

Regards,

Bill

kyle_s · May 12, 2007, 1:18am

On Sat, 12 May 2007 07:47:57 +0900, “Bill K.” [email protected] wrote:

If we could prevent Thread#raise from happening within an ensure block,
and we could guarantee that an assignment to a variable would similarly
be non-interruptable (meaning, an exception can’t be raised between
the point where a method call completes, and its result is assigned to
a variable) … wouldn’t that be getting us pretty close to being able to
write “sane” code?

Closer, but in this context “safe” is an all-or-nothing proposition.
Simply making variable assignments atomic falls far, far short of what’s
needed. What you’re actually groping towards is atomic transactions –
the ability to take an arbitrary block of code and say “if this block
does not complete successfully, any of its effects should be rolled back
before propagating the exception”. Of course, not all effects (e.g. IO)
can be rolled back, so you’re still not entirely safe in that case.

Or am I just missing out on something fundamental? (If so, I’m most
definitely interested to learn.)

Yes. You have to be worried about all of the code involved (e.g. also
the arbitrarily complex implementations of any methods you call), not
just your own immediate code.

-mental

kyle_s · May 12, 2007, 1:26am

On Sat, 12 May 2007 07:08:05 +0900, Joel VanderWerf
[email protected] wrote:

Can’t we implement ensure_uninterruptible in such a way that
thread1.raise waits until thread1 finishes the clause (or maybe raises
an error in thread2 instead of in thread1)?

Sure. But it’s still not enough – you’ll notice that the examples I
gave earlier were concerned with race conditions around entry of the
protected section itself, not even the ensure clause.

What you actually need to do is make uninterruptability the universal
default; interruptability at a specific point must be specifically
allowed for.

-mental

kyle_s · May 12, 2007, 1:39am

On Sat, 12 May 2007 08:25:28 +0900, MenTaLguY [email protected] wrote:

What you actually need to do is make uninterruptability the universal
default; interruptability at a specific point must be specifically allowed
for.

Another way to do this is to work in terms of atomic transactions (e.g.
STM, if your STM implementation is itself safe in the face of
asynchronous exceptions).

-mental

kyle_s · May 12, 2007, 2:02am

On Sat, 12 May 2007 08:40:43 +0900, “Bill K.” [email protected] wrote:

If I happen to be calling other routines that aren’t written to handle
exceptions safely, well, yeah, that sucks for me.

That goes for most of core and stdlib, though. At least in the face of
asynchronous exceptions. Have a look at Set#replace, for instance.

But if one method can be written to be safe,

I’m not sure any non-trivial method can be. Additionally, if we’re
dealing with a non-green-threaded case where an exception can be
delivered
at any time (not just set scheduling points), it’s really not possible
at all.

Rather than my continuing to make bald assertions, though, if you’d like
to provide a code sample I can probably illustrate what I mean.

-mental

kyle_s · May 12, 2007, 1:41am

From: “MenTaLguY” [email protected]

Or am I just missing out on something fundamental? (If so, I’m most
definitely interested to learn.)

Yes. You have to be worried about all of the code involved (e.g. also the
arbitrarily complex implementations of any methods you call), not just your
own immediate code.

Oh. That’s different.

I was just looking for whether we could arrive at constraints that would
allow a given routine to be written in a way that 100% safely handled
these new hypothetical Thread#raise semantics.

If I happen to be calling other routines that aren’t written to handle
exceptions safely, well, yeah, that sucks for me.

But if one method can be written to be safe, why can’t they all? Why,
then,
would I have to worry about anything other than my own immediate code,
unless I am assuming the code I’m calling is broken WRT exception
handling, in which case… well, it’s broken. But that’s different.

?

Regards,

Bill

kyle_s · May 13, 2007, 7:36pm

From: “Bill K.” [email protected]

RUBY NEEDS SMART PTR SEMANTICS

Ugh, sorry for the noise.

I was having a weird day yesterday.

kyle_s · May 13, 2007, 10:50am

From: “MenTaLguY” [email protected]

Rather than my continuing to make bald assertions, though, if you’d like
to provide a code sample I can probably illustrate what I mean.

Thanks,

Mainly this has happened when I wanted to do some sort of
timed-wait on a condition variable.

I’ve seen this concept reccur as recently as this week…
Obviously if Ruby had the concept of timing out on a
mutex or cond-var wait…

But I’m sorry, I’ve gotten ahead of myself. I think, if we
could just … sort of, <3

Er, e hehe

Well my conceptualization was something along the lines
of what boost::shared_ptr does in C++; that is: there are
certain language semantics that are guaranteed…

Smart-ptrs only work in C++ because of language-given
semantics that describe exactly what will happen on an
assignment following a function call, and dtors called
when a function exits, etc.

My feeling was, ensure blocks in ruby, could be similarly
described, such that, one could guarantee certain
semantics … similar to what’s guaranteed in C++ that
makes wrapped ptrs be 100% reliable as smart-ptrs.

RUBY NEEDS SMART PTR SEMANTICS

is a concept

worth-while?

Just a thought,

Regards,

Bill

kyle_s · June 4, 2007, 6:04pm

Kill and raise require the target thread to eventually reach a
checkpoint where they are willing to “listen” to the kill or raise
event. If they don’t, the calling thread will wait forever.
I think this is fair. I also think that the core developer may need to
really think about what should be a checkpoint in the language itself
(for instance, end of a block, end of a method, whatever). For instance,
not allowing to have a checkpoint in an “ensure” context would fix the
ensure-related issues (but may not fix others that I don’t see)

My main use of Thread#raise is “returning” from a
ConditionVariable#wait.
Can I assume this is seen as a checkpoint by JRuby ?

Sylvain J.

kyle_s · May 15, 2007, 3:04am

MenTaLguY wrote:

On Thu, 10 May 2007 18:45:47 +0900, Marcin R. [email protected] wrote:

As i mentioned earlier - easiest way to get REAL concurency (java VM will
NOT use both processors - for few reasons JavaVM ALWAYS use one processor -

Have you got evidence for this? I do not believe it to be the case for a
non-green-threaded JVM.

The OP is incorrect. Java VMs always use all cores in the system, except
in a very few specialized VMs that are green threaded.

Even if we’re talking about only one thread of execution, there’s still
the GC thread which generally runs in parallel.

Charlie