Basic threading question: can ruby use real threads?


#1

I’ve read somewhere, and would love for it to be wrong, that ruby
doesn’t use real threads, that it handles it’s threads internally. Is
that true? If it is true, will that still be true when ruby 1.9, or
2.0 comes out?

For many systems this isn’t a big deal one way or the other, since
they only have one physical processor. Luckily(?) pretty much all my
systems have two procs. (Two real processors, not HT, but that’s a
debate for another day.) I’d like to write some threaded ruby code,
and have it spread across my cpus, share data structures etc.

I’m used to pthreads in UNIX systems :slight_smile: so I’d really like it if I
could do the same type of things I’ve done before, just in a rubyish
sort of way. Setting up a shared memory area and all that jazz that
you had to do for forking really doesn’t sound like a fun, especially
when the point of the code I wanna write is for fun.

Thanks,
Kyle


#2

On May 8, 2007, at 3:52 PM, Kyle S. wrote:

I’ve read somewhere, and would love for it to be wrong, that ruby
doesn’t use real threads, that it handles it’s threads internally. Is
that true? If it is true, will that still be true when ruby 1.9, or
2.0 comes out?

This was recently discussed in detail by the creators:

http://blog.grayproductions.net/articles/2007/04/27/the-ruby-vm-
episode-iii

James Edward G. II


#3

Sweet, thanks for the link!


#4

OK, so I’m reading that article, and I’m getting three things form it:
YARV uses native threads.
YARV doesn’t run them simultaneously.
YARV will eventually run them simultaneously.

Good enough for me, I’ll just hope that writing threaded code doesn’t
change to much with ruby2.0/YARV.

–Kyle


#5

On Tuesday 08 May 2007 21:34, Kyle S. wrote:

OK, so I’m reading that article, and I’m getting three things form it:
YARV uses native threads.
YARV doesn’t run them simultaneously.
YARV will eventually run them simultaneously.

Good enough for me, I’ll just hope that writing threaded code doesn’t
change to much with ruby2.0/YARV.

–Kyle

well you can use fastthreads gem (part of mongrel)
also you can fork your script ^^ threads usually execute on same
processor
AFIK, that’s why if you want to use 2 processors you have to fork your
scripts, and if you need comunication between them consider using drb.

very good gem is slave - it makes creating new processes super easy - it
provides easy way to comunicate, so you can create 4-6 new processes
each
will get data to compute from mother process and the’ll use both
processors

sorry for lots of randomness and strange grammar - to much coffeine
to sumarize - read rdoc for gems:

  • fasthread(s)
  • slave(s)
    (i never remember if they are plurar or singular)

#6

Quoting Kyle S. removed_email_address@domain.invalid:

I’ve read somewhere, and would love for it to be wrong, that ruby
doesn’t use real threads, that it handles it’s threads internally. Is
that true?

You have heard correctly and yes it is a pain.


#7

On Wednesday 09 May 2007 18:27, MenTaLguY wrote:

On Thu, 10 May 2007 01:00:04 +0900, Marcin R.
removed_email_address@domain.invalid wrote:

well you can use fastthreads gem (part of mongrel)

fastthread just makes the locking primitives from thread.rb a little
faster; it doesn’t otherwise affect the operation of Ruby threads.
Additionally, it is applicable only to Ruby 1.8, not YARV/1.9.

-mental

I didn’t say it makes use of POSIX threads - i just recomended it
becouse they
are well … faster.

only thing right now that’ll let you use botht procesors is fork


#8

If you fork, is there even a way to create objects that are shared
between the two forks? Or do you have to rely on rpc/ipc stuff
instead?

If someone were to… write a c extension who’s objects were threaded,
via pthreads, would it be a nightmare?

Even just typing that line almost scares me…but I can think of some
clean(ish?) ways of doing it. I’m just worried I’d loose the rubyness
of the thing if I did it that way.

Thanks,
Kyle


#9

On May 9, 2007, at 2:57 PM, Marcin R. wrote:

I didn’t say it makes use of POSIX threads - i just recomended it
becouse they
are well … faster.

only thing right now that’ll let you use botht procesors is fork

Just my opinion but my default choice would be fork when I need
concurrency rather than threads. The main reason is that it forces you
to be explicit in how you structure the communication between processes.
One process can’t inadvertently change the state of another.
On a multi-processor box you’ll get IO multiplexing and real CPU
concurrency automatically with fork.

Some problems can’t be partitioned easily into separate addresses
spaces,
in which case threads are a better choice. Even then I might consider
using shared memory among cooperating processes first.

I realize that the Unix fork/exec model of processes doesn’t quite apply
in the Windows environment. Anecdotal evidence makes me think that
Windows programmers tend to reach for threads as a multi-tasking
solution
more often than Unix programmers.

One more observation. The desire for real concurrency using multiple
processors is great for problems that can be cleanly partitioned, but if
you have a problem that requires concurrent access to shared data then
you’ll have to keep in mind the memory/cache contention that will be
created when processing is distributed across multiple processors (via
processes or threads).

Gary W.


#10

On Thu, 10 May 2007 04:03:57 +0900, “Kyle S.”
removed_email_address@domain.invalid wrote:

If you fork, is there even a way to create objects that are shared
between the two forks? Or do you have to rely on rpc/ipc stuff
instead?

Totally RPC. You could use DRb to do this in a Rubyesque fashion.

It’s worth noting that no matter what threading approach you use, it’s
absolutely best to minimize the number of objects shared between
threads.

If someone were to… write a c extension who’s objects were threaded,
via pthreads, would it be a nightmare?

Yes, somewhere between nightmare and flesh-rending terror. At least if
you’re
planning on manipulating Ruby objects from each thread.

You might want to consider using JRuby instead. It’s compatible enough
with MRI
that it runs Rails, and it uses “real” threads for multi-CPU goodness.

-mental


#11

On Thu, 10 May 2007 01:00:04 +0900, Marcin R.
removed_email_address@domain.invalid wrote:

well you can use fastthreads gem (part of mongrel)

fastthread just makes the locking primitives from thread.rb a little
faster; it doesn’t otherwise affect the operation of Ruby threads.
Additionally, it is applicable only to Ruby 1.8, not YARV/1.9.

-mental


#12

On Thu, 10 May 2007 06:59:33 +0900, “Kyle S.”
removed_email_address@domain.invalid wrote:

does anyone know if I code on MRI will it automatically use real threads on JRuby,

Yes.

The APIs are the same between MRI and JRuby, though JRuby deliberately
hedges
on the implementation of certain unsafe features like Thread#kill,
Thread#raise,
and Thread.critical=.

-mental


#13

The APIs are the same between MRI and JRuby, though JRuby deliberately
hedges on the implementation of certain unsafe features like
Thread#kill, Thread#raise, and Thread.critical=.
Thread#raise, “unsafe” ? It is the most useful thread-related
functionality
I’ve seen since I’m using threads ! It allows for instance to handle
failing rendezvous the proper way (by using exceptions).

Could you tell us why you think it is “unsafe” ?


#14

Manipulating ruby objects from inside the threads would be the idea in
some cases I’m thinking of… so it looks like JRuby until YARV gets
concurrent threads… and ooh do I hope it does.

Will the threading interface be drastically different between
MRI/JRuby/YARV? IE does anyone know if I code on MRI will it
automatically use real threads on JRuby, or will I have to re-code
some parts to get that?

Thanks again,
Kyle


#15

From: “Sylvain J.” removed_email_address@domain.invalid

The APIs are the same between MRI and JRuby, though JRuby deliberately
hedges on the implementation of certain unsafe features like
Thread#kill, Thread#raise, and Thread.critical=.
Thread#raise, “unsafe” ? It is the most useful thread-related functionality
I’ve seen since I’m using threads ! It allows for instance to handle
failing rendezvous the proper way (by using exceptions).

Could you tell us why you think it is “unsafe” ?

Hi,

I’m not sure if this is what MenTaLGuY meant, but one way that
Thread#raise is unsafe, is that it can raise an exception in the
specified thread while that thread is executing an ‘ensure’ block.

This can cause a failure of critical resources to be cleaned up
correctly, such as locks on mutexes, etc., as some or all of the
code in the ensure block is skipped.

I first ran into this when I tried to use timeout{} to implement
a ConditionVariable#timed_wait, like:

require ‘thread’
require ‘timeout’
class ConditionVariable
def timed_wait(mutex, timeout_secs)
timeout(timeout_secs) { wait(mutex) } # THIS IS UNSAFE
end
end

Note that ‘timeout’ functions by creating a temporary new thread
which sleeps for the duration, then raises an exception in the
‘current’ thread that invoked timeout.

If the timeout raises its exception at an unlucky moment, the
various internals of ConditionVariable#wait and Mutex#synchronize
that depend on ensure blocks to restore their class invariants are
skipped, resulting in nasty things like a permanently locked mutex.

Not fun… :frowning:

Regards,

Bill


#16

On Wednesday 09 May 2007 19:20, Gary W. wrote:

One process can’t inadvertently change the state of another.
Windows programmers tend to reach for threads as a multi-tasking
Gary W.
As i mentioned earlier - easiest way to get REAL concurency (java VM
will NOT
use both processors - for few reasons JavaVM ALWAYS use one processor -
scalling for example Tomcat in production enviroment require running 2-4
java
VM’s) is to use Slave gem - I’m using it for my project for concurent
parasing of logs - overhead on DRb is not big -and what’s more you can
use it
on few machines if you want to scale it further

http://www.codeforpeople.com/lib/ruby/slave/slave-1.2.1/

creating new forks is really easy and you can create just one class for
procesing of data that can be concurent and everything else can be done
in
main program


#17

Bill K. wrote:

Could you tell us why you think it is “unsafe” ?

Hi,

I’m not sure if this is what MenTaLGuY meant, but one way that
Thread#raise is unsafe, is that it can raise an exception in the
specified thread while that thread is executing an ‘ensure’ block.

And to make it clear, we do implement kill, raise, and critical=, with
the following limitations:

  • There are no guarantees all other threads will have stopped before
    critical= allows the current thread to continue executing.
  • Kill and raise require the target thread to eventually reach a
    checkpoint where they are willing to “listen” to the kill or raise
    event. If they don’t, the calling thread will wait forever.

I even made these operations a bit cleaner and faster in 0.9.9, but
there’s no way to do them perfectly with real concurrent threads.

  • Charlie

#18

On Thu, 10 May 2007 15:40:57 +0900, Sylvain J.
removed_email_address@domain.invalid wrote:

Could you tell us why you think [Thread#raise] is “unsafe” ?

Because you have no control over when the exception is delivered, which
may be at the worst possible moment. Even ensure does not provide
adequate protection.

Consider what happens with this code if an exception happens to arrive
just before the begin block is processed:

@counter += 1
begin

… do stuff …

ensure
@counter -= 1
end

Lest you think there’s an easy fix, consider what happens with this
second example if an exception arrives after the begin block is entered,
but before the counter has been incremented:

begin
@counter += 1

… do stuff …

ensure
@counter -= 1
end

-mental


#19

On Thu, 10 May 2007 18:45:47 +0900, Marcin R.
removed_email_address@domain.invalid wrote:

As i mentioned earlier - easiest way to get REAL concurency (java VM will
NOT use both processors - for few reasons JavaVM ALWAYS use one processor -

Have you got evidence for this? I do not believe it to be the case for
a
non-green-threaded JVM.

-mental


#20

Bill K. wrote:

Could you tell us why you think it is “unsafe” ?

Note that ‘timeout’ functions by creating a temporary new thread
which sleeps for the duration, then raises an exception in the
‘current’ thread that invoked timeout.

If the timeout raises its exception at an unlucky moment, the
various internals of ConditionVariable#wait and Mutex#synchronize
that depend on ensure blocks to restore their class invariants are
skipped, resulting in nasty things like a permanently locked mutex.

Not fun… :frowning:

This is disturbing.

Is #timeout inherently unsafe, if it is implemented as a thread, even in
MRI ruby’s green threads?

Ruby gives you a lot of freedom to do anything you want inside of ensure
clauses, and I guess this means that ensure clauses can’t be given
special treatment–the ensure clause itself might be what needs to be
interrupted by the timeout. That seems to rule out treating ensure
clauses as a critical section, for example. And it seems to rule out a
method like Thread#raise_unless_in_ensure or
Thread#raise_after_ensure_finishes.

What if there were two kind of ensure clauses, one which is
uninterruptible (to be used only for cleanup that is deterministic) and
one which is interruptible (and not guaranteed to finish)?

What’s the best practice in current MRI ruby? Use Timeout only in cases
where you know it is safe, and otherwise use #select timeouts or
whatever else is appropriate?