Threads and Ruby

barjunk · June 30, 2008, 6:48am

I’ve been hunting around for information regarding threads, and to me,
it seems confusing and conflicting.

What I’m trying to find out is…if I was going to start using threads
in Ruby, which version of Ruby should I be using.

I’ve seen folks say that I should use Ruby 1.9 and others say that it
is possible to use earlier versions. Nothing that I found seemed
definitive.

I’m new to all this, so this may be part of the problem.

What I’d like to accomplish is starting a main ruby instance, then
launch threads from that instance that run in their own sandbox.

At this point, I don’t believe the threads need to talk with each
other, but it seems I could use some form of message passing to
accomplish this.

Any ideas and direction would be helpful. Thanks.

Mike B.

barjunk · June 30, 2008, 7:03am

On Jun 29, 2008, at 10:44 PM, barjunk wrote:

I’m new to all this, so this may be part of the problem.
Mike B.
for any situation you want processes. use fork or systemu if you want
it portable. threads are not the way to get a sandbox.

cheers.

a @ http://codeforpeople.com/

barjunk · June 30, 2008, 7:07am

barjunk wrote:

I’m new to all this, so this may be part of the problem.
Mike B.
I haven’t used 1.9 much, but the impression I get is:

use 1.9 if you need native threads (e.g. to take advantage of
multiple processors, or blocking system calls)
use 1.8 if you want in-process threads, which are lighter and pretty
good for multiplexing io calls (using select()).

If the threads don’t need shared state, why not use fork instead of
threads? You can use DRb for IPC.

barjunk · June 30, 2008, 7:09am

On Jun 29, 2008, at 11:00 PM, ara.t.howard wrote:

for any situation you want processes. use fork or systemu if you
want it portable. threads are not the way to get a sandbox.

or checkout the slave lib - it may be quite appropriate.

a @ http://codeforpeople.com/

barjunk · June 30, 2008, 11:29am

On Mon, Jun 30, 2008 at 9:04 AM, Joel VanderWerf
[email protected] wrote:

definitive.
Any ideas and direction would be helpful. Thanks.

If the threads don’t need shared state, why not use fork instead of threads?
You can use DRb for IPC.

–
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

really advantage on multiply processors? Ruby 1.9 does’t use GIL???

barjunk · June 30, 2008, 11:17am

ara.t.howard wrote:

for any situation you want processes. use fork or systemu if you want
it portable. threads are not the way to get a sandbox.

I hope you mean “for this situation”. Processes are definitely not the
solution to all problems.

Charlie

barjunk · June 30, 2008, 11:32am

On Mon, Jun 30, 2008 at 1:14 PM, Charles Oliver N.
[email protected] wrote:

Ruby doesn’t have real threads AFAIK, so DRb+fork it’s the only way
to get true parallel work.

barjunk · June 30, 2008, 6:57pm

Zhukov P. wrote:

On Mon, Jun 30, 2008 at 9:04 AM, Joel VanderWerf

I haven’t used 1.9 much, but the impression I get is:

use 1.9 if you need native threads (e.g. to take advantage of multiple
processors, or blocking system calls)
really advantage on multiply processors? Ruby 1.9 does’t use GIL???

You’re quite right.

barjunk · June 30, 2008, 1:13pm

2008/6/30 barjunk [email protected]:

I’m new to all this, so this may be part of the problem.

What I’d like to accomplish is starting a main ruby instance, then
launch threads from that instance that run in their own sandbox.

At this point, I don’t believe the threads need to talk with each
other, but it seems I could use some form of message passing to
accomplish this.

As Ara said, in this case processes are the better choice for several
reasons: they have separation out of the box and they can make use of
multiple cores (which Ruby threads can’t unless you use JRuby - this
may change with future 1.9 versions AFAIK).

Kind regards

robert

barjunk · June 30, 2008, 7:23pm

On Jun 30, 2008, at 3:14 AM, Charles Oliver N. wrote:

I hope you mean “for this situation”. Processes are definitely not
the solution to all problems.

well, given that the difference between processes and threads is an
incredibly small one for any modern os, and given that threads are at
least 100 harder to write deterministic code for (as your bug reports
regarding exception handling and ruby illustrate) i’d hazard a guess
that processes are almost always the correct solution when robust
code is desired. in otherwords i’d take the position that one should
always use processes unless the reason becomes clear to use threads
and, of course ,there are indeed reasons. this is mostly a comment on
the limitations of programmers and not on platforms or languages,
nevertheless the incredible ease of IPC with ruby makes it even more
true imho.

cheers.

a @ http://codeforpeople.com/

barjunk · June 30, 2008, 8:11pm

El Lunes, 30 de Junio de 2008, Joel VanderWerf escribiÃ³:

You’re quite right.
A good article about it:
The Futures of Ruby Threading

barjunk · July 1, 2008, 2:59am

ara.t.howard wrote:

that processes are almost always the correct solution when robust code
is desired. in otherwords i’d take the position that one should always
use processes unless the reason becomes clear to use threads and, of
course ,there are indeed reasons. this is mostly a comment on the
limitations of programmers and not on platforms or languages,
nevertheless the incredible ease of IPC with ruby makes it even more
true imho.

The fact that Ruby’s threading has many breakages and pitfalls does not
mean threading in general is the wrong way to fix things. Java threading
works extremely well, with the only real requirement that you must
either synchronize or avoid access to shared resources.
Power…responsibility…etc. You can’t damn threading because the
standard implementation of Ruby doesn’t do it well.

Perhaps you’re right that when you only have access to green threads
that processes are the right way to go, since green threads don’t really
gain you anything other than simulated asynchrony. But native threads
done right are as good as separate processes, with the bonus that you
can share fast in-memory access to resources if you’re willing to accept
the synchronization cost and complexity.

Charlie

barjunk · July 1, 2008, 3:00am

Robert K. wrote:

As Ara said, in this case processes are the better choice for several
reasons: they have separation out of the box and they can make use of
multiple cores (which Ruby threads can’t unless you use JRuby - this
may change with future 1.9 versions AFAIK).

This is an eventual goal, but I asked ko1 about it and such work has not
started yet. It will be hard.

Processes are probably better under Ruby, but it’s most definitely worth
trying threads under JRuby first.

Charlie

barjunk · July 1, 2008, 3:00am

Zhukov P. wrote:

Ruby doesn’t have real threads AFAIK, so DRb+fork it’s the only way
to get true parallel work.

JRuby has native threads that are really parallel.

Charlie

barjunk · July 1, 2008, 4:03am

On Jun 30, 2008, at 6:55 PM, Charles Oliver N. wrote:

really gain you anything other than simulated asynchrony. But native
threads done right are as good as separate processes, with the bonus
that you can share fast in-memory access to resources if you’re
willing to accept the synchronization cost and complexity.

yeah i agree 100% in principle. however i was programming java when
stopping threads suddenly became depreciated, which i know you know
all about, but for others

JDK 20 Documentation - Home

so doing something as simple as stopping a thread can be complicated.
i can kill a process and all resources will be returned to the
system. the fact that sun took quite a few years to figure this out,
and that matz ruby had the bugs you recently found beg the question:
if matz cannot do exceptions + threads right, if sun cannot get
stopping a thread right for years, what chance do i have of writing
code for, say, a web server that’s supposed to run 24x7? i think
modern languages are caving to the reality that most (aka average)
programmers simply cannot program threads safely and are increasingly
moving towards the message passing paradigm ousterhout has been raving
about for years.

now having said that, i very often use ruby threads but often do so in
a message passing fashion and even more often use those threads to
spawn processes and achieve parallelism so i definitely am glad they
are there (Thread.new{ curl } is ultra powerful). still, i can’t help
but feel they are destined to become relics - at least in the direct
fashion we use them now.

kind regards.

a @ http://codeforpeople.com/

barjunk · July 1, 2008, 2:55pm

ara.t.howard wrote:

Perhaps you’re right that when you only have access to green threads
that processes are the right way to go, since green threads don’t
really gain you anything other than simulated asynchrony. But native
threads done right are as good as separate processes, with the bonus
that you can share fast in-memory access to resources if you’re
willing to accept the synchronization cost and complexity.

yeah i agree 100% in principle. however i was programming java when
stopping threads suddenly became depreciated, which i know you know all
about, but for others

The deprecation of thread stop, suspend, and exception raising was
implemented precisely because of the shared resource requirements. If
you can stop a thread in an environment where it may have been using
resources other threads will use, it’s impossible to know if those
resources have been cleaned up or released safely. Sure, you can stop a
process. The Java deprecations were done because it’s provably
impossible to share in-process resources and safely terminate threads at
will.

The same goes for shared out-of-process resources, but since it’s harder
to share out-of-process resources it’s harder to do serious damage. You
can still corrupt files, orphan processes, or leave sibling processes
waiting for data that will never arrive. You can even introduce exactly
the same race conditions common to threading if you want multiple
processes to perform atomic mutations of shared files or memory. If you
have a large interconnected app with lots of processes communicating or
using shared resources, arbitrarily nuking one of them can cause exactly
the same headaches. It’s a factor of resource sharing and
interdependency, rather than anything specific to threading over
processes.

now having said that, i very often use ruby threads but often do so in a
message passing fashion and even more often use those threads to spawn
processes and achieve parallelism so i definitely am glad they are there
(Thread.new{ curl } is ultra powerful). still, i can’t help but feel
they are destined to become relics - at least in the direct fashion we
use them now.

Probably not, but hopefully neither will typical IPC mechanisms, which
are almost as painful to get right and make reliable. Threads are a
low-level API, perhaps lower-level than day-to-day programmers should
generally have to deal with. But it’s absurd to say that processes can
do everything threads can, otherwise we’d have a massive process bloat
for almost every nontrivial applications we use. Threads have a place,
though the ease in which resources can be shared often makes it a
dangerous place to go. Let’s not throw the threading baby out with the
shared resource bathwater.

Charlie

barjunk · July 1, 2008, 6:17pm

ara.t.howard wrote:

in fairness we’re talking about ruby here where that is definitely not
true. it’s extremely painless to have reliable ipc with ruby using drb
or com with sqlite as a message store.

Since DRb operates over a network it’s not reliable by definition; you
have to deal with the other end going away, etc. With COM, you’re either
going over a network or using same-machine IPC mechanisms that are only
a bit more reliable (or loading things in-process, which is then back to
threads). And with sqlite, you need to synchronize writes and possibly
reads or you need to hope sqlite will do that for you (I don’t know if
it does). And then you’re into locking, atomicity, etc.

So for IPC or cross-“process” data comm or sharing, I think processes:

give you fewer ways to shoot yourself in the foot
the remaining ways are somewhat less likely to be dangerous
but they mostly leave the options that are by and large the most
complicated and the most prone to complete failure (e.g. external
process goes away completely).

Meanwhile, threads

give you many, many ways to shoot yourself in the foot
sometimes with catastrophic consequences
but you can turn the complexity knob down much lower

Choose wisely.

Charlie

barjunk · July 1, 2008, 6:37pm

Charles Oliver N. wrote:

Since DRb operates over a network it’s not reliable by definition; you
have to deal with the other end going away, etc. With COM, you’re either
going over a network or using same-machine IPC mechanisms that are only
a bit more reliable (or loading things in-process, which is then back to
threads). And with sqlite, you need to synchronize writes and possibly
reads or you need to hope sqlite will do that for you (I don’t know if
it does). And then you’re into locking, atomicity, etc.

Yes, sqlite does synchronize, but a potential problem is granularity: a
writer gets an exclusive lock on the entire db.

barjunk · July 1, 2008, 4:41pm

On Jul 1, 2008, at 6:51 AM, Charles Oliver N. wrote:

The Java deprecations were done because it’s provably impossible to
share in-process resources and safely terminate threads at will.

i think java is correct to have done so, precisely because people have
found it too hard to write safe code using those mechanisms but as you
point out, you can do the same with processes while no OS has limited
us yet. why? i think it’s because sharing data between threads, or
processes, is both dangerous and powerful. when someone builds an OS
that supports message passing we’ll see those operations limited on a
processes too i bet but, for now, they are just too useful despite the
danger and do work ‘much’ of the time which, for better or worse,
seems to be the MO for many programming tasks.

Probably not, but hopefully neither will typical IPC mechanisms,
which are almost as painful to get right and make reliable.

in fairness we’re talking about ruby here where that is definitely not
true. it’s extremely painless to have reliable ipc with ruby using
drb or com with sqlite as a message store.

. But it’s absurd to say that processes can do everything threads
can, otherwise we’d have a massive process bloat for almost every
nontrivial applications we use.

for the record i’m not saying that - i’m saying that processes are a
better starting point for most people wanting to gain parallelism in
ruby for most problems. threads are appropriate at times too but, in
ruby, the disadvantages like lack of cpu migration, blocking the
entire process (in win), etc limit their usefulness - jruby excepted
of course (no jab, it’s a huge advantage jruby has over the mri)

kind regards.

a @ http://codeforpeople.com/

barjunk · July 1, 2008, 6:54pm

On Jul 1, 2008, at 10:33 AM, Joel VanderWerf wrote:

Yes, sqlite does synchronize, but a potential problem is
granularity: a writer gets an exclusive lock on the entire db.

but only very briefly, in practice the throughput is close to what you
can achieve with mutexes combined with the mri thread scheduler

File Locking And Concurrency In SQLite Version 3

one of the reasons this is true is that for a heavily threaded ruby
program (green threads) you end up with the entire process sometimes
blocked on io and the threads end up getting into a pattern where all
of them need to write at once - a kind of rhythm - with processes the
ability for the OS to schedule access to resources ends up staggering
the phase of execution so access is generally faster than it ‘ought’
to be taking only TPS into account.

this a wild generalization based only on the kinds of parallel
processing i’ve done, but i’ve seen the pattern where a heavily
threaded program ends up being effectively serial enough times to
mention it…

a @ http://codeforpeople.com/