Ruby Threads

ReggW · May 27, 2006, 8:37am

What is the reason why Ruby doesn’t use native threads…at least on
Windows?

Thanks

ReggW · May 27, 2006, 9:09am

ReggW wrote:

What is the reason why Ruby doesn’t use native threads…at least on
Windows?

Green threads are the most portable, as threads differ
from one OS to another, I would think. At least, I’m
sure the Windows model isn’t the same as pthreads.

Ruby just doesn’t use native threads anywhere. If it did,
it would support them first on Linux, its primary
platform. (No flames, please – I’m just saying that
Matz develops on Linux, and the Windows port is derived
from that.)

It’s probably possible to write some kind of extension
to support native threads, but I would think it’s quite
a bit of work.

Hal

ReggW · May 27, 2006, 11:34am

2006/5/27, Hal F. [email protected]:

ReggW wrote:

What is the reason why Ruby doesn’t use native threads…at least on
Windows?

Green threads are the most portable, as threads differ
from one OS to another, I would think. At least, I’m
sure the Windows model isn’t the same as pthreads.

Yuck. And I believe Solaris is even another beast.

Ruby just doesn’t use native threads anywhere. If it did,
it would support them first on Linux, its primary
platform. (No flames, please – I’m just saying that
Matz develops on Linux, and the Windows port is derived
from that.)

Fl… Just kidding.

It’s probably possible to write some kind of extension
to support native threads, but I would think it’s quite
a bit of work.

I don’t believe this can be done by an extension alone. Threading is
intertwined with IO operations, uses longjmp etc. IMHO this would
amount to a rewrite of a significant portion of the interpreter. And
that’s probably also the reason why it does not happen for Ruby 1.x.

Kind regards

robert

ReggW · May 27, 2006, 1:29pm

Francis C. wrote:

It seems to me that Ruby’s green-thread implementation is perfectly
adequate
for most programmers’ requirements.

But the problem is that it doesn’t take advantage of these new
multi-core processor that are now starting to become the standard
machines being sold (at least for my customers).

I’m a newbie to Ruby and I really, really love it, but I think this
issue will start to become a serious issue for Ruby in the near future.

How does Python, Perl, PHP handle this (if at all)?

Thanks

ReggW · May 27, 2006, 1:15pm

Regarding Solaris: its implementation of threads was what supplied the
API
model for Posix threads, so you could say it’s as close to the original
sin
as anything. Most of the important Unix-like systems support the Posix
model
more or less well, but (apart from major defects in some of the
implementations), the key nonportabilities relate to the scheduling
discipline. And the world seems to have arrived at a consensus that the
“typical” scheduling discipline for threads is pre-emptive, so these
differences are no longer that important.

Ironically, the Linux implementation of threads is closest to the one in
Windows, although the APIs couldn’t be more different. (Win32 had
kernel-scheduled threads from the earliest beta releases in 1992, at
least
three years before the Posix API was standardized.) In both Linux and
Windows, threads are “lightweight processes,” relatively heavyweight
entities which are scheduled by the kernel. Ruby’s threads (and the
threads
in the early Java implementations) are pure userland threads, scheduled
by a
library inside your process. (Solaris uses an extremely complex hybrid
model
which in my opinion has proven to be far more trouble than it’s worth.)

The reason that Ruby’s threads are tightly intertwined with the
interpreter
logic is because Ruby must prevent the possibility that one of your
threads
may make a system call that will block in the kernel (like reading a
disk
file or a network socket, accessing the system time, etc) and thus block
every thread in your program. Ruby uses the I/O multiplexer (select) to
keep
this from happening.

Threads can be used for two basic purposes: to make your programs run
faster, or to make them easier to write. Ruby’s (and Java’s) threads
seem
designed primarily to facilitate the latter. You can easily imagine
several
kinds of problems that are easier to model if you have access to
relatively
independent flows of control. Thus both languages have the “synchronize”
method, taking an arbitrary code block, which makes it easy to lock
relatively large chunks of code in “critical sections” without having to
really design proper synchronization sets.

But to effectively use threads for higher performance and concurrency
requires a large amount of experience and understanding, much of which
takes
platform dependencies into account. For just one example, I would want
to
use a spin lock in some situations, if I’m running on a multi-processor
machine on certain hardware platforms. Ruby doesn’t have one.

It seems to me that Ruby’s green-thread implementation is perfectly
adequate
for most programmers’ requirements. What I think might be interesting is
an
extension that would provide access to native threads and
synchronization
primitives in parallel with Ruby’s (an early version of the EventMachine
library did this). Then you could write extensions that were far more
thread-hot than is possible with Ruby threads. It may be possible to do
this
without disturbing the existing implementation. If you wanted to mix
Ruby
threads with native threads, you’d just have to be careful to use the
native
mutex rather than Ruby’s in your Ruby threads.

ReggW · May 27, 2006, 1:51pm

But the problem is that it doesn’t take advantage of these new
multi-core processor that are now starting to become the standard
machines being sold (at least for my customers).

THAT is absolutely correct, and insightful. Many people have noticed
that
raw processor speeds aren’t increasing at nearly the same rate they once
did, and all the chip designers are going to some form of multicore
hardware. The most interesting one (to me at least) is the Cell, which
essentially requires a different programming model if you’re going to
get
the most out of it.

There’s a great deal of controversy over this issue, with a lot of
people
contending that C compilers bear most of the responsibility for
effective
multicore scheduling. Speaking as an application programmer who has also
written a lot of compilers, I’m partially but not fully convinced by
this. I
believe that significant changes in programming methodology will be
required in the future to write programs with acceptable performance.

Python is IMO quite a bit more sophisticated than Ruby in handling
threads.
I don’t rate Perl or PHP as serious contenders for thread-hot
development
for several reasons. (Besides, they often run inside of Apache
processes,
and Apache will naturally take some advantage of the newer hardware
because
of its multiprocess nature.)

I come in for a lot of criticism because I don’t care for the way many
programmers are trained to use threads. But I think the shortcomings of
the
typical approach to threaded programming that is encouraged by languages
like Ruby, Java and even Python will be far more deleterious on the
coming
hardware than they are today. Ironically, Java may have an edge because
it
has some deployment systems that can partition programs into
indepedently-schedulable pieces. I’d like to see something similar for
Ruby
(and have opened a project (“catamount”) to do so) but it’s still early.

ReggW · May 27, 2006, 4:32pm

Francis, you should consider writing a book on advanced programming
concepts. You’re a great communicator.

Michael

ReggW · May 27, 2006, 4:29pm

On May 27, 2006, at 7:50 AM, Francis C. wrote:

indepedently-schedulable pieces. I’d like to see something similar
for Ruby
(and have opened a project (“catamount”) to do so) but it’s still
early.

Have you investigated or played around with the concurrency model
that Bertrand Meyer
has written about for Eiffel? Last time I checked it wasn’t
implemented but it seemed
like an interesting abstraction.

I do agree with you that it takes a lot of discipline to use threads
effectively.
Many times it seems like a standard multi-process model would work
just as well as
trying to play with fire in a shared address space. Unix used to be
known for its
‘cheap’ processes and now everyone seems to think that process
creation is monumentally
expensive.

The Plan 9 approach to the process/thread dichotomy is pretty
interesting also.

Sometimes I think language design in the real world has been held
back by the
limitations of the two generally available OS frameworks (Unix and
Windows).

Gary W.

ReggW · May 27, 2006, 7:59pm

Francis C. wrote:
…

Threads can be used for two basic purposes: to make your programs run
faster, or to make them easier to write. Ruby’s (and Java’s) threads seem
designed primarily to facilitate the latter. You can easily imagine several
kinds of problems that are easier to model if you have access to relatively
independent flows of control. Thus both languages have the “synchronize”
method, taking an arbitrary code block, which makes it easy to lock
relatively large chunks of code in “critical sections” without having to
really design proper synchronization sets.

You probably mean “Thread.critical” or “Thread.exclusive”, and not
“synchronize”, at least in the context of ruby. (There is a
Mutex#synchronize and of course that does require you to think about
synchronization sets and ordering.)

But to effectively use threads for higher performance and concurrency
requires a large amount of experience and understanding, much of which
takes
platform dependencies into account. For just one example, I would want to
use a spin lock in some situations, if I’m running on a multi-processor
machine on certain hardware platforms. Ruby doesn’t have one.

Doesn’t have one and doesn’t need one, as long as threads are green.
But, someday, when ruby has native threads, it will need spin locks.

ReggW · May 27, 2006, 8:05pm

[email protected] wrote:
…

I do agree with you that it takes a lot of discipline to use threads
effectively. Many times it seems like a standard multi-process model
would work just as well as trying to play with fire in a shared
address space. Unix used to be known for its ‘cheap’ processes and
now everyone seems to think that process creation is monumentally
expensive.

Agree in general, but in the case of ruby, note that forking a ruby
process is more costly because of GC. In a short-lived child, GC can be
disabled to improve performance. [ruby-talk:186561]

ReggW · May 27, 2006, 5:15pm

On Sat, 27 May 2006, ReggW wrote:

Francis C. wrote:

It seems to me that Ruby’s green-thread implementation is perfectly
adequate
for most programmers’ requirements.

But the problem is that it doesn’t take advantage of these new multi-core
processor that are now starting to become the standard machines being sold
(at least for my customers).

it’s a small problem. here is some code which starts two processes,
three if
you count the parent. both run in separate processes using drb as the
ipc
layer to make the communication painless. because the code uses drb the
com is
simple. because it uses multiple processes it allows the kernel to
migrate
them to different cpus. the cost is about 100 lines of pure-ruby (the
slave
lib). notice how easy it is for parent to communicate with child and
for
childrent to communicate with each other:

 harp:~ > cat a.rb
 require 'slave'
 require 'yaml'

 class ProcessA
   def initialize(b) @b = b end
   def process(n) @b.process(n * n) end
   def pid() Process.pid end
 end

 class ProcessB
   def process(n) n + 6 end
   def pid() Process.pid end
 end

 b = Slave.new(ProcessB.new).object
 a = Slave.new(ProcessA.new(b)).object

 y 'a.pid' => a.pid
 y 'b.pid' => b.pid

 y 'answer' => a.process(6)


 harp:~ > ruby a.rb
 ---
 a.pid: 15142
 ---
 b.pid: 15141
 ---
 answer: 42

this is one of those things that allows one to consider designs that
would be
untenable in other languages. obviously using this approach it would be
trivial to setup a job that spawned 16 intercommunicating proccess,
something
which would be absurd to code in c.

regards.

-a

ReggW · May 27, 2006, 8:21pm

On May 27, 2006, at 10:14 AM, [email protected] wrote:

processor that are now starting to become the standard machines
migrate
[snip cool example using ‘slave’]
cremes$ gem list -b |grep slave
slave (0.0.0)
slave
cremes$ gem install slave
Attempting local installation of ‘slave’
Local gem file not found: slave*.gem
Attempting remote installation of ‘slave’
ERROR: While executing gem … (Gem::GemNotFoundException)
Could not find slave (> 0) in the repository

Ruh roh!

cr

Chuck R.
[email protected]
www.familyvideovault.com (not yet live!)

ReggW · May 27, 2006, 8:14pm

On May 27, 2006, at 2:03 PM, Joel VanderWerf wrote:

process is more costly because of GC. In a short-lived child, GC
can be
disabled to improve performance. [ruby-talk:186561]

Interesting, thanks for the pointer.

Gary W.

ReggW · May 27, 2006, 8:27pm

On May 27, 2006, at 11:19 AM, [email protected] wrote:

layer to make the communication painless. because the code uses
require ‘slave’
Attempting remote installation of ‘slave’
ERROR: While executing gem … (Gem::GemNotFoundException)
Could not find slave (> 0) in the repository

Ruh roh!

cr

Chuck R.

http://codeforpeople.com/lib/ruby/slave/

-Ezra

ReggW · May 27, 2006, 9:04pm

You probably mean “Thread.critical” or “Thread.exclusive”, and not
“synchronize”, at least in the context of ruby. (There is a
Mutex#synchronize and of course that does require you to think about
synchronization sets and ordering.)

No, I mean Mutex#synchronize and its equivalents in Java and Python.
Proper
synchronization design is a fine art, and highly hardware and OS
dependent.
The simplicity of #synchronize encourages people not to learn it very
deeply. As I said upthread, the thread-support constructs provided by
Ruby,
Python, Java and similar languages seem designed to facilitate the goal
of
making threaded programming easier to do. This is of course a fine goal
in
itself. But using threads to make programs faster and more concurrent is
a
very different goal, one which IMO is NOT well supported by Java or any
of
the agile languages.

Doesn’t have one and doesn’t need one, as long as threads are green.
But, someday, when ruby has native threads, it will need spin locks.

Fair enough as far as it goes. But green threads mean you can’t take
advantage of multiprocessor hardware at all. (Python has the same
shortcoming, but for a different reason.) So as long as we’re clear on
Ruby’s goals (grace and ease of cross-platform development) and its
non-goals (performance and scalability), you don’t need the more
powerful
thread-handling constructs, and for now there’s nothing wrong with that.
But
all of this changes when serious multicore hardware like the Cell
processors
become the norm. At that point, we’ll all need to get a lot better at
programming multithreaded, multiprocess or event-driven, and our
language
systems will have to evolve accordingly.

ReggW · May 27, 2006, 9:13pm

You’re making a very interesting point, one I’ve made many times: you’re
saying to write cooperative multiprocess rather than multithreaded
programs.
If you take aggregate costs into account (including time-to-market and
lifecycle maintenance and support), this approach can be far better than
multithreaded because it’s so much more robust and easier to do. Whether
it’s as fast, however, is a highly hardware and OS-dependent question.
If
you can specify multiprocessor or multicore hardware, multiprocess
software
design has a clear edge, IMO. And in a few years nearly all processors
for
general computation will be multicore.

(This is a side point (and as we know, the side points always generate
the
hottest flames), but I happen to disagree with your choice of DRb. Not
because of the communications model, but because distributed objects are
fundamentally problematic. I’d encourage you to look at multiprocess
event-driven systems. Watch for the upcoming pure-Ruby version of the
eventmachine library on Rubyforge- it will have built-in constructs to
explicitly support multiprocess event-driven programming.)

ReggW · May 27, 2006, 8:58pm

On 5/27/06, Francis C. [email protected] wrote:

(Solaris uses an extremely complex hybrid model
which in my opinion has proven to be far more trouble than it’s worth.)

Almost true. In Solaris 8, you can link with liblwp to get lightweight
process threads. In Solaris 9 and 10 (especially 10), you can just
use pthreads and you’ll be getting LWP threads.

-austin

ReggW · May 27, 2006, 10:23pm

On 5/27/06, Francis C. [email protected] wrote:

hardware. The most interesting one (to me at least) is the Cell, which
Python is IMO quite a bit more sophisticated than Ruby in handling threads.
Is this because Python uses native threads?

I don’t rate Perl or PHP as serious contenders for thread-hot development
for several reasons. (Besides, they often run inside of Apache processes,
and Apache will naturally take some advantage of the newer hardware because
of its multiprocess nature.)

I come in for a lot of criticism because I don’t care for the way many
programmers are trained to use threads. But I think the shortcomings of the
typical approach to threaded programming that is encouraged by languages
like Ruby, Java and even Python will be far more deleterious on the coming
hardware than they are today.

Do you think that threads are just the wrong model or metaphore?
For example, Io has the concept of Actors.

Ironically, Java may have an edge because it
has some deployment systems that can partition programs into
indepedently-schedulable pieces. I’d like to see something similar for Ruby
(and have opened a project (“catamount”) to do so) but it’s still early.

Given that fork’ing a new process is pretty cheap (on Linux, at least)
is that perhaps a better way to acheive concurrancy for us in the
short term? (or course there are lots of of other issues then like
sharing data between processes).

…looking forward to hearing more aobut catamount.

Phil

ReggW · May 27, 2006, 10:36pm

On 5/27/06, [email protected] [email protected] wrote:

(at least for my customers).
require ‘slave’
def pid() Process.pid end

untenable in other languages. obviously using this approach it would be
trivial to setup a job that spawned 16 intercommunicating proccess, something
which would be absurd to code in c.

Is your Slave code available? (perhaps someone asked later; I miss
the newsgroup where I would be able to more easily tell if they did )

BTW: Just curious: Why are you require’ing Yaml? Are you marshalling
in Drb with Yaml instead of the builtin marshalling? If so, why? Is
it faster? (I wouldn’t think so)

Phil

I really miss the gateway…

ReggW · May 27, 2006, 10:39pm

On 5/27/06, Francis C. [email protected] wrote:

And in a few years nearly all processors for
general computation will be multicore.

Yes, it’s happening pretty quickly.

(This is a side point (and as we know, the side points always generate the
hottest flames), but I happen to disagree with your choice of DRb. Not
because of the communications model, but because distributed objects are
fundamentally problematic.

Can you elaborate?

I’d encourage you to look at multiprocess
event-driven systems. Watch for the upcoming pure-Ruby version of the
eventmachine library on Rubyforge- it will have built-in constructs to
explicitly support multiprocess event-driven programming.)

Sounds interesting.

Phil

…still missing the gateway to c.l.r…