Forum: Ruby Ruby Threads...

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Ba2582cd50a59359ac3f7305ad2a0429?d=identicon&s=25 ReggW (Guest)
on 2006-05-27 08:37
What is the reason why Ruby doesn't use native threads...at least on
Windows?

Thanks
C1bcb559f87f356698cfad9f6d630235?d=identicon&s=25 Hal Fulton (Guest)
on 2006-05-27 09:09
(Received via mailing list)
ReggW wrote:
> What is the reason why Ruby doesn't use native threads...at least on
> Windows?
>

Green threads are the most portable, as threads differ
from one OS to another, I would think. At least, I'm
sure the Windows model isn't the same as pthreads.

Ruby just doesn't use native threads anywhere. If it did,
it would support them first on Linux, its primary
platform. (No flames, please -- I'm just saying that
Matz develops on Linux, and the Windows port is derived
from that.)

It's probably possible to write some kind of extension
to support native threads, but I would think it's quite
a bit of work.


Hal
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-05-27 11:34
(Received via mailing list)
2006/5/27, Hal Fulton <hal9000@hypermetrics.com>:
> ReggW wrote:
> > What is the reason why Ruby doesn't use native threads...at least on
> > Windows?
> >
>
> Green threads are the most portable, as threads differ
> from one OS to another, I would think. At least, I'm
> sure the Windows model isn't the same as pthreads.

Yuck.  And I believe Solaris is even another beast.

> Ruby just doesn't use native threads anywhere. If it did,
> it would support them first on Linux, its primary
> platform. (No flames, please -- I'm just saying that
> Matz develops on Linux, and the Windows port is derived
> from that.)

Fl...  Just kidding. :-)

> It's probably possible to write some kind of extension
> to support native threads, but I would think it's quite
> a bit of work.

I don't believe this can be done by an extension alone. Threading is
intertwined with IO operations, uses longjmp etc. IMHO this would
amount to a rewrite of a significant portion of the interpreter. And
that's probably also the reason why it does not happen for Ruby 1.x.

Kind regards

robert
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-05-27 13:15
(Received via mailing list)
Regarding Solaris: its implementation of threads was what supplied the
API
model for Posix threads, so you could say it's as close to the original
sin
as anything. Most of the important Unix-like systems support the Posix
model
more or less well, but (apart from major defects in some of the
implementations), the key nonportabilities relate to the scheduling
discipline. And the world seems to have arrived at a consensus that the
"typical" scheduling discipline for threads is pre-emptive, so these
differences are no longer that important.

Ironically, the Linux implementation of threads is closest to the one in
Windows, although the APIs couldn't be more different. (Win32 had
kernel-scheduled threads from the earliest beta releases in 1992, at
least
three years before the Posix API was standardized.) In both Linux and
Windows, threads are "lightweight processes," relatively heavyweight
entities which are scheduled by the kernel. Ruby's threads (and the
threads
in the early Java implementations) are pure userland threads, scheduled
by a
library inside your process. (Solaris uses an extremely complex hybrid
model
which in my opinion has proven to be far more trouble than it's worth.)

The reason that Ruby's threads are tightly intertwined with the
interpreter
logic is because Ruby must prevent the possibility that one of your
threads
may make a system call that will block in the kernel (like reading a
disk
file or a network socket, accessing the system time, etc) and thus block
every thread in your program. Ruby uses the I/O multiplexer (select) to
keep
this from happening.

Threads can be used for two basic purposes: to make your programs run
faster, or to make them easier to write. Ruby's (and Java's) threads
seem
designed primarily to facilitate the latter. You can easily imagine
several
kinds of problems that are easier to model if you have access to
relatively
independent flows of control. Thus both languages have the "synchronize"
method, taking an arbitrary code block, which makes it easy to lock
relatively large chunks of code in "critical sections" without having to
really design proper synchronization sets.

But to effectively use threads for higher performance and concurrency
requires a large amount of experience and understanding, much of which
takes
platform dependencies into account. For just one example, I would want
to
use a spin lock in some situations, if I'm running on a multi-processor
machine on certain hardware platforms. Ruby doesn't have one.

It seems to me that Ruby's green-thread implementation is perfectly
adequate
for most programmers' requirements. What I think might be interesting is
an
extension that would provide access to native threads and
synchronization
primitives in parallel with Ruby's (an early version of the EventMachine
library did this). Then you could write extensions that were far more
thread-hot than is possible with Ruby threads. It may be possible to do
this
without disturbing the existing implementation. If you wanted to mix
Ruby
threads with native threads, you'd just have to be careful to use the
native
mutex rather than Ruby's in your Ruby threads.
Ba2582cd50a59359ac3f7305ad2a0429?d=identicon&s=25 ReggW (Guest)
on 2006-05-27 13:29
Francis Cianfrocca wrote:

> It seems to me that Ruby's green-thread implementation is perfectly
> adequate
> for most programmers' requirements.

But the problem is that it doesn't take advantage of these new
multi-core processor that are now starting to become the standard
machines being sold (at least for my customers).


I'm a newbie to Ruby and I really, really love it, but I think this
issue will start to become a serious issue for Ruby in the near future.

How does Python, Perl, PHP handle this (if at all)?

Thanks
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-05-27 13:51
(Received via mailing list)
>
> >>>>But the problem is that it doesn't take advantage of these new
> multi-core processor that are now starting to become the standard
> machines being sold (at least for my customers).
>


THAT is absolutely correct, and insightful. Many people have noticed
that
raw processor speeds aren't increasing at nearly the same rate they once
did, and all the chip designers are going to some form of multicore
hardware. The most interesting one (to me at least) is the Cell, which
essentially requires a different programming model if you're going to
get
the most out of it.

There's a great deal of controversy over this issue, with a lot of
people
contending that C compilers bear most of the responsibility for
effective
multicore scheduling. Speaking as an application programmer who has also
written a lot of compilers, I'm partially but not fully convinced by
this. I
believe that significant changes in programming methodology will be
*required* in the future to write programs with acceptable performance.

Python is IMO quite a bit more sophisticated than Ruby in handling
threads.
I don't rate Perl or PHP as serious contenders for thread-hot
development
for several reasons. (Besides, they often run inside of Apache
processes,
and Apache will naturally take some advantage of the newer hardware
because
of its multiprocess nature.)

I come in for a lot of criticism because I don't care for the way many
programmers are trained to use threads. But I think the shortcomings of
the
typical approach to threaded programming that is encouraged by languages
like Ruby, Java and even Python will be far more deleterious on the
coming
hardware than they are today. Ironically, Java may have an edge because
it
has some deployment systems that can partition programs into
indepedently-schedulable pieces. I'd like to see something similar for
Ruby
(and have opened a project ("catamount") to do so) but it's still early.
E7559e558ececa67c40f452483b9ac8c?d=identicon&s=25 unknown (Guest)
on 2006-05-27 16:29
(Received via mailing list)
On May 27, 2006, at 7:50 AM, Francis Cianfrocca wrote:
> indepedently-schedulable pieces. I'd like to see something similar
> for Ruby
> (and have opened a project ("catamount") to do so) but it's still
> early.

Have you investigated or played around with the concurrency model
that Bertrand Meyer
has written about for Eiffel?  Last time I checked it wasn't
implemented but it seemed
like an interesting abstraction.

I do agree with you that it takes a lot of discipline to use threads
effectively.
Many times it seems like a standard multi-process model would work
just as well as
trying to play with fire in a shared address space.  Unix used to be
known for its
'cheap' processes and now everyone seems to think that process
creation is monumentally
expensive.

The Plan 9 approach to the process/thread dichotomy is pretty
interesting also.

Sometimes I think language design in the real world has been held
back by the
limitations of the two generally available OS frameworks (Unix and
Windows).


Gary Wright
455ac2a64d06dc8461f4d258d7f7e980?d=identicon&s=25 Michael Trier (Guest)
on 2006-05-27 16:32
(Received via mailing list)
Francis, you should consider writing a book on advanced programming
concepts.  You're a great communicator.

Michael
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-05-27 17:15
(Received via mailing list)
On Sat, 27 May 2006, ReggW wrote:

> Francis Cianfrocca wrote:
>
>> It seems to me that Ruby's green-thread implementation is perfectly
>> adequate
>> for most programmers' requirements.
>
> But the problem is that it doesn't take advantage of these new multi-core
> processor that are now starting to become the standard machines being sold
> (at least for my customers).

it's a small problem.  here is some code which starts two processes,
three if
you count the parent.  both run in separate processes using drb as the
ipc
layer to make the communication painless.  because the code uses drb the
com is
simple.  because it uses multiple processes it allows the kernel to
migrate
them to different cpus.  the cost is about 100 lines of pure-ruby (the
slave
lib).  notice how easy it is for parent to communicate with child and
for
childrent to communicate with each other:

     harp:~ > cat a.rb
     require 'slave'
     require 'yaml'

     class ProcessA
       def initialize(b) @b = b end
       def process(n) @b.process(n * n) end
       def pid() Process.pid end
     end

     class ProcessB
       def process(n) n + 6 end
       def pid() Process.pid end
     end

     b = Slave.new(ProcessB.new).object
     a = Slave.new(ProcessA.new(b)).object

     y 'a.pid' => a.pid
     y 'b.pid' => b.pid

     y 'answer' => a.process(6)


     harp:~ > ruby a.rb
     ---
     a.pid: 15142
     ---
     b.pid: 15141
     ---
     answer: 42


this is one of those things that allows one to consider designs that
would be
untenable in other languages.  obviously using this approach it would be
trivial to setup a job that spawned 16 intercommunicating proccess,
something
which would be absurd to code in c.

regards.


-a
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2006-05-27 19:59
(Received via mailing list)
Francis Cianfrocca wrote:
...
> Threads can be used for two basic purposes: to make your programs run
> faster, or to make them easier to write. Ruby's (and Java's) threads seem
> designed primarily to facilitate the latter. You can easily imagine several
> kinds of problems that are easier to model if you have access to relatively
> independent flows of control. Thus both languages have the "synchronize"
> method, taking an arbitrary code block, which makes it easy to lock
> relatively large chunks of code in "critical sections" without having to
> really design proper synchronization sets.

You probably mean "Thread.critical" or "Thread.exclusive", and not
"synchronize", at least in the context of ruby. (There is a
Mutex#synchronize and of course that does require you to think about
synchronization sets and ordering.)

> But to effectively use threads for higher performance and concurrency
> requires a large amount of experience and understanding, much of which
> takes
> platform dependencies into account. For just one example, I would want to
> use a spin lock in some situations, if I'm running on a multi-processor
> machine on certain hardware platforms. Ruby doesn't have one.

Doesn't have one and doesn't need one, as long as threads are green.
But, someday, when ruby has native threads, it will need spin locks.
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2006-05-27 20:05
(Received via mailing list)
gwtmp01@mac.com wrote:
...
> I do agree with you that it takes a lot of discipline to use threads
> effectively. Many times it seems like a standard multi-process model
> would work just as well as trying to play with fire in a shared
> address space.  Unix used to be known for its 'cheap' processes and
> now everyone seems to think that process creation is monumentally
> expensive.

Agree in general, but in the case of ruby, note that forking a ruby
process is more costly because of GC. In a short-lived child, GC can be
disabled to improve performance. [ruby-talk:186561]
E7559e558ececa67c40f452483b9ac8c?d=identicon&s=25 unknown (Guest)
on 2006-05-27 20:14
(Received via mailing list)
On May 27, 2006, at 2:03 PM, Joel VanderWerf wrote:

> process is more costly because of GC. In a short-lived child, GC
> can be
> disabled to improve performance. [ruby-talk:186561]

Interesting, thanks for the pointer.


Gary Wright
A0c079a7c3c9b2cf0bffebd84dc578b0?d=identicon&s=25 unknown (Guest)
on 2006-05-27 20:21
(Received via mailing list)
On May 27, 2006, at 10:14 AM, ara.t.howard@noaa.gov wrote:

>> processor that are now starting to become the standard machines
> migrate
> [snip cool example using 'slave']
cremes$ gem list -b |grep slave
slave (0.0.0)
     slave
cremes$ gem install slave
Attempting local installation of 'slave'
Local gem file not found: slave*.gem
Attempting remote installation of 'slave'
ERROR:  While executing gem ... (Gem::GemNotFoundException)
     Could not find slave (> 0) in the repository

Ruh roh!

cr


Chuck Remes
cremes@mac.com
www.familyvideovault.com (not yet live!)
6076c22b65b36f5d75c30bdcfb2fda85?d=identicon&s=25 Ezra Zygmuntowicz (Guest)
on 2006-05-27 20:27
(Received via mailing list)
On May 27, 2006, at 11:19 AM, cremes.devlist@mac.com wrote:

>>>
>> layer to make the communication painless.  because the code uses
>>     require 'slave'
> Attempting remote installation of 'slave'
> ERROR:  While executing gem ... (Gem::GemNotFoundException)
>     Could not find slave (> 0) in the repository
>
> Ruh roh!
>
> cr
>
>
> Chuck Remes


http://codeforpeople.com/lib/ruby/slave/

-Ezra
31ab75f7ddda241830659630746cdd3a?d=identicon&s=25 Austin Ziegler (Guest)
on 2006-05-27 20:58
(Received via mailing list)
On 5/27/06, Francis Cianfrocca <garbagecat10@gmail.com> wrote:
> (Solaris uses an extremely complex hybrid model
> which in my opinion has proven to be far more trouble than it's worth.)

Almost true. In Solaris 8, you can link with liblwp to get lightweight
process threads. In Solaris 9 and 10 (*especially* 10), you can just
use pthreads and you'll be getting LWP threads.

-austin
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-05-27 21:04
(Received via mailing list)
>>>You probably mean "Thread.critical" or "Thread.exclusive", and not
"synchronize", at least in the context of ruby. (There is a
Mutex#synchronize and of course that does require you to think about
synchronization sets and ordering.)

No, I mean Mutex#synchronize and its equivalents in Java and Python.
Proper
synchronization design is a fine art, and highly hardware and OS
dependent.
The simplicity of #synchronize encourages people not to learn it very
deeply. As I said upthread, the thread-support constructs provided by
Ruby,
Python, Java and similar languages seem designed to facilitate the goal
of
making threaded programming easier to do. This is of course a fine goal
in
itself. But using threads to make programs faster and more concurrent is
a
very different goal, one which IMO is NOT well supported by Java or any
of
the agile languages.

>>>Doesn't have one and doesn't need one, as long as threads are green.
But, someday, when ruby has native threads, it will need spin locks.

Fair enough as far as it goes. But green threads mean you can't take
advantage of multiprocessor hardware at all. (Python has the same
shortcoming, but for a different reason.) So as long as we're clear on
Ruby's goals (grace and ease of cross-platform development) and its
non-goals (performance and scalability), you don't need the more
powerful
thread-handling constructs, and for now there's nothing wrong with that.
But
all of this changes when serious multicore hardware like the Cell
processors
become the norm. At that point, we'll all need to get a lot better at
programming multithreaded, multiprocess or event-driven, and our
language
systems will have to evolve accordingly.
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-05-27 21:13
(Received via mailing list)
You're making a very interesting point, one I've made many times: you're
saying to write cooperative multiprocess rather than multithreaded
programs.
If you take aggregate costs into account (including time-to-market and
lifecycle maintenance and support), this approach can be far better than
multithreaded because it's so much more robust and easier to do. Whether
it's as fast, however, is a highly hardware and OS-dependent question.
If
you can specify multiprocessor or multicore hardware, multiprocess
software
design has a clear edge, IMO. And in a few years nearly all processors
for
general computation will be multicore.

(This is a side point (and as we know, the side points always generate
the
hottest flames), but I happen to disagree with your choice of DRb. Not
because of the communications model, but because distributed objects are
fundamentally problematic. I'd encourage you to look at multiprocess
event-driven systems. Watch for the upcoming pure-Ruby version of the
eventmachine library on Rubyforge- it will have built-in constructs to
explicitly support multiprocess event-driven programming.)
D3944ff4e8bc05067a615579b6ef599b?d=identicon&s=25 Phil Tomson (Guest)
on 2006-05-27 22:23
(Received via mailing list)
On 5/27/06, Francis Cianfrocca <garbagecat10@gmail.com> wrote:
> hardware. The most interesting one (to me at least) is the Cell, which
> Python is IMO quite a bit more sophisticated than Ruby in handling threads.
Is this because Python uses native threads?

> I don't rate Perl or PHP as serious contenders for thread-hot development
> for several reasons. (Besides, they often run inside of Apache processes,
> and Apache will naturally take some advantage of the newer hardware because
> of its multiprocess nature.)
>
> I come in for a lot of criticism because I don't care for the way many
> programmers are trained to use threads. But I think the shortcomings of the
> typical approach to threaded programming that is encouraged by languages
> like Ruby, Java and even Python will be far more deleterious on the coming
> hardware than they are today.

Do you think that threads are just the wrong model or metaphore?
For example, Io has the concept of Actors.

> Ironically, Java may have an edge because it
> has some deployment systems that can partition programs into
> indepedently-schedulable pieces. I'd like to see something similar for Ruby
> (and have opened a project ("catamount") to do so) but it's still early.
>
>

Given that fork'ing a new process is pretty cheap (on Linux, at least)
is that perhaps a better way to acheive concurrancy for us in the
short term? (or course there are lots of of other issues then like
sharing data between processes).

...looking forward to hearing more aobut catamount.

Phil
D3944ff4e8bc05067a615579b6ef599b?d=identicon&s=25 Phil Tomson (Guest)
on 2006-05-27 22:27
(Received via mailing list)
On 5/27/06, gwtmp01@mac.com <gwtmp01@mac.com> wrote:
>
 > Sometimes I think language design in the real world has been held
> back by the
> limitations of the two generally available OS frameworks (Unix and
> Windows).

And perhaps also the fact that we've been stuck with the Von Neumann
architecture for so long... or may be we've been stuck with the Von
Neumann architecture for so long because our languages haven't evolved
in order to effectively model a different architecture?

Phil
D3944ff4e8bc05067a615579b6ef599b?d=identicon&s=25 Phil Tomson (Guest)
on 2006-05-27 22:36
(Received via mailing list)
On 5/27/06, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov> wrote:
> > (at least for my customers).
>      require 'slave'
>        def pid() Process.pid end
>
> untenable in other languages.  obviously using this approach it would be
> trivial to setup a job that spawned 16 intercommunicating proccess, something
> which would be absurd to code in c.
>

Is your Slave code available?  (perhaps someone asked later; I miss
the newsgroup where I would be able to more easily tell if they did )

BTW: Just curious: Why are you require'ing Yaml?  Are you marshalling
in Drb with Yaml instead of the builtin marshalling?  If so, why?  Is
it faster?  (I wouldn't think so)

Phil

I really miss the gateway...
D3944ff4e8bc05067a615579b6ef599b?d=identicon&s=25 Phil Tomson (Guest)
on 2006-05-27 22:39
(Received via mailing list)
On 5/27/06, Francis Cianfrocca <garbagecat10@gmail.com> wrote:
> And in a few years nearly all processors for
> general computation will be multicore.

Yes, it's happening pretty quickly.

>
> (This is a side point (and as we know, the side points always generate the
> hottest flames), but I happen to disagree with your choice of DRb. Not
> because of the communications model, but because distributed objects are
> fundamentally problematic.

Can you elaborate?

>  I'd encourage you to look at multiprocess
> event-driven systems. Watch for the upcoming pure-Ruby version of the
> eventmachine library on Rubyforge- it will have built-in constructs to
> explicitly support multiprocess event-driven programming.)

Sounds interesting.

Phil

...still missing the gateway to c.l.r...
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-05-27 23:25
(Received via mailing list)
On Sun, 28 May 2006 cremes.devlist@mac.com wrote:

>    Could not find slave (> 0) in the repository
>
> Ruh roh!
>
> cr

it's here

   http://rubyforge.org/frs/?group_id=1024&release_id=5048

you can grab it or wait for the index to update.

sorry for hassle.

-a
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-05-27 23:28
(Received via mailing list)
On Sun, 28 May 2006, Phil Tomson wrote:

> Is your Slave code available?  (perhaps someone asked later; I miss
> the newsgroup where I would be able to more easily tell if they did )

http://rubyforge.org/frs/?group_id=1024&release_id=5048
http://codeforpeople.com/lib/ruby/slave/

> BTW: Just curious: Why are you require'ing Yaml?  Are you marshalling
> in Drb with Yaml instead of the builtin marshalling?  If so, why?  Is
> it faster?  (I wouldn't think so)

slave doesn't use it, but

   y 'this output is so much nicer to read' => 'than Kernel.p'

> I really miss the gateway...

indeed!

-a
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-05-27 23:40
(Received via mailing list)
On Sun, 28 May 2006, Francis Cianfrocca wrote:

> You're making a very interesting point, one I've made many times: you're
> saying to write cooperative multiprocess rather than multithreaded programs.
> If you take aggregate costs into account (including time-to-market and
> lifecycle maintenance and support), this approach can be far better than
> multithreaded because it's so much more robust and easier to do.

i agree totally.

> Whether it's as fast, however, is a highly hardware and OS-dependent
> question. If you can specify multiprocessor or multicore hardware,
> multiprocess software design has a clear edge, IMO. And in a few years
> nearly all processors for general computation will be multicore.

true.  but, for me, it's totally moot.  'fast' for me requires 30-50
nodes.
right now we are doing some processing on 30 nodes that will last 5
days.
each node has 4 cpus.  so, when compared to a single processor cpu that
something like 600 days of processing on wall-clock-time.  whether or
not the
code takes 30 minutes or 34 is largely besides the point.  the thing is
to get
the jobs out there, using rq (ruby queue), and then to spread them
across
cpus.  my approach is to keep the code simple and, when it's cpu bound
spread
it out across the cluster.  saves brainpower.  also, we can add 10 nodes
to
our cluster in about 2 hours.  i can't fix that many bugs in that
time... so i
prefer to use brute force and be simple/stupid about such things.

> (This is a side point (and as we know, the side points always generate the
> hottest flames), but I happen to disagree with your choice of DRb. Not
> because of the communications model, but because distributed objects are
> fundamentally problematic. I'd encourage you to look at multiprocess
> event-driven systems. Watch for the upcoming pure-Ruby version of the
> eventmachine library on Rubyforge- it will have built-in constructs to
> explicitly support multiprocess event-driven programming.)

the problem here is the same as with clustering - it's easy to send
events/jobs around - it's the data that's hard.  i'd argue that an
average
programmer working on a difficult multi-processing task could accomplish
it
much faster using drb than events/signals, etc.  this is because state
and
data become very, very important with logically difficult tasks and drb
makes
this trivial to manage in an atomic way.  i don't like handling events
one way
(signals, kpoll, whatever) and data (and atomic access to it) in two
ways.  my
mind is feeble so doing it the braindead way lets me get it done now,
get it
out on the cluster, let it run for three days before noticing a mistake,
and
then to repeat that about 3 or 4 times (seriously, our stuff will have
100s of
config params so we almost never get it right the first time).  but,
i'll
acknowledge that event driven programming is good for many applications
and
that your work on even machine is certainly appreciated.

cheers.

-a
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 Christian Neukirchen (Guest)
on 2006-05-27 23:56
(Received via mailing list)
"Phil Tomson" <rubyfan@gmail.com> writes:

> in order to effectively model a different architecture?
Shouldn't we know that since 1977?  What is everyone doing?

http://www.stanford.edu/class/cs242/readings/backus.pdf
E34b5cae57e0dd170114dba444e37852?d=identicon&s=25 Logan Capaldo (Guest)
on 2006-05-28 00:17
(Received via mailing list)
On May 27, 2006, at 11:14 AM, ara.t.howard@noaa.gov wrote:

>> processor that are now starting to become the standard machines
> migrate
>     class ProcessA
>     b = Slave.new(ProcessB.new).object
>     a.pid: 15142
> trivial to setup a job that spawned 16 intercommunicating proccess,
>
Dammit! I was about to write this library!

(Mine was going to look a little different:
   require 'task'


   x = "Hello"
   Task.new(x) { |o| puts o.upcase } # sets up Drb to have a connection
                                     # to x, forks, connects to x via
drb and
                                     # and yields the DrbObject to
the block

)
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2006-05-28 00:23
(Received via mailing list)
Logan Capaldo wrote:
...
>                                     # and yields the DrbObject to the block
What about

http://raa.ruby-lang.org/project/detach

(I've never used it and the last update is >2 yrs ago, though.)
E34b5cae57e0dd170114dba444e37852?d=identicon&s=25 Logan Capaldo (Guest)
on 2006-05-28 00:32
(Received via mailing list)
On May 27, 2006, at 6:22 PM, Joel VanderWerf wrote:

>> connection
> (I've never used it and the last update is >2 yrs ago, though.)
>
> --
>       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
>

Well I see I'm redundant ;)
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-05-28 01:00
(Received via mailing list)
On Sun, 28 May 2006, Logan Capaldo wrote:

>
> )

hi logan-

build your Task on top of Slave and send me a patch.  i'll incorporate
it and
release pronto.

regards.

-a
E34b5cae57e0dd170114dba444e37852?d=identicon&s=25 Logan Capaldo (Guest)
on 2006-05-28 01:03
(Received via mailing list)
On May 27, 2006, at 6:58 PM, ara.t.howard@noaa.gov wrote:

>>                                    # to x, forks, connects to x
> release pronto.
>
> regards.
>
> -a
> --
> be kind whenever possible... it is always possible.
> - h.h. the 14th dali lama
>

Ok.
Ba2582cd50a59359ac3f7305ad2a0429?d=identicon&s=25 ReggW (Guest)
on 2006-05-28 05:59
Francis Cianfrocca wrote:
> You're making a very interesting point, one I've made many times: you're
> saying to write cooperative multiprocess rather than multithreaded
> programs.

This may work well on Linux, but multiprocesses are very heavy on
Windows versus multithreads.

> If you take aggregate costs into account (including time-to-market and
> lifecycle maintenance and support), this approach can be far better than
> multithreaded because it's so much more robust and easier to do. Whether
> it's as fast, however, is a highly hardware and OS-dependent question.

But with multiprocess you would need to now develope a shared memory
scheme (MapFiles on Windows) or something similiar if your processes
need to communicate with each other.
I would just prefer to have native threads and the
syncronization/locks/etc available to me to do what I need to do.
25e11a00a89683f7e01e425a1a6e305c?d=identicon&s=25 Wilson Bilkovich (Guest)
on 2006-05-28 06:52
(Received via mailing list)
On 5/27/06, ReggW <me@yourhome.com> wrote:
> > multithreaded because it's so much more robust and easier to do. Whether
> > it's as fast, however, is a highly hardware and OS-dependent question.
>
> But with multiprocess you would need to now develope a shared memory
> scheme (MapFiles on Windows) or something similiar if your processes
> need to communicate with each other.
> I would just prefer to have native threads and the
> syncronization/locks/etc available to me to do what I need to do.
>

This is an excellent thread, full of good thoughts.
I'll chime in and say that my opinion lines in this direction:
http://rubyurl.com/DJB
I'm not sure a 'coordination language' is the right direction, but I
do think that threads are a deeply flawed model.
An earlier issue of Computer made some strong arguments in favor of
transactions, as well.
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 Joel VanderWerf (Guest)
on 2006-05-28 10:09
(Received via mailing list)
Wilson Bilkovich wrote:...
> http://rubyurl.com/DJB

In a perfect world, that url points to cr.yp.to, but Ed Lee is pretty
good too ;)
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-05-28 15:44
(Received via mailing list)
There's a very great deal to the subject of proper multiprocess design,
and
Ara's system upthread exemplifies a lot of it, but I'll briefly answer
your
points.

Windows: one is sorely tempted to ask why you're considering Windows for
a
seriously scalable application, but let's finesse it by noting that
aggregate cost does become a significant factor with scale. So if you
have
to use Windows, you're probably trying to meet a political requirement
rather than a technical one ;-).

More to the point, you don't really want to be forking a lot of
processes to
run a cooperative multiprocess application. Rather, you want the
processes
to be long-running. This does amortize their startup cost (which is
large
even on Unix), but far more importantly it gives you an opportunity to
avoid
context-switch overhead, which can be extremely expensive on modern
hardware. If your workpile consists of long-running tasks (which it
probably
doesn't), then you don't have to work too hard to get long-running
processes. Otherwise you need an event-driven sytem to keep them busy
(and
pinned to their respective processors).

Shared memory: no. Don't do that. Use IPC or network communications. No,
don't do that either. Use a proper event-passing library that wraps all
of
that up for you, so your remote-operation activations look like simple
function calls. Remember, you'll want to run your multiprocesses on
multiple
machines before you know it. (Avoid distributed objects if possible,
because
for one thing they force you to couple client and server processes, and
for
another you really don't want the management hassles if your network is
asynchronous.)
Ba2582cd50a59359ac3f7305ad2a0429?d=identicon&s=25 ReggW (Guest)
on 2006-05-28 15:51
Francis Cianfrocca wrote:

> Windows: one is sorely tempted to ask why you're considering Windows for
> a
> seriously scalable application, but let's finesse it by noting that
> aggregate cost does become a significant factor with scale. So if you
> have
> to use Windows, you're probably trying to meet a political requirement
> rather than a technical one ;-).

I'm not sure what you are getting at here , but Windows scales fines for
me and my clients.



> Shared memory: no. Don't do that. Use IPC or network communications. No,
> don't do that either. Use a proper event-passing library that wraps all
> of
> that up for you, so your remote-operation activations look like simple
> function calls. Remember, you'll want to run your multiprocesses on
> multiple
> machines before you know it.

Where does this "event-passing" library reside?
Is it in Ruby, Perl, Java, C# or is this just a theory?
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-05-28 16:08
(Received via mailing list)
Edward Lee makes many interesting points, none more than in the section
"Coordination Languages" near the bottom of the paper. He points out
that,
essentially due to inertia, many new models have been proposed but not
adopted. (Side point: I *really* like Erlang, which Lee mentions at
several
points. What a beautiful design.) To this point, I'd add the following:
necessity drives uptake. Some of the approaches to the scalable-design
problem will emerge into common use simply because they have to, and the
leaders who take the risks will be well rewarded. The problem is
becoming
urgent because of the rise of multicore hardware. What's really nice
about
all this is that we will soon have the tools to build applications that
haven't even been imagined yet.

I'll add another hopefully provocative point (wrapped in an hommage to
Fortran): I don't know what the coordination language will look like,
but I
do know what it will be named: Ruby!
481b8eedcc884289756246e12d1869c1?d=identicon&s=25 Francis Cianfrocca (Guest)
on 2006-05-28 16:18
(Received via mailing list)
Event-libraries: I've been writing these in C++ for over ten years.
There is
plenty of such stuff available for Java. I don't know much about C# (and
I
wish I knew less). There is some work going on in Ruby now, so stay
tuned.
(Always bearing in mind that a key goals of any such framework is to be
both
platform and language neutral.)
Cb48ca5059faf7409a5ab3745a964696?d=identicon&s=25 unknown (Guest)
on 2006-05-28 17:56
(Received via mailing list)
On Sun, 28 May 2006, Francis Cianfrocca wrote:

> If your workpile consists of long-running tasks (which it probably doesn't),

in fact, in my particular situation, it really does.  a task may task 1
hour
or 5 days.  so you have to put my comments in that context.  still i've
found
rq or tuplespace scales well till about 30s jobs and, imho, if your jobs
are
faster than that it's easier to bunch them in groups of 100 than to
modify
your job distribution system...

> then you don't have to work too hard to get long-running processes.
> Otherwise you need an event-driven sytem to keep them busy (and pinned to
> their respective processors).

a good point.  this is precisely why we find using rq for our cluster to
be so
applicable - the cost of a pure ruby solution is nothing compared to the
acutal work to be done.  if jobs start taking 0.5ms to run then that
wouldn't
be the case at all to be sure.

> Shared memory: no. Don't do that. Use IPC or network communications. No,
> don't do that either. Use a proper event-passing library that wraps all of
> that up for you, so your remote-operation activations look like simple
> function calls. Remember, you'll want to run your multiprocesses on multiple
> machines before you know it. (Avoid distributed objects if possible, because
> for one thing they force you to couple client and server processes, and for
> another you really don't want the management hassles if your network is
> asynchronous.)

i'm unclear on exactly what you're advocating here.  how does remote
event
driven programming couple your design any less that, say, a tuplespace
of jobs
looked up via rinda/ring?  the same question applies to 'management
hassles'
where a tuplespace makes it trivial to handle 'events' in an
asynchronous
fashion that's very similar to a tradition event loop.

i'm just trying to learn more here about where even driven programming
might
fit into my bag of tricks.  the big question i still have, and what
seems like
a show stopper to me for my applications is:

   - data.  once you've received an event.  where's the data?  where is
your
     config, your input, and where does your output go.  with a
tuplespace you
     can use the exact same logic for all.  with rq this is all encoded
in to
     job object.  please don't say use marshal because that just too
crazy to
     even think about debugging...

   - point to point communication.  with rq or tuplespace the logic is
to
     simply put a job 'out there' and that some node will 'take it'.  we
don't
     care which node does so long as one does.  the lack of coupling
between
     tasks and clients builds a very robust system since no client
relies on
     any other.  take the example of 'broadcasting' a job: with rq or
     tuplespace you simple put it in the queue, with event driven
programming
     you either hit every client with tcp or broadcast with udp and open
     yourself up for a flood of responses and the difficult programming
task of
     coordinating atomic handshaking to grant access to one, and only
one,
     client.  am i missing something obivous here or is this a tough
thing to
     handle with an event driven paradigm?  how would you design a
system where
     30 nodes pulled jobs from a central list as fast as they could with
event
     driven programing?  (note that i'm specing a pull vs. push model to
avoid
     any scheduling issues - all nodes bail water as fast as they can so
     scheduling it optimal for simple parallel tasks).

regards.

-a
3bb23e7770680ea44a2d79e6d10daaed?d=identicon&s=25 M. Edward (Ed) Borasky (Guest)
on 2006-05-28 21:21
(Received via mailing list)
Francis Cianfrocca wrote:
> urgent because of the rise of multicore hardware. What's really nice
>> http://rubyurl.com/DJB
>> I'm not sure a 'coordination language' is the right direction, but I
>> do think that threads are a deeply flawed model.
>> An earlier issue of Computer made some strong arguments in favor of
>> transactions, as well.
I've been looking for a place to jump into this -- er -- thread -- and
this looks like as good a place as any. Let me take a meta position
here, as someone who's been in computing, mostly scientific, for over 40
years.

1. We want to solve *big* problems. Whether it's keeping track of
millions of peoples' accounts, emulating the big bang, designing cures
for genetic illnesses, beating some arrogant chess grandmaster, proving
theorems that have defied humans, in some cases for centuries,
predicting the path of hurricanes or maintaining a complete collection
of Western classical music on a piece of plastic the size of a human
thumb, our desire is to solve problems bigger and bigger.

2. There are *two* fundamental limits to our ability to solve big
problems. The hardware/technology limit is that we can only make
transistors so small before they start to function not as transistors
but as something totally useless for building a digital computer.

The second limit is more profound. The software/human limit is that
there are in fact problems which are impossible to solve in software,
and other problems that are not impossible but whose time to solve grows
in an unrealistic way with the size of the problem.

3. The evolutions and revolutions in scientific and commercial computing
in the four decades I've been in the business have been mixes of
"general-purpose" and "special-purpose" hardware, languages and
algorithms.

So what does all this mean for Ruby, threads, multi-core processors and
the users thereof?

1. Ruby is decidedly a general-purpose language and environment. I don't
think it's realistic to expect Ruby to solve large sets of equations,
either numerically or symbolically, act as a synthesizer, or run a hard
real-time process control application. Because it is general purpose,
you *could* do these things in Ruby on an Intel PC running Windows or
Linux, but there are better ways to do them.

2. Threads are here to stay. So are monitors, semaphores, shared memory,
distributed memory, message passing, massively parallel SIMD machines,
symmetric and asymmetric multiprocessing, DSP chips, 64-bit address
spaces, IEEE floating point arithmetic, disk drives sealed off from the
outside world, interrupts, Windows, Linux and probably BSD. So are Ruby,
Perl, PHP, Python, R, Lisp, Fortran, C, Java, Forth and .NET. So are
both proprietary and open source software. :)

3. My next computer will be a multi-core 64-bit machine with hardware
virtualization support running both Linux and Windows in some kind of
virtualized manner. Until I can afford that, I'll keep my current stable
of 32-bit machines and spend my money on food, clothing, shelter and
transportation. :) By then, I will have learned Ruby, and there will be
a Ruby virtual machine capable of doing all the tasks in 1 efficiently
on this hardware. Maybe there will even be world peace and a committment
to deal with global warming. :)

Speaking of World Peace, for those in the USA, Happy Memorial Day.

<ducking>
--
M. Edward (Ed) Borasky

http://linuxcapacityplanning.com
31867bf7d269297b4f757f6dd1014ba2?d=identicon&s=25 Xiao-Feng Li (Guest)
on 2006-05-29 10:14
(Received via mailing list)
On 5/28/06, Francis Cianfrocca <garbagecat10@gmail.com> wrote:
> Windows: one is sorely tempted to ask why you're considering Windows for a
> seriously scalable application, but let's finesse it by noting that
> aggregate cost does become a significant factor with scale. So if you have
> to use Windows, you're probably trying to meet a political requirement
> rather than a technical one ;-).

Windows threads are pretty scalable in a SMP system. Was better than
Linuxthreads in my experience.

> More to the point, you don't really want to be forking a lot of processes to
> run a cooperative multiprocess application. Rather, you want the processes
> to be long-running. This does amortize their startup cost (which is large
> even on Unix), but far more importantly it gives you an opportunity to avoid
> context-switch overhead, which can be extremely expensive on modern
> hardware. If your workpile consists of long-running tasks (which it probably
> doesn't), then you don't have to work too hard to get long-running
> processes. Otherwise you need an event-driven sytem to keep them busy (and
> pinned to their respective processors).

When the tasks are runtime generated, we found thread pool is useful.

> Shared memory: no. Don't do that. Use IPC or network communications. No,
> don't do that either. Use a proper event-passing library that wraps all of
> that up for you, so your remote-operation activations look like simple
> function calls. Remember, you'll want to run your multiprocesses on multiple
> machines before you know it. (Avoid distributed objects if possible, because
> for one thing they force you to couple client and server processes, and for
> another you really don't want the management hassles if your network is
> asynchronous.)
>

Shared memory is very convenient for certain workloads. I developed
applications with various programming models like dataflow, event
driven, threading, message passing, and found shared memory
programming frequently outperformed others in both programmability and
performance.

-xiaofeng
This topic is locked and can not be replied to.