Ruby Threads

ReggW · May 27, 2006, 11:25pm

On Sun, 28 May 2006 [email protected] wrote:

Could not find slave (> 0) in the repository

Ruh roh!

cr

it’s here

http://rubyforge.org/frs/?group_id=1024&release_id=5048

you can grab it or wait for the index to update.

sorry for hassle.

-a

ReggW · May 27, 2006, 10:27pm

On 5/27/06, [email protected] [email protected] wrote:

Sometimes I think language design in the real world has been held
back by the
limitations of the two generally available OS frameworks (Unix and
Windows).

And perhaps also the fact that we’ve been stuck with the Von Neumann
architecture for so long… or may be we’ve been stuck with the Von
Neumann architecture for so long because our languages haven’t evolved
in order to effectively model a different architecture?

Phil

ReggW · May 27, 2006, 11:28pm

On Sun, 28 May 2006, Phil T. wrote:

Is your Slave code available? (perhaps someone asked later; I miss
the newsgroup where I would be able to more easily tell if they did )

http://rubyforge.org/frs/?group_id=1024&release_id=5048
http://codeforpeople.com/lib/ruby/slave/

BTW: Just curious: Why are you require’ing Yaml? Are you marshalling
in Drb with Yaml instead of the builtin marshalling? If so, why? Is
it faster? (I wouldn’t think so)

slave doesn’t use it, but

y ‘this output is so much nicer to read’ => ‘than Kernel.p’

I really miss the gateway…

indeed!

-a

ReggW · May 27, 2006, 11:56pm

“Phil T.” [email protected] writes:

in order to effectively model a different architecture?
Shouldn’t we know that since 1977? What is everyone doing?

http://www.stanford.edu/class/cs242/readings/backus.pdf

ReggW · May 27, 2006, 11:40pm

On Sun, 28 May 2006, Francis C. wrote:

You’re making a very interesting point, one I’ve made many times: you’re
saying to write cooperative multiprocess rather than multithreaded programs.
If you take aggregate costs into account (including time-to-market and
lifecycle maintenance and support), this approach can be far better than
multithreaded because it’s so much more robust and easier to do.

i agree totally.

Whether it’s as fast, however, is a highly hardware and OS-dependent
question. If you can specify multiprocessor or multicore hardware,
multiprocess software design has a clear edge, IMO. And in a few years
nearly all processors for general computation will be multicore.

true. but, for me, it’s totally moot. ‘fast’ for me requires 30-50
nodes.
right now we are doing some processing on 30 nodes that will last 5
days.
each node has 4 cpus. so, when compared to a single processor cpu that
something like 600 days of processing on wall-clock-time. whether or
not the
code takes 30 minutes or 34 is largely besides the point. the thing is
to get
the jobs out there, using rq (ruby queue), and then to spread them
across
cpus. my approach is to keep the code simple and, when it’s cpu bound
spread
it out across the cluster. saves brainpower. also, we can add 10 nodes
to
our cluster in about 2 hours. i can’t fix that many bugs in that
time… so i
prefer to use brute force and be simple/stupid about such things.

(This is a side point (and as we know, the side points always generate the
hottest flames), but I happen to disagree with your choice of DRb. Not
because of the communications model, but because distributed objects are
fundamentally problematic. I’d encourage you to look at multiprocess
event-driven systems. Watch for the upcoming pure-Ruby version of the
eventmachine library on Rubyforge- it will have built-in constructs to
explicitly support multiprocess event-driven programming.)

the problem here is the same as with clustering - it’s easy to send
events/jobs around - it’s the data that’s hard. i’d argue that an
average
programmer working on a difficult multi-processing task could accomplish
it
much faster using drb than events/signals, etc. this is because state
and
data become very, very important with logically difficult tasks and drb
makes
this trivial to manage in an atomic way. i don’t like handling events
one way
(signals, kpoll, whatever) and data (and atomic access to it) in two
ways. my
mind is feeble so doing it the braindead way lets me get it done now,
get it
out on the cluster, let it run for three days before noticing a mistake,
and
then to repeat that about 3 or 4 times (seriously, our stuff will have
100s of
config params so we almost never get it right the first time). but,
i’ll
acknowledge that event driven programming is good for many applications
and
that your work on even machine is certainly appreciated.

cheers.

-a

ReggW · May 28, 2006, 12:23am

Logan C. wrote:
…

                                # and yields the DrbObject to the block

What about

http://raa.ruby-lang.org/project/detach

(I’ve never used it and the last update is >2 yrs ago, though.)

ReggW · May 28, 2006, 12:17am

On May 27, 2006, at 11:14 AM, [email protected] wrote:

processor that are now starting to become the standard machines
migrate
class ProcessA
b = Slave.new(ProcessB.new).object
a.pid: 15142
trivial to setup a job that spawned 16 intercommunicating proccess,

Dammit! I was about to write this library!

(Mine was going to look a little different:
require ‘task’

x = “Hello”
Task.new(x) { |o| puts o.upcase } # sets up Drb to have a connection
# to x, forks, connects to x via
drb and
# and yields the DrbObject to
the block

)

ReggW · May 28, 2006, 12:32am

On May 27, 2006, at 6:22 PM, Joel VanderWerf wrote:

connection
(I’ve never used it and the last update is >2 yrs ago, though.)

–
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Well I see I’m redundant

ReggW · May 28, 2006, 1:00am

On Sun, 28 May 2006, Logan C. wrote:

)

hi logan-

build your Task on top of Slave and send me a patch. i’ll incorporate
it and
release pronto.

regards.

-a

ReggW · May 28, 2006, 5:59am

Francis C. wrote:

You’re making a very interesting point, one I’ve made many times: you’re
saying to write cooperative multiprocess rather than multithreaded
programs.

This may work well on Linux, but multiprocesses are very heavy on
Windows versus multithreads.

If you take aggregate costs into account (including time-to-market and
lifecycle maintenance and support), this approach can be far better than
multithreaded because it’s so much more robust and easier to do. Whether
it’s as fast, however, is a highly hardware and OS-dependent question.

But with multiprocess you would need to now develope a shared memory
scheme (MapFiles on Windows) or something similiar if your processes
need to communicate with each other.
I would just prefer to have native threads and the
syncronization/locks/etc available to me to do what I need to do.

ReggW · May 28, 2006, 1:03am

On May 27, 2006, at 6:58 PM, [email protected] wrote:

                               # to x, forks, connects to x  
release pronto.

regards.

-a

be kind whenever possible… it is always possible.

h.h. the 14th dali lama

Ok.

ReggW · May 28, 2006, 6:52am

On 5/27/06, ReggW [email protected] wrote:

multithreaded because it’s so much more robust and easier to do. Whether
it’s as fast, however, is a highly hardware and OS-dependent question.

But with multiprocess you would need to now develope a shared memory
scheme (MapFiles on Windows) or something similiar if your processes
need to communicate with each other.
I would just prefer to have native threads and the
syncronization/locks/etc available to me to do what I need to do.

This is an excellent thread, full of good thoughts.
I’ll chime in and say that my opinion lines in this direction:
http://rubyurl.com/DJB
I’m not sure a ‘coordination language’ is the right direction, but I
do think that threads are a deeply flawed model.
An earlier issue of Computer made some strong arguments in favor of
transactions, as well.

ReggW · May 28, 2006, 3:44pm

There’s a very great deal to the subject of proper multiprocess design,
and
Ara’s system upthread exemplifies a lot of it, but I’ll briefly answer
your
points.

Windows: one is sorely tempted to ask why you’re considering Windows for
a
seriously scalable application, but let’s finesse it by noting that
aggregate cost does become a significant factor with scale. So if you
have
to use Windows, you’re probably trying to meet a political requirement
rather than a technical one ;-).

More to the point, you don’t really want to be forking a lot of
processes to
run a cooperative multiprocess application. Rather, you want the
processes
to be long-running. This does amortize their startup cost (which is
large
even on Unix), but far more importantly it gives you an opportunity to
avoid
context-switch overhead, which can be extremely expensive on modern
hardware. If your workpile consists of long-running tasks (which it
probably
doesn’t), then you don’t have to work too hard to get long-running
processes. Otherwise you need an event-driven sytem to keep them busy
(and
pinned to their respective processors).

Shared memory: no. Don’t do that. Use IPC or network communications. No,
don’t do that either. Use a proper event-passing library that wraps all
of
that up for you, so your remote-operation activations look like simple
function calls. Remember, you’ll want to run your multiprocesses on
multiple
machines before you know it. (Avoid distributed objects if possible,
because
for one thing they force you to couple client and server processes, and
for
another you really don’t want the management hassles if your network is
asynchronous.)

ReggW · May 28, 2006, 3:51pm

Francis C. wrote:

Windows: one is sorely tempted to ask why you’re considering Windows for
a
seriously scalable application, but let’s finesse it by noting that
aggregate cost does become a significant factor with scale. So if you
have
to use Windows, you’re probably trying to meet a political requirement
rather than a technical one ;-).

I’m not sure what you are getting at here , but Windows scales fines for
me and my clients.

Shared memory: no. Don’t do that. Use IPC or network communications. No,
don’t do that either. Use a proper event-passing library that wraps all
of
that up for you, so your remote-operation activations look like simple
function calls. Remember, you’ll want to run your multiprocesses on
multiple
machines before you know it.

Where does this “event-passing” library reside?
Is it in Ruby, Perl, Java, C# or is this just a theory?

ReggW · May 28, 2006, 10:09am

Wilson B. wrote:…

http://rubyurl.com/DJB

In a perfect world, that url points to cr.yp.to, but Ed Lee is pretty
good too

ReggW · May 28, 2006, 4:18pm

Event-libraries: I’ve been writing these in C++ for over ten years.
There is
plenty of such stuff available for Java. I don’t know much about C# (and
I
wish I knew less). There is some work going on in Ruby now, so stay
tuned.
(Always bearing in mind that a key goals of any such framework is to be
both
platform and language neutral.)

ReggW · May 28, 2006, 4:08pm

Edward Lee makes many interesting points, none more than in the section
“Coordination Languages” near the bottom of the paper. He points out
that,
essentially due to inertia, many new models have been proposed but not
adopted. (Side point: I really like Erlang, which Lee mentions at
several
points. What a beautiful design.) To this point, I’d add the following:
necessity drives uptake. Some of the approaches to the scalable-design
problem will emerge into common use simply because they have to, and the
leaders who take the risks will be well rewarded. The problem is
becoming
urgent because of the rise of multicore hardware. What’s really nice
about
all this is that we will soon have the tools to build applications that
haven’t even been imagined yet.

I’ll add another hopefully provocative point (wrapped in an hommage to
Fortran): I don’t know what the coordination language will look like,
but I
do know what it will be named: Ruby!

ReggW · May 28, 2006, 5:56pm

On Sun, 28 May 2006, Francis C. wrote:

If your workpile consists of long-running tasks (which it probably doesn’t),

in fact, in my particular situation, it really does. a task may task 1
hour
or 5 days. so you have to put my comments in that context. still i’ve
found
rq or tuplespace scales well till about 30s jobs and, imho, if your jobs
are
faster than that it’s easier to bunch them in groups of 100 than to
modify
your job distribution system…

then you don’t have to work too hard to get long-running processes.
Otherwise you need an event-driven sytem to keep them busy (and pinned to
their respective processors).

a good point. this is precisely why we find using rq for our cluster to
be so
applicable - the cost of a pure ruby solution is nothing compared to the
acutal work to be done. if jobs start taking 0.5ms to run then that
wouldn’t
be the case at all to be sure.

Shared memory: no. Don’t do that. Use IPC or network communications. No,
don’t do that either. Use a proper event-passing library that wraps all of
that up for you, so your remote-operation activations look like simple
function calls. Remember, you’ll want to run your multiprocesses on multiple
machines before you know it. (Avoid distributed objects if possible, because
for one thing they force you to couple client and server processes, and for
another you really don’t want the management hassles if your network is
asynchronous.)

i’m unclear on exactly what you’re advocating here. how does remote
event
driven programming couple your design any less that, say, a tuplespace
of jobs
looked up via rinda/ring? the same question applies to ‘management
hassles’
where a tuplespace makes it trivial to handle ‘events’ in an
asynchronous
fashion that’s very similar to a tradition event loop.

i’m just trying to learn more here about where even driven programming
might
fit into my bag of tricks. the big question i still have, and what
seems like
a show stopper to me for my applications is:

data. once you’ve received an event. where’s the data? where is
your
config, your input, and where does your output go. with a
tuplespace you
can use the exact same logic for all. with rq this is all encoded
in to
job object. please don’t say use marshal because that just too
crazy to
even think about debugging…
point to point communication. with rq or tuplespace the logic is
to
simply put a job ‘out there’ and that some node will ‘take it’. we
don’t
care which node does so long as one does. the lack of coupling
between
tasks and clients builds a very robust system since no client
relies on
any other. take the example of ‘broadcasting’ a job: with rq or
tuplespace you simple put it in the queue, with event driven
programming
you either hit every client with tcp or broadcast with udp and open
yourself up for a flood of responses and the difficult programming
task of
coordinating atomic handshaking to grant access to one, and only
one,
client. am i missing something obivous here or is this a tough
thing to
handle with an event driven paradigm? how would you design a
system where
30 nodes pulled jobs from a central list as fast as they could with
event
driven programing? (note that i’m specing a pull vs. push model to
avoid
any scheduling issues - all nodes bail water as fast as they can so
scheduling it optimal for simple parallel tasks).

regards.

-a

ReggW · May 28, 2006, 9:21pm

Francis C. wrote:

urgent because of the rise of multicore hardware. What’s really nice

http://rubyurl.com/DJB
I’m not sure a ‘coordination language’ is the right direction, but I
do think that threads are a deeply flawed model.
An earlier issue of Computer made some strong arguments in favor of
transactions, as well.
I’ve been looking for a place to jump into this – er – thread – and
this looks like as good a place as any. Let me take a meta position
here, as someone who’s been in computing, mostly scientific, for over 40
years.

We want to solve big problems. Whether it’s keeping track of
millions of peoples’ accounts, emulating the big bang, designing cures
for genetic illnesses, beating some arrogant chess grandmaster, proving
theorems that have defied humans, in some cases for centuries,
predicting the path of hurricanes or maintaining a complete collection
of Western classical music on a piece of plastic the size of a human
thumb, our desire is to solve problems bigger and bigger.
There are two fundamental limits to our ability to solve big
problems. The hardware/technology limit is that we can only make
transistors so small before they start to function not as transistors
but as something totally useless for building a digital computer.

The second limit is more profound. The software/human limit is that
there are in fact problems which are impossible to solve in software,
and other problems that are not impossible but whose time to solve grows
in an unrealistic way with the size of the problem.

The evolutions and revolutions in scientific and commercial computing
in the four decades I’ve been in the business have been mixes of
“general-purpose” and “special-purpose” hardware, languages and
algorithms.

So what does all this mean for Ruby, threads, multi-core processors and
the users thereof?

Ruby is decidedly a general-purpose language and environment. I don’t
think it’s realistic to expect Ruby to solve large sets of equations,
either numerically or symbolically, act as a synthesizer, or run a hard
real-time process control application. Because it is general purpose,
you could do these things in Ruby on an Intel PC running Windows or
Linux, but there are better ways to do them.
Threads are here to stay. So are monitors, semaphores, shared memory,
distributed memory, message passing, massively parallel SIMD machines,
symmetric and asymmetric multiprocessing, DSP chips, 64-bit address
spaces, IEEE floating point arithmetic, disk drives sealed off from the
outside world, interrupts, Windows, Linux and probably BSD. So are Ruby,
Perl, PHP, Python, R, Lisp, Fortran, C, Java, Forth and .NET. So are
both proprietary and open source software.
My next computer will be a multi-core 64-bit machine with hardware
virtualization support running both Linux and Windows in some kind of
virtualized manner. Until I can afford that, I’ll keep my current stable
of 32-bit machines and spend my money on food, clothing, shelter and
transportation. By then, I will have learned Ruby, and there will be
a Ruby virtual machine capable of doing all the tasks in 1 efficiently
on this hardware. Maybe there will even be world peace and a committment
to deal with global warming.

Speaking of World Peace, for those in the USA, Happy Memorial Day.

-- M. Edward (Ed) Borasky

ReggW · May 29, 2006, 10:14am

On 5/28/06, Francis C. [email protected] wrote:

Windows: one is sorely tempted to ask why you’re considering Windows for a
seriously scalable application, but let’s finesse it by noting that
aggregate cost does become a significant factor with scale. So if you have
to use Windows, you’re probably trying to meet a political requirement
rather than a technical one ;-).

Windows threads are pretty scalable in a SMP system. Was better than
Linuxthreads in my experience.

More to the point, you don’t really want to be forking a lot of processes to
run a cooperative multiprocess application. Rather, you want the processes
to be long-running. This does amortize their startup cost (which is large
even on Unix), but far more importantly it gives you an opportunity to avoid
context-switch overhead, which can be extremely expensive on modern
hardware. If your workpile consists of long-running tasks (which it probably
doesn’t), then you don’t have to work too hard to get long-running
processes. Otherwise you need an event-driven sytem to keep them busy (and
pinned to their respective processors).

When the tasks are runtime generated, we found thread pool is useful.

Shared memory: no. Don’t do that. Use IPC or network communications. No,
don’t do that either. Use a proper event-passing library that wraps all of
that up for you, so your remote-operation activations look like simple
function calls. Remember, you’ll want to run your multiprocesses on multiple
machines before you know it. (Avoid distributed objects if possible, because
for one thing they force you to couple client and server processes, and for
another you really don’t want the management hassles if your network is
asynchronous.)

Shared memory is very convenient for certain workloads. I developed
applications with various programming models like dataflow, event
driven, threading, message passing, and found shared memory
programming frequently outperformed others in both programmability and
performance.

-xiaofeng