Working on multiple machines in a LAN

aris · August 29, 2012, 12:58pm

I have a program, now I want this to be run on multiple computers
individually.
Among them one computer (say boss) will distribute inputs to other
computers (say followers) and itself in a LAN. The boss will also
collect outputs from followers. Whenever a follower (including boss)
finishes its task, it acknowledges to the boss, and boss gives the next
input to that machine.

How can it be done?
I know there may be many possibilities… but just need your suggestions.

proggrammer · August 29, 2012, 1:30pm

This is just plain old distributed computing in the style of seti@home
but you could make it simpler.

You could just put files in a shared directory and the workers just
grab files from the directory and work on them. File names like
‘job_1234_worker_1.txt’ would be picked up by worker 1 (and no one
else). The worker would then write the results back to the shared
directory as ‘results_1234_worker_1.txt’ and pick up another job.

The boss would then just have to make sure that there are enough jobs
to keep all the workers busy and do whatever it need to with the
results.

proggrammer · August 29, 2012, 2:18pm

ajay paswan wrote in post #1073756:

I have a program, now I want this to be run on multiple computers
individually.
Among them one computer (say boss) will distribute inputs to other
computers (say followers) and itself in a LAN. The boss will also
collect outputs from followers. Whenever a follower (including boss)
finishes its task, it acknowledges to the boss, and boss gives the next
input to that machine.

How can it be done?

If you want to do this entirely in ruby, have a look at resque. The boss
can queue up jobs (they actually go in a redis database) and the workers
can do work. They can post their results back into redis or into a
shared filesystem or whatever you choose.

If you want something a little less closely coupled to ruby, look at
condor. This allows you to submit arbitrary jobs (which could be shell
scripts or ruby scripts or binary executables) with arguments, and runs
them on available worker nodes. There’s a lot of functionality there,
including the ability to set up DAGs (Directed Acyclic Graphs) of jobs,
so that jobs don’t run until all their necessary precursors have run,
and fitting jobs into slots of available CPU and RAM.

Links:

http://research.microsoft.com/en-us/events/scidata04/todd_tannebaum.ppt
https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=GettingStarted

For some other related options see:

proggrammer · September 3, 2012, 1:54pm

On 3 September 2012 12:41, ajay paswan [email protected] wrote:

How will we compare above technologies for my purpose? in terms of ease
of implementation, scalable, efficient?

Well the question is really are any of the technologies sufficient for
your needs. It doesn’t matter if technology X can scale to 10,000
workers but Y can only handle 1000 if you will have at most 20. How
efficient it is depends on the type of work the workers are doing. If
each job could take up to 5 hours then what does it matter if it takes
a minute or more to get or return work.

When I last did this I used XMLRPC and Perl and it worked over the
internet. It worked well enough for me. Before that I used a shared
directory on a LAN just as I mentioned in my first reply. This too
worked well enough.

I would implement the simplest solution and see how it goes.

proggrammer · September 3, 2012, 1:59pm

I would implement the simplest solution and see how it goes.

Do you think Lucas Carlson | Entrepreneur, Author, and Technology Executive will be simple? I am feeling
whatever you said is very close to starfish.

Just like a stupid question, how can I write to ‘job_1234_worker_1.txt’
on runtime, which will be accessed by a worker too?

proggrammer · September 3, 2012, 1:41pm

Brian C. wrote in post #1073771:

ajay paswan wrote in post #1073756:

I have a program, now I want this to be run on multiple computers
individually.
Among them one computer (say boss) will distribute inputs to other
computers (say followers) and itself in a LAN. The boss will also
collect outputs from followers. Whenever a follower (including boss)
finishes its task, it acknowledges to the boss, and boss gives the next
input to that machine.

How can it be done?

If you want to do this entirely in ruby, have a look at resque. The boss
can queue up jobs (they actually go in a redis database) and the workers
can do work. They can post their results back into redis or into a
shared filesystem or whatever you choose.
GitHub - defunkt/resque: Moved to resque/resque

Can we guarantee that we use all machine whenever it gets free? and same
task should not run more than once?

What about: starfish (Lucas Carlson | Entrepreneur, Author, and Technology Executive)?

What if I use simple socket programming? I mean a chat server will help
distribute the works?

Watirgrid? it needs watir which is only for windows… so cant use… as I
need it to be run both on windows and linux.

What if I use: http://celeryproject.org/ and
https://github.com/leapfrogdevelopment/rcelery

How will we compare above technologies for my purpose? in terms of ease
of implementation, scalable, efficient?

proggrammer · September 3, 2012, 2:12pm

ajay paswan wrote in post #1074428:

Can we guarantee that we use all machine whenever it gets free? and same
task should not run more than once?

What about: starfish (Lucas Carlson | Entrepreneur, Author, and Technology Executive)?

What if I use simple socket programming? I mean a chat server will help
distribute the works?

Sure there are lots of solutions. Choose one, try it out. If it doesn’t
meet your needs, try another. If your code is modular, it should be
straightforward to connect your actual worker code to any of these
frameworks.

At the simplest level you can have a Queue object (from thread.rb) and
share it using DRb; the workers call q.pop and they will block until a
job is available. I’ve done that before. One I haven’t used is Rinda. I
haven’t used Starfish either. Like you say, you can do something
directly with Socket, or gserver.rb. The lower-level you go, the more
work you’ll probably have to do around handling error conditions and
retries.

I mention Resque because I’ve used it myself. It comes with a web
interface which lets you see the queued jobs, the running workers, what
each worker is doing, and failed jobs - you can trigger retries via the
web interface. If that’s something you need in your application, then
this will save you work implementing it yourself.

What resque is not so strong at is passing the results back from a
particular worker, or notifying a central point when a job has
completed.

proggrammer · September 3, 2012, 3:27pm

On 3 September 2012 12:59, ajay paswan [email protected] wrote:

Just like a stupid question, how can I write to ‘job_1234_worker_1.txt’
on runtime, which will be accessed by a worker too?

We we are assuming a common directory where both the boss and worker
can read and write to, you said that this was on a lan. The the boss
would do something like

tmp_filename =
“#{COMMON_DIRECTORY}/job_#{job_number}worker#{worker_id}.tmp”
real_filename =
“#{COMMON_DIRECTORY}/job_#{job_number}worker#{worker_id}.txt”
f = File.new(tmp_filename, “w”)
f.puts … # Write the contents of the file
f.close

File.rename?(tmp_filename, real_filename)

This should stop the worker being able to read the file before the
boss has finished writing to it. Then the worker just has to

while true
Dir[“#{COMMON_DIRECTORY/job_*worker#{my_id}.txt”].each do |file|
# Read the contents of the file and extract the job_id from either
the
# filename or the contents of the file itself.
# Do the work
# Write the results to
“#{COMMON_DIRECTORY}/results_#{job_id}worker#{my_id}.txt”
# in a similar manner as above ‘write to .tmp, rename to .txt’
File.delete(file)
end

sleep 60 # We have processed all the available jobs, lets wait a
minute before we look again
end

Basically the worker just monitors the common directory, picks up work
and writes results. The boss however has two tasks to undertake. 1)
make sure that there is enough work in the common directory for all
the workers so that none are idle 2) Read and process the results.

The worker is very simple, the boss can get quite complex if you need
more than one worker to perform the same job to crosscheck the results
but even then it is not a biggie.