ForkAndReturn 0.1.0

erikveen · July 13, 2008, 12:23pm

ForkAndReturn 0.1.0 is released.

RDoc: http://www.erikveen.dds.nl/forkandreturn/doc/index.html
Download: http://rubyforge.org/projects/forkandreturn/index.html

gegroet,
Erik V. - http://www.erikveen.dds.nl/

ForkAndReturn implements a couple of methods that simplifies
running a block of code in a subprocess. The result (Ruby
object or exception) of the block will be available in the
parent process.

The intermediate return value (or exception) will be
Marshal’led to disk. This means that it is possible to
(concurrently) run thousands of child process, with a relative
low memory footprint. Just gather the results once all child
process are done. ForkAndReturn will handle the writing,
reading and deleting of the temporary file.

The core of these methods is fork_and_return_core(). It returns
some nested lambdas, which are handled by the other methods and
by Enumerable#concurrent_collect(). These lambdas handle the
WAITing, LOADing and RESULTing (explained in
fork_and_return_core()).

The child process exits with Process.exit!(), so at_exit()
blocks are skipped in the child process. However, both $stdout
and $stderr will be flushed.

Only Marshal’lable Ruby objects can be returned.

ForkAndReturn uses Process.fork(), so it only runs on platforms
where Process.fork() is implemented.

Example:

[1, 2, 3, 4].collect do |object|
Thread.fork do
ForkAndReturn.fork_and_return do
2*object
end
end
end.collect do |thread|
thread.value
end # ===> [2, 4, 6, 8]

This runs each “2*object” in a seperate process. Hopefully, the
processes are spread over all available CPU’s. That’s a simple
way of parallel processing! Although
Enumerable#concurrent_collect() is even simpler:

[1, 2, 3, 4].concurrent_collect do |object|
2*object
end # ===> [2, 4, 6, 8]

Note that the code in the block is run in a seperate process,
so updating objects and variables in the block won’t affect the
parent process:

count = 0
[…].concurrent_collect do
count += 1
end
count # ==> 0

Enuemerable#concurrent_collect() is suitable for handling a
couple of very CPU intensive jobs, like parsing large XML files.

Enuemerable#clustered_concurrent_collect() is suitable for
handling a lot of not too CPU intensive jobs. The situations
where the overhead of forking is too expensive, but where you
still want to use all available CPU’s.

erikveen · July 13, 2008, 12:48pm

Erik V. wrote:

[1, 2, 3, 4].concurrent_collect do |object|
2*object
end # ===> [2, 4, 6, 8]

This looks great! Is there something special I should consider under
Windows?

Kind regards
Andreas

erikveen · July 13, 2008, 12:59pm

This looks great! Is there something special I should consider under Windows?

ForkAndReturn uses Process.fork(), so it only runs on platforms where
Process.fork() is implemented. Ruby for Windows hasn’t implemented
Process.fork(), so ForkAndReturn won’t work on pure Windows. However,
Ruby for Cygwin has implemented Process.fork().

gegroet,
Erik V. - http://www.erikveen.dds.nl/

erikveen · July 14, 2008, 12:22am

Erik V. wrote:

This looks great! Is there something special I should consider under Windows?

ForkAndReturn uses Process.fork(), so it only runs on platforms where
Process.fork() is implemented. Ruby for Windows hasn’t implemented
Process.fork(), so ForkAndReturn won’t work on pure Windows. However,
Ruby for Cygwin has implemented Process.fork().

Damn, this means it won’t run on jRuby either
I keep missing for on all kinds of cases…
V.-