Forum: Ruby process-group gem - concurrent processes with fibers.

77a7ebbce9a694eda5138af13f3a4805?d=identicon&s=25 Samuel Williams (Guest)
on 2014-03-11 08:18
(Received via mailing list)
Hi Everyone,

`Process::Group` is a class for coordinating and managing multiple
processes which execute concurrently in fibers.

In some of my testing scripts, multiple processes need to run. In the
past,
I've just done this sequentially. However, I've been modernising some
scripts and I've bundled up the code into this gem.

Previously:

Process.spawn("some-task")
process_and_email_results

Process.spawn("some-other-task --foobar")
process_and_email_results

Now I can run like this:

group = Process::Group.new

Fiber.new do
group.spawn("some-task")
process_and_email_results
end.resume

Fiber.new do
group.spawn("some-other-task --foobar")
process_and_email_results
end.resume

group.wait

Process::Group allows you to run the two tasks concurrently and in these
cases it was an easy option to modernise existing scripts. You can call
spawn multiple times in a fiber and it will work as expected. You can
also
kill the entire group of processes if you wish.

Examples, documentation and code:
https://github.com/ioquatix/process-group

Kind regards,
Samuel
0e6ac58dab6125c1cd2e7ac645076b6f?d=identicon&s=25 Joel VanderWerf (Guest)
on 2014-03-12 02:57
(Received via mailing list)
On 03/11/2014 12:17 AM, Samuel Williams wrote:
>
> Fiber.new do
>
> Process::Group allows you to run the two tasks concurrently and in these
> cases it was an easy option to modernise existing scripts. You can call
> spawn multiple times in a fiber and it will work as expected. You can
> also kill the entire group of processes if you wish.
>
> Examples, documentation and code: https://github.com/ioquatix/process-group
>
> Kind regards,
> Samuel

What's the advantage over using threads, the old school way?

Thread.new do
   system("some-task")
   process_and_email_results
end
A74a68807619459925cc1d8e1045c7bd?d=identicon&s=25 Tony Arcieri (Guest)
on 2014-03-12 03:01
(Received via mailing list)
On Tue, Mar 11, 2014 at 6:56 PM, Joel VanderWerf
<joelvanderwerf@gmail.com>wrote:

> What's the advantage over using threads, the old school way?
>

Or a system like http://celluloid.io which provides both threads and
fibers
and can integrate with things like I/O reactors...
77a7ebbce9a694eda5138af13f3a4805?d=identicon&s=25 Samuel Williams (Guest)
on 2014-03-12 14:23
(Received via mailing list)
Threads are good but I felt like I wanted something more predictable.
Also,
not all implementations of Ruby use green threads and therefore might
have
synchronisation issues if you use (either directly or indirectly through
a
gem/library) shared global state.
77a7ebbce9a694eda5138af13f3a4805?d=identicon&s=25 Samuel Williams (Guest)
on 2014-03-12 14:31
(Received via mailing list)
Celluloid looks pretty interesting - I've seen it pop up quite a few
times. A unix process group and a set of actors are two completely
different things (e.g. signal handling). I wanted something dead simple
and
specific to what I was trying to do. I've also got some use-cases for
which
celluloid feels too heavy.
0e6ac58dab6125c1cd2e7ac645076b6f?d=identicon&s=25 Joel VanderWerf (Guest)
on 2014-03-12 21:05
(Received via mailing list)
On 03/12/2014 06:22 AM, Samuel Williams wrote:
> Threads are good but I felt like I wanted something more predictable.
> Also, not all implementations of Ruby use green threads and therefore
> might have synchronisation issues if you use (either directly or
> indirectly through a gem/library) shared global state.

Even green threads have this danger, don't they?

Taking over manual scheduling seems a bit awkward compared to using some
kind of concurrency control (mutexes, queues, actors). What happens if
application code inside the fiber (process_and_email_results in the
example) makes a blocking IO call?

Manual scheduling with fibers is great for testing concurrent code which
would otherwise run in threads, because you can force a certain kind of
contention in a predicable way. I'm working on extracting a library for
doing this from a project where it's been a useful technique.
77a7ebbce9a694eda5138af13f3a4805?d=identicon&s=25 Samuel Williams (Guest)
on 2014-03-13 01:04
(Received via mailing list)
> Even green threads have this danger, don't they?

Yes, but in this context, I'm actually not sure I'd call the manual
scheduling a danger. While it could be referred to as explicit
scheduling,
I prefer to look at as providing a specific, well defined, non-blocking
API
with explicit synchronisation points.

(I think what I really like about fibers is they make it very easy to
compose concurrent code in a predictable way. For all intents and
purposes,
the code is still sequential with very little overhead.)

> Taking over manual scheduling seems a bit awkward compared to using some
kind of concurrency control (mutexes, queues, actors).

I would have said the opposite. Code using threads is typically very
hard
to reason about compared to sequential code (like the API I've
proposed).

Except in specific situations (e.g. game engines, data
processing/access,
algorithms/compression), I find threading causes more problems than it
solves (e.g.
http://www.linuxprogrammingblog.com/threads-and-fo...
). Even debugging code with threads can be a nightmare - why is there a
deadlock - why is there memory corruption - etc. The only situation
where
I've seen this working well in general is in languages/environments
designed from the ground up to support parallel processing (e.g.
haskell,
clojure, etc). Everything else seems like a hack that requires careful
analysis to verify correctness and the path to the dark side is always
just
one (poorly chosen) line of code away..

Anyway, basically, I really like fibers - if you want to run concurrent
unix processes, this gem is a good starting point.

Thanks for your thoughts and input.

Kind regards,
Samuel
0e6ac58dab6125c1cd2e7ac645076b6f?d=identicon&s=25 Joel VanderWerf (Guest)
on 2014-03-13 01:56
(Received via mailing list)
Still wondering how you handle blocking IO in fibers.

If all of the code inside the fiber is under your control, you can use
non-blocking operations, and Fiber.yield if the operation would block.
(See example below.)

But I get the impression you are dealing with various third-party libs
which might just open a socket and start talking? Couldn't that block
the fiber and therefore the whole thread?

This has always seemed to me to be the compelling feature of ruby's
threads: you just let the thread scheduler manage blocking.

For anyone else who's reading and hasn't played with fibers, here's what
you can do to avoid blocking the whole thread while one fiber waits for
input:

----

require 'socket'
require 'fiber'

s1, s2 = UNIXSocket.pair

f = Fiber.new do
   loop do
     begin
       puts "Fiber checking for available data"
       data = s1.read_nonblock(10)
       puts "Fiber received #{data.inspect}"
     rescue IO::WaitReadable
       puts "Fiber yielding"
       Fiber.yield
       puts "Fiber resuming"
       unless IO.select([s1], [], [], 0)
         puts "..even though no data is available"
       end
       retry
     rescue => ex
       puts ex
     end
   end
end

f.resume
f.resume

puts "writing to socket"
s2.write "123456"

f.resume
f.resume

puts "writing to socket"
s2.write "abcdef"

f.resume
f.resume
77a7ebbce9a694eda5138af13f3a4805?d=identicon&s=25 Samuel Williams (Guest)
on 2014-03-13 02:53
(Received via mailing list)
> Still wondering how you handle blocking IO in fibers.

That wasn't an important feature for the intended purpose of the gem,
therefore there is no explicit support for it at the moment.. that might
seem like a cop out but it is exactly what I wanted (minimal features,
specific use-case).

> But I get the impression you are dealing with various third-party libs
which might just open a socket and start talking? Couldn't that block
the
fiber and therefore the whole thread?

That is the same problem you'd have for any sequential code, whether it
is
running in a fiber or in an actor - calling something that blocks
indefinitely - but I think as a user you'd be aware of this. I'm not
proposing a solution to this problem, I think that's probably impossible
anyway.

> This has always seemed to me to be the compelling feature of ruby's
threads: you just let the thread scheduler manage blocking.

The thread scheduler may seem like a good idea in theory, but in
practice
event driven code that works with OS primitives (select, epoll, kevent)
is
generally more efficient. I think there are good arguments either way
(e.g.
SUN UltraSparc chips seemed to be designed for thread-based workloads,
running up to 64 threads in parallel, a bit like HyperThreading in x86),
but event driven systems generally seem easier to reason about, give
more
predictable behaviour, better defined resource usage, etc. Also, as
mentioned, while some implementations use green threads, not all
implementations are using green threads. That means that if you use
threads, you need to deal with reentrancy and contention issues - at
least
the same, if not more, complex than dealing with fibers (e.g. calling
fork
might break everything when using threads, as mentioned).

Thanks for the example code. I'm sure that can be done more efficiently
and
cleanly by having one function calling #select and resuming the correct
fiber.

Thanks for your ideas and feedback.

Kind regards,
Samuel
A74a68807619459925cc1d8e1045c7bd?d=identicon&s=25 Tony Arcieri (Guest)
on 2014-03-13 07:01
(Received via mailing list)
On Wed, Mar 12, 2014 at 5:56 PM, Joel VanderWerf
<joelvanderwerf@gmail.com>wrote:

> Still wondering how you handle blocking IO in fibers.


This is a genuine concern with this sort of library. For it to really be
useful, you need to be able to do things like I/O concurrently. In fact,
if
it can't do I/O, it's not particularly helpful, because Fibers are
useless
for CPU-bound tasks by default. I/O is one of the biggest use cases of
fibers.

If you're curious how Celluloid handles it, it provides a Celluloid::IO
companion library which has duck types of things like TCPSocket,
UDPSocket,
and UNIXSocket which interact with Celluloid's scheduling and can
suspend/resume fibers when they make "blocking" calls. I/O multiplexing
is
handled by a central reactor/event loop (provided by nio4r)
77a7ebbce9a694eda5138af13f3a4805?d=identicon&s=25 Samuel Williams (Guest)
on 2014-03-14 08:35
(Received via mailing list)
> This is a genuine concern with this sort of library. For it to really be
useful

This library is VERY useful for me in it's current form. If you want
concurrent I/O, yes, don't use this library. IF you just want to run
processes to completion concurrently, this library is perfect. I'm using
it
to retrofit existing sequential scripts and also in another project
similar
to make which doesn't care about IO, just running compilers/linkers,
etc.
18813f71506ebad74179bf8c5a136696?d=identicon&s=25 Eric Wong (Guest)
on 2014-03-14 08:59
(Received via mailing list)
You can avoid fibers/threads entirely, too.
Just a hash, lambdas, and waitpid2:

# tasks is a hash which maps pids to lambdas (callbacks):
tasks = {
  Process.spawn("some-task") => lambda do |status|
    process_and_email_results(status, "some task done!")
  end,
  Process.spawn("some-other-task --foobar") => lambda do |status|
    process_and_email_results(status, "some other task done!")
  end,
}

until tasks.empty?
  pid, status = Process.waitpid2(-1)
  if callback = tasks.delete(pid)
    callback.call(status)
  else
    warn "reaped unknown process: #{status.inspect}"
  end
end
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.