Ruby lacks atfork : The evil that lives in fork

Consider this simple usage of Thread and Process…

I use a mutex to block access to the $state variable when it is in
an “inconsistent” state.

======================================================================
require ‘thread’
require ‘pp’
Thread.abort_on_exception = true
$state = “Uninited”

def state(m)
print "#{caller(0)[1]}: #{Time.now} #{$state} "
if m.locked?
puts “Mutex locked”
else
puts “Mutex unlocked”
end
end

m = Mutex.new
state(m)
$state = “Good”
t = Thread.new do
begin
state(m)
m.synchronize do
$state = “Inconsistent”
state(m)
sleep 10
$state = “Good Again”
state(m)
end
ensure
state(m)
end
end

state(m)
sleep 2

state(m)

pid = Process.fork do
state(m)

sleep 2

state(m)

end

state(m)
pp Process.waitpid2(pid)

t.join

state(m)

This is what it outputs…

ruby -v;ruby -w evil_fork.rb
ruby 1.8.7 (2008-06-20 patchlevel 22) [i686-linux]
evil_fork.rb:16: Mon Oct 06 14:51:17 +1300 2008 Uninited Mutex unlocked
evil_fork.rb:20: Mon Oct 06 14:51:17 +1300 2008 Good Mutex unlocked
evil_fork.rb:23: Mon Oct 06 14:51:17 +1300 2008 Inconsistent Mutex
locked
evil_fork.rb:33: Mon Oct 06 14:51:17 +1300 2008 Inconsistent Mutex
locked
evil_fork.rb:36: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex
locked
evil_fork.rb:39: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex
unlocked
evil_fork.rb:46: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex
locked
evil_fork.rb:43: Mon Oct 06 14:51:21 +1300 2008 Inconsistent Mutex
unlocked
[5082, #<Process::Status: pid=5082,exited(0)>]
evil_fork.rb:26: Mon Oct 06 14:51:27 +1300 2008 Good Again Mutex locked
evil_fork.rb:29: Mon Oct 06 14:51:27 +1300 2008 Good Again Mutex
unlocked
evil_fork.rb:51: Mon Oct 06 14:51:27 +1300 2008 Good Again Mutex
unlocked

======================================================================

Oh dear!

When I Process.fork’ed I saw this…
evil_fork.rb:39: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex
unlocked

ie. I could be accessing $state when it is in an inconsistent state
and the Mutex doesn’t protect me.

From the fork man page…

    * The child process is created with a single thread — the one
      that called fork().  The entire virtual address space of
      the parent is replicated in the child, including the states
      of mutexes, condition variables, and other pthreads objects;
      the use of pthread_atfork(3) may be helpful for dealing with
      problems that this can cause.

Unfortunately Ruby doesn’t provide an atfork facility.

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

Hi,

In message “Re: Ruby lacks atfork : The evil that lives in fork…”
on Mon, 6 Oct 2008 11:11:26 +0900, John C.
[email protected] writes:

|Consider this simple usage of Thread and Process…
|
|I use a mutex to block access to the $state variable when it is in
|an “inconsistent” state.

|When I Process.fork’ed I saw this…
|evil_fork.rb:39: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex unlocked
|
|ie. I could be accessing $state when it is in an inconsistent state
|and the Mutex doesn’t protect me.

I am not sure what you meant here. It worked as I expected. You
didn’t wrap state(m) by synchronize, so that they are not mutually
exclusive. What did you expect out of the script?

          matz.

John C. wrote:

From the fork man page…

    * The child process is created with a single thread — the one
      that called fork().

That manpage is talking about OS threads, not Ruby threads.

If you want to know how Ruby handles its green threads through fork, you
need to refer to “ri Process::fork” instead.

According to ri, when (Ruby’s) fork is called only the currently-running
(Ruby) thread continues to live in the child process.

      The entire virtual address space of
      the parent is replicated in the child, including the states
      of mutexes, condition variables, and other pthreads objects;

This is talking about OS mutexes etc. Again, the Ruby objects with
corresponding names are entirely different.

On Tue, 7 Oct 2008, Brian C. wrote:

John C. wrote:

From the fork man page…

    * The child process is created with a single thread — the one
      that called fork().

That manpage is talking about OS threads, not Ruby threads.

Correct. But I had just demonstrated that the problem pthread_atfork
was designed to solve exists within ruby threads.

According to ri, when (Ruby’s) fork is called only the currently-running
(Ruby) thread continues to live in the child process.

Exactly the same as with pthreads and linux fork.

      The entire virtual address space of
      the parent is replicated in the child, including the states
      of mutexes, condition variables, and other pthreads objects;

This is talking about OS mutexes etc. Again, the Ruby objects with
corresponding names are entirely different.

That is neither here not there. The point is I have just shown that
problem described exists within in Ruby.

ie. If deep within a library routine there is are threads and mutexes
and deep within another library routine there is a Process.fork the
potential for “the wrong thing” to happen exists.

Where the wrong thing is that :- if a thread is in the critical
section protected by the mutex, it may leave it an inconsistent and
unusable state when the “fork” is executed by another thread.

If the child process ever invokes the library with the mutex, it may
find the Mutex unlocked, when it should be locked, and hence enter a
critical section, when it shouldn’t, and find an inconsistent state
which leads to an erroneous result.

The solution proposed by POSIX is to provide the facility for
libraries to chain handlers, to handle in some sensible fashion, any
fork event occurring in a different library.

If Ruby can come up with a better mechanism than atfork to handle this
problem, I would be very pleased.

But some solution is required to be able to have reusable multiple
libraries some of which use sub-processes, some which use threads.

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

Hi,

In message “Re: Ruby lacks atfork : The evil that lives in fork…”
on Mon, 6 Oct 2008 13:32:30 +0900, John C.
[email protected] writes:

|state(m) is merely reporting the value of $state and the whether the
|mutex was locked or not.
|
|For the time $state is “Inconsistent”, the mutex should be in a locked
|state. Which it is, when view by any other thread in the same
|process
.
|
|However, if you fork a process, the mutex in the child process is in
|the unlocked state whilst the resource is still in the inconsistent
|state.

When you fork off the process, the entire resources are (virtually)
copied, so that there’s no way to ensure (copied) mutex to share
locking status across processes. The basic rule is: don’t mix threads
(and thread related resources like mutex) with processes.

          matz.

On Mon, 6 Oct 2008, Yukihiro M. wrote:

|When I Process.fork’ed I saw this…
|evil_fork.rb:39: Mon Oct 06 14:51:19 +1300 2008 Inconsistent Mutex unlocked
|
|ie. I could be accessing $state when it is in an inconsistent state
|and the Mutex doesn’t protect me.

I am not sure what you meant here. It worked as I expected. You
didn’t wrap state(m) by synchronize, so that they are not mutually
exclusive. What did you expect out of the script?

state(m) is merely reporting the value of $state and the whether the
mutex was locked or not.

For the time $state is “Inconsistent”, the mutex should be in a locked
state. Which it is, when view by any other thread in the same
process
.

However, if you fork a process, the mutex in the child process is in
the unlocked state whilst the resource is still in the inconsistent
state.

The usual pattern for a lock/unlock pair is to be wrapped round some
access to a shared resource.

In this case the shared resource is $state.

Let us make that more explicit. Suppose we are transferring money from
one account to another…

require ‘thread’
Thread.abort_on_exception = true
STDOUT.sync = true

$account_a = 100
$account_b = 100
$total = $account_a + $account_b
$mutex = Mutex.new

def log(msg,level=1)
puts “\n#{caller(0)[level]}:#{Time.now} #{msg}”
end

def invariant_check
if $total == ($account_a + $account_b)
log( “We are in a consistent state”, 2)
else
log( “We are in an inconsistent state”, 2)
end
end

def transfer( sum)
log “At the start of transaction the invariant holds $account_a +
$account_b == 200”
invariant_check
$mutex.synchronize do
log “Got lock”
$account_a = $account_a - sum
log " For the next 10 seconds we have lost money from our system.
We are inconsistent."
sleep 10
$account_b = $account_b + sum
log “Ah! Their it is again. We’re consistent again.”
end
log “Invariant holds at end”
invariant_check
end

t1 = Thread.new do
log “Sleep 4 to ensure we wait for other”
sleep 4
log “Try get lock, can’t since t2 has it. #{$mutex.locked?}”
$mutex.synchronize do
log “Only unblocks after 12 seconds into the program”
invariant_check
log “Release lock”
end
log “t1 exits”
end

sleep 1

t2 = Thread.new do
log “t2 grabs lock immediately and holds for 10”
transfer(50)
log “t2 exits”
end

sleep 1

pid = Process.fork do
log “Forked process wakes and sleeps 5”
sleep 5
log “By now t2 has the lock, but will try get it anyway”
log( “Looky the lock is free”) if !$mutex.locked?
$mutex.synchronize do
log “What! it Unblocks immediately!”
log “Announces we’re inconsistent!”
invariant_check
log “Relinquish lock”
end
log “exit process”
end

log “Wait for process”
p Process.waitpid2 pid

log “Wait for t1”
t1.join

log “Wait for t2”
t2.join

Then the output is…
ruby -w fork.rb

fork.rb:40:Mon Oct 06 17:23:25 +1300 2008 Sleep 4 to ensure we wait for
other

fork.rb:54:Mon Oct 06 17:23:26 +1300 2008 t2 grabs lock immediately and
holds for 10

fork.rb:24:in `transfer’:Mon Oct 06 17:23:26 +1300 2008 At the start of
transaction the invariant holds $account_a + $account_b == 200

fork.rb:25:in `transfer’:Mon Oct 06 17:23:26 +1300 2008 We are in a
consistent state

fork.rb:27:in `transfer’:Mon Oct 06 17:23:26 +1300 2008 Got lock

fork.rb:29:in `transfer’:Mon Oct 06 17:23:26 +1300 2008 For the next 10
seconds we have lost money from our system. We are inconsistent.

fork.rb:62:Mon Oct 06 17:23:27 +1300 2008 Forked process wakes and
sleeps 5

fork.rb:75:Mon Oct 06 17:23:27 +1300 2008 Wait for process

fork.rb:42:Mon Oct 06 17:23:29 +1300 2008 Try get lock, can’t since t2
has it. true

fork.rb:64:Mon Oct 06 17:23:32 +1300 2008 By now t2 has the lock, but
will try get it anyway

fork.rb:65:Mon Oct 06 17:23:32 +1300 2008 Looky the lock is free

fork.rb:67:Mon Oct 06 17:23:32 +1300 2008 What! it Unblocks immediately!

fork.rb:68:Mon Oct 06 17:23:32 +1300 2008 Announces we’re inconsistent!

fork.rb:69:Mon Oct 06 17:23:32 +1300 2008 We are in an inconsistent
state

fork.rb:70:Mon Oct 06 17:23:32 +1300 2008 Relinquish lock

fork.rb:72:Mon Oct 06 17:23:32 +1300 2008 exit process
[15355, #<Process::Status: pid=15355,exited(0)>]

fork.rb:78:Mon Oct 06 17:23:32 +1300 2008 Wait for t1

fork.rb:32:in `transfer’:Mon Oct 06 17:23:36 +1300 2008 Ah! Their it is
again. We’re consistent again.

fork.rb:44:Mon Oct 06 17:23:36 +1300 2008 Only unblocks after 12 seconds
into the program
fork.rb:34:in `transfer’:Mon Oct 06 17:23:36 +1300 2008 Invariant holds
at end

fork.rb:45:Mon Oct 06 17:23:36 +1300 2008 We are in a consistent state
fork.rb:35:in `transfer’:Mon Oct 06 17:23:36 +1300 2008 We are in a
consistent state

fork.rb:46:Mon Oct 06 17:23:36 +1300 2008 Release lock
fork.rb:56:Mon Oct 06 17:23:36 +1300 2008 t2 exits

fork.rb:48:Mon Oct 06 17:23:36 +1300 2008 t1 exits

fork.rb:81:Mon Oct 06 17:23:36 +1300 2008 Wait for t2

======================================================================

Where the crucial lines are…
fork.rb:65:Mon Oct 06 17:23:32 +1300 2008 Looky the lock is free

fork.rb:67:Mon Oct 06 17:23:32 +1300 2008 What! it Unblocks immediately!

fork.rb:68:Mon Oct 06 17:23:32 +1300 2008 Announces we’re inconsistent!

fork.rb:69:Mon Oct 06 17:23:32 +1300 2008 We are in an inconsistent
state

fork.rb:70:Mon Oct 06 17:23:32 +1300 2008 Relinquish lock

The solution provided by POSIX is pthread_at_fork

    pthread_atfork - register handlers to be called at fork(2) time

SYNOPSIS
#include <pthread.h>

    int pthread_atfork(void (*prepare)(void), void (*parent)(void), 

void (*child)(void));

DESCRIPTION

    "pthread_atfork" registers handler functions to be called just
    before and just after a new process is created with
    "fork"(2). The 'prepare' handler will be called from the parent
    process, just before the new process is created. The 'parent'
    handler will be called from the parent process, just before
    "fork"(2) returns. The 'child' handler will be called from the
    child process, just before "fork"(2) returns.

    One or several of the three handlers 'prepare', 'parent' and
    'child' can be given as "NULL", meaning that no handler needs
    to be called at the corresponding point.

    "pthread_atfork" can be called several times to install several
    sets of handlers. At "fork"(2) time, the 'prepare' handlers are
    called in LIFO order (last added with "pthread_atfork", first
    called before "fork"), while the 'parent' and 'child' handlers
    are called in FIFO order (first added, first called).

    To understand the purpose of "pthread_atfork", recall that
    "fork"(2) duplicates the whole memory space, including mutexes
    in their current locking state, but only the calling thread:
    other threads are not running in the child process.  The
    mutexes are not usable after the "fork" and must be ini‐
    tialized with 'pthread_mutex_init' in the child process.  This
    is a limitation of the current imple‐ mentation and might or
    might not be present in future versions.

Which, in my example may grab the Mutex in the parent process for the
lifetime of the child, leaving it unlocked in the child process.

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

John C. wrote:

Ah, but that’s the point of pthread_atfork… it gives you several
possible strategies to allow you to mix threads with processes…

You can atfork in ruby; fsdb uses this kind of construct:

module ForkSafely
def fork
# clean up before forking
super do
# clean up after forking, in child
# clean up inconsistent mutexes, etc.
yield
end
# clean up after forking, in parent
end
end

include ForkSafely

On Tue, 7 Oct 2008, Yukihiro M. wrote:

When you fork off the process, the entire resources are (virtually)
copied, so that there’s no way to ensure (copied) mutex to share
locking status across processes. The basic rule is: don’t mix threads
(and thread related resources like mutex) with processes.

The problem with “don’t mix threads with processes” is unless you
inspect the source code of each version of each library in turn… it
is very hard to prove that nothing in your system is using a thread
and a process together.

Ah, but that’s the point of pthread_atfork… it gives you several
possible strategies to allow you to mix threads with processes…

  1. Give the resource to the child.

    Use the atfork handler to lock the mutex in the prepare handler
    (blocking if need be until you can obtain it) perform the fork,
    release the lock in the child handler. Thereafter the resource will
    be
    unobtainable in the parent until the child exits, and the child may
    continue to use the resource as need be.

  2. Give the resource to the parent.

    Use atfork to lock the resource in the prepare handler, ensure it is
    locked in the child handler, unlock in the parent
    handler. Thereafter the child process will find it is always
    locked, and the parent process will be able to access it.

  3. Lock both out for the duration.

  4. Mutate the mutex into a valid interprocess lock like flock or
    fcntl.

Sun solaris has a “fork_all” variant that creates copies of all
actives threads as well… but I’m not sure that 'fork_all" really
solves the problem instead of multiplying it.

What the open group has to say on the subject is informative…

http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

 There are at least two serious problems with the semantics of
 fork() in a multi-threaded program. One problem has to do with
 state (for example, memory) covered by mutexes. Consider the case
 where one thread has a mutex locked and the state covered by that
 mutex is inconsistent while another thread calls fork(). In the
 child, the mutex is in the locked state (locked by a nonexistent
 thread and thus can never be unlocked). Having the child simply
 reinitialize the mutex is unsatisfactory since this approach does
 not resolve the question about how to correct or otherwise deal
 with the inconsistent state in the child.

 It is suggested that programs that use fork() call an exec
 function very soon afterwards in the child process, thus resetting
 all states. In the meantime, only a short list of
 async-signal-safe library routines are promised to be available.

 Unfortunately, this solution does not address the needs of
 multi-threaded libraries. Application programs may not be aware
 that a multi-threaded library is in use, and they feel free to
 call any number of library routines between the fork() and exec
 calls, just as they always have. Indeed, they may be extant
 single-threaded programs and cannot, therefore, be expected to
 obey new restrictions imposed by the threads library.

 On the other hand, the multi-threaded library needs a way to
 protect its internal state during fork() in case it is re-entered
 later in the child process. The problem arises especially in
 multi-threaded I/O libraries, which are almost sure to be invoked
 between the fork() and exec calls to effect I/O redirection. The
 solution may require locking mutex variables during fork(), or it
 may entail simply resetting the state in the child after the
 fork() processing completes.

 The pthread_atfork() function provides multi-threaded libraries
 with a means to protect themselves from innocent application
 programs that call fork(), and it provides multi-threaded
 application programs with a standard mechanism for protecting
 themselves from fork() calls in a library routine or the
 application itself.

 The expected usage is that the prepare handler acquires all mutex
 locks and the other two fork handlers release them.

 For example, an application can supply a prepare routine that
 acquires the necessary mutexes the library maintains and supply
 child and parent routines that release those mutexes, thus
 ensuring that the child gets a consistent snapshot of the state of
 the library (and that no mutexes are left
 stranded). Alternatively, some libraries might be able to supply
 just a child routine that reinitializes the mutexes in the library
 and all associated states to some known value (for example, what
 it was when the image was originally executed).

 When fork() is called, only the calling thread is duplicated in
 the child process. Synchronization variables remain in the same
 state in the child as they were in the parent at the time fork()
 was called. Thus, for example, mutex locks may be held by threads
 that no longer exist in the child process, and any associated
 states may be inconsistent. The parent process may avoid this by
 explicit code that acquires and releases locks critical to the
 child via pthread_atfork(). In addition, any critical threads need
 to be recreated and reinitialized to the proper state in the child
 (also via pthread_atfork()).

 A higher-level package may acquire locks on its own data
 structures before invoking lower-level packages. Under this
 scenario, the order specified for fork handler calls allows a
 simple rule of initialization for avoiding package deadlock: a
 package initializes all packages on which it depends before it
 calls the pthread_atfork() function for itself.

Yes, I’m aware the author of that document was describe POSIX pthreads
not ruby threads. But clearly the same problems exist in both.

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

On Oct 6, 2008, at 20:08 PM, John C. wrote:

Hmm. Almost, but I can’t get rid of this warning…
a.rb:52: warning: redefine fork

any suggestions on how to get rid of that pesky warning?

use alias to copy it to a “backup” name before overriding.

alias fork_orig fork

Hi,

In message “Re: Ruby lacks atfork : The evil that lives in fork…”
on Tue, 7 Oct 2008 09:41:55 +0900, John C.
[email protected] writes:

|The problem with “don’t mix threads with processes” is unless you
|inspect the source code of each version of each library in turn… it
|is very hard to prove that nothing in your system is using a thread
|and a process together.

The point is not to touch thread related objects from the forked child
process. They won’t work as expected anyway. It’s not too hard to
do, I believe. You were touching them in your example.

          matz.

John C. wrote:

Can I chain the at fork handlers?

Chaining is possible. You can have multiple copies of the following code
(changing the Mutex-specific part of course):

class Mutex
module ForkSafely
def fork
super do
ObjectSpace.each_object(Mutex) { |m| m.remove_dead }
yield
end
end
end
end

module ForkSafely
include Mutex::ForkSafely
end
include ForkSafely

On Tue, 7 Oct 2008, Joel VanderWerf wrote:

clean up after forking, in parent

end
end

include ForkSafely

Hmm. Cute. Really cute.

Can I chain the at fork handlers?
Does it work for Process.fork as well?

Let me try…

module A

def fork
   puts "Prereal"
   pid = super do
      puts "In real"
      yield
   end
   puts "post real"
   pid
end

end
include A

pid = fork do
puts “Did it work?”
end

Process.waitpid2 pid

module B
def fork
puts “Prereal1”
pid = super do
puts “In real1”
yield
end
puts “post real1”
pid
end
end
include B

pid2 = fork do
puts “Does chained work?”
end
Process.waitpid2 pid2

puts “Yes nested did work, does Process.fork work too?”

pid3 = Process.fork do
puts “Does Process.fork work?”
end
Process.waitpid2 pid3

puts “No it didn’t”

module C

def Process.fork
   puts "b4 proc fork"
   super do
      puts "in proc fork"
      yield
   end
   puts "Post proc fork"
end

end

include C

pid2 = Process.fork do
puts “Bah”
end

ruby -w a.rb
Prereal
In real
Did it work?
post real
Prereal1
Prereal
In real
In real1
Does chained work?
post real
post real1
Yes nested did work, does Process.fork work too?
Does Process.fork work?
No it didn’t
a.rb:52: warning: redefine fork
b4 proc fork
Prereal1
Prereal
In real
In real1
in proc fork
Bah
post real
post real1
Post proc fork

Hmm. Almost, but I can’t get rid of this warning…
a.rb:52: warning: redefine fork

any suggestions on how to get rid of that pesky warning?

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand