Forum: Ruby-core [Closed] Ruby Process Deadlocks With Fork on Mac OS X Lion

02da662c083396641da96c1d32fc86ed?d=identicon&s=25 kosaki (Motohiro KOSAKI) (Guest)
on 2013-02-12 03:21
(Received via mailing list)
Issue #5811 has been updated by kosaki (Motohiro KOSAKI).

Status changed from Assigned to Closed

OK. then I will close this ticket.

Please reopen this if anyone hit the same issue on 2.0 or trunk.
Bug #5811: Ruby Process Deadlocks With Fork on Mac OS X Lion

Author: netshade (Chris Zelenak)
Status: Closed
Priority: Normal
Assignee: akr (Akira Tanaka)
Target version:
ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11.2.0]

Given a Ruby process that acts like the following:

* Spawn new thread that initializes a TCPSocket
* Execute script using backticks in main thread

there is a chance that it will deadlock on Lion.  The GDB traces for the
threads show:

* The TCP connecting thread stuck on
native_cond_wait/thread_pthread.c:321 by way of
* The main thread stuck on read() by way of rb_f_backquote/io.c:7266

Meanwhile, in the forked process from rb_f_backquote:

* The main thread is stuck at (longer trace):
 #0  0x00007fff9160c6b6 in semaphore_wait_trap ()
 #1  0x00007fff8fc03bc2 in _dispatch_thread_semaphore_wait ()
 #2  0x00007fff8fc04286 in dispatch_once_f ()
 #3  0x00007fff95e12f20 in si_module_static_search ()
 #4  0x00007fff95e16a3d in si_module_with_name ()
 #5  0x00007fff95e0eac8 in getpwuid ()
 #6  0x00007fff90daa842 in getgroups$DARWIN_EXTSN ()
 #7  0x000000010b82b020 in rb_group_member (gid=0) at file.c:1002
 #8  0x000000010b82b10f in eaccess (path=0x7fff6b3d3570 "/bin/hostname",
mode=1) at file.c:1052

The documentation for getpwuid in Mac OS X Lion states that getpwuid now
is threadsafe, much like getpwuid_r - however, the values returned by
getpwuid are thread local and disposed automatically, as opposed to
getpwuid_r's allocation of results.  The disassembly of
semaphore_wait_trap and __psynch_cvwait  both show syscalls being made
(I don't know how to go much further here), but the arguments are all
void to these functions too when snooping in GDB.  I believe that the
posix wait and semaphore_wait taking place are in fact making syscalls
to wait on a condition variable of the same value - this value is the
same due to the shared memory state of the fork.

When an artificial delay ("sleep 1") is introduced after the creation of
the TCP connect thread, this deadlock no longer occurs.

Attached is a test script that uses the Instrumental Agent gem for the
TCP connect and can reliably cause the deadlock under 1.9.3.
This topic is locked and can not be replied to.