Forum: Ruby-core [Assigned] Ruby Process Deadlocks With Fork on Mac OS X Lion

Posted by kosaki (Motohiro KOSAKI) (Guest)
on 2013-02-08 05:44
(Received via mailing list)
Issue #5811 has been updated by kosaki (Motohiro KOSAKI).

Status changed from Rejected to Assigned
Assignee changed from mrkn (Kenta Murata) to akr (Akira Tanaka)

AFAIK, pipe and `command` are unsafe if multi thread is used. this 
issues was fixed at trunk (aka 2.0).

akr-san, do you have any commnet?
----------------------------------------
Bug #5811: Ruby Process Deadlocks With Fork on Mac OS X Lion
https://bugs.ruby-lang.org/issues/5811#change-36036

Author: netshade (Chris Zelenak)
Status: Assigned
Priority: Normal
Assignee: akr (Akira Tanaka)
Category:
Target version:
ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11.2.0]


=begin
Given a Ruby process that acts like the following:

* Spawn new thread that initializes a TCPSocket
* Execute script using backticks in main thread

there is a chance that it will deadlock on Lion.  The GDB traces for the 
threads show:

* The TCP connecting thread stuck on 
native_cond_wait/thread_pthread.c:321 by way of 
rsock_getaddrinfo/raddrinfo.c:359
* The main thread stuck on read() by way of rb_f_backquote/io.c:7266

Meanwhile, in the forked process from rb_f_backquote:

* The main thread is stuck at (longer trace):
 #0  0x00007fff9160c6b6 in semaphore_wait_trap ()
 #1  0x00007fff8fc03bc2 in _dispatch_thread_semaphore_wait ()
 #2  0x00007fff8fc04286 in dispatch_once_f ()
 #3  0x00007fff95e12f20 in si_module_static_search ()
 #4  0x00007fff95e16a3d in si_module_with_name ()
 #5  0x00007fff95e0eac8 in getpwuid ()
 #6  0x00007fff90daa842 in getgroups$DARWIN_EXTSN ()
 #7  0x000000010b82b020 in rb_group_member (gid=0) at file.c:1002
 #8  0x000000010b82b10f in eaccess (path=0x7fff6b3d3570 "/bin/hostname", 
mode=1) at file.c:1052
 ...

The documentation for getpwuid in Mac OS X Lion states that getpwuid now 
is threadsafe, much like getpwuid_r - however, the values returned by 
getpwuid are thread local and disposed automatically, as opposed to 
getpwuid_r's allocation of results.  The disassembly of 
semaphore_wait_trap and __psynch_cvwait  both show syscalls being made 
(I don't know how to go much further here), but the arguments are all 
void to these functions too when snooping in GDB.  I believe that the 
posix wait and semaphore_wait taking place are in fact making syscalls 
to wait on a condition variable of the same value - this value is the 
same due to the shared memory state of the fork.

When an artificial delay ("sleep 1") is introduced after the creation of 
the TCP connect thread, this deadlock no longer occurs.

Attached is a test script that uses the Instrumental Agent gem for the 
TCP connect and can reliably cause the deadlock under 1.9.3.
=end
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.