Forum: Ruby-core Ruby 1.8 fastthread library can segfault in child process after forking

Posted by Brent Roman (brentr)
on 2012-12-02 04:06
Someone on github sent me this interesting one-liner that often causes
Ruby 1.8.7 with the MBARI patchset to either abort with an inappropriate
error message or segfault:

ruby  -rthread -e '
  q = Queue.new; Thread.new { q.pop }; pid = fork;
  if pid.nil?; q = nil; GC.start; else; Process.wait(pid); end'

I tracked down the problem to the fastthread library's policy of killing
any threads waiting on a queue when it is finalized.  In the test case
above, the Thread.new {q.pop} is destroyed immediately in the child
process.  The subsequent GC marks the thread for finalization, setting
its object type to T_DEFFERRED.  When the same GC pass finalizes the
Queue, its finalizer tries to kill those threads again, but, because
they have already been destroy and are marked for finalization
themselves, rb_thead_check fails on them (as it should).

In this simple case, you'll see:

-e:1:in `start': wrong argument type Object (expected Thread)
(TypeError)

However, I'm told that more complex cases will segfault.
(which I find quite believable)

When both a Queue and one or more threads waiting on it fail to be
marked on the same gc marking pass they are finalized and discarded in
no particular order.

Note that, if I compile my ruby with optimization turned off, the
failure does not occur.  It also does not occur on unpatched MRI.
In both these cases, the 'C' stack will retain references to either the
thread or the queue that prevent GC of both on the same pass.

My first patch for fixing this was to add, in kill_waiting_threads(), a
test to ensure each thread's object type == T_DATA before calling
rb_thread_kill().  However, on further reflection, I question whether
there is any point in trying to gracefully kill such threads.

My reasoning for this is:

Any thread waiting on an object must hold a reference to that object.
That reference should prevent the synchronization object (Queue, Mutex,
etc.) itself from being garbage collected.  So, if a synchronization
object is garbage collected, this implies that any threads on its
waiting lists must already be dead.  (i.e. their stack has already been
freed)

If no one sees any holes on this logic,
I'd propose eliminating the kill_waiting_threads() calls in thread.c

- brent
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.