Forum: Ruby-core [Ruby 1.9-Bug#4009][Open] Segfault with combination of threads and condition variables

C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Max Aller (Guest)
on 2010-10-31 18:44
(Received via mailing list)
Attachment: proof_of_segfault.rb (955 Bytes)
Attachment: proof_of_segfault_output (10 KB)
Bug #4009: Segfault with combination of threads and condition variables
http://redmine.ruby-lang.org/issues/show/4009

Author: Max Aller
Status: Open, Priority: Normal
ruby -v: ruby 1.9.3dev (2010-11-01 trunk 29655) [i686-linux]

When running the attached program, I get a segfault.  When changing some
of the values inside what I've designated as the "delayed adder" thread
(namely, the number of jobs that get added, or the duration that the
sleep occurs for), I get "fatal: deadlock detected" -- which is fine.
But with the given settings on my laptop, it segfaults routinely.

I've already trimmed down the example as much as I could, but I realize
it's a bit long, still.  It's apparently important that the "delayed
adder" thread exists at all, potentially "putting off" the deadlock
detection somehow.
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Max Aller (Guest)
on 2010-10-31 18:48
(Received via mailing list)
Issue #4009 has been updated by Max Aller.


Note: this also happens with ruby-1.9.2-p0.  The provided program
doesn't terminate at all using ruby-1.8.7-p302.
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Yui NARUSE (Guest)
on 2010-11-16 11:23
(Received via mailing list)
Issue #4009 has been updated by Yui NARUSE.


I can't reproduce this.
Can you show gdb backtrace?

ruby 1.9.3dev (2010-11-09 trunk 29733) [i686-linux]
1277:12: warning: assigned but unused variable - job
/home/naruse/local/ruby/lib/ruby/1.9.1/thread.rb:71:in `sleep': deadlock
detected (fatal)
        from /home/naruse/local/ruby/lib/ruby/1.9.1/thread.rb:71:in
`wait'
        from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:100:in
`wait'
        from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:121:in
`wait_until'
        from 1277:49:in `block in <main>'
        from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:201:in
`mon_synchronize'
        from 1277:48:in `<main>'
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Tomoyuki Chikanaga (Guest)
on 2010-11-16 11:54
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


Hi
I can reproduce similar SEGV under gdb with "ruby 1.9.3dev (2010-11-16
trunk 29789) [i686-linux]"

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1222669392 (LWP 14762)]
0x0813df46 in vm_call0 (th=0x9384238, recv=153972280, id=2512, argc=1,
    argv=0xb71f8948, me=0x92e8a68) at vm_eval.c:75
75                      vm_push_frame(th, 0, VM_FRAME_MAGIC_CFUNC,
(gdb) where
#0  0x0813df46 in vm_call0 (th=0x9384238, recv=153972280, id=2512,
argc=1,
    argv=0xb71f8948, me=0x92e8a68) at vm_eval.c:75
#1  0x0813e711 in check_funcall (recv=153972280, mid=2512, argc=1,
    argv=0xb71f8948) at vm_eval.c:290
#2  0x0813e73e in rb_check_funcall (recv=153972280, mid=2512, argc=1,
    argv=0xb71f8948) at vm_eval.c:296
#3  0x08059e05 in make_exception (argc=2, argv=0xb71f8944, isstr=1)
    at eval.c:552
#4  0x08059ebb in rb_make_exception (argc=2, argv=0xb71f8944) at
eval.c:574
#5  0x081484e8 in rb_threadptr_raise (th=0x92a7568, argc=2,
argv=0xb71f8944)
    at thread.c:1350
#6  0x0814c31a in rb_check_deadlock (vm=0x92a72e0) at thread.c:4465
#7  0x08147286 in thread_start_func_2 (th=0x9384238,
stack_start=0xb71f8a78)
    at thread.c:516
#8  0x08146356 in thread_start_func_1 (th_ptr=0x9384238)
    at thread_pthread.c:361
#9  0x00666dd8 in start_thread () from /lib/tls/libpthread.so.0
#10 0x00f83d1a in clone () from /lib/tls/libc.so.6

th->cfp seemss pointed invalid address

(gdb) p reg_cfp->sp
Cannot access memory at address 0xb7278fe0
(gdb) p reg_cfp
$9 = (rb_control_frame_t *) 0xb7278fdc
(gdb) p *reg_cfp
Cannot access memory at address 0xb7278fdc
(gdb) p th->cfp
$10 = (rb_control_frame_t *) 0xb7278fdc
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Tomoyuki Chikanaga (Guest)
on 2010-11-17 03:34
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


I think the following patch fixes this situation.
Max, could you try this patch?

BTW I'm not too confident in this patch. Please review it.

Index: thread.c
===================================================================
--- thread.c    (revision 29809)
+++ thread.c    (working copy)
@@ -507,13 +507,14 @@
            join_th = join_th->join_list_next;
        }

+       thread_unlock_all_locking_mutexes(th);
+       if (th != main_th) rb_check_deadlock(th->vm);
+
        if (!th->root_fiber) {
            rb_thread_recycle_stack_release(th->stack);
            th->stack = 0;
        }
     }
-    thread_unlock_all_locking_mutexes(th);
-    if (th != main_th) rb_check_deadlock(th->vm);
     if (th->vm->main_thread == th) {
        ruby_cleanup(state);
     }
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Tomoyuki Chikanaga (Guest)
on 2010-12-24 01:14
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


Hi,
I can still reproduce this segv on trunk(r30329) and also on
1.9.2-HEAD(r30326).
Please check the previous patch, thanks.
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Max Aller (Guest)
on 2011-01-18 04:10
(Received via mailing list)
Issue #4009 has been updated by Max Aller.


Hit this on Ruby 1.9.3dev (2011-01-18 trunk 30590) [i686-linux] again.
Tried applying the above patch (had to improvise regarding line numbers
a little) but didn't seem to have any effect.  The first two times I ran
my script again it segfaulted as described in the original ticket, but
the third time it hung with "*** glibc detected *** ruby: corrupted
double-linked list: 0x0973f110 ***" immediately after the "C level
backtrace information" header and I had to kill -9 it.
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Max Aller (Guest)
on 2011-01-18 05:11
(Received via mailing list)
Attachment: proof_of_segfault_small.rb (427 Bytes)
Issue #4009 has been updated by Max Aller.

File proof_of_segfault_small.rb added

Good news, I have managed to greatly reduce the failing code, which will
hopefully make it easier to figure out.  It's attached.
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Tomoyuki Chikanaga (Guest)
on 2011-01-19 14:38
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


Hi,
The reduced sample code saves time to examine. Thank you :)
I also can reproduce segv in my Linux environment with ruby 1.9.3dev
(2011-01-18 trunk 30590) [i686-linux], for both 'proof_og_segfault.rb'
and 'proof_of_segfault_small.rb'.
But my previous patch seems effective in my environment.
Hmm, there may be another potential problems. I'll check with valgrind
later.
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Tomoyuki Chikanaga (Guest)
on 2011-01-20 16:05
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


Hi,
I've checked again with valgrind and get no extra problem report.
Sorry I can't hel

I have noticed that according to 'proof_of_segfault_output', your ruby
should be build with --enable-shared configuration.
And if you retry with my patch like below, it could be dynamically
linked with installed ~/.rvm/rubies/ruby-head/lib/libruby.so.1.9

(in building directory. after apply patch)
% make
% ./ruby proof_of_segfault.rb

If so, how about command like like below?

% LD_LIBRARY_PATH=<building_directory> ./ruby proof_of_segfault.rb
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Max Aller (Guest)
on 2011-01-23 19:33
(Received via mailing list)
Issue #4009 has been updated by Max Aller.


Tomoyuki, I suspect you were right regarding the linking situation -- so
I applied your patch to my .rvm/repos/ruby-head path and ran rvm
--static install ruby-head (which, interestingly, does not perform a
`git pull`, so I did that manually; it just copies from repos/ to src/
and configures/builds/installs), and...presto, no more segfault!  Get
the deadlock detected error instead, which I think is the desired
behavior.  I even raised/lowered the worker count, still deadlocks.

To be thorough, I also tried running my sample with stock ruby-head, and
it did still have the bug, so I think your patch is, at the very least,
a functional solution.

Nice work.
F24ff61beb80aa5f13371aa22a35619c?d=identicon&s=25 Yusuke ENDOH (Guest)
on 2011-01-26 17:26
(Received via mailing list)
Hi,

2010/11/17 Tomoyuki Chikanaga <redmine@ruby-lang.org>:
> +    thread_unlock_all_locking_mutexes(th);
>    ruby_cleanup(state);
>   }


Looks good.  Nice work!
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Tomoyuki Chikanaga (Guest)
on 2011-01-27 03:20
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


Hi,

Max san, thank you for checking my patch again. I'm relieved to hear
that it works fine.

Endoh san, thank you for reviewing.
I'll check in the patch later if there is no opposition.
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Tomoyuki Chikanaga (Guest)
on 2011-01-31 13:47
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.

Status changed from Open to Closed
% Done changed from 0 to 100

This issue was solved with changeset r30743.
Max, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

----
* thread.c (thread_start_func_2): check deadlock condition before
    release thread stack. fix memory violation when deadlock detected.
    reported by Max Aller. [Bug #4009] [ruby-core:32982]
C4e88907313843cf07f6d85ba8162120?d=identicon&s=25 Tomoyuki Chikanaga (Guest)
on 2011-01-31 14:04
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.

Status changed from Closed to Assigned
Assigned to set to Yuki Sonoda

Please backport r30743 to 1.9.2
This topic is locked and can not be replied to.