Forum: Ruby-core [Ruby 1.9-Bug#4009][Open] Segfault with combination of threads and condition variables

Posted by Max Aller (Guest)
on 2010-10-31 18:44
Attachment: proof_of_segfault.rb (955 Bytes)
Attachment: proof_of_segfault_output (12,9 KB)
(Received via mailing list)
Bug #4009: Segfault with combination of threads and condition variables
http://redmine.ruby-lang.org/issues/show/4009

Author: Max Aller
Status: Open, Priority: Normal
ruby -v: ruby 1.9.3dev (2010-11-01 trunk 29655) [i686-linux]

When running the attached program, I get a segfault.  When changing some 
of the values inside what I've designated as the "delayed adder" thread 
(namely, the number of jobs that get added, or the duration that the 
sleep occurs for), I get "fatal: deadlock detected" -- which is fine. 
But with the given settings on my laptop, it segfaults routinely.

I've already trimmed down the example as much as I could, but I realize 
it's a bit long, still.  It's apparently important that the "delayed 
adder" thread exists at all, potentially "putting off" the deadlock 
detection somehow.
Posted by Max Aller (Guest)
on 2010-10-31 18:48
(Received via mailing list)
Issue #4009 has been updated by Max Aller.


Note: this also happens with ruby-1.9.2-p0.  The provided program 
doesn't terminate at all using ruby-1.8.7-p302.
Posted by Yui NARUSE (Guest)
on 2010-11-16 11:23
(Received via mailing list)
Issue #4009 has been updated by Yui NARUSE.


I can't reproduce this.
Can you show gdb backtrace?

ruby 1.9.3dev (2010-11-09 trunk 29733) [i686-linux]
1277:12: warning: assigned but unused variable - job
/home/naruse/local/ruby/lib/ruby/1.9.1/thread.rb:71:in `sleep': deadlock 
detected (fatal)
        from /home/naruse/local/ruby/lib/ruby/1.9.1/thread.rb:71:in 
`wait'
        from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:100:in 
`wait'
        from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:121:in 
`wait_until'
        from 1277:49:in `block in <main>'
        from /home/naruse/local/ruby/lib/ruby/1.9.1/monitor.rb:201:in 
`mon_synchronize'
        from 1277:48:in `<main>'
Posted by Tomoyuki Chikanaga (Guest)
on 2010-11-16 11:54
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


Hi
I can reproduce similar SEGV under gdb with "ruby 1.9.3dev (2010-11-16 
trunk 29789) [i686-linux]"

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1222669392 (LWP 14762)]
0x0813df46 in vm_call0 (th=0x9384238, recv=153972280, id=2512, argc=1,
    argv=0xb71f8948, me=0x92e8a68) at vm_eval.c:75
75                      vm_push_frame(th, 0, VM_FRAME_MAGIC_CFUNC,
(gdb) where
#0  0x0813df46 in vm_call0 (th=0x9384238, recv=153972280, id=2512, 
argc=1,
    argv=0xb71f8948, me=0x92e8a68) at vm_eval.c:75
#1  0x0813e711 in check_funcall (recv=153972280, mid=2512, argc=1,
    argv=0xb71f8948) at vm_eval.c:290
#2  0x0813e73e in rb_check_funcall (recv=153972280, mid=2512, argc=1,
    argv=0xb71f8948) at vm_eval.c:296
#3  0x08059e05 in make_exception (argc=2, argv=0xb71f8944, isstr=1)
    at eval.c:552
#4  0x08059ebb in rb_make_exception (argc=2, argv=0xb71f8944) at 
eval.c:574
#5  0x081484e8 in rb_threadptr_raise (th=0x92a7568, argc=2, 
argv=0xb71f8944)
    at thread.c:1350
#6  0x0814c31a in rb_check_deadlock (vm=0x92a72e0) at thread.c:4465
#7  0x08147286 in thread_start_func_2 (th=0x9384238, 
stack_start=0xb71f8a78)
    at thread.c:516
#8  0x08146356 in thread_start_func_1 (th_ptr=0x9384238)
    at thread_pthread.c:361
#9  0x00666dd8 in start_thread () from /lib/tls/libpthread.so.0
#10 0x00f83d1a in clone () from /lib/tls/libc.so.6

th->cfp seemss pointed invalid address

(gdb) p reg_cfp->sp
Cannot access memory at address 0xb7278fe0
(gdb) p reg_cfp
$9 = (rb_control_frame_t *) 0xb7278fdc
(gdb) p *reg_cfp
Cannot access memory at address 0xb7278fdc
(gdb) p th->cfp
$10 = (rb_control_frame_t *) 0xb7278fdc
Posted by Tomoyuki Chikanaga (Guest)
on 2010-11-17 03:34
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


I think the following patch fixes this situation.
Max, could you try this patch?

BTW I'm not too confident in this patch. Please review it.

Index: thread.c
===================================================================
--- thread.c    (revision 29809)
+++ thread.c    (working copy)
@@ -507,13 +507,14 @@
            join_th = join_th->join_list_next;
        }

+       thread_unlock_all_locking_mutexes(th);
+       if (th != main_th) rb_check_deadlock(th->vm);
+
        if (!th->root_fiber) {
            rb_thread_recycle_stack_release(th->stack);
            th->stack = 0;
        }
     }
-    thread_unlock_all_locking_mutexes(th);
-    if (th != main_th) rb_check_deadlock(th->vm);
     if (th->vm->main_thread == th) {
        ruby_cleanup(state);
     }
Posted by Tomoyuki Chikanaga (Guest)
on 2010-12-24 01:14
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


Hi,
I can still reproduce this segv on trunk(r30329) and also on 
1.9.2-HEAD(r30326).
Please check the previous patch, thanks.
Posted by Max Aller (Guest)
on 2011-01-18 04:10
(Received via mailing list)
Issue #4009 has been updated by Max Aller.


Hit this on Ruby 1.9.3dev (2011-01-18 trunk 30590) [i686-linux] again. 
Tried applying the above patch (had to improvise regarding line numbers 
a little) but didn't seem to have any effect.  The first two times I ran 
my script again it segfaulted as described in the original ticket, but 
the third time it hung with "*** glibc detected *** ruby: corrupted 
double-linked list: 0x0973f110 ***" immediately after the "C level 
backtrace information" header and I had to kill -9 it.
Posted by Max Aller (Guest)
on 2011-01-18 05:11
Attachment: proof_of_segfault_small.rb (427 Bytes)
(Received via mailing list)
Issue #4009 has been updated by Max Aller.

File proof_of_segfault_small.rb added

Good news, I have managed to greatly reduce the failing code, which will 
hopefully make it easier to figure out.  It's attached.
Posted by Tomoyuki Chikanaga (Guest)
on 2011-01-19 14:38
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


Hi,
The reduced sample code saves time to examine. Thank you :)
I also can reproduce segv in my Linux environment with ruby 1.9.3dev 
(2011-01-18 trunk 30590) [i686-linux], for both 'proof_og_segfault.rb' 
and 'proof_of_segfault_small.rb'.
But my previous patch seems effective in my environment.
Hmm, there may be another potential problems. I'll check with valgrind 
later.
Posted by Tomoyuki Chikanaga (Guest)
on 2011-01-20 16:05
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


Hi,
I've checked again with valgrind and get no extra problem report.
Sorry I can't hel

I have noticed that according to 'proof_of_segfault_output', your ruby 
should be build with --enable-shared configuration.
And if you retry with my patch like below, it could be dynamically 
linked with installed ~/.rvm/rubies/ruby-head/lib/libruby.so.1.9

(in building directory. after apply patch)
% make
% ./ruby proof_of_segfault.rb

If so, how about command like like below?

% LD_LIBRARY_PATH=<building_directory> ./ruby proof_of_segfault.rb
Posted by Max Aller (Guest)
on 2011-01-23 19:33
(Received via mailing list)
Issue #4009 has been updated by Max Aller.


Tomoyuki, I suspect you were right regarding the linking situation -- so 
I applied your patch to my .rvm/repos/ruby-head path and ran rvm 
--static install ruby-head (which, interestingly, does not perform a 
`git pull`, so I did that manually; it just copies from repos/ to src/ 
and configures/builds/installs), and...presto, no more segfault!  Get 
the deadlock detected error instead, which I think is the desired 
behavior.  I even raised/lowered the worker count, still deadlocks.

To be thorough, I also tried running my sample with stock ruby-head, and 
it did still have the bug, so I think your patch is, at the very least, 
a functional solution.

Nice work.
Posted by Yusuke ENDOH (Guest)
on 2011-01-26 17:26
(Received via mailing list)
Hi,

2010/11/17 Tomoyuki Chikanaga <redmine@ruby-lang.org>:
> +    thread_unlock_all_locking_mutexes(th);
>    ruby_cleanup(state);
>   }


Looks good.  Nice work!
Posted by Tomoyuki Chikanaga (Guest)
on 2011-01-27 03:20
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.


Hi,

Max san, thank you for checking my patch again. I'm relieved to hear 
that it works fine.

Endoh san, thank you for reviewing.
I'll check in the patch later if there is no opposition.
Posted by Tomoyuki Chikanaga (Guest)
on 2011-01-31 13:47
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.

Status changed from Open to Closed
% Done changed from 0 to 100

This issue was solved with changeset r30743.
Max, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.

----
* thread.c (thread_start_func_2): check deadlock condition before
    release thread stack. fix memory violation when deadlock detected.
    reported by Max Aller. [Bug #4009] [ruby-core:32982]
Posted by Tomoyuki Chikanaga (Guest)
on 2011-01-31 14:04
(Received via mailing list)
Issue #4009 has been updated by Tomoyuki Chikanaga.

Status changed from Closed to Assigned
Assigned to set to Yuki Sonoda

Please backport r30743 to 1.9.2
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.