Forum: Ruby-core [Bug #2739] ruby 1.8.7 built with pthreads hangs under some circumstances

Posted by Joel Ebel (Guest)
on 2010-02-11 22:54
(Received via mailing list)
Bug #2739: ruby 1.8.7 built with pthreads hangs under some circumstances
http://redmine.ruby-lang.org/issues/show/2739

Author: Joel Ebel
Status: Open, Priority: Normal
ruby -v: ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]

Ruby 1.8.7 built with pthreads is hanging for me.  I can't produce a 
reproducible testcase, and the problem is intermittent for me as it is, 
but I have traced it back to a particular patch when it began.  The hang 
happens on an exec, where the ruby process clones itself, and the clone 
hangs.  If I build ruby without pthreads it works fine.  Specifically 
this is happening on a run of puppet when it is loading facts.

Going back through versions of the 1.8.7 branch, it appears the problem 
began happening in patchlevel 183 (svn revision 24104)
If I try the 1.8 branch, problems begin happening with svn revision 
23268 and become more like the current behavior with revision 23305, 
both of which were merged into 1.8.7 in patchlevel 183 (r 24104)

If I try newer versions of the 1.8 branch, I find that the syntax has 
changed, however, it's possible that the specific problem i'm 
experiencing is fixed in r 24400 and/or 24402 (24400 doesn't build for 
me, so I can't be sure which revision is responsible for the improved 
behavior.

I will continue trying to create a reproducible test case for this bug, 
but I hoped that narrowing down where the regression begins would be a 
helpful place to start.
Posted by Lucas Nussbaum (Guest)
on 2010-03-01 23:32
(Received via mailing list)
Issue #2739 has been updated by Lucas Nussbaum.


After more investigation (see 
https://bugs.launchpad.net/ubuntu/+source/ruby1.8/+bug/520715 for the 
details), here are some conclusions.

Using this test case:
<------------------
#!/usr/bin/ruby1.8

%x{/usr/bin/touch /tmp/7777}
puts "executed without timeout ok"
puts "executing with timeout"
require 'timeout'
status = Timeout::timeout(5) {
%x{/usr/bin/touch /tmp/7777}
}
puts "executed with timeout ok"
--------------------------->

The above test case:
- runs fine on Debian unstable (using GLIBC 2.10)
- hangs on Debian unstable using the GLIBC packages from Debian 
experimental, version 2.11.0
- hangs on Ubuntu Lucid (which GLIBC 2.11.0)
Both Debian unstable and Ubuntu lucid use Ruby 1.8.7 (2010-01-10 
patchlevel 249)

By "hangs", I mean:
$ while ruby1.8 te.rb ; do true; done
executed without timeout ok
executing with timeout
executed with timeout ok
executed without timeout ok
executing with timeout
/usr/lib/ruby/1.8/timeout.rb:60: execution expired (Timeout::Error)
  from te.rb:11

It is not clear whether this is a GLIBC or a Ruby issue. However, it 
would be fantastic if a Ruby developer with insight in the Ruby 
threading code could take a look.
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2739
Posted by Alex Legler (Guest)
on 2010-03-06 09:28
(Received via mailing list)
Issue #2739 has been updated by Alex Legler.


If it's any help: I can confirm this issue on Gentoo as well. After 
10-200 iterations, the timeout occurs.
glibc 2.11, ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]

On a machine with glibc 2.9 on the other hand, I can run the reproducer 
for minutes w/o any failure.
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2739
Posted by Motohiro KOSAKI (Guest)
on 2010-03-06 12:12
(Received via mailing list)
Issue #2739 has been updated by Motohiro KOSAKI.


Hi

I'm very glad to your help. if you can reproduce this issue easily, can 
you
please get stacktrace-info and give it us? I think we can use pstack 
command.

% pstack [pid-of-ruby]

Thanks.
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2739
Posted by Lucas Nussbaum (Guest)
on 2010-03-06 18:26
(Received via mailing list)
Issue #2739 has been updated by Lucas Nussbaum.


Ruby is compiled with pthreads enabled on Ubuntu (and Debian), so there 
are several PIDs of interest here.

Backtraces for the parent PID:
#0  0x00007f508e929c73 in select () from /lib/libc.so.6
#1  0x00007f508f6f2893 in rb_thread_schedule ()
   from /usr/lib/libruby1.8.so.1.8
#2  0x00007f508f709a3c in ?? () from /usr/lib/libruby1.8.so.1.8
#3  0x00007f508f70e6d3 in ?? () from /usr/lib/libruby1.8.so.1.8
#4  0x00007f508f6ef6c1 in ?? () from /usr/lib/libruby1.8.so.1.8
#5  0x00007f508f6ef8b3 in ?? () from /usr/lib/libruby1.8.so.1.8
#6  0x00007f508f6f0578 in ?? () from /usr/lib/libruby1.8.so.1.8
#7  0x00007f508f6f0825 in rb_funcall () from /usr/lib/libruby1.8.so.1.8
#8  0x00007f508f6ebd7d in ?? () from /usr/lib/libruby1.8.so.1.8
#9  0x00007f508f6ed9f7 in ?? () from /usr/lib/libruby1.8.so.1.8
#10 0x00007f508f6e9dea in ?? () from /usr/lib/libruby1.8.so.1.8
#11 0x00007f508f6ecb81 in ?? () from /usr/lib/libruby1.8.so.1.8
#12 0x00007f508f6ecc5b in ?? () from /usr/lib/libruby1.8.so.1.8
#13 0x00007f508f6ef573 in ?? () from /usr/lib/libruby1.8.so.1.8
#14 0x00007f508f6ef8b3 in ?? () from /usr/lib/libruby1.8.so.1.8
#15 0x00007f508f6ec721 in ?? () from /usr/lib/libruby1.8.so.1.8
#16 0x00007f508f6ed066 in ?? () from /usr/lib/libruby1.8.so.1.8
#17 0x00007f508f6ea2f6 in ?? () from /usr/lib/libruby1.8.so.1.8
#18 0x00007f508f6fc85b in ?? () from /usr/lib/libruby1.8.so.1.8
#19 0x00007f508f6fc8a5 in ruby_exec () from /usr/lib/libruby1.8.so.1.8
#20 0x00007f508f6fc8d5 in ruby_run () from /usr/lib/libruby1.8.so.1.8
#21 0x0000000000400911 in main ()

Backtrace for the child PID:
#0  0x00007f508f4a2474 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f508f4a00c1 in pthread_cond_signal@@GLIBC_2.3.2 ()
   from /lib/libpthread.so.0
#2  0x00007f508f6e5e8e in rb_thread_stop_timer ()
   from /usr/lib/libruby1.8.so.1.8
#3  0x00007f508e8f5416 in fork () from /lib/libc.so.6
#4  0x00007f508f70cd20 in ?? () from /usr/lib/libruby1.8.so.1.8
#5  0x00007f508f70e691 in ?? () from /usr/lib/libruby1.8.so.1.8
#6  0x00007f508f6ef6c1 in ?? () from /usr/lib/libruby1.8.so.1.8
#7  0x00007f508f6ef8b3 in ?? () from /usr/lib/libruby1.8.so.1.8
#8  0x00007f508f6f0578 in ?? () from /usr/lib/libruby1.8.so.1.8
#9  0x00007f508f6f0825 in rb_funcall () from /usr/lib/libruby1.8.so.1.8
#10 0x00007f508f6ebd7d in ?? () from /usr/lib/libruby1.8.so.1.8
#11 0x00007f508f6ed9f7 in ?? () from /usr/lib/libruby1.8.so.1.8
#12 0x00007f508f6e9dea in ?? () from /usr/lib/libruby1.8.so.1.8
#13 0x00007f508f6ecb81 in ?? () from /usr/lib/libruby1.8.so.1.8
#14 0x00007f508f6ecc5b in ?? () from /usr/lib/libruby1.8.so.1.8
#15 0x00007f508f6ef573 in ?? () from /usr/lib/libruby1.8.so.1.8
#16 0x00007f508f6ef8b3 in ?? () from /usr/lib/libruby1.8.so.1.8
#17 0x00007f508f6ec721 in ?? () from /usr/lib/libruby1.8.so.1.8
#18 0x00007f508f6ed066 in ?? () from /usr/lib/libruby1.8.so.1.8
#19 0x00007f508f6ea2f6 in ?? () from /usr/lib/libruby1.8.so.1.8
#20 0x00007f508f6fc85b in ?? () from /usr/lib/libruby1.8.so.1.8
#21 0x00007f508f6fc8a5 in ruby_exec () from /usr/lib/libruby1.8.so.1.8
#22 0x00007f508f6fc8d5 in ruby_run () from /usr/lib/libruby1.8.so.1.8
#23 0x0000000000400911 in main ()

I could easily provide you with an Ubuntu lucid chroot (as a tarball) so 
you can reproduce the issue. I'd just need to use the CPU architecture 
that you use (i386, amd64?)
----------------------------------------
http://redmine.ruby-lang.org/issues/show/2739
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.