Forum: Ruby Garbage collector segmentation fault

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
0377ec6fab3880beb5da3dde78af5418?d=identicon&s=25 Henrik Zagerholm (cubiq)
on 2008-11-06 00:04
Hello list,

I get this every now and then when running some ruby / c++ methods from
within rails.
It looks like ruby's garbage collector once again makes bad decisions...

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47432141183712 (LWP 30617)]
garbage_collect () at gc.c:1184
1184        if (!(p->as.basic.flags & FL_MARK)) {


I've posted the bug to the ruby bug hunting system but haven't heard
nothing yet. I think this is the biggest problem with ruby. The core
development is slow and badly maintained.

Anyone have an idea what to do next? My only choice right now is to
switch to python...

Or try Ruby enterprise which has an rewritten garbage collector.

Cheers,
Henke
Bec38d63650c8912b6ba9b557fb953b9?d=identicon&s=25 Roger Pack (rogerdpack)
on 2008-11-06 00:58
Was this in an embedded environment? What version of ruby? Is it
reproducible?   What's the full gdb?  Can you produce a sample that
causes it [that is really the only way it can ever get fixed].
Jruby is another option.
Thanks!
-=R


> Hello list,
>
> I get this every now and then when running some ruby / c++ methods from
> within rails.
> It looks like ruby's garbage collector once again makes bad decisions...
0377ec6fab3880beb5da3dde78af5418?d=identicon&s=25 Henrik Zagerholm (cubiq)
on 2008-11-06 10:46
Roger Pack wrote:
> Was this in an embedded environment? What version of ruby? Is it
> reproducible?   What's the full gdb?  Can you produce a sample that
> causes it [that is really the only way it can ever get fixed].
> Jruby is another option.
> Thanks!
> -=R

Hello Roger and thanks for quick reply,

It is always hard to reproduce a bug in garbage collector as it is
unpredictable at best. But calling the same function between 5-10 times
always produces the segmentation fault.

It can only be reproduced in my web environment running rails with
webbrick (or any other webserver). I've tried to reproduce the bug in
the console but haven't succeeded, maybe because its hard to simulate
what the webserver does in the console. If you have any ideas on how to
do this please let me know :)

This is not an embedded environment.
uname -a
Linux debian-ipox-x64 2.6.18-5-amd64 #1 SMP Thu May 31 23:51:05 UTC 2007
x86_64 GNU/Linux

I tried with different version of ruby and now I'm using
ruby 1.8.7 (2008-10-31 revision 0) [x86_64-linux]


Below is a GDB backtrace
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47286927956704 (LWP 23124)]
garbage_collect () at gc.c:1184
1184        if (!(p->as.basic.flags & FL_MARK)) {
(gdb) bt
#0  garbage_collect () at gc.c:1184
#1  0x000000000042e41b in rb_gc () at gc.c:1533
#2  0x000000000042e439 in rb_gc_start () at gc.c:1550
#3  0x000000000041b87a in rb_call0 (klass=47286918689040,
recv=47286918689080, id=5313, oid=5313,
    argc=0, argv=0x0, body=0x2b01d7fdb0e8, flags=<value optimized out>)
at eval.c:5906
#4  0x000000000041c63f in rb_call (klass=47286918689040,
recv=47286918689080, mid=5313, argc=0,
    argv=0x0, scope=0, self=47287043546680) at eval.c:6153
#5  0x000000000041734c in rb_eval (self=47287043546680, n=<value
optimized out>) at eval.c:3494
#6  0x000000000041c0f2 in rb_call0 (klass=47287043546760,
recv=47287043546680, id=107929,
    oid=107929, argc=0, argv=0x7fffd2b0c1b0, body=0x2b01df71fa00,
flags=<value optimized out>)
    at eval.c:6057
#7  0x000000000041c63f in rb_call (klass=47287043546760,
recv=47287043546680, mid=107929, argc=4,
    argv=0x7fffd2b0c190, scope=0, self=47287042528040) at eval.c:6153
#8  0x000000000041734c in rb_eval (self=47287042528040, n=<value
optimized out>) at eval.c:3494
#9  0x00000000004176db in rb_eval (self=47287042528040, n=<value
optimized out>) at eval.c:3049
#10 0x000000000041c0f2 in rb_call0 (klass=47287043072000,
recv=47287042528040, id=113353,
    oid=113353, argc=0, argv=0x7fffd2b0cea8, body=0x2b01df6a9918,
flags=<value optimized out>)
    at eval.c:6057
#11 0x000000000041c63f in rb_call (klass=47287043072000,
recv=47287042528040, mid=113353, argc=0,
    argv=0x7fffd2b0cea8, scope=1, self=6) at eval.c:6153
#12 0x000000000042421e in rb_f_send (argc=1, argv=0x7fffd2b0cea0,
recv=47287042528040)
    at eval.c:6201
#13 0x000000000041b87a in rb_call0 (klass=47286918830400,
recv=47287042528040, id=4225, oid=4225,
    argc=1, argv=0x7fffd2b0cea0, body=0x2b01d7ff93b8, flags=<value
optimized out>) at eval.c:5906
#14 0x000000000041c63f in rb_call (klass=47286918830400,
recv=47287042528040, mid=4225, argc=1,
    argv=0x7fffd2b0cea0, scope=1, self=47287042528040) at eval.c:6153
#15 0x0000000000417482 in rb_eval (self=47287042528040, n=<value
optimized out>) at eval.c:3509
---Type <return> to continue, or q <return> to quit---
#16 0x000000000041c0f2 in rb_call0 (klass=47286962656200,
recv=47287042528040, id=63369,
    oid=56801, argc=0, argv=0x0, body=0x2b01dab76770, flags=<value
optimized out>) at eval.c:6057
#17 0x000000000041c63f in rb_call (klass=47286962656200,
recv=47287042528040, mid=63369, argc=0,
    argv=0x0, scope=2, self=47287042528040) at eval.c:6153
#18 0x00000000004174cc in rb_eval (self=47287042528040, n=<value
optimized out>) at eval.c:3515
#19 0x000000000041c0f2 in rb_call0 (klass=47286963258720,
recv=47287042528040, id=63337,
    oid=63337, argc=0, argv=0x7fffd2b0db08, body=0x2b01da95f810,
flags=<value optimized out>)
    at eval.c:6057
#20 0x000000000041c63f in rb_call (klass=47286963258720,
recv=47287042528040, mid=63337, argc=3,
    argv=0x7fffd2b0daf0, scope=1, self=47287042528040) at eval.c:6153
#21 0x0000000000417482 in rb_eval (self=47287042528040, n=<value
optimized out>) at eval.c:3509
#22 0x000000000041c0f2 in rb_call0 (klass=47286963258720,
recv=47287042528040, id=62609,
    oid=63329, argc=0, argv=0x0, body=0x2b01da95fc70, flags=<value
optimized out>) at eval.c:6057
#23 0x000000000041c63f in rb_call (klass=47286963258720,
recv=47287042528040, mid=62609, argc=0,
    argv=0x0, scope=2, self=47287042528040) at eval.c:6153
#24 0x00000000004174cc in rb_eval (self=47287042528040, n=<value
optimized out>) at eval.c:3515
#25 0x000000000041aa53 in rb_yield_0 (val=6, self=47287042528040,
klass=0,
    flags=<value optimized out>, avalue=0) at eval.c:5079
#26 0x0000000000417e84 in rb_eval (self=47286940121880, n=<value
optimized out>) at eval.c:3299
#27 0x000000000041c0f2 in rb_call0 (klass=47286943807600,
recv=47286940121880, id=53321,
    oid=53321, argc=0, argv=0x0, body=0x2b01d8d125e0, flags=<value
optimized out>) at eval.c:6057
#28 0x000000000041c63f in rb_call (klass=47286943807600,
recv=47286940121880, mid=53321, argc=0,
    argv=0x0, scope=0, self=47287042528040) at eval.c:6153
#29 0x000000000041734c in rb_eval (self=47287042528040, n=<value
optimized out>) at eval.c:3494
#30 0x0000000000419480 in rb_eval (self=47287042528040, n=<value
optimized out>) at eval.c:3224
#31 0x0000000000417257 in rb_eval (self=47287042528040, n=<value
optimized out>) at eval.c:3488
#32 0x0000000000416c6d in rb_eval (self=47287042528040, n=<value
optimized out>) at eval.c:3847


Cheers,
Henke
Bec38d63650c8912b6ba9b557fb953b9?d=identicon&s=25 Roger Pack (rogerdpack)
on 2008-11-06 19:31
> Hello Roger and thanks for quick reply,
>
> It is always hard to reproduce a bug in garbage collector as it is
> unpredictable at best. But calling the same function between 5-10 times
> always produces the segmentation fault.
>
> It can only be reproduced in my web environment running rails with
> webbrick (or any other webserver). I've tried to reproduce the bug in
> the console but haven't succeeded, maybe because its hard to simulate
> what the webserver does in the console. If you have any ideas on how to
> do this please let me know :)

that's tough :)

> I tried with different version of ruby and now I'm using
> ruby 1.8.7 (2008-10-31 revision 0) [x86_64-linux]

You may want to try a newer revision--I assume they all err? [SVN HEAD
of the 187 branch would be ideal].

I assume that p is a null pointer in this example?

> Below is a GDB backtrace
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 47286927956704 (LWP 23124)]
> garbage_collect () at gc.c:1184
> 1184        if (!(p->as.basic.flags & FL_MARK)) {
> (gdb) bt
> #0  garbage_collect () at gc.c:1184
> #1  0x000000000042e41b in rb_gc () at gc.c:1533
> #2  0x000000000042e439 in rb_gc_start () at gc.c:1550
> #3  0x000000000041b87a in rb_call0 (klass=47286918689040,
> recv=47286918689080, id=5313, oid=5313,

you might be able to email me a core dump [rogerpack2005@gmail.com]
but...that's still not quite as nice as having a reproducible
environment, is it?
I'd say try with SVN HEAD and report back.
Thanks!
-=R
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2008-11-06 22:16
Henrik Zagerholm wrote:
> I get this every now and then when running some ruby / c++ methods from
> within rails.

What do you mean by "ruby / c++ methods"? Have you written your own
extension to Ruby, or used someone elses?

What does $LOADED_FEATURES.grep(/\.so/) show?

> It looks like ruby's garbage collector once again makes bad decisions...
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 47432141183712 (LWP 30617)]
> garbage_collect () at gc.c:1184
> 1184        if (!(p->as.basic.flags & FL_MARK)) {

Or it could be bad setup of data structures from some extension library,
or incorrect handling of mark/sweep by that library, causing objects
which are still live to be garbage collected.
Bec38d63650c8912b6ba9b557fb953b9?d=identicon&s=25 Roger Pack (rogerdpack)
on 2008-12-08 06:59
>> Below is a GDB backtrace
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 47286927956704 (LWP 23124)]
>> garbage_collect () at gc.c:1184
>> 1184        if (!(p->as.basic.flags & FL_MARK)) {
>> (gdb) bt
>> #0  garbage_collect () at gc.c:1184
>> #1  0x000000000042e41b in rb_gc () at gc.c:1533
>> #2  0x000000000042e439 in rb_gc_start () at gc.c:1550
>> #3  0x000000000041b87a in rb_call0 (klass=47286918689040,
>> recv=47286918689080, id=5313, oid=5313,
>

Yeah the backtrace doesn't look too suspicious.  A reproducible test
case would be ideal [even if it's sending me all your rails app--though
you'd have to trust I would delete it after :) ]
Valgrind would be another option.
Cheers!
-=R
Bec38d63650c8912b6ba9b557fb953b9?d=identicon&s=25 Roger Pack (rogerdpack)
on 2008-12-22 09:39
> I get this every now and then when running some ruby / c++ methods from
> within rails.
> It looks like ruby's garbage collector once again makes bad decisions...
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 47432141183712 (LWP 30617)]
> garbage_collect () at gc.c:1184
> 1184        if (!(p->as.basic.flags & FL_MARK)) {
>
>
> I've posted the bug to the ruby bug hunting system but haven't heard
> nothing yet. I think this is the biggest problem with ruby. The core
> development is slow and badly maintained.

Check http://www.ruby-forum.com/topic/170608#new it has some recent GC
patches that might help.  Also could submit a report to
redmine.ruby-lang.org [esp. if reproducible].
Thanks!
-=R
0377ec6fab3880beb5da3dde78af5418?d=identicon&s=25 Henrik Zagerholm (cubiq)
on 2008-12-22 09:48
Roger Pack wrote:
>
>> I get this every now and then when running some ruby / c++ methods from
>> within rails.
>> It looks like ruby's garbage collector once again makes bad decisions...
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 47432141183712 (LWP 30617)]
>> garbage_collect () at gc.c:1184
>> 1184        if (!(p->as.basic.flags & FL_MARK)) {
>>
>>
>> I've posted the bug to the ruby bug hunting system but haven't heard
>> nothing yet. I think this is the biggest problem with ruby. The core
>> development is slow and badly maintained.
>
> Check http://www.ruby-forum.com/topic/170608#new it has some recent GC
> patches that might help.  Also could submit a report to
> redmine.ruby-lang.org [esp. if reproducible].
> Thanks!
> -=R

Hi Roger and thanks for the info.

I'll see if I can try this out. The wrapper code is made with Swig so it
could be a swig problem. This is whay makes it soo hard to debug.

Cheers,
Henke
This topic is locked and can not be replied to.