Garbage collector segmentation fault


#1

Hello list,

I get this every now and then when running some ruby / c++ methods from
within rails.
It looks like ruby’s garbage collector once again makes bad decisions…

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47432141183712 (LWP 30617)]
garbage_collect () at gc.c:1184
1184 if (!(p->as.basic.flags & FL_MARK)) {

I’ve posted the bug to the ruby bug hunting system but haven’t heard
nothing yet. I think this is the biggest problem with ruby. The core
development is slow and badly maintained.

Anyone have an idea what to do next? My only choice right now is to
switch to python…

Or try Ruby enterprise which has an rewritten garbage collector.

Cheers,
Henke


#2

Was this in an embedded environment? What version of ruby? Is it
reproducible? What’s the full gdb? Can you produce a sample that
causes it [that is really the only way it can ever get fixed].
Jruby is another option.
Thanks!
-=R

Hello list,

I get this every now and then when running some ruby / c++ methods from
within rails.
It looks like ruby’s garbage collector once again makes bad decisions…


#3

Henrik Z. wrote:

I get this every now and then when running some ruby / c++ methods from
within rails.

What do you mean by “ruby / c++ methods”? Have you written your own
extension to Ruby, or used someone elses?

What does $LOADED_FEATURES.grep(/.so/) show?

It looks like ruby’s garbage collector once again makes bad decisions…

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47432141183712 (LWP 30617)]
garbage_collect () at gc.c:1184
1184 if (!(p->as.basic.flags & FL_MARK)) {

Or it could be bad setup of data structures from some extension library,
or incorrect handling of mark/sweep by that library, causing objects
which are still live to be garbage collected.


#4

Hello Roger and thanks for quick reply,

It is always hard to reproduce a bug in garbage collector as it is
unpredictable at best. But calling the same function between 5-10 times
always produces the segmentation fault.

It can only be reproduced in my web environment running rails with
webbrick (or any other webserver). I’ve tried to reproduce the bug in
the console but haven’t succeeded, maybe because its hard to simulate
what the webserver does in the console. If you have any ideas on how to
do this please let me know :slight_smile:

that’s tough :slight_smile:

I tried with different version of ruby and now I’m using
ruby 1.8.7 (2008-10-31 revision 0) [x86_64-linux]

You may want to try a newer revision–I assume they all err? [SVN HEAD
of the 187 branch would be ideal].

I assume that p is a null pointer in this example?

Below is a GDB backtrace
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47286927956704 (LWP 23124)]
garbage_collect () at gc.c:1184
1184 if (!(p->as.basic.flags & FL_MARK)) {
(gdb) bt
#0 garbage_collect () at gc.c:1184
#1 0x000000000042e41b in rb_gc () at gc.c:1533
#2 0x000000000042e439 in rb_gc_start () at gc.c:1550
#3 0x000000000041b87a in rb_call0 (klass=47286918689040,
recv=47286918689080, id=5313, oid=5313,

you might be able to email me a core dump [removed_email_address@domain.invalid]
but…that’s still not quite as nice as having a reproducible
environment, is it?
I’d say try with SVN HEAD and report back.
Thanks!
-=R


#5

Below is a GDB backtrace
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47286927956704 (LWP 23124)]
garbage_collect () at gc.c:1184
1184 if (!(p->as.basic.flags & FL_MARK)) {
(gdb) bt
#0 garbage_collect () at gc.c:1184
#1 0x000000000042e41b in rb_gc () at gc.c:1533
#2 0x000000000042e439 in rb_gc_start () at gc.c:1550
#3 0x000000000041b87a in rb_call0 (klass=47286918689040,
recv=47286918689080, id=5313, oid=5313,

Yeah the backtrace doesn’t look too suspicious. A reproducible test
case would be ideal [even if it’s sending me all your rails app–though
you’d have to trust I would delete it after :slight_smile: ]
Valgrind would be another option.
Cheers!
-=R


#6

Roger P. wrote:

Was this in an embedded environment? What version of ruby? Is it
reproducible? What’s the full gdb? Can you produce a sample that
causes it [that is really the only way it can ever get fixed].
Jruby is another option.
Thanks!
-=R

Hello Roger and thanks for quick reply,

It is always hard to reproduce a bug in garbage collector as it is
unpredictable at best. But calling the same function between 5-10 times
always produces the segmentation fault.

It can only be reproduced in my web environment running rails with
webbrick (or any other webserver). I’ve tried to reproduce the bug in
the console but haven’t succeeded, maybe because its hard to simulate
what the webserver does in the console. If you have any ideas on how to
do this please let me know :slight_smile:

This is not an embedded environment.
uname -a
Linux debian-ipox-x64 2.6.18-5-amd64 #1 SMP Thu May 31 23:51:05 UTC 2007
x86_64 GNU/Linux

I tried with different version of ruby and now I’m using
ruby 1.8.7 (2008-10-31 revision 0) [x86_64-linux]

Below is a GDB backtrace
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47286927956704 (LWP 23124)]
garbage_collect () at gc.c:1184
1184 if (!(p->as.basic.flags & FL_MARK)) {
(gdb) bt
#0 garbage_collect () at gc.c:1184
#1 0x000000000042e41b in rb_gc () at gc.c:1533
#2 0x000000000042e439 in rb_gc_start () at gc.c:1550
#3 0x000000000041b87a in rb_call0 (klass=47286918689040,
recv=47286918689080, id=5313, oid=5313,
argc=0, argv=0x0, body=0x2b01d7fdb0e8, flags=)
at eval.c:5906
#4 0x000000000041c63f in rb_call (klass=47286918689040,
recv=47286918689080, mid=5313, argc=0,
argv=0x0, scope=0, self=47287043546680) at eval.c:6153
#5 0x000000000041734c in rb_eval (self=47287043546680, n=) at eval.c:3494
#6 0x000000000041c0f2 in rb_call0 (klass=47287043546760,
recv=47287043546680, id=107929,
oid=107929, argc=0, argv=0x7fffd2b0c1b0, body=0x2b01df71fa00,
flags=)
at eval.c:6057
#7 0x000000000041c63f in rb_call (klass=47287043546760,
recv=47287043546680, mid=107929, argc=4,
argv=0x7fffd2b0c190, scope=0, self=47287042528040) at eval.c:6153
#8 0x000000000041734c in rb_eval (self=47287042528040, n=) at eval.c:3494
#9 0x00000000004176db in rb_eval (self=47287042528040, n=) at eval.c:3049
#10 0x000000000041c0f2 in rb_call0 (klass=47287043072000,
recv=47287042528040, id=113353,
oid=113353, argc=0, argv=0x7fffd2b0cea8, body=0x2b01df6a9918,
flags=)
at eval.c:6057
#11 0x000000000041c63f in rb_call (klass=47287043072000,
recv=47287042528040, mid=113353, argc=0,
argv=0x7fffd2b0cea8, scope=1, self=6) at eval.c:6153
#12 0x000000000042421e in rb_f_send (argc=1, argv=0x7fffd2b0cea0,
recv=47287042528040)
at eval.c:6201
#13 0x000000000041b87a in rb_call0 (klass=47286918830400,
recv=47287042528040, id=4225, oid=4225,
argc=1, argv=0x7fffd2b0cea0, body=0x2b01d7ff93b8, flags=) at eval.c:5906
#14 0x000000000041c63f in rb_call (klass=47286918830400,
recv=47287042528040, mid=4225, argc=1,
argv=0x7fffd2b0cea0, scope=1, self=47287042528040) at eval.c:6153
#15 0x0000000000417482 in rb_eval (self=47287042528040, n=) at eval.c:3509
—Type to continue, or q to quit—
#16 0x000000000041c0f2 in rb_call0 (klass=47286962656200,
recv=47287042528040, id=63369,
oid=56801, argc=0, argv=0x0, body=0x2b01dab76770, flags=) at eval.c:6057
#17 0x000000000041c63f in rb_call (klass=47286962656200,
recv=47287042528040, mid=63369, argc=0,
argv=0x0, scope=2, self=47287042528040) at eval.c:6153
#18 0x00000000004174cc in rb_eval (self=47287042528040, n=) at eval.c:3515
#19 0x000000000041c0f2 in rb_call0 (klass=47286963258720,
recv=47287042528040, id=63337,
oid=63337, argc=0, argv=0x7fffd2b0db08, body=0x2b01da95f810,
flags=)
at eval.c:6057
#20 0x000000000041c63f in rb_call (klass=47286963258720,
recv=47287042528040, mid=63337, argc=3,
argv=0x7fffd2b0daf0, scope=1, self=47287042528040) at eval.c:6153
#21 0x0000000000417482 in rb_eval (self=47287042528040, n=) at eval.c:3509
#22 0x000000000041c0f2 in rb_call0 (klass=47286963258720,
recv=47287042528040, id=62609,
oid=63329, argc=0, argv=0x0, body=0x2b01da95fc70, flags=) at eval.c:6057
#23 0x000000000041c63f in rb_call (klass=47286963258720,
recv=47287042528040, mid=62609, argc=0,
argv=0x0, scope=2, self=47287042528040) at eval.c:6153
#24 0x00000000004174cc in rb_eval (self=47287042528040, n=) at eval.c:3515
#25 0x000000000041aa53 in rb_yield_0 (val=6, self=47287042528040,
klass=0,
flags=, avalue=0) at eval.c:5079
#26 0x0000000000417e84 in rb_eval (self=47286940121880, n=) at eval.c:3299
#27 0x000000000041c0f2 in rb_call0 (klass=47286943807600,
recv=47286940121880, id=53321,
oid=53321, argc=0, argv=0x0, body=0x2b01d8d125e0, flags=) at eval.c:6057
#28 0x000000000041c63f in rb_call (klass=47286943807600,
recv=47286940121880, mid=53321, argc=0,
argv=0x0, scope=0, self=47287042528040) at eval.c:6153
#29 0x000000000041734c in rb_eval (self=47287042528040, n=) at eval.c:3494
#30 0x0000000000419480 in rb_eval (self=47287042528040, n=) at eval.c:3224
#31 0x0000000000417257 in rb_eval (self=47287042528040, n=) at eval.c:3488
#32 0x0000000000416c6d in rb_eval (self=47287042528040, n=) at eval.c:3847

Cheers,
Henke


#7

I get this every now and then when running some ruby / c++ methods from
within rails.
It looks like ruby’s garbage collector once again makes bad decisions…

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47432141183712 (LWP 30617)]
garbage_collect () at gc.c:1184
1184 if (!(p->as.basic.flags & FL_MARK)) {

I’ve posted the bug to the ruby bug hunting system but haven’t heard
nothing yet. I think this is the biggest problem with ruby. The core
development is slow and badly maintained.

Check http://www.ruby-forum.com/topic/170608#new it has some recent GC
patches that might help. Also could submit a report to
redmine.ruby-lang.org [esp. if reproducible].
Thanks!
-=R


#8

Roger P. wrote:

I get this every now and then when running some ruby / c++ methods from
within rails.
It looks like ruby’s garbage collector once again makes bad decisions…

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47432141183712 (LWP 30617)]
garbage_collect () at gc.c:1184
1184 if (!(p->as.basic.flags & FL_MARK)) {

I’ve posted the bug to the ruby bug hunting system but haven’t heard
nothing yet. I think this is the biggest problem with ruby. The core
development is slow and badly maintained.

Check http://www.ruby-forum.com/topic/170608#new it has some recent GC
patches that might help. Also could submit a report to
redmine.ruby-lang.org [esp. if reproducible].
Thanks!
-=R

Hi Roger and thanks for the info.

I’ll see if I can try this out. The wrapper code is made with Swig so it
could be a swig problem. This is whay makes it soo hard to debug.

Cheers,
Henke