Forum: Ruby-core $2000 USD Reward for help fixing Segmentation Fault in GC

Posted by Brent Roman (Guest)
on 2007-05-31 02:17
(Received via mailing list)
Help!

Our Ruby controlled Robotic Marine Laboratory started failing
with Segmentation Faults just a few days before it was to be
deployed.  We had seen random, very occasional Segmentation
Faults for some months, as our application grew larger
and more complex.  Then, just days before the ship
was scheduled to sail, after we'd integrated a couple new and
exciting features, we started getting segfaults regularly.

Please see [ruby-core:11218] & [ruby-core:11228] for more details
including URLs to a core dump and stack trace.

The 200+ level stack trace indicates that the application was deep
into a GC cycle while executing Marshal.dump when the segfault
occurred.


Here are the details on the reward:

  The Monterey Bay Aquarium Research Institute (http://www.mbari.org)
is offering $2000 USD to the first person to provide a software
fix to the bug causing the above described Segmentation Faults
in our application.  Our application is arguably one of the cooler,
more unusual uses of Ruby:

http://www.mbari.org/microbial/ESP/
http://www.zenspider.com/dl/rubyconf2005/EmbeddedRuby.pdf

I will provide support to individuals who offer plausible
suggestions.  E-mail suggested fixes or queries for specific
information to me directly if you do
not wish to share them with the list.  Whatever fix finally
is determined to work will be shared with the list, after
the individual providing it is paid, so that the community
may also benefit.

The fine print:

Funds will be paid by corporate cheque in U.S. Dollars after
the bug fix is verified to work.  Verification may take up to 45 days
from the submission of the prospective fix.  Only the first working
bug fix submitted by email to brent@mbari.org will be rewarded.

Individuals obligated to pay U.S. taxes on their income will
be sent an IRS form 1099 from MBARI at year's end.
If you are a United States citizen or have a "green card", you will
need to send MBARI your mailing address and
U.S. Social Security Number before receiving
payment and should expect to pay income tax on the reward.
Individuals not obligated to pay U.S. income tax will have the
option to receive payment via bank wire rather than cheque.

I will post to this list, roughly on a weekly basis, the number of
plausible, prospective fixes we have received thus far.
After we've received ten or so from different individuals, we will
cease accepting any more.
Posted by daz (c) (Guest)
on 2007-05-31 09:01
(Received via mailing list)
Brent Roman wrote:
> Help!

Maybe you've tried already but, if you can afford the reduced
performance, compiling ruby with less optimisation might help
until a more suitable solution can be found?

daz
Posted by Sur Max (sur)
on 2007-05-31 09:29
(Received via mailing list)
Hi Brent,

Although i can not anticipate the code structure, still suggesting... 
please
ignore if you have already tried that out!

using WekRef library, a lots of memory management problems can be 
solved...
i have implemented it in various cases and that worked out well.

like if u have a large object say a 5 MB file assigned to a variable

like file = File.open("myfile", "r")
.... all processing with file ......
GC.start #  withing the same process

then this time running GC.start will not swipe off the "file" object 
from
the memory as its reference still exists as "file"

by doing it like

require 'weakref'
file = WeakRef.new(File.open("myfile", "r"))
.... all processing with file ......
GC.start

this time the variable "file" will be swiped off, no matter whether the
reference exists or not!


so, what all i mean... that wherever you know the variable is assigned 
as
too big values that will not be useful after the current function....
you can make those variables as WeakRef object and can call GC.start at 
the
end of each function... thereby keeping the memory free.

like a string object can assigned as

str = WeakRef.new("my string")

so str will perform all the functions of String object... but remember 
one
thing it will be an object of WeakRef class, so you can not rely on 
anything
like str.class in your code... I mean, with WeakRef you will strongly 
need
to focus on DuckTyping.

thanks.

--
SurMax
http://expressica.com
Posted by Meinrad Recheis (Guest)
on 2007-05-31 13:42
(Received via mailing list)
On 5/31/07, sur max <sur.max@gmail.com> wrote:
> Hi Brent,
>
> Although i can not anticipate the code structure, still suggesting... please
> ignore if you have already tried that out!
>
> using WekRef library, a lots of memory management problems can be solved...
> i have implemented it in various cases and that worked out well.

be warned, though, that using weakref extensively can result in
extreme performance loss. it is really very slow!

-- henon
Posted by Sur Max (sur)
on 2007-05-31 13:49
(Received via mailing list)
Hey Meinrad,

I never came across this fact earlier that WeakRef usage can lead to the
performance loss..
It will be great if you can provide some links justifying the same.

Thanks

--
Sur Max
http://expressica.com
Posted by Meinrad Recheis (Guest)
on 2007-05-31 14:03
(Received via mailing list)
On 5/31/07, sur max <sur.max@gmail.com> wrote:
> Hey Meinrad,
>
> I never came across this fact earlier that WeakRef usage can lead to the
> performance loss..
> It will be great if you can provide some links justifying the same.

require "weakref"
require "benchmark"

a=Object.new

puts Benchmark.measure {
  1_000_000.times{ a.to_s }
}

b=WeakRef.new( Object.new )

puts Benchmark.measure {
  1_000_000.times{ b.to_s }
}

=>

  5.032000   0.031000   5.063000 (  5.297000)
 18.172000   0.047000  18.219000 ( 18.375000)

so as you see, the weakref is quite slow (almost factor 4) compared to
direct reference. maybe, if the number of objects in objectspace is
high it is even slower, but i don't know if that is really true. this
was measured with ruby 1.8.5.
Posted by M. Edward (Ed) Borasky (Guest)
on 2007-05-31 15:31
(Received via mailing list)
daz (c) wrote:
>
>
>
>
Perhaps they are looking to pay someone $2000US to do that for them. ;)
Posted by Sylvain Joyeux (Guest)
on 2007-06-01 20:04
(Received via mailing list)
If you can run your application on x86 and still have the crash, run the
interpreter under valgrind. You'll have to generate a suppression file
beforehand though.
Posted by Sylvain Joyeux (Guest)
on 2007-06-03 20:33
(Received via mailing list)
On Friday 01 June 2007 20:04:09 Sylvain Joyeux wrote:
> If you can run your application on x86 and still have the crash, run the
> interpreter under valgrind. You'll have to generate a suppression file
> beforehand though.

Brent, you can try the attached file. They are for 1.8.5 under debian, 
but
maybe the backtraces will be similar enough to do the trick for you.

Sylvain
Posted by Brent Roman (Guest)
on 2007-09-25 23:04
(Received via mailing list)
Per the suggestion of ggarra and Sylvain, I've been running our robotic
lab control
application under valgrind on an x86 laptop rather than our ARM based
CPU board.

I've posted valgrind's stderr output and the suppression file on our FTP
site:

ftp://ftp.mbari.org/pub/brent

the suppression file is rubySuppressions.valgrind
the stderr output is valgrind.trace2

So far, I have not triggered a Segfault with the simulated hardware
currently
available to me.

The unhandled ioctl is expected.  We're locking
a serial port for exclusive access.

Many of the memcheck errors have been suppressed as they
are apparently artifacts of Ruby's conservative
garbarge collector.  However, those below *do* worry me.

Are they innocuous, or do they indicate a real problem?

....

==10837== Invalid write of size 1
==10837==    at 0x401DEAC: memcpy (in
/usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==10837==    by 0x8062E5E: rb_thread_restore_context (eval.c:7624)
==10837==    by 0x8062CA8: stack_extend (eval.c:7575)
==10837==    by 0x8062CF3: rb_thread_restore_context (eval.c:7592)
==10837==    by 0x8062CA8: stack_extend (eval.c:7575)
==10837==    by 0x8062CF3: rb_thread_restore_context (eval.c:7592)
==10837==    by 0x8062CA8: stack_extend (eval.c:7575)
==10837==    by 0x8062CF3: rb_thread_restore_context (eval.c:7592)
==10837==    by 0x8062CA8: stack_extend (eval.c:7575)
==10837==    by 0x8062CF3: rb_thread_restore_context (eval.c:7592)
==10837==    by 0x8062CA8: stack_extend (eval.c:7575)
==10837==    by 0x8062CF3: rb_thread_restore_context (eval.c:7592)
==10837==  Address 0xBEFE8908 is on thread 1's stack
==10837==

==10837== Invalid write of size 1
==10837==    at 0x401DEAC: memcpy (in
/usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==10837==    by 0x8062E5E: rb_thread_restore_context (eval.c:7624)
==10837==    by 0x8063D21: rb_thread_schedule (eval.c:7987)
==10837==    by 0x8063E9D: rb_thread_fd_writable (eval.c:8018)
==10837==    by 0x8070835: io_fflush (io.c:256)
==10837==    by 0x8070A70: rb_io_flush (io.c:339)
==10837==    by 0x805ACEE: call_cfunc (eval.c:4288)
==10837==    by 0x805B78F: rb_call0 (eval.c:4423)
==10837==    by 0x805C2CA: rb_call (eval.c:4654)
==10837==    by 0x8055F53: rb_eval (eval.c:2559)
==10837==    by 0x80594C9: rb_yield_0 (eval.c:3650)
==10837==    by 0x80554A1: rb_eval (eval.c:2391)
==10837==  Address 0xBEFE8908 is on thread 1's stack
==10837==

==10837== Invalid write of size 1
==10837==    at 0x401DEAC: memcpy (in
/usr/lib/valgrind/x86-linux/vgpreload_memcheck.so)
==10837==    by 0x8062E5E: rb_thread_restore_context (eval.c:7624)
==10837==    by 0x8063D21: rb_thread_schedule (eval.c:7987)
==10837==    by 0x806466B: rb_thread_stop (eval.c:8309)
==10837==    by 0x805ACEE: call_cfunc (eval.c:4288)
==10837==    by 0x805B78F: rb_call0 (eval.c:4423)
==10837==    by 0x805C2CA: rb_call (eval.c:4654)
==10837==    by 0x8055F53: rb_eval (eval.c:2559)
==10837==    by 0x805BDE8: rb_call0 (eval.c:4560)
==10837==    by 0x805C2CA: rb_call (eval.c:4654)
==10837==    by 0x8055F53: rb_eval (eval.c:2559)
==10837==    by 0x805BDE8: rb_call0 (eval.c:4560)
==10837==  Address 0xBEFE8908 is on thread 1's stack
=
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.