Help! Our Ruby controlled Robotic Marine Laboratory started failing with Segmentation Faults just a few days before it was to be deployed. We had seen random, very occasional Segmentation Faults for some months, as our application grew larger and more complex. Then, just days before the ship was scheduled to sail, after we'd integrated a couple new and exciting features, we started getting segfaults regularly. Please see [ruby-core:11218] & [ruby-core:11228] for more details including URLs to a core dump and stack trace. The 200+ level stack trace indicates that the application was deep into a GC cycle while executing Marshal.dump when the segfault occurred. Here are the details on the reward: The Monterey Bay Aquarium Research Institute (http://www.mbari.org) is offering $2000 USD to the first person to provide a software fix to the bug causing the above described Segmentation Faults in our application. Our application is arguably one of the cooler, more unusual uses of Ruby: http://www.mbari.org/microbial/ESP/ http://www.zenspider.com/dl/rubyconf2005/EmbeddedRuby.pdf I will provide support to individuals who offer plausible suggestions. E-mail suggested fixes or queries for specific information to me directly if you do not wish to share them with the list. Whatever fix finally is determined to work will be shared with the list, after the individual providing it is paid, so that the community may also benefit. The fine print: Funds will be paid by corporate cheque in U.S. Dollars after the bug fix is verified to work. Verification may take up to 45 days from the submission of the prospective fix. Only the first working bug fix submitted by email to brent@mbari.org will be rewarded. Individuals obligated to pay U.S. taxes on their income will be sent an IRS form 1099 from MBARI at year's end. If you are a United States citizen or have a "green card", you will need to send MBARI your mailing address and U.S. Social Security Number before receiving payment and should expect to pay income tax on the reward. Individuals not obligated to pay U.S. income tax will have the option to receive payment via bank wire rather than cheque. I will post to this list, roughly on a weekly basis, the number of plausible, prospective fixes we have received thus far. After we've received ten or so from different individuals, we will cease accepting any more.
on 2007-05-31 02:17
on 2007-05-31 09:01
Brent Roman wrote:
> Help!
Maybe you've tried already but, if you can afford the reduced
performance, compiling ruby with less optimisation might help
until a more suitable solution can be found?
daz
on 2007-05-31 09:29
Hi Brent,
Although i can not anticipate the code structure, still suggesting...
please
ignore if you have already tried that out!
using WekRef library, a lots of memory management problems can be
solved...
i have implemented it in various cases and that worked out well.
like if u have a large object say a 5 MB file assigned to a variable
like file = File.open("myfile", "r")
.... all processing with file ......
GC.start # withing the same process
then this time running GC.start will not swipe off the "file" object
from
the memory as its reference still exists as "file"
by doing it like
require 'weakref'
file = WeakRef.new(File.open("myfile", "r"))
.... all processing with file ......
GC.start
this time the variable "file" will be swiped off, no matter whether the
reference exists or not!
so, what all i mean... that wherever you know the variable is assigned
as
too big values that will not be useful after the current function....
you can make those variables as WeakRef object and can call GC.start at
the
end of each function... thereby keeping the memory free.
like a string object can assigned as
str = WeakRef.new("my string")
so str will perform all the functions of String object... but remember
one
thing it will be an object of WeakRef class, so you can not rely on
anything
like str.class in your code... I mean, with WeakRef you will strongly
need
to focus on DuckTyping.
thanks.
--
SurMax
http://expressica.com
on 2007-05-31 13:42
On 5/31/07, sur max <sur.max@gmail.com> wrote: > Hi Brent, > > Although i can not anticipate the code structure, still suggesting... please > ignore if you have already tried that out! > > using WekRef library, a lots of memory management problems can be solved... > i have implemented it in various cases and that worked out well. be warned, though, that using weakref extensively can result in extreme performance loss. it is really very slow! -- henon
on 2007-05-31 13:49
Hey Meinrad, I never came across this fact earlier that WeakRef usage can lead to the performance loss.. It will be great if you can provide some links justifying the same. Thanks -- Sur Max http://expressica.com
on 2007-05-31 14:03
On 5/31/07, sur max <sur.max@gmail.com> wrote: > Hey Meinrad, > > I never came across this fact earlier that WeakRef usage can lead to the > performance loss.. > It will be great if you can provide some links justifying the same. require "weakref" require "benchmark" a=Object.new puts Benchmark.measure { 1_000_000.times{ a.to_s } } b=WeakRef.new( Object.new ) puts Benchmark.measure { 1_000_000.times{ b.to_s } } => 5.032000 0.031000 5.063000 ( 5.297000) 18.172000 0.047000 18.219000 ( 18.375000) so as you see, the weakref is quite slow (almost factor 4) compared to direct reference. maybe, if the number of objects in objectspace is high it is even slower, but i don't know if that is really true. this was measured with ruby 1.8.5.
on 2007-05-31 15:31
daz (c) wrote: > > > > Perhaps they are looking to pay someone $2000US to do that for them. ;)
on 2007-06-01 20:04
If you can run your application on x86 and still have the crash, run the interpreter under valgrind. You'll have to generate a suppression file beforehand though.
on 2007-06-03 20:33
On Friday 01 June 2007 20:04:09 Sylvain Joyeux wrote: > If you can run your application on x86 and still have the crash, run the > interpreter under valgrind. You'll have to generate a suppression file > beforehand though. Brent, you can try the attached file. They are for 1.8.5 under debian, but maybe the backtraces will be similar enough to do the trick for you. Sylvain
on 2007-09-25 23:04
Per the suggestion of ggarra and Sylvain, I've been running our robotic lab control application under valgrind on an x86 laptop rather than our ARM based CPU board. I've posted valgrind's stderr output and the suppression file on our FTP site: ftp://ftp.mbari.org/pub/brent the suppression file is rubySuppressions.valgrind the stderr output is valgrind.trace2 So far, I have not triggered a Segfault with the simulated hardware currently available to me. The unhandled ioctl is expected. We're locking a serial port for exclusive access. Many of the memcheck errors have been suppressed as they are apparently artifacts of Ruby's conservative garbarge collector. However, those below *do* worry me. Are they innocuous, or do they indicate a real problem? .... ==10837== Invalid write of size 1 ==10837== at 0x401DEAC: memcpy (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so) ==10837== by 0x8062E5E: rb_thread_restore_context (eval.c:7624) ==10837== by 0x8062CA8: stack_extend (eval.c:7575) ==10837== by 0x8062CF3: rb_thread_restore_context (eval.c:7592) ==10837== by 0x8062CA8: stack_extend (eval.c:7575) ==10837== by 0x8062CF3: rb_thread_restore_context (eval.c:7592) ==10837== by 0x8062CA8: stack_extend (eval.c:7575) ==10837== by 0x8062CF3: rb_thread_restore_context (eval.c:7592) ==10837== by 0x8062CA8: stack_extend (eval.c:7575) ==10837== by 0x8062CF3: rb_thread_restore_context (eval.c:7592) ==10837== by 0x8062CA8: stack_extend (eval.c:7575) ==10837== by 0x8062CF3: rb_thread_restore_context (eval.c:7592) ==10837== Address 0xBEFE8908 is on thread 1's stack ==10837== ==10837== Invalid write of size 1 ==10837== at 0x401DEAC: memcpy (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so) ==10837== by 0x8062E5E: rb_thread_restore_context (eval.c:7624) ==10837== by 0x8063D21: rb_thread_schedule (eval.c:7987) ==10837== by 0x8063E9D: rb_thread_fd_writable (eval.c:8018) ==10837== by 0x8070835: io_fflush (io.c:256) ==10837== by 0x8070A70: rb_io_flush (io.c:339) ==10837== by 0x805ACEE: call_cfunc (eval.c:4288) ==10837== by 0x805B78F: rb_call0 (eval.c:4423) ==10837== by 0x805C2CA: rb_call (eval.c:4654) ==10837== by 0x8055F53: rb_eval (eval.c:2559) ==10837== by 0x80594C9: rb_yield_0 (eval.c:3650) ==10837== by 0x80554A1: rb_eval (eval.c:2391) ==10837== Address 0xBEFE8908 is on thread 1's stack ==10837== ==10837== Invalid write of size 1 ==10837== at 0x401DEAC: memcpy (in /usr/lib/valgrind/x86-linux/vgpreload_memcheck.so) ==10837== by 0x8062E5E: rb_thread_restore_context (eval.c:7624) ==10837== by 0x8063D21: rb_thread_schedule (eval.c:7987) ==10837== by 0x806466B: rb_thread_stop (eval.c:8309) ==10837== by 0x805ACEE: call_cfunc (eval.c:4288) ==10837== by 0x805B78F: rb_call0 (eval.c:4423) ==10837== by 0x805C2CA: rb_call (eval.c:4654) ==10837== by 0x8055F53: rb_eval (eval.c:2559) ==10837== by 0x805BDE8: rb_call0 (eval.c:4560) ==10837== by 0x805C2CA: rb_call (eval.c:4654) ==10837== by 0x8055F53: rb_eval (eval.c:2559) ==10837== by 0x805BDE8: rb_call0 (eval.c:4560) ==10837== Address 0xBEFE8908 is on thread 1's stack =
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.