Hi Mike,
On Tue, 2009-05-26 at 22:30 +0900, Mike C. wrote:
I vaguely remember your previous posting on this, but not
any of the details. Can you please give a description of what
is happening to cause the segfault?
I’d love to be able to do this, but I truly have no idea–and that’s
the core problem.
Can you isolate the
code? Is it a Ruby-Gnome problem or a ruby problem?
I don’t have trouble unless I’m using the UI integration code. It is
somehow related to the way ruby swaps stack frames to simulate threading
not playing well with GTK+, but I’ve tried several times to isolate the
code that has issues so far without any real success. There’s still
something strange going on at a level that I really don’t seem to be
able to control, mutexes or not, queuing everything into the gtk main
thread or not. While obviously something isn’t quite right, I’ve been
through the code several times ensuring that I’m following all of the
available guidelines.
Anyway, a segfault means that there is a pointer wonky
somewhere. There’s not much more information that
ruby can give you. You can try with a debug version of
ruby, get a core dump and then figure out where it’s
crashing… but I’m guessing you don’t want to do that 
I’ve actually been down this route too. I installed the debug symbols
for ruby, but even with the backtrace in gdb, I wasn’t really getting
any useful information. I’d also given up on this approach, but I’ll
try it again after I’m finished with what I’m doing at the moment–if I
can get it to crash. Sometimes it would only crash outside of the
debugger too… 
Whether or not a different version of ruby would help would
really depend on what is wrong. If you are using multiple
different extension libraries then it could be in any of them as
well.
Fundamentally, it’s ruby’s threading model that’s “wrong” in this case
because it doesn’t play nicely with POSIX threads. I’m slowly migrating
some of the functionality to JRuby, but it’s going to take quite a bit
of time to get feature parity, and some things will require me to write
several JNI wrappers there too (which I’d rather not as part of the Java
route was to better support cross-platform usage).
At this point, I think I’m close to hitting a “productivity plateau”
with Ruby. To go further with the GTK+ interface, I’m going to have to
either finish the WebKit port Dan and I worked on, or I’m going to have
to expose the gtkmozembed internal API so I can hook into it via Ruby.
That’s also just one piece of the puzzle, but it’s one of the bigger
ones. At the moment, even there I still end up having to use both
browser implementations because some of the plug-ins I need don’t work
equally well in both environments.
The first thing to do is to get as calm as you can and try
to pare down the code until you get the smallest thing
that crashes. Then you can work out whether the bug
is in ruby, a library or even your code (using the libraries
incorrectly could potentially cause a segfault in some
cases).
Thanks for the pointers. All good advice. However, in 15 years of
writing software professionally, I don’t think I’ve really worked with
tools that were this frustrating–except some forced 3rd-party
integration work that required me to coerce VBA/VB6 into real OOP design
patterns (I never want to do anything like trying to implement the
Visitor pattern in “classic” VB again… it did finally work, though).
Even with C++ and Java CORBA libraries (incl. some open source ones), it
was easier to track down strange issues like this than it seems to be
using the Ruby-GNOME2 combination.
Believe me, I do want this to work because I’ve been working on the
Ruby-GNOME2 part of this system for nearly a year now, and because of
what it is, the system is strategically important to our organization.
I’ve currently a good bit of instrumentation in the code, but the
problems are intermittent. Literally, it will crash like clockwork 6
times in a row, then you come back to it later - with the same data
set - and it will work just fine. Try it with an empty data-set, and it
works just fine. Try it with a bigger data-set and it works just fine.
I’ve faced these “random crasher” issues in the past with other systems
and other technologies, but you could always find something in gdb, a
stack trace, log file or something that would help you find it. The
fact that I need to leverage so many different components/libraries in
a single application is partially because this application is doing a
lot of complex things.
However, I’m beginning to think it’s gotta be like the car where all of
the parts are within individual tolerances, but when you put them all
together, they are enough out of whack in incompatible ways that the
whole car shakes when you drive it (true story, btw). Part of the point
of trying to leverage these things is so that I don’t have to learn the
internals of each and every one of them in order to build the whole
system. So much for “component based software” 
Unfortunately, at this point I can’t give you anything more than
moral support. Once you know more about your situation
I’ll try to help more
Good luck!
I do appreciate your empathy, moral support and efforts to try and help
– even if it might not always come across that way.
I’ll have another go with gdb and debugging symbols and adding more
instrumentation. However, it’s now not crashing again (same steps; same
data… sigh), and doing any development/debugging on the system isn’t
really in the schedule for the next couple of months if I can avoid it
at all. I need to spend that time using it instead.
Any other suggestions are welcome too.
Thanks again,
ast
Andrew S. Townley [email protected]
http://atownley.org