Segmentation fault issues

Andrew_SSTownley · May 26, 2009, 3:13pm

Hi Folks,

Hitting this again, but no good way to tell why now (was working 10
min ago and all yesterday just fine). Application is now fairly complex
and uses several other libraries. I do not believe that:

[BUG] Segmentation fault ruby 1.8.6 (2007-09-24) [x86_64-linux]

is an acceptable response given that there’s no way to figure out what
the hell’s really happening.

Does anyone know if this problem is better/worse with other ruby
versions? Would be willing to try anything at this stage, since I’ve
quite a bit invested in both the app and Ruby-GNOME2 at this stage.
However, it’s beginning to make me regret choosing Ruby for this
project–even though I love Ruby and I’d be much further behind without
it.

Frustrated regards,

ast

Andrew S. Townley [email protected]
http://atownley.org

Andrew_SSTownley · May 26, 2009, 4:35pm

I vaguely remember your previous posting on this, but not
any of the details. Can you please give a description of what
is happening to cause the segfault? Can you isolate the
code? Is it a Ruby-Gnome problem or a ruby problem?

Anyway, a segfault means that there is a pointer wonky
somewhere. There’s not much more information that
ruby can give you. You can try with a debug version of
ruby, get a core dump and then figure out where it’s
crashing… but I’m guessing you don’t want to do that

Whether or not a different version of ruby would help would
really depend on what is wrong. If you are using multiple
different extension libraries then it could be in any of them as
well.

The first thing to do is to get as calm as you can and try
to pare down the code until you get the smallest thing
that crashes. Then you can work out whether the bug
is in ruby, a library or even your code (using the libraries
incorrectly could potentially cause a segfault in some
cases).

Unfortunately, at this point I can’t give you anything more than
moral support. Once you know more about your situation
I’ll try to help more Good luck!

      MikeC

Andrew_SSTownley · May 26, 2009, 5:22pm

Hi Mike,

On Tue, 2009-05-26 at 22:30 +0900, Mike C. wrote:

I vaguely remember your previous posting on this, but not
any of the details. Can you please give a description of what
is happening to cause the segfault?

I’d love to be able to do this, but I truly have no idea–and that’s
the core problem.

Can you isolate the
code? Is it a Ruby-Gnome problem or a ruby problem?

I don’t have trouble unless I’m using the UI integration code. It is
somehow related to the way ruby swaps stack frames to simulate threading
not playing well with GTK+, but I’ve tried several times to isolate the
code that has issues so far without any real success. There’s still
something strange going on at a level that I really don’t seem to be
able to control, mutexes or not, queuing everything into the gtk main
thread or not. While obviously something isn’t quite right, I’ve been
through the code several times ensuring that I’m following all of the
available guidelines.

Anyway, a segfault means that there is a pointer wonky
somewhere. There’s not much more information that
ruby can give you. You can try with a debug version of
ruby, get a core dump and then figure out where it’s
crashing… but I’m guessing you don’t want to do that

I’ve actually been down this route too. I installed the debug symbols
for ruby, but even with the backtrace in gdb, I wasn’t really getting
any useful information. I’d also given up on this approach, but I’ll
try it again after I’m finished with what I’m doing at the moment–if I
can get it to crash. Sometimes it would only crash outside of the
debugger too…

Whether or not a different version of ruby would help would
really depend on what is wrong. If you are using multiple
different extension libraries then it could be in any of them as
well.

Fundamentally, it’s ruby’s threading model that’s “wrong” in this case
because it doesn’t play nicely with POSIX threads. I’m slowly migrating
some of the functionality to JRuby, but it’s going to take quite a bit
of time to get feature parity, and some things will require me to write
several JNI wrappers there too (which I’d rather not as part of the Java
route was to better support cross-platform usage).

At this point, I think I’m close to hitting a “productivity plateau”
with Ruby. To go further with the GTK+ interface, I’m going to have to
either finish the WebKit port Dan and I worked on, or I’m going to have
to expose the gtkmozembed internal API so I can hook into it via Ruby.
That’s also just one piece of the puzzle, but it’s one of the bigger
ones. At the moment, even there I still end up having to use both
browser implementations because some of the plug-ins I need don’t work
equally well in both environments.

The first thing to do is to get as calm as you can and try
to pare down the code until you get the smallest thing
that crashes. Then you can work out whether the bug
is in ruby, a library or even your code (using the libraries
incorrectly could potentially cause a segfault in some
cases).

Thanks for the pointers. All good advice. However, in 15 years of
writing software professionally, I don’t think I’ve really worked with
tools that were this frustrating–except some forced 3rd-party
integration work that required me to coerce VBA/VB6 into real OOP design
patterns (I never want to do anything like trying to implement the
Visitor pattern in “classic” VB again… it did finally work, though).

Even with C++ and Java CORBA libraries (incl. some open source ones), it
was easier to track down strange issues like this than it seems to be
using the Ruby-GNOME2 combination.

Believe me, I do want this to work because I’ve been working on the
Ruby-GNOME2 part of this system for nearly a year now, and because of
what it is, the system is strategically important to our organization.

I’ve currently a good bit of instrumentation in the code, but the
problems are intermittent. Literally, it will crash like clockwork 6
times in a row, then you come back to it later - with the same data
set - and it will work just fine. Try it with an empty data-set, and it
works just fine. Try it with a bigger data-set and it works just fine.

I’ve faced these “random crasher” issues in the past with other systems
and other technologies, but you could always find something in gdb, a
stack trace, log file or something that would help you find it. The
fact that I need to leverage so many different components/libraries in
a single application is partially because this application is doing a
lot of complex things.

However, I’m beginning to think it’s gotta be like the car where all of
the parts are within individual tolerances, but when you put them all
together, they are enough out of whack in incompatible ways that the
whole car shakes when you drive it (true story, btw). Part of the point
of trying to leverage these things is so that I don’t have to learn the
internals of each and every one of them in order to build the whole
system. So much for “component based software”

Unfortunately, at this point I can’t give you anything more than
moral support. Once you know more about your situation
I’ll try to help more Good luck!

I do appreciate your empathy, moral support and efforts to try and help
– even if it might not always come across that way.

I’ll have another go with gdb and debugging symbols and adding more
instrumentation. However, it’s now not crashing again (same steps; same
data… sigh), and doing any development/debugging on the system isn’t
really in the schedule for the next couple of months if I can avoid it
at all. I need to spend that time using it instead.

Any other suggestions are welcome too.

Thanks again,

ast

Andrew S. Townley [email protected]
http://atownley.org

Andrew_SSTownley · May 26, 2009, 5:43pm

Hi,

Andrew S. Townley schrieb:

I don’t have trouble unless I’m using the UI integration code. It is
somehow related to the way ruby swaps stack frames to simulate threading
not playing well with GTK+, but I’ve tried several times to isolate the
code that has issues so far without any real success. There’s still
something strange going on at a level that I really don’t seem to be
able to control, mutexes or not, queuing everything into the gtk main
thread or not. While obviously something isn’t quite right, I’ve been
through the code several times ensuring that I’m following all of the
available guidelines.

You have to encapsulate several parts of your code in

Gdk::Threads.enter
Gdk::Threads.leave

blocks, cause Gdk and Gtk are not thread safe. A little bit of
documentation about this, you’ll find at

http://library.gnome.org/devel/gtk-faq/stable/x481.html

I don’t know your project, but in many cases it is simpler not to use
threads at all in Gtk+ programs, but to use an idle handler or timeouts.

HTH
detlef

–
http://det.cable.nu

Andrew_SSTownley · May 26, 2009, 6:06pm

On Tue, 2009-05-26 at 17:42 +0200, Detlef R. wrote:

through the code several times ensuring that I’m following all of the
http://library.gnome.org/devel/gtk-faq/stable/x481.html

I don’t know your project, but in many cases it is simpler not to use
threads at all in Gtk+ programs, but to use an idle handler or timeouts.

I hadn’t seen that guideline. Thanks for the link.

Most of the code in the UI module gets called asynchronously via
callbacks, but not all of it. There’s a couple of places where I use
worker threads to perform some tasks.

How well does Threads.enter/leave interact with Ruby threads?

I know Gdk/GTK+ aren’t thread safe. Most UI toolkits aren’t, but I
never really used GTK+ from any other language than Ruby.

Again, thanks for the tip. I’ll go back through the code to see where
I’m not being invoked via callbacks and see if maybe this isn’t part of
the problem.

Cheers,

ast

Andrew S. Townley [email protected]
http://atownley.org

Andrew_SSTownley · May 26, 2009, 7:46pm

May I join you in your frustration? On exit, sometimes the application
will segfault. If you go up the directory tree in a FileChooserDialog
too far too quickly, it will segfault. Neither of these are killer,
but they are annoying.

For the integration test framework I wrote (Gutkumber), I tried to use
a separate thread to control the application, and I followed all the
official guidelines on threads, but segfaults still happened
semi-randomly (I knew about that threads_enter threads_leave thing,
but thought it didn’t apply if you were using the main_with_queue
technique - still don’t, actually). So I was forced to not use Threads
for that at all, and instead drive a custom main loop myself, and
override Dialog.run to make it all work.

The application I work on (Redcar) is now big enough that I too have
very little luck paring problems down to small examples.

One thing that you could help me with: I’ve tried to use debugging
information myself to sort this out, but I’m not very experienced with
C-land build tools. Could you write down exactly the steps you follow
to try to debug these problems? Let’s try and get that on the wiki
too.

Anyway, I still have hope that Ruby-GNOME2 can get through these
problems. Surely it is so close to being a fantastic Ruby GUI option.
I think if we get enough people who are motivated enough to make sure
these problems get fixed, we can sort it out.

good luck,
Dan

Daniel Benjamin Lucraft

www.daniellucraft.com/blog
twitter.com/danlucraft

2009/5/26 Andrew S. Townley [email protected]:

Andrew_SSTownley · May 27, 2009, 12:05am

As a side note, are there any other examples of large, complex
applications with some or all of these characteristics written with
Ruby-GNOME2? Maybe comparisons will help–even if I can’t share the
code I have.

The Luz editor app[1] is Ruby-GNOME2.

It’s around 10k lines of code including libraries and plugins.
Single-threaded.

I very rarely see segfaults during use. I have seen some during startup
but they are inconsistent and transient.

I hope this is a useful data point.

Best,
-Ian

[1] Luz in Launchpad

Andrew_SSTownley · May 26, 2009, 11:00pm

On Tue, 2009-05-26 at 18:44 +0100, Daniel L. wrote:

May I join you in your frustration?

But of course…

On exit, sometimes the application
will segfault. If you go up the directory tree in a FileChooserDialog
too far too quickly, it will segfault. Neither of these are killer,
but they are annoying.

In my case, I’m currently ensuring that I’ve a recovery mechanism to
deal with crashes and frequently either automatically or manually
persist pending changes to my data store to deal with the event that
things go pear shaped. However, occasionally, I still either loose time
(in the case of waiting for remote resources to load/cache again and/or
waiting for things to magically start working again) or occasionally
data.

While occasional crashes are tolerable when I use the software, they
aren’t acceptable for when others use the software. It needs to be
rock-solid and offer as little trouble as possible to the other types of
users I’m trying to support. One or two crashes and they won’t use it
again because it’s not stable–and rightly so. These would be much more
typical administration/office worker types of users who wouldn’t
understand/appreciate unstable apps. I haven’t gotten things stable
and/or finished enough for this type of user yet, but I also don’t want
it to crash in the middle of a demo for seemingly no apparent reason.

The application I work on (Redcar) is now big enough that I too have
very little luck paring problems down to small examples.

When I do manage to do this, in every case so far encountered, I haven’t
been able to get it to crash in this scenario. Of course, you can’t do
as much, and this is a heavily UI-driven application with lots of events
and nearly-simultaneous updates of multiple controls/views of the same
data.

One thing that you could help me with: I’ve tried to use debugging
information myself to sort this out, but I’m not very experienced with
C-land build tools. Could you write down exactly the steps you follow
to try to debug these problems? Let’s try and get that on the wiki
too.

I’ll do what I can to keep track of what I do. However, as I said,
it’ll probably be some time (e.g. weeks) before I can allocate enough
time to dive back into messing with the code. As I said, at the moment,
I really need it to actually work to do my job.

Anyway, I still have hope that Ruby-GNOME2 can get through these
problems. Surely it is so close to being a fantastic Ruby GUI option.
I think if we get enough people who are motivated enough to make sure
these problems get fixed, we can sort it out.

I think there are a lot of moving parts once you start layering the
core GTK+ widgets and more complex widgets like either Gecko and/or
Mozilla on top. Add in browser plug-ins and trying to off-load
labor-intensive work to thread pools with the appropriately protected
callbacks/queuing to update the GTK+ controls (representing multiple
views of the same data models), and there’s loads going on without the
collision of fundamental threading models encountered when using MRE and
controls/libraries leveraging POSIX threads.

Simple stuff probably doesn’t have this problem, so maybe we’re breaking
new ground here. However, simple stuff doesn’t help me do my job.

As a side note, are there any other examples of large, complex
applications with some or all of these characteristics written with
Ruby-GNOME2? Maybe comparisons will help–even if I can’t share the
code I have.

Cheers,

ast

Andrew S. Townley [email protected]
http://atownley.org

Andrew_SSTownley · May 27, 2009, 12:29am

On Wed, May 27, 2009 at 12:42 AM, Detlef R. [email protected]
wrote:

I don’t know your project, but in many cases it is simpler not to use
threads at all in Gtk+ programs, but to use an idle handler or timeouts.

OK I remember now. Yes, I always got random crashes when I used
threads with Gtk. I second the recommendation to avoid using threads.
If you have background processing to do, but want to keep the
UI active, threads are almost always the wrong solution anyway.

      MikeC

Andrew_SSTownley · May 27, 2009, 12:50am

On Wed, 2009-05-27 at 07:28 +0900, Mike C. wrote:

On Wed, May 27, 2009 at 12:42 AM, Detlef R. [email protected] wrote:

I don’t know your project, but in many cases it is simpler not to use
threads at all in Gtk+ programs, but to use an idle handler or timeouts.

OK I remember now. Yes, I always got random crashes when I used
threads with Gtk. I second the recommendation to avoid using threads.
If you have background processing to do, but want to keep the
UI active, threads are almost always the wrong solution anyway.

On what planet? I don’t want an idle handler, I want n concurrent
background activities that I can individually control. Threads are most
certainly the only coherent way to do this without resorting to rolling
your own lightweight activity manager.

Yes, I have threads because it’s the right thing to do for my
application. They aren’t right for all things, but I’m very curious
about your blanket “almost always the wrong solution” statement. Not
trying to be confrontational here, but can you elaborate please?

Cheers,

ast

Andrew S. Townley [email protected]
http://atownley.org

Andrew_SSTownley · May 27, 2009, 12:52am

On Tue, 2009-05-26 at 15:04 -0700, Ian McIntosh wrote:

I very rarely see segfaults during use. I have seen some during startup
but they are inconsistent and transient.

I hope this is a useful data point.

Hi Ian. Thanks for the reference. Probably not so relevant to this
case since it’s single threaded, but it certainly is a significant
app!

…and if it works for this situation, then perfect.

Thanks again for the pointer.

ast

Andrew S. Townley [email protected]
http://atownley.org

Andrew_SSTownley · May 27, 2009, 1:09pm

From: “Andrew S. Townley” [email protected]

[BUG] Segmentation fault ruby 1.8.6 (2007-09-24) [x86_64-linux]

Perhaps a small chance, but that looks like a rather
old version of 1.8.6. There have been some potential
segfaults fixed in ruby since then.

There was an official release on the 1.8.6 branch
recently, March 31:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/332578

“This time we’ve fixed dozens of bugs, including workarounds for
CVE-2007-1558 and CVE-2008-1447. Many segfaults are also fixed.”

Hope this helps,

Bill

Andrew_SSTownley · May 27, 2009, 9:34am

On Wed, May 27, 2009 at 7:49 AM, Andrew S. Townley [email protected]
wrote:

If you have background processing to do, but want to keep the
UI active, threads are almost always the wrong solution anyway.

On what planet?

Possibly you misunderstood what I intended to say. If you have
background
processing to do and wish to use threads to keep the UI concurrent, then
that is almost always the wrong thing to do. A good example is loading
a file and keeping a cancel button or some such thing. Or if you are
processing
incoming data and wish to manipulate it using the UI as it is being
processed.

If, as you say, you have a multi-threaded application whose threads are
not part of the UI, then of course you have no choice.

The reason it is almost always the wrong thing to do is because
synchronization
with the user’s intent has to happen anyway. Often people try to
simplify
this synchronization leading to strange and subtle bugs. It is almost
always
better to explicitly design it in using something like a reactor
pattern.
There are exceptions, though, but in my experience they are rare
(usually when
the processing is so complex that it really can’t be broken down into
a state pattern).

As to your current problem, my guess is that it is a GTK bug. I noticed
that
crashes started happening when I upgraded GTK a year or so ago. I
wonder
if you could solve your problem by creating an idle method that observes
your
model objects rather than updating them directly. Not ideal, but it
might
work around your current issue.

      MikeC

Andrew_SSTownley · May 28, 2009, 12:59pm

On Wed, 2009-05-27 at 00:48 -0700, Bill K. wrote:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/332578

“This time we’ve fixed dozens of bugs, including workarounds for
CVE-2007-1558 and CVE-2008-1447. Many segfaults are also fixed.”

Hope this helps,

Thanks for the info. Ruby updates haven’t been part of the standard
Hardy update stream I’m trying to keep from changing everything on a
“standard” distro. Have a few things that I need to build/install
manually at the moment anyway (GtkHTML3, WebKit, etc.), so I might try a
newer ruby too.

Cheers,

ast

Andrew S. Townley [email protected]
http://atownley.org

Andrew_SSTownley · May 28, 2009, 1:30pm

Hi Mike,

Thanks for the clarifications.

On Wed, 2009-05-27 at 16:34 +0900, Mike C. wrote:

incoming data and wish to manipulate it using the UI as it is being
processed.

If, as you say, you have a multi-threaded application whose threads are
not part of the UI, then of course you have no choice.

The kind of thing I’m talking about here is more akin to loading web
resources like WebKit, Firefox, etc. You have a pool of worker threads
that help you to more efficiently load/gather information to be
displayed to the user, yet still be aborted–gracefully. Same would go
for long operations like searching through files,
copying/deleting/moving files & directories, etc.

To me, these things are what threaded UIs were designed to address
efficiently, and that’s the way that’s been proven to be a a good way to
solve these problems for nearly 20 years, from OS/2 through to today’s
desktops.

The reason it is almost always the wrong thing to do is because synchronization
with the user’s intent has to happen anyway. Often people try to simplify
this synchronization leading to strange and subtle bugs. It is almost always
better to explicitly design it in using something like a reactor pattern.
There are exceptions, though, but in my experience they are rare (usually when
the processing is so complex that it really can’t be broken down into
a state pattern).

Essentially, what I have is an appserver-like implementation of
basically the Reactor pattern which uses a fixed-size thread pool to
actually manage the handlers (implemented by either a block or an object
with a run() method, e.g. Java’s Runnable). This is glossing over a few
details, but all UI updates are queued using the Gtk.queue to attempt to
ensure that the updates happen in the main GTK+ thread. However, I’ve
observed that this doesn’t always work as advertised.

I’ve done this sort of thing before many times, most recently with
JFC/Swing, so I’m reasonably confident of the approach. However, I may
be missing some aspects of how to actually make it really work given
Ruby’s threading implementation.

So, the threading is “explicitly designed” along the lines of Reactor,
it just allows “concurrent” execution of these as well as Ruby will
allow.

A single-threaded Reactor wouldn’t be an acceptable solution in this
case because the whole reason for performing these operations using
threads is that they may be long-running and/or interrupted, and the
user may want/need to do other things while these tasks are being
performed.

I don’t take this approach where the user shouldn’t do anything while a
long operation is being performed, or where it doesn’t make sense to
cancel an operation, but the above scenarios would be at least 70-80% of
the user interaction with this application.

As to your current problem, my guess is that it is a GTK bug. I noticed that
crashes started happening when I upgraded GTK a year or so ago. I wonder
if you could solve your problem by creating an idle method that observes your
model objects rather than updating them directly. Not ideal, but it might
work around your current issue.

In a way, I hope you’re wrong about the GTK+ bug. If it was in my code,
I could at least fix it. Otherwise, I’m kinda screwed–unless I fix the
bug and it magically appears on all the systems where the app could be
used along with not getting whacked with an auto-updater.

I’ll have to think about whether this would work or not for some of the
cases. I already delay many updates until the control is
activated/exposed or otherwise “poked”.

Again, thanks for the clarification and your suggestions.

Cheers,

ast

Andrew S. Townley [email protected]
http://atownley.org