Self-pipe for Ruby threads pushing blocks to main loop

Drake_W · November 16, 2007, 6:00pm

It seems like a common problem when using Ruby-GNOME2 (particularly
GTK+) is having other Ruby threads be able to signal back to the main
loop to run some GTK+ code or other stuff that needs to be done in the
main thread. Even Gtk.idle_add and the like seem to not be completely
safe with Ruby threads, because GTK+ doesn’t know about the Ruby
threads at all.

An obvious interface to have would be a way of pushing blocks to the
main loop; I see people in the mailing list archives doing this, but
either they do it with timeouts (which results in continual polling,
which is poor design) or… what? I don’t see people obviously doing
it any other way.

The first obvious way to wake up the main thread would be some kind of
plain interthread wakeup call, but there doesn’t seem to be a way to
do rb_thread_select and atomically release a mutex, so there’s no way
to reliably get the wakeup signal to the main thread that I can see.

So instead, it looks like a slightly better/worse functional
solution—which I think someone proposed earlier, but I’m not sure
when—is to use a pipe that automatically gets added into the select,
and write a byte to it whenever we push a new function for the main
loop to call.

I have attached an experimental diff against Ruby-GNOME2 SVN that adds
the function GLib.main_thread_do_async { … } that pushes Proc
objects to the main thread in what looks to be the proper thread-safe
way, and uses a pipe to wake it up.

Does this seem like a basically reasonable thing to add to
Ruby-GNOME2? (Not necessarily in the exact form attached, of course.)

Problems and questions:

It seems like it should be possible to do this in Ruby with
a GLib::IOChannel with a watch or something, but I tried this, and
it didn’t seem to work; it sometimes hung or crashed. I’m not sure
why.
Maybe we can’t create the pipe on startup. Is it reasonable to
expect to be able to do this?
This patch also removes the constant empty timeouts, to make sure
that the functionality still works without them. Really these
should go away too, if it is possible, but does this break
backwards compatibility? It seems that with the absence of a way
to get messages to the main thread, many extant Ruby-GNOME2 programs
will rely on the timeout kludge to keep the main loop iterating
constantly and just ignore the unsafe thread usage.
The pipe write can block. This isn’t so bad, but it should probably
be avoided in an async function. This can be avoided by using a bit
more rb_thread_critical and only writing new bytes when the previous
ones have not been acknowledged yet, but I haven’t written this yet.
Sometimes we may actually want the async pushes to block; if the
GLib
main loop gets hung somehow, the pushes will just fill the queue up
with procs forever. Maybe that’s a sign of an incorrect program
anyway, though, so it doesn’t matter.

A sample program which seems to basically work in the proper
thread-safe manner after this patch is applied is also attached.

What do people think of this?

—> Drake W.

Drake_W · November 16, 2007, 8:40pm

Drake W. wrote:

An obvious interface to have would be a way of pushing blocks to the
main loop; I see people in the mailing list archives doing this, but
either they do it with timeouts (which results in continual polling,
which is poor design) or… what? I don’t see people obviously doing
it any other way.
I have to agree that having a timeout running all the time in the
background is really not ideal…

I came up with the attached solution. It starts the timeout when
necessary and stops it when it is no longer necessary, automatically.
I’ve quickly tested this solution in my app and it seems to work.

Some remarks and questions:

It may be best to use module GLib instead of module Gtk
Since I’m checking the number of running threads in the timeout, is it
subject to race conditions?
I’m protecting Gtk::Thread.number with a mutex but I’m not sure
whether it’s necessary or not

What do you think Drake? And you Guillaume? Anyone?

Thanks,
Mathieu

Drake_W · November 16, 2007, 9:17pm

Quoth Mathieu B. [email protected], on 2007-11-16 20:45:19
+0100:

I have to agree that having a timeout running all the time in the
background is really not ideal…

I came up with the attached solution. It starts the timeout when necessary
and stops it when it is no longer necessary, automatically. I’ve quickly
tested this solution in my app and it seems to work.

This is still not good, in my opinion, especially because it doesn’t
just “start the timeout when necessary”. When it’s “necessary” should
be when there’s stuff that needs to be done, not just when arbitrary
other threads are running. If I use this, and I have six threads
blocked on network connections, plus the main thread which is blocked
on user activity, I’m still waking up every hundred milliseconds.
This isn’t acceptable. Moreover, if a network event comes in that
triggers a UI update, I’m still waiting up to a hundred milliseconds
before the block that I pushed to the main loop runs, for no reason at
all.

This only provides any benefit for an application that doesn’t use
Ruby threads, really. Once you’re using threads, more likely than not
you’re going to have a few of them sitting idle at any given time, so
the timeout is running anyway and it’s not any better than the
original kludgey solution.

This also requires the user of threads to initialize them with
Gtk::Thread rather than Thread, so it doesn’t do any better for
backwards compatibility either.

Some remarks and questions:

It may be best to use module GLib instead of module Gtk

Since I’m checking the number of running threads in the timeout, is it
subject to race conditions?

I’m protecting Gtk::Thread.number with a mutex but I’m not sure whether
it’s necessary or not

What do you think Drake? And you Guillaume? Anyone?

The whole idea of using delays and polling for triggers like this is
broken. We need a way to push events of some type into the main
loop. It doesn’t even have to be pushing blocks, though that’s the
most convenient way. Right now I don’t even see that the primitive
even exists. The closest thing I can see is a one-time Gtk.idle_add,
but this is not thread-safe because the main thread might have been
suspended in the middle of a GTK+ call.

This is why I proposed adding the primitive in C; there’s no good way
to get at the necessary main loop hackage from the Ruby side, as far
as I can tell.

If the primitive does exist, and I just didn’t know about it, please
tell
me about it; I’d be happy to be proven wrong here.

Thanks,
Mathieu

—> Drake W.

Drake_W · November 16, 2007, 9:21pm

so this approach [the ruby only method] still uses up a lot of cpu as
it has that timeout that keeps being called…

I added a ‘sleep 0.001’ to simulate something happening more async
like…

I have created my own, much simpler approach, it spawns a lot of
timeout events but it doesn’t peg the cpu.

Comments?

-Alex

Drake_W · November 16, 2007, 9:31pm

i guess this is no good if Gtk.timeout_add is not thread safe…
I personally like Drake’s proposal, I’m just curious about
alternatives… especially since I’d like to make software that, while
it could utilize future enhancements like Drake proposed, could
hopefully run using the packages that most linux distributions have
now.
-Alex

Drake_W · November 16, 2007, 9:55pm

Quoth A. [email protected], on 2007-11-16 12:31:07 -0800:

i guess this is no good if Gtk.timeout_add is not thread safe…
I personally like Drake’s proposal, I’m just curious about
alternatives… especially since I’d like to make software that, while
it could utilize future enhancements like Drake proposed, could
hopefully run using the packages that most linux distributions have
now.

Programs that want to use a timer-based workaround if real
asynchronous operation is not available (and don’t mind the increased
CPU usage and general semantic breakage) can test for the presence of
the real stuff and start the timer-and-queue song-and-dance if the
method doesn’t already exist:

module GLib
unless respond_to? :main_thread_async_do then
@async_do_mutex = Mutex.new
@async_do_queue = []

  def self.main_thread_async_do(&block)
    @async_do_mutex.synchronize {
      @async_do_queue.push block
    }
  end

  def self.main_thread_async_do_poll()
    @async_do_mutex.synchronize {
      @async_do_queue.slice!(0, @async_do_queue.length)
    }.each { |block| block.call() }
  end
end

end

Then later, if you know a GTK+ main loopis in use:

if GLib.respond_to? :main_thread_async_do_poll then
Gtk.timeout_add(100) { GLib.main_thread_async_do_poll }
end

Or, if you’re using a GLib main loop for something else, you can add
whatever other timer you need that calls that. (The above is largely
untested, and is mostly to show the flavor of how the compatibility
shim would work.)

Incidentally, I’m still curious as to the purpose of the presently
existing 100-ms empty timeout that is automatically started in the
current Ruby-GNOME2 C code anyway. It seems to be a shim of some sort
to keep the main loop iterating, but for what? So that unsafe UI
updates from other threads still sort-of-work (i.e., redraw) even
though they’ll sometimes crash anyway?

-Alex

—> Drake W.

Drake_W · November 17, 2007, 4:50am

Drake W. wrote:

Programs that want to use a timer-based workaround if real
asynchronous operation is not available (and don’t mind the increased
CPU usage and general semantic breakage) can test for the presence of
the real stuff and start the timer-and-queue song-and-dance
Since you’re using a pipe and select() for polling it, you’re solution
won’t work on Windows. Did you try your solution without using a pipe
and select? You can call all procs in the array and clear the array on
all iterations of the main loop… (It would be interesting to compare
in terms of performance.)

As I understand your patch, this is what happens:

on each g_main_context_iteration of the main loop, g_main_context_poll
is called
g_main_context_poll calls the custom poll function set by Ruby/GLib
with g_main_context_set_poll_func
it polls the pipe with select and calls all the procs as needed

In the end, it doesn’t seem so far from what idle_add does, except maybe
with regards to the priority.

module GLib
def self.thread_protect(&proc)
GLib::Idle::add { proc.call; false }
end
end

or even

module GLib
def self.thread_protect(&proc)
GLib::Idle::add(GLib::PRIORITY_HIGH_IDLE) { proc.call; false }
end
end

(I found out that Gtk.timeout and Gtk.idle_add are deprecated.)

Using it as follows:
Thread.new do
GLib.thread_protect { do_gui_stuff }
end

The possible priorities are HIGH, DEFAULT, HIGH_IDLE, DEFAULT_IDLE and
LOW.

I’ve experimented with the GLib main loop API. Here are a few comments
that may be useful to others. A custom GLib::Source object can be
created and added to the main loop in order to handle one’s own events.
idle and timeout are actually both GLib::Source objects with their own
behavior.I had also tried to do something like in the patch:

mc = GLib::MainContext.default
rd, wr = IO.pipe
fd = GLib::PollFD.new(rd.fileno, GLib::IOChannel::IN |
GLib::IOChannel::HUP | GLib::IOChannel::ERR,0)
src = GLib::Idle::source_new
src.priority = GLib::PRIORITY_DEFAULT
src.attach(mc)
src.add_poll(fd)
src.set_callback do
p “callback”
end

(Ruby/GLib doesn’t implement initialize for GLib::Source yet. Ideally,
initialize should take an array of procs (prepare, query, check,
dispatch) as parameter).

Mathieu B.

Drake_W · November 17, 2007, 12:25pm

Drake W. wrote:

Now, I don’t know how the main loop works on Windows. Presumably
Windows itself has appropriate synchronization and I/O mechanisms, but
I don’t know how to use them, and I wouldn’t be able to test them,
since I don’t have a Windows installation easily available. The
current C implementation there seems to spin every ten milliseconds,
and I don’t really understand how that’s supposed to work…

Looking quickly at glib’s source code, it uses Windows semaphores, which
seem to work like pipes

HANDLE wake_up_semaphore = CreateSemaphore (NULL, 0, 100, NULL);

So hopefully Windows does seem to support halting and waking up the main
loop.

What exactly are you suggesting with this? The exact purpose of the

self-pipe is to avoid having the main loop iterate when it does not
need to iterate. It should block at the OS level (well, Ruby thread
level, really); in this case, on a file descriptor. This means that
when the main loop has nothing to do, it halts until it has something
to do. The question is how to wake it up. How are you suggesting to
wake it up? Or am I missing your point entirely?

Yes you’re right, forget about what I said. BTW, shouldn’t using a
custom poll function be avoided? Using the add_poll function either in a
GSource or GMainContext would allows to remain portable thanks to the
underlying glib implementation. Basically here, it seems like Ruby/GLib
and your patch are reimplementing bits of glib. And it’s hardcoded.

And because idle_add doesn’t seem to be guaranteed to be safe with

Ruby threads—that’s the main reason idle_add doesn’t seem to work
here. Or is it safe after all? It didn’t seem to be; when I tried
stress-testing it, it crashed, but maybe that’s from a bug somewhere
else. I have difficulty seeing how it could be safe though, given
the nature of Ruby threads.

Can you explain why your solution is more thread-safe?

Going further with the idle_add solution, I think it is necessary to
protect with a mutex both the GLib.idle_add call and the proc call
issued by GLib.idle_add call. As I understand it, it may be dangerous to
issue several GLib.idle_add or call the passed procs from separate
threads without any protection. Something in this fashion:

module GLib
MUTEX = Mutex.new
def self.thread_protect(&proc)
MUTEX.synchronize do
GLib::Idle.add( GLib::PRIORITY_HIGH_IDLE) {
MUTEX.synchronize { proc.call; false } }
end
end
end

I think this is another thing that your patch is missing: you should
protect the procs array with a mutex.

Ruby/Gtk is a binding so everything deprecated in GTK is deprecated in
Ruby/Gtk as well.

This looks similar to what I tried before in Ruby, but fancier. Does
this actually work for you? Does it actually block properly? If so,
then that sounds better than patching at the C level.

I’ve not tested it. I’ve spent quite some reading the API, glib’s
gmain.c and Ruby/GLib’s source so I wanted to keep track of how I
understand all the API glues together.But theoretically, the above
snippet could work with a bit more work.

Interestingly, it seems like pygtk has similar problems. I’ve just found
this ticket:
http://dev.laptop.org/ticket/4680

Mathieu B.

Drake_W · November 17, 2007, 3:24pm

Quoth Mathieu B. [email protected], on 2007-11-17 12:30:47
+0100:

Looking quickly at glib’s source code, it uses Windows semaphores, which
seem to work like pipes

HANDLE wake_up_semaphore = CreateSemaphore (NULL, 0, 100, NULL);

So hopefully Windows does seem to support halting and waking up the main
loop.

Then that would be the thing to do if G_OS_WIN32 is defined.
Integrating that with Ruby is the hard part, I imagine. You’d have to
attach the semaphore to the Ruby thread scheduler (which I see no
obvious way to do), or else play other tricks somehow.

Yes you’re right, forget about what I said. BTW, shouldn’t using a
custom poll function be avoided?

(The custom poll function was there before; I just added code to it.
But you probably knew that.)

Using the add_poll function either in a
GSource or GMainContext would allows to remain portable thanks to the
underlying glib implementation.

The custom poll function is so that Ruby threads can be dispatched
properly using rb_thread_select, as I understand it. Otherwise you’d
have to do fancy tricks to make sure that the Ruby scheduler can still
run when other threads are active even if the main loop is idle, and
probably hook the creation of IO objects to make sure GLib knows about
them, and… wrgh.

And because idle_add doesn’t seem to be guaranteed to be safe with
Ruby threads—that’s the main reason idle_add doesn’t seem to work
here. Or is it safe after all? It didn’t seem to be; when I tried
stress-testing it, it crashed, but maybe that’s from a bug somewhere
else. I have difficulty seeing how it could be safe though, given
the nature of Ruby threads.

Can you explain why your solution is more thread-safe?

I was thinking of a situation where, say, the main thread enters
idle_add (or something similar), then the Ruby scheduler switches to a
worker thread, which then also enters idle_add at the same time, thus
potentially trampling on the GLib data structures. (GLib wouldn’t be
able to do locking in this case because it has no idea about the Ruby
scheduler.)

Now that I look more closely, that may not actually occur, if Ruby
never switches contexts in the middle of a C call. Though I’m still
not really convinced that the chain of invariants guarantees that
calling idle_add from more than one Ruby thread is safe.

Going further with the idle_add solution, I think it is necessary to
protect with a mutex both the GLib.idle_add call and the proc call
issued by GLib.idle_add call. As I understand it, it may be dangerous to
issue several GLib.idle_add or call the passed procs from separate
threads without any protection.

What purpose does the mutex serve, unless you’re protecting every
call to idle_add with it? Of course, you can do that (by changing the
definition of idle_add). You’d also have to mutex anything else that
accesses the same data structures (which I don’t know). Or it may be
that Ruby treats these C calls as atomic, in which case you may not
need a mutex at all.

Now that I think about it, that still doesn’t help the problem of
waking up the main loop without ultimately using a polling timer
somewhere, since the current behavior of pushing blocks using idle_add
seems to rely on the extant internal constant empty timer function.

I think this is another thing that your patch is missing: you should
protect the procs array with a mutex.

The procs array is already protected by judicious use of
rb_thread_critical, since the critical sections are so small.

Ruby/Gtk is a binding so everything deprecated in GTK is deprecated in
Ruby/Gtk as well.

Then we need to update the docs, I suppose.

I’ve not tested it. I’ve spent quite some reading the API, glib’s
gmain.c and Ruby/GLib’s source so I wanted to keep track of how I
understand all the API glues together.But theoretically, the above
snippet could work with a bit more work.

Okay.

Interestingly, it seems like pygtk has similar problems. I’ve just found
this ticket:
http://dev.laptop.org/ticket/4680

That would be because this is a general problem integrating different
pieces of code that want to catch I/O and signals and the like in the
same process.

Mathieu B.

—> Drake W.

Drake_W · November 17, 2007, 4:58pm

Quoth Mathieu B. [email protected], on 2007-11-17 16:18:24
+0100:

Drake W. wrote:

I was thinking of a situation where, say, the main thread enters
idle_add (or something similar), then the Ruby scheduler switches to a
worker thread, which then also enters idle_add at the same time, thus
potentially trampling on the GLib data structures. (GLib wouldn’t be
able to do locking in this case because it has no idea about the Ruby
scheduler.)

And that’s about the same problem when you do GTK stuff that triggers a
signal from within a thread.

I thought that was why you weren’t allowed to do any GTK+ stuff inside
worker threads directly, no? You have to push it all to the main
thread,
including emitting signals on GTK+ objects and such.

Now that I think about it, that still doesn’t help the problem of
waking up the main loop without ultimately using a polling timer
somewhere, since the current behavior of pushing blocks using idle_add
seems to rely on the extant internal constant empty timer function.

I’ve thought about this too but then what would be the point of idle_add
if it does nothing when the main loop is… idle? It is probably
possible to write a small test-case in C to see what happens without the
constant empty timer function.

I thought the original point of idle sources was more to do
multiplexing of background tasks with events without using
threads—make the background task restartable and then have each
iteration of it be carried out using an idle source, then remove the
idle source when the background task is done.

A small test case in C says that GLib does know how to wake up a
native thread in a main loop when another native thread adds an idle
source to the loop. I wouldn’t expect this to carry over to Ruby
threads, though; the Ruby thread is going to be blocked in
rb_thread_select, and the only way to wake it up is to go through
Ruby.

(Actually, an strace shows that on this system (GNU/Linux amd64), GLib
is waking up the main thread through a pipe anyway! Unfortunately, we
can’t easily get access to it. If it’s already there it’ll show up in
the select call, and that’ll integrate into the Ruby scheduler, but
the mechanism is sort of buried in the GLib internals; we’d probably
have to trick GLib into thinking we were in a different thread somehow.
Maybe we could actually create a different native thread for just
that purpose, but that seems a bit ugly.)

Anyway, it seems like there isn’t one clear answer to the problem which
makes me think that it’s best to implement workarounds in pure Ruby and
ship them with applications.

That’s necessary anyway, of course, to be compatible with
already-released versions of Ruby-GNOME2, but that doesn’t take the
place of doing it right.

Mathieu B.

—> Drake W.

Drake_W · November 17, 2007, 5:54am

Quoth Mathieu B. [email protected], on 2007-11-17 04:55:08
+0100:

Drake W. wrote:

Programs that want to use a timer-based workaround if real
asynchronous operation is not available (and don’t mind the increased
CPU usage and general semantic breakage) can test for the presence of
the real stuff and start the timer-and-queue song-and-dance

Since you’re using a pipe and select() for polling it, you’re solution
won’t work on Windows.

This sounds true. I thought I’d conditionalized the init on
G_OS_WIN32, and noted that in my original message, but I see now that
I forgot to do both of those. Good catch.

Now, I don’t know how the main loop works on Windows. Presumably
Windows itself has appropriate synchronization and I/O mechanisms, but
I don’t know how to use them, and I wouldn’t be able to test them,
since I don’t have a Windows installation easily available. The
current C implementation there seems to spin every ten milliseconds,
and I don’t really understand how that’s supposed to work…

Did you try your solution without using a pipe and select? You can
call all procs in the array and clear the array on all iterations of
the main loop… (It would be interesting to compare in terms of
performance.)

What exactly are you suggesting with this? The exact purpose of the
self-pipe is to avoid having the main loop iterate when it does not
need to iterate. It should block at the OS level (well, Ruby thread
level, really); in this case, on a file descriptor. This means that
when the main loop has nothing to do, it halts until it has something
to do. The question is how to wake it up. How are you suggesting to
wake it up? Or am I missing your point entirely?

As I understand your patch, this is what happens:

on each g_main_context_iteration of the main loop, g_main_context_poll
is called

g_main_context_poll calls the custom poll function set by Ruby/GLib
with g_main_context_set_poll_func

it polls the pipe with select and calls all the procs as needed

In the end, it doesn’t seem so far from what idle_add does, except maybe
with regards to the priority.

And because idle_add doesn’t seem to be guaranteed to be safe with
Ruby threads—that’s the main reason idle_add doesn’t seem to work
here. Or is it safe after all? It didn’t seem to be; when I tried
stress-testing it, it crashed, but maybe that’s from a bug somewhere
else. I have difficulty seeing how it could be safe though, given
the nature of Ruby threads.

(I found out that Gtk.timeout and Gtk.idle_add are deprecated.)

Hmm, that’s not in the Ruby-GNOME2 API docs.

Okay, I see it’s in the GTK+ C API docs. Is this known to apply to
Ruby-GNOME2 also, or does the API translation in the middle
un-deprecate it? Probably the former, I suppose, which means the
Ruby-side docs should be updated.

src = GLib::Idle::source_new
src.priority = GLib::PRIORITY_DEFAULT
src.attach(mc)
src.add_poll(fd)
src.set_callback do
p “callback”
end

This looks similar to what I tried before in Ruby, but fancier. Does
this actually work for you? Does it actually block properly? If so,
then that sounds better than patching at the C level.

(Ruby/GLib doesn’t implement initialize for GLib::Source yet. Ideally,
initialize should take an array of procs (prepare, query, check,
dispatch) as parameter).

Mathieu B.

—> Drake W.

Drake_W · November 20, 2007, 2:36pm

Since I’m checking the number of running threads in the timeout, is it
subject to race conditions?

I’m protecting Gtk::Thread.number with a mutex but I’m not sure
whether it’s necessary or not

What do you think Drake? And you Guillaume? Anyone?

As stated by Drake, I think this approach is not much different from
the timeout always running. It improves a bit, granted, and I’m glad
to see you improved

I have not explored all the consequences of your solution, but I think
you may have a race condition - it depends on how Ruby’s Thread class
implements the #list and friends, but I’m wondering if you would not
create two timeouts when a created Gtk::Thread immediately creates
another Gtk::Thread.

–
Guillaume C. - Guillaume Cottenceau

Drake_W · November 17, 2007, 4:13pm

Drake W. wrote:

I was thinking of a situation where, say, the main thread enters
idle_add (or something similar), then the Ruby scheduler switches to a
worker thread, which then also enters idle_add at the same time, thus
potentially trampling on the GLib data structures. (GLib wouldn’t be
able to do locking in this case because it has no idea about the Ruby
scheduler.)

And that’s about the same problem when you do GTK stuff that triggers a
signal from within a thread. As Ruby switches to the main thread
(containing the main loop),it may run the block associated with the
triggered signal too early (without the GUI stuff to be actually
finished) and the data structures would be missing or something.

Now that I think about it, that still doesn’t help the problem of
waking up the main loop without ultimately using a polling timer
somewhere, since the current behavior of pushing blocks using idle_add
seems to rely on the extant internal constant empty timer function.

I’ve thought about this too but then what would be the point of idle_add
if it does nothing when the main loop is… idle? It is probably
possible to write a small test-case in C to see what happens without the
constant empty timer function.

Anyway, it seems like there isn’t one clear answer to the problem which
makes me think that it’s best to implement workarounds in pure Ruby and
ship them with applications. In my case, I need threads to update the
user interface while doing blocking IO. I don’t perform any CPU
intensive task. Without protection, when signals are triggered from
within a thread, my application crashes very often. With both the timer
based solution or the idle_add solution, it seems to work just fine. It
doesn’t mean the problem is really fixed but at least, I don’t see my
application crashing anymore…

Mathieu B.

Drake_W · November 20, 2007, 5:06pm

Hi Drake,

So instead, it looks like a slightly better/worse functional
solution—which I think someone proposed earlier, but I’m not sure
when—is to use a pipe that automatically gets added into the select,
and write a byte to it whenever we push a new function for the main
loop to call.

I have attached an experimental diff against Ruby-GNOME2 SVN that adds
the function GLib.main_thread_do_async { … } that pushes Proc
objects to the main thread in what looks to be the proper thread-safe
way, and uses a pipe to wake it up.

IMHO, your approach seems very promising. I think it’s a much more
elegant way of solving the problem - except that there might be some
adjustments to make it work with Windows or other OS backends, as
Mathieu pointed out, but that’s peripheral.

While rapidly looking at your patch, I noticed one peculiar thing:
you’re adding your pipe file descriptor to the polled file
descriptors; while it’s of course needed to make your solution work, I
think it makes the final return of the custom poll func slightly
“broken” because such file descriptor is unknown to GTK which is
calling the custom poll func. Your pipe file descriptor should
probably be removed from the changed file descriptors list after
rb_thread_select is finished, and another rb_thread_select invoked
instead of returning to GTK if only your pipe file descriptor was
inside the changed file descriptors lists.

Also, is using rb_thread_critical really needed? It forbids other Ruby
threads, totally unrelated to rg2, to be scheduled, whereas what you
probably want to avoid here is only race conditions in your list of
async procs.

–
Guillaume C. - Guillaume Cottenceau

Drake_W · November 20, 2007, 4:03pm

On Nov 16, 2007 9:54 PM, Drake W. [email protected] wrote:

Incidentally, I’m still curious as to the purpose of the presently
existing 100-ms empty timeout that is automatically started in the
current Ruby-GNOME2 C code anyway. It seems to be a shim of some sort
to keep the main loop iterating, but for what? So that unsafe UI
updates from other threads still sort-of-work (i.e., redraw) even
though they’ll sometimes crash anyway?

As far as I understand the code, it seems to be the core of the
solution found to the impossible cooperation between ruby’s mainloop
(the thing which schedules ruby threads) and glib’s mainloop (the
thing which triggers timeouts, idles, emit signals). As far as I know,
there is no native collaboration mechanism between the two’s, and
that’s the core of our problem; I guess other bindings may have
similar problems… indeed Mathieu pointed out the pygtk ticket[1],
which would indicate they’re using a timeout, but that seems to be for
a slightly different problem: it is a problem with kernel signals (e.g
SIGINT, SIGSTOP, SIGCONT etc) and native threads - with rg2 we don’t
use native threads, because Ruby doesn’t use native threads (AFAIK).
Actually, collaboration between gtk and application threads may end up
being very different if involved threads are native (py) or at
interpreter level (rb).

Unfortunately, this problem is going to get worse and worse, because
processor occupancy during idle times is getting tracked more closer
now that environment friendliness is a hype.

Now, to answer more precisely your question, we have a problem: most
of the main developers of rg2 seem to have too little time to get
involved in the discussions, at least in this ML (I know that many
japanese developers also say that they have trouble reading/writing
english); and when it goes so deeply to tough problems, it is
sometimes hard to understand fully what’s going on only from the
source code. When I looked in the source code for solving my threading
problem, rg2 was setting a custom poll function, and installed an
empty timeout, with this source code comment:

/* This forces the custom g_poll function to be called
 * with a minimum timeout of 100ms so that the GMainLoop
 * iterates from time to time even if there is no event.
 * Another way could be to add a wakeup pipe to the selectable
 * fds and than wake up the select only when needed.
 */
g_timeout_add(100, empty_timeout_func, NULL);

Here’s how I understood the problem back then:

so that Ruby threads are scheduled, a custom poll function was used
(e.g. GTK calls a custom poll function instead of the normal OS poll
call, when its mainloop blocks on a list of file descriptors to wait
for events); this poll function used rb_thread_select instead of the
native poll (or select) call, and rb_thread_select actually schedules
other Ruby threads when they have work to do, instead of just waiting
on the specified file descriptors list (which would prevent ruby
threads from being scheduled)
incidentally, this created problems when some GTK code is called not
from the main Ruby thread (this is a complicated matter which is
already explained in one of my previous mails), which is why GTK code
must be called only from the main Ruby thread

Now, that doesn’t explain the needs for this empty timeout. As often
with Ruby, I have problems finding the API documentation, this time
for rb_thread_select :/. Last time I asked for assistance here about
Ruby API documentation, Masao indicated that README.EXT from ruby
tarball is often useful to extension developers, but it seems to not
document rb_thread_select. An assumption can be that rb_thread_select
will not necessarily schedule all the threads, or that non main
threads will somehow “lock” the main thread, hence the need to pump an
event into the file descriptors lists to force rb_thread_select to
return - but that doesn’t really make sense anyway…

Now, according to rg2’s svn, Kouhei recently switched from the custom
poll function approach, to installing a GSource (which calls
rb_thread_schedule from the “prepare” callback).

Working file: rbglib_mainloop.c
r2707 | ktou | 2007-11-17 04:50:39 +0100 (Sat, 17 Nov 2007) | 2 lines

src/rbglib_mainloop.c: used GSource not poll func overriding.

I see that the “prepare” callback currently sets the returned timeout
value to 1 millisecond, which would indicate that the source tells GTK
that it needs to be rechecked after 1 millisecond only. If I’m
understanding this correctly (trunk untested), it means rg2 is almost
doing active-wait now

Actually,

seems to be a worthwhile read to understand the big picture of
GSource. There’s even some explanation geared at integrating the GTK
mainloop to an external mainloop, that may be of some help for rg2?

Kouhei, would you be so kind in participating to the discussion, at
least explaining what’s your modification, what problem are you trying
to solve, and its impact on multi-threading in rg2 applications if you
know it? In the current situation, where I guess some discussion
occurs in the japanese ML only, it is very frustrating and
disappointing, because on this ML, we are left in the dark… I think
that more and more non japanese people interested in ruby/gtk will be
more and more frustrated, if we are only left as powerless spectators
in the actual rg2 development. I can even see that this subject was
discussed between you and Masao in japanese:

http://sourceforge.net/mailarchive/forum.php?thread_name=20071112.235142.1141224146.kou%40cozmixng.org&forum_name=ruby-gnome2-devel-ja

I think that the status of non japanese developers/contributors on rg2
should be clarified. It is not fair that we are considered second-tier
citizens, or then let’s just consider rg2 is a japanese developers
only project and we’ll be able to make decisions according to that
fact. You see that here, this thread discussed that matter; if you
make some related source code modifications, we should not be left
outside.

Thanks!

[1] http://dev.laptop.org/ticket/4680

–
Guillaume C. - Guillaume Cottenceau

Drake_W · November 20, 2007, 9:16pm

Guillaume C. wrote:

use native threads, because Ruby doesn’t use native threads (AFAIK).
Actually, collaboration between gtk and application threads may end up
being very different if involved threads are native (py) or at
interpreter level (rb).

Ruby 1.8 uses uses green threads (non-native threads) but Ruby 1.9 is
going to use native threads and indeed, it may bring up new kinds of
bugs in our applications…

Mathieu

Drake_W · November 20, 2007, 11:22pm

Quoth Guillaume C. [email protected], on 2007-11-20 16:58:55
+0100:

While rapidly looking at your patch, I noticed one peculiar thing:
you’re adding your pipe file descriptor to the polled file
descriptors; while it’s of course needed to make your solution work, I
think it makes the final return of the custom poll func slightly
“broken” because such file descriptor is unknown to GTK which is
calling the custom poll func.

I think you are right. I will make a note of that. (Not that it will
matter very much if this approach ultimately isn’t taken in the first
place.)

Also, is using rb_thread_critical really needed? It forbids other Ruby
threads, totally unrelated to rg2, to be scheduled, whereas what you
probably want to avoid here is only race conditions in your list of
async procs.

Well, note that the critical sections in use are very small:

In push: rb_ary_push
In retrieve: test, rb_ary_dup, rb_ary_clear

Note that the pushed functions themselves are not run inside the
critical section. The overhead of a separate mutex would make things
slower and slightly more complicated for no real reason, I think,
especially since in many cases a Ruby-GNOME2 program that uses a main
loop at all will tend to be centered around it out of necessity.

Now, if Ruby threads are going to become native threads in later
versions of ruby, as I understand it, then this would not be the way
to do it. But then, that would also change the integration with
GLib’s event loop drastically in any case, requiring one to look at
all the thread interactions again. The custom poll function would
probably go away in that case anyway, for instance, and since GLib
would be able to differentiate between the native threads, you could
just use its native locking.

Guillaume C. - Guillaume Cottenceau

—> Drake W.

Drake_W · November 21, 2007, 12:08am

Quoth Guillaume C. [email protected], on 2007-11-20 16:03:02
+0100:

As far as I understand the code, it seems to be the core of the
solution found to the impossible cooperation between ruby’s mainloop
(the thing which schedules ruby threads) and glib’s mainloop (the
thing which triggers timeouts, idles, emit signals). As far as I know,
there is no native collaboration mechanism between the two’s, and
that’s the core of our problem;

The cooperation mechanism is (or was) the custom polling function,
which does at least bind the scheduling functions together. The GLib
docs even refer to doing things this way: the doc for
g_main_context_set_poll_func says that “this function could possibly
be used to integrate the GLib event loop with an external event loop.”

Unfortunately, Unixy platforms do not make integrating such things
with each other easy at all; the custom poll function sort of thing is
about the best you can do. (Again, I haven’t developed enough Win32
to be able to say anything about it there; perhaps it’s easier there.)

Unfortunately, this problem is going to get worse and worse, because
processor occupancy during idle times is getting tracked more closer
now that environment friendliness is a hype.

I consider it an issue of clarity. When the semantics that you mean
are a combination of “wait until something happens” and “a new thing
to do counts as something happening”, then that is what should be
implemented. If you implement “check every N milliseconds for
something happening” instead, this is suboptimal, because that’s not
really the semantics that were meant. The processor having to wake up
incessantly is an unfortunate side effect.

Here’s how I understood the problem back then:
[light snip]
Now, that doesn’t explain the needs for this empty timeout. As often
with Ruby, I have problems finding the API documentation, this time
for rb_thread_select :/. Last time I asked for assistance here about
Ruby API documentation, Masao indicated that README.EXT from ruby
tarball is often useful to extension developers, but it seems to not
document rb_thread_select. An assumption can be that rb_thread_select
will not necessarily schedule all the threads, or that non main
threads will somehow “lock” the main thread, hence the need to pump an
event into the file descriptors lists to force rb_thread_select to
return - but that doesn’t really make sense anyway…

The reason I used a pipe rather than some other form of communication
between Ruby threads is that I didn’t see another way to prevent
desynchronization conditions where the wakeup signal would get lost,
but that may have been based on my previous conservative
interpretation of the ruby atomicity guarantees. Instead, it may be
possible to just check for the presence of main loop blocks before
doing the rb_thread_select and set the timeout to zero if there are
any there.

Now, according to rg2’s svn, Kouhei recently switched from the custom
poll function approach, to installing a GSource (which calls
rb_thread_schedule from the “prepare” callback).

Working file: rbglib_mainloop.c
r2707 | ktou | 2007-11-17 04:50:39 +0100 (Sat, 17 Nov 2007) | 2 lines

src/rbglib_mainloop.c: used GSource not poll func overriding.

That sounds like a disaster. (Custom GSources are designed rather
poorly, in my opinion, though that’s probably the fault of the
platform(s). It basically forces the use of unwanted timer polling.)

I see that the “prepare” callback currently sets the returned timeout
value to 1 millisecond, which would indicate that the source tells GTK
that it needs to be rechecked after 1 millisecond only. If I’m
understanding this correctly (trunk untested), it means rg2 is almost
doing active-wait now

strace says that’s exactly what’s happening when I run a simple test
program that uses GTK+ with r2712.

poll([{fd=5, events=POLLIN}], 1, 1) = 0
select(4, [3], [], [], {0, 0}) = 0 (Timeout)
ioctl(5, FIONREAD, [0]) = 0
{infinitely repeating}

The interrupt signal doesn’t work properly anymore either, though that
may be an artifact of something else. I rather hope this will be
backed out, because I think a GSource is a much more broken way to
integrate with Ruby threads than the custom poll function, but perhaps
they have another idea that I haven’t anticipated.

I think that the status of non japanese developers/contributors on rg2
should be clarified. It is not fair that we are considered second-tier
citizens, or then let’s just consider rg2 is a japanese developers
only project and we’ll be able to make decisions according to that
fact. You see that here, this thread discussed that matter; if you
make some related source code modifications, we should not be left
outside.

Alas, a language barrier is not really anyone’s fault, and there’s no
clear solution. x.x Maybe we should get translators to act as
gateways between the lists? c.c

Guillaume C. - Guillaume Cottenceau

—> Drake W.

Drake_W · November 21, 2007, 12:32pm

Hi,

In [email protected]
“Re: [ruby-gnome2-devel-en] Self-pipe for Ruby threads pushing blocks
to main loop” on Tue, 20 Nov 2007 16:03:02 +0100,
“Guillaume C.” [email protected] wrote:

Kouhei, would you be so kind in participating to the discussion, at
least explaining what’s your modification, what problem are you trying
to solve, and its impact on multi-threading in rg2 applications if you
know it?

If you find any problems, please make a sample script that
shows the problem and the script should not be pseudo code.
It’s too hard for me to participate your discussion without
trying the current implementation in English. I don’t good
in English as you know.

     In the current situation, where I guess some discussion
occurs in the japanese ML only, it is very frustrating and
disappointing, because on this ML, we are left in the dark… I think
that more and more non japanese people interested in ruby/gtk will be
more and more frustrated, if we are only left as powerless spectators
in the actual rg2 development. I can even see that this subject was
discussed between you and Masao in japanese:

Thread: [ruby-gnome2-devel-ja] rglib_poll | Ruby-GNOME 2

We are talking with code (patch). You can find that my first
post has just a short description with a patch even if you
can’t understand Japanese. And in the next mail, Masao just
said “It seems that works on Win32 too with a small
patch.”. And in the next mail, I just said “GSource is used
in GDK too” and “Thanks”. And in the last Masao mail, he
just said “It’s a good idea.”.

We discussed about this implementation in C not
Japanese. It’s very frustrating to discuss without any
implementation in English. If you want to expect my
response, please use Ruby or C with a very short English (or
Japanese).

I’m prefer to show a small sample script that shows what you
want to do and works and just say “This script should output
‘OK’ but outputs ‘NO’ now”. If you have a solution, you can
just attach a patch and say “The attached patch shows my
idea.”. It’s enough for me because I’m good in Ruby/C rather
than in English.

You may remember that Masao and I often say “please show us
a sample script that reproduces the problem”.

If you use looooong description in English without any
codes, please don’t except my response.

Thanks,

kou

Drake_W · November 22, 2007, 1:23am

Please correct me if I’m saying anything wrong.

We have two related yet distinct problems

->If we don’t consider last revision, RG2 forces the mainloop to wake up
every 100 ms due to the empty_timeout_func. This a long-standing issue.

Can you comment why the empty_timeout_func was needed? My understanding
is that with Ruby using non-native threads, the poll function could be
blocking and thus Ruby would not be able to switch between threads.

-> We need a safe and clean way to pass procs to the mainloop.

Drake, does the following work with your patch (I don’t have time to
test it myself right now, sorry):

ruby -r gtk2 -e “Thread.new { while true; sleep 1; puts ‘yes’; end };
Gtk.main”

As I understand what’s happening, only calls to your
GLib.main_thread_do_async write to the pipe and thus only them can wake
up the main loop. So threads doing non-gtk stuff would not wake up the
mainloop and ruby would not be able to switch to them. Am I missing
something?

Mathieu