Ruby 1.91.376 hangs when launched with python pExpect

davidjrice · April 22, 2010, 2:57am

Hi

I have a ruby program which has a UI using readline library. In the
background it launches some ruby threads which serve as a tcp server. It
works fine so far. But when it is launched on an automated way using
python pExpect the tcp server threads do not dequeue their packets. But
if from pExpect I send a message to the UI, the ruby threads move and
dequeue their packets. The trace of the readline thread and the ones
which has packets to dequeue is attached. I appreciate if somebody could
take a look why is this happening?

Thanks,

David

davidjrice · April 22, 2010, 10:02pm

David R. wrote:

Hi

I have a ruby program which has a UI using readline library. In the
background it launches some ruby threads which serve as a tcp server. It
works fine so far. But when it is launched on an automated way using
python pExpect the tcp server threads do not dequeue their packets.

Maybe readline is getting confused somehow. You could try using pure
ruby readline.
-rp

davidjrice · April 22, 2010, 11:40pm

Hi Roger

I am using ruby Readline as:
while @logout == false
cmd =Readline::readline(’>’,true)
process_command(cmd) if !cmd.nil?
end

adn it works fine when running independently but not when running from
the pExpect. So I changed to use jsut stdin.gets as:

while @logout == false
print “>”
$stdout.flush
process_command($stdin.gets)
end

and it seems to be working now. But what I saw is that my program has
between 10 to 40 threads and the ones that have packets to dequeue are
waiting on
the __lll_lock_wait. So those threads looks like they are out of the
select but somehow they are stuck in this lock_wait.

Thanks

David

#0 0x00d32410 in __kernel_vsyscall()
#1 0x00b45509 in __lll_lock_wait () from lib/libpthread.so.0
#2 0x00b40bbf in _L_lock_885 () from /lib/libpthread.so.0
#3 0x00b40a86 in pthread_mutex_lock () from /lib/libpthread.so.0
#4 0x0811daae in native_mutex_lock (lock=0xfffffffc) at
thread_pthread.c:36
#5 0x081225d2 in do_select (n=7, read=0x9856938, write=0x0, except=0x0,
timeout=0x0) at thread.c:984
#6 0x0812279c in rb_thread_wait_fd_rw (fd=6, read=1) at thread.c:2475
#7 0x00861827 in s_recvfrom (sock=, argc=, argv=, from=RECV_IP)

davidjrice · April 23, 2010, 12:41am

Hi Roger

Hmm. One thing to remember is Ruby has a GLI, which means that if any
one thread is not within a rb_thread_blocking_region then it can block
your process. With 1.9.x, anyway.

Could you explain me what is a GLI and what does it mean when a thread
is “in” and is “not” in a rb_thread_blocking_region. And why any thread
“not” could block my process ? Actually I see in my threads many
threads “in” a rb_thread_blocking_region doing waitpid which I assume
for the others thread I launch.

Thanks,

David

davidjrice · April 22, 2010, 11:50pm

and it seems to be working now. But what I saw is that my program has
between 10 to 40 threads and the ones that have packets to dequeue are
waiting on
the __lll_lock_wait. So those threads looks like they are out of the
select but somehow they are stuck in this lock_wait.

Hmm. One thing to remember is Ruby has a GLI, which means that if any
one thread is not within a rb_thread_blocking_region then it can block
your process. With 1.9.x, anyway.
So make sure this isn’t the case. If you’re on Linux it shouldn’t be.
-rp

davidjrice · April 25, 2010, 7:43am

Could you explain me what is a GLI and what does it mean when a thread
is “in” and is “not” in a rb_thread_blocking_region. And why any thread
“not” could block my process ? Actually I see in my threads many
threads “in” a rb_thread_blocking_region doing waitpid which I assume
for the others thread I launch.

http://www.igvita.com/2008/11/13/concurrency-is-a-myth-in-ruby

might help.

In general if you have one thread that’s not within a select, it “might”
be blocking all threads. I don’t know if that’s actually your problem
here, though.
-rp

davidjrice · April 25, 2010, 11:12pm

Thanks Roger

Very helpful!