Ephemeral error in threading

Hi,

We have been having a bit of a problem with GnuRadio
hanging when we call run() on a created flow graph and
wanted to see if anyone had any ideas or suggestions
for resolving the issue.

We have installed the latest version of the GnuRadio
libraries 3.0.3. However, we also saw the problem on
the previous version 3.0.2. Our application creates
and runs many small flow graphs to capture a signal
sample from a physical usrp board for 0.001 of a
secondond. The problem happens inconsistently.
Sometimes it will happen on the 4th or 5th flow graph
we create and run, sometimes it takes a very long
time, if ever, for the problem to occur. Basically we
create a flow graph g, and call g.run() and it never
returns.

The creation of our particular flow graph looks
something like the following (I have only included the
essential code segments, so some references and code
are missing):


Some code has been removed to simplify

presentation…
class
UsrpWrapperServerCaptureFlowGraph(gr.flow_graph):
def init(self, options, freq=88.9e6,
duration=0.001):
gr.flow_graph.init(self)
usrp_wrapper_decimation = 16
adc_rate = 64e6 # constant for USRP
nsamples = adc_rate /
float(usrp_wrapper_decimation) * duration
self.src =
usrp.source_s(decim_rate=usrp_wrapper_decimation)
self.dst = gr.vector_sink_s()
self.head = gr.head(gr.sizeof_short,
int(nsamples))
self.connect(self.src, self.head, self.dst)
r = self.u.tune(0, self.subdev, freq)

def captureSample(self, freq, duration, subdev = None,
store = False):
print “<UsrpWrapper.capturSample> Capturing sample
at freq:”, freq, “dur:”, duration
options.usrp_wrapper_subdev = subdev
g =
UsrpWrapperServerCaptureFlowGraph(self.options, freq,
duration)

print "<UsrpWraper.capturSample> ... about to

capture sample by callingg.run() …"
g.run()
print "<UsrpWraper.capturSample> … returned
from g.run() Done … "

return g.dst.data()

As I described above, the captureSample() function can
be called many times and it will successfully
communicate with the usrp board, capture a sample by
running the flow graph, and return the sample as a
vector of shorts. However, eventually after calling
captureSample() many times, the function will be
entered and we will get the debug message before the
call to g.run():

<UsrpWrapper.capturSample> Capturing sample at freq:
360000000.0 dur:
0.001
Using RX d’board B: TV Rx Rev 3
USB sample rate 4M
<UsrpWraper.capturSample> … about to capture sample
by calling g.run() …

but g.run() will never return.


We have added some print/debug statements into the
gnuradio library code to get some further information.
After examining things for a bit we have found the
following. The gnuradio code appears to be entering an
infinite loop in the wait() method for the scheduler
class in the gnuradio/gr/scheduler.py file. I have
made the following modifications to the wait() method
of the scheduler:

----------- our version of wait() function for
scheduler class

----------- in gnuradio/gr/scheduler.py

def wait(self):
    print "<scheduler.wait> entered, before for

loop"
for (sts, thread) in self.state:
print “<scheduler.wait> in for loop sts=”,
sts, “thread=”, thread
timeout = 0.100
print “<scheduler.wait> after setting
timeout, before while True loop”
numJoinAttempts = 0
while True:
print “<scheduler.wait> before
attempting thread join”
thread.join(timeout)
print “<scheduler.wait> now check if
thread is alive or not, if not we break out”
if not thread.isAlive():
print “<scheduler.wait> dead
thread, should break out of this loop, this was a
normal halting condition”
break


Basically if we run our code, and look at the messages
from the wait function as I have show above, we will
see:

<flow_grapy.wait> before we call scheduler.wait()
<scheduler.wait> entered, before for loop
<scheduler.wait> in for loop sts=
<gnuradio.gr.gnuradio_swig_python.gr_single_th
readed_scheduler_sptr; proxy of <Swig Object of type
‘gr_single_threaded_schedul
er_sptr *’ at 0xa16d280> > thread=
<scheduler_thread(Thread-9, started
daemon)>
<scheduler.wait> after setting timeout, before while
True loop
<scheduler.wait> before attempting thread join
<scheduler.wait> now check if thread is alive or not,
if not we break out
<scheduler.wait> before attempting thread join
<scheduler.wait> now check if thread is alive or not,
if not we break out
<scheduler.wait> before attempting thread join
<scheduler.wait> now check if thread is alive or not,
if not we break out
<scheduler.wait> before attempting thread join
… infinite loop …

Basically the thread.join() call never succeeds, and
we never break
out of the while loop.


So that is where we are at the moment. I haven’t been
able to determine if I have uncovered some legitimate
race condition or heisenbug with the gnuradio flow
graph threading and scheduling code,
or if we are doing something a bit silly in our
creation/usage of a gnuradio flow graph. Any help,
suggestions or ideas would be greatly welcomed. We
will try and pin down the problem more specifically if
we have to, but wanted to check with the community at
large first before getting deep into the
threading/scheduling guts and discovering either a)
someone has already fixed it or b) we are doing
something silly and/or stupid in our creation and use
of the flow graph.

Thanks in advance,
Derek Harter / John B.
Texas A&M-Commerce


Finding fabulous fares is fun.
Let Yahoo! FareChase search your favorite travel sites to find flight
and hotel bargains.
http://farechase.yahoo.com/promo-generic-14795097