Tx path segfaults on ucla zigbee_phy

dubstep · June 6, 2011, 1:33pm

Hi!

I checked out various branches on the UCLA Zigbee stack, that is on
cgran:
https://www.cgran.org/wiki/UCLAZigBee

Actually I have some success using it as a sniffer with a USRP1. But
using this implementation with TX path has issues. I’m running into
segfaults on all branches.

Is there a newer/working version of any TX implementation on a
802.15.4 stack? I’m not sure how to patch or debug this issue, because
running through the Swig layer interface stuff with gdb is kind of
complex. So I was wondering whether I’m the first person ever facing
these bugs.

Best,
Marius

casper_the_ghost · July 14, 2011, 8:54pm

I am having the same problem with the segfault in this zigbee code. I
updated it to replace number with numpy and remove the old flow_graph
calls for top_block.

I have Googled one other person who tried this code and had the same
problem about a year ago. I am using Ubuntu 11.04 (64 bit).

Have you made any progress on this issue Marius?

casper_the_ghost · July 20, 2011, 8:29pm

I am trying to use GDB to debug this code, but it is difficult since I
am using GDB for the first time, know very little about Python, do not
know advanced C++ coding (haven’t really used virtual functions before)
and don’t know much about the physical behavior of radios… I was
hoping to study the working program to learn.

So far I have:

Downloaded the UCLA ZigBee project from CGRAN.
Updated the old flow_graph code to top_block in the python files. I
have also had to update Numeric code to NumPy/OldNumeric (done
automatically with a provided script).
Make and install the project.
Run src/examples/cc2420_rxtest.py (this seems to run fine, but I
cannot fully test it without the matching transmitter)
Run src/examples/cc2420_txtest.py (this segfaults)

So I run the txtest with a blocked wait and attach GDB. Then I press
enter to continue and use “continue” in GDB. It hits the segfault and
gives this error info. I’m not sure how to read it:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f0c43a3d700 (LWP 17349)]
0x00007f0c48a60ab0 in ucla_delay_cc::work (this=,
noutput_items=3584, input_items=,
output_items=) at ucla_delay_cc.cc:60
60 out[j] = gr_complex (real(in[j]), imag(in[j-d_delay]));

The test packet sent is the same each time:

tb.send_pkt(struct.pack(‘9B’, 0x1, 0x80, 0x80, 0xff, 0xff, 0x10, 0x0,
0x20, 0x0))

The program normally has a 1 second wait between attempts to send
packets and loops to send 10 packets. If I remove this wait and loop to
send 100 packets, the program will segfault after a random number of
packets (usually 1 or 2, but sometimes 5 and rarely as much as 45).
With the wait, it always segfaults after the 1st packet. I guess the
program dumps a packet on the queue then continues (not surprising). So
there is a separate queue reading thread and somehow that is causing the
seg fault when it gets around to reading the packet or trying to send
it. My understanding of gnuradio is very limited though, so I’m not
sure what to make of this.

If anyone has an idea please let me know and I will do my best to
continue hunting this down.

Note: I am using a USRP1, the latest (as of a week or two ago) install
script by Marcus to install gnuradio with the UHD (on Ubuntu 11.04
x86_64). The segfault occurs whether I use the RFX1800 daughterboard or
RFX2400 daughterboard.

casper_the_ghost · July 20, 2011, 8:52pm

The ucla_delay_cc.cc file is as follows. The line that segfaults is
according to GDB is the body of the last for loop (in the work
function):

#ifdef HAVE_CONFIG_H
#include “config.h”
#endif

#include <ucla_delay_cc.h>

// public constructor
ucla_delay_cc_sptr
ucla_make_delay_cc (const int delay)
{
return ucla_delay_cc_sptr (new ucla_delay_cc (delay));
}

ucla_delay_cc::ucla_delay_cc (const int delay)
: gr_sync_block (“delay_cc”,
gr_make_io_signature (1, 1, sizeof (gr_complex)),
gr_make_io_signature (1, 1, sizeof (gr_complex)))
{
d_delay = delay;
set_history (delay);
}

ucla_delay_cc::~ucla_delay_cc ()
{
return;
}

int
ucla_delay_cc::work (int noutput_items,
gr_vector_const_void_star &input_items,
gr_vector_void_star &output_items)
{
gr_complex *in = (gr_complex *) input_items[0];
gr_complex *out = (gr_complex *) output_items[0];

//fprintf(stderr, “.”), fflush(stderr);
for (int j = 0; j < noutput_items; j++)
out[j] = gr_complex (real(in[j]), imag(in[j-d_delay]));

return noutput_items;
}

casper_the_ghost · July 22, 2011, 5:18am

Well, even though I don’t really understand DSSS yet, I started looking
more closely at the C++ code:

out[j] = gr_complex (real(in[j]), imag(in[j-d_delay]));

…I noticed that the line that segfaults tries to index in[] with a
negative index (j-d_delay). So I temporarily removed d_delay and it
does not segfault anymore (of course).

Now my question is why did these guys dump this code on CGRAN like this?
The cc2420_txtest.py contains to sample packets (the second being
commented out by default) and they both cause this error.

Since I don’t really understand DSSS or the physical level of 802.15.4
I’m not sure what the d_delay is for or how I should change it to
correct the problem. If anyone has any ideas I’d appreciate.

Whew, I’m just glad I know what to look at finally. Sorry for spamming
the mail list.

casper_the_ghost · July 22, 2011, 4:46am

Here is the GDB backtrace for this problem (for those viewing this in
email form only, the rest of the thread is here:
Tx path segfaults on ucla zigbee_phy - GNU Radio - Ruby-Forum ).

Single stepping until exit from function PyObject_GetAttr,
which has no line number information.
0x0000000000496499 in PyEval_EvalFrameEx ()
(gdb)
Single stepping until exit from function PyEval_EvalFrameEx,
which has no line number information.
[New Thread 0x7f4c33f5d700 (LWP 22930)]
[New Thread 0x7f4c3375c700 (LWP 22931)]
[New Thread 0x7f4c32f5b700 (LWP 22932)]
[New Thread 0x7f4c3275a700 (LWP 22933)]
[New Thread 0x7f4c31f59700 (LWP 22934)]
[New Thread 0x7f4c31758700 (LWP 22935)]
[New Thread 0x7f4c30f57700 (LWP 22936)]
[New Thread 0x7f4c2bfff700 (LWP 22937)]
0x00000000004c3e00 in PyFrame_BlockSetup ()
(gdb)
Single stepping until exit from function PyFrame_BlockSetup,
which has no line number information.
0x0000000000497111 in PyEval_EvalFrameEx ()
(gdb)
Single stepping until exit from function PyEval_EvalFrameEx,
which has no line number information.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f4c31758700 (LWP 22935)]
0x00007f4c3678ca00 in ucla_delay_cc::work (this=0x2c1c010,
noutput_items=3584, input_items=…,
output_items=…) at ucla_delay_cc.cc:60
60 out[j] = gr_complex (real(in[j]), imag(in[j-d_delay]));
(gdb) bt
#0 0x00007f4c3678ca00 in ucla_delay_cc::work (this=0x2c1c010,
noutput_items=3584, input_items=…,
output_items=…) at ucla_delay_cc.cc:60
#1 0x00007f4c3a6ac274 in gr_sync_block::general_work (this=0x2c1c010,
noutput_items=, ninput_items=,
input_items=, output_items=) at gr_sync_block.cc:64
#2 0x00007f4c3a68f8ed in gr_block_executor::run_one_iteration
(this=0x7f4c31757d70)
at gr_block_executor.cc:378
#3 0x00007f4c3a6aef80 in gr_tpb_thread_body::gr_tpb_thread_body
(this=0x7f4c31757d70, block=…)
at gr_tpb_thread_body.cc:49
#4 0x00007f4c3a6a8b74 in operator() (function_obj_ptr=…) at
gr_scheduler_tpb.cc:42
#5 operator() (function_obj_ptr=…)
at
/home/david/gnuradio/gruel/src/include/gruel/thread_body_wrapper.h:49
#6
boost::detail::function::void_function_obj_invoker0<gruel::thread_body_wrapper<tpb_container>,
void>::invoke (function_obj_ptr=…) at
/usr/include/boost/function/function_template.hpp:153
#7 0x00007f4c399ac8ee in operator() (this=)
at /usr/include/boost/function/function_template.hpp:1013
#8 boost::detail::thread_data<boost::function0 >::run
(this=)
at /usr/include/boost/thread/detail/thread.hpp:56
#9 0x00007f4c38a2916e in thread_proxy () from
/usr/lib/libboost_thread.so.1.42.0
#10 0x00007f4c3c43fd8c in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#11 0x00007f4c3b30504d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x0000000000000000 in ?? ()
(gdb)