Unhandled exception after upgrading gnuradio

Hi,
I am using benchmark_tx.py and benchmark_rx.py in code to make WBX
usrp2 as transceiver, this was running without error until I upgraded my
gnuradio from 3.5.1 to current version

v3.5.2-1-g57ad294b, when I run the same code now it gives a
segmenatation fault(core dump), below are the /var/log/messages when
seg. fault occurs

Mar 8 15:29:10 MRAFIQ-NGN kernel: [75426.998302] python[12650] general
protection ip:b2d4a0 sp:b51fee80 error:0 in
libvolk.so.0.0.0[ae2000+be000]
Mar 8 15:29:11 MRAFIQ-NGN abrt[12658]: saved core dump of pid 12635
(/usr/bin/python) to
/var/spool/abrt/ccpp-2012-03-08-15:29:10-12635.new/coredump (127561728
bytes)
Mar 8 15:29:11 MRAFIQ-NGN abrtd: Directory
‘ccpp-2012-03-08-15:29:10-12635’ creation detected
Mar 8 15:29:11 MRAFIQ-NGN abrtd: Interpreter crashed, but no packaged
script detected: ‘python algo-sink.py’
Mar 8 15:29:11 MRAFIQ-NGN abrtd: Corrupted or bad dump
/var/spool/abrt/ccpp-2012-03-08-15:29:10-12635 (res:2), deleting

In the code this occurs at tb.start() inside benchmark_rx.py

and it works without an error after a 6-8 trials in a row ,
I am not able to figure out where there is exception that is not handled
properly, I appreciate your comments.

fedora 15, gnuradio v3.5.2-1-g57ad294b,
usrp2, WBX daughter board

On Thu, Mar 8, 2012 at 5:40 AM, MOHD RAFIQ [email protected]
wrote:

(/usr/bin/python) to /var/spool/abrt/ccpp-2012-03-08-15:29:10-12635.new/coredump
and it works without an error after a 6-8 trials in a row ,
I am not able to figure out where there is exception that is not handled
properly, I appreciate your comments.

fedora 15, gnuradio v3.5.2-1-g57ad294b,
usrp2, WBX daughter board

Can you follow the instructions here:
http://gnuradio.org/redmine/projects/gnuradio/wiki/FAQ#How-do-I-debug-GNU-Radio-in-Python

And get us a backtrace (bt in gdb) when the seg fault happens? It’ll
help
us to debug it.

Thanks,
Tom

I apologise for not being clear, here is the backtrace

[New Thread 0xb21feb70 (LWP 16921)]
[Thread 0xb19fdb70 (LWP 16920) exited]
[New Thread 0xb19fdb70 (LWP 16922)]
[New Thread 0xb29ffb70 (LWP 16923)]
[New Thread 0xb33ffb70 (LWP 16924)]
[New Thread 0xb3dffb70 (LWP 16925)]
[New Thread 0xb51ffb70 (LWP 16926)]
[New Thread 0xb47ffb70 (LWP 16927)]
[New Thread 0xb11fcb70 (LWP 16928)]
[New Thread 0xb09fbb70 (LWP 16929)]
[New Thread 0xb01fab70 (LWP 16930)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb29ffb70 (LWP 16923)]
0x002fc4a0 in volk_32fc_x2_multiply_32fc_a_sse3 ()
from /usr/local/lib/libvolk.so.0.0.0
(gdb) bt
#0 0x002fc4a0 in volk_32fc_x2_multiply_32fc_a_sse3 ()
from /usr/local/lib/libvolk.so.0.0.0
#1 0x002c25d2 in get_volk_32fc_x2_multiply_32fc_a ()
from /usr/local/lib/libvolk.so.0.0.0
#2 0x005125bf in gri_fft_filter_ccc_generic::filter(int,
std::complex const*, std::complex) ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#3 0x005196ff in gr_fft_filter_ccc::work(int, std::vector<void const
,
std::allocator<void const*> >&, std::vector<void*, std::allocator<void*>

&) ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#4 0x004e537a in gr_sync_decimator::general_work(int, std::vector<int,
std::allocator >&, std::vector<void const*, std::allocator<void
const*> >&, std::vector<void*, std::allocator<void*> >&) ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#5 0x004c983c in gr_block_executor::run_one_iteration() ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#6 0x004e7a7f in
gr_tpb_thread_body::gr_tpb_thread_body(boost::shared_ptr<gr_block>, int)
() from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#7 0x004e2374 in
boost::detail::function::void_function_obj_invoker0<gruel::thread_body_wrapper<tpb_container>,
void>::invoke(boost::detail::function::function_buffer&) () from
/usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#8 0x006c8156 in boost::detail::thread_data<boost::function0
::run() ()
from /usr/local/lib/libgruel-3.5.3git.so.0.0.0
#9 0x0017444d in thread_proxy () from
/usr/lib/libboost_thread-mt.so.1.46.0
#10 0x4f94ca2e in start_thread () from /lib/libpthread.so.0
#11 0x4f86689e in clone () from /lib/libc.so.6

Thanks,
rafiq


From: Tom R. [email protected]
To: MOHD RAFIQ [email protected]
Cc: “[email protected][email protected]
Sent: Thursday, 8 March 2012 8:51 PM
Subject: Re: [Discuss-gnuradio] Unhandled exception after upgrading
gnuradio

On Thu, Mar 8, 2012 at 5:40 AM, MOHD RAFIQ [email protected]
wrote:

Hi,

Mar 8 15:29:11 MRAFIQ-NGN abrtd: Corrupted or bad dump
/var/spool/abrt/ccpp-2012-03-08-15:29:10-12635 (res:2), deleting
usrp2, WBX daughter board
Can you follow the instructions here:
http://gnuradio.org/redmine/projects/gnuradio/wiki/FAQ#How-do-I-debug-GNU-Radio-in-Python

And get us a backtrace (bt in gdb) when the seg fault happens? It’ll
help us to debug it.

Thanks,
Tom

This might be related to the problems Martin B. is having with sse3
volk in the other thread today. Was anything done to volk git recently
that would effect sse3?

Rafiq,

What CPU are you using for this test? Specifically, please send the
output
of “cat /proc/cpuinfo”.

–n

On 03/08/2012 12:28 PM, Andrew D. wrote:

This might be related to the problems Martin B. is having with sse3
volk in the other thread today. Was anything done to volk git recently
that would effect sse3?

For what it’s worth, the attached FFT filter tester flow-graph works
just fine on my AMD 64-bit machine (Phenom II X4 955) with the latest
Gnu Radio.

On 08/03/12 01:11 PM, Nick F. wrote:

Rafiq,

What CPU are you using for this test? Specifically, please send the
output of “cat /proc/cpuinfo”.

–n

To add some more data, I tested the attached flow-graph on the only
remaining 32-bit machine in my herd.
It provoked:

#0 0x0090d6b3 in volk_32fc_x2_multiply_32fc_a_sse3 ()
from /usr/local/lib/libvolk.so.0.0.0
#1 0x008de0d5 in get_volk_32fc_x2_multiply_32fc_a ()
from /usr/local/lib/libvolk.so.0.0.0
#2 0x00a8f517 in gri_fft_filter_ccc_generic::filter(int,
std::complex const*, std::complex) ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#3 0x00a967fb in gr_fft_filter_ccc::work(int, std::vector<void const
,
std::allocator<void const*> >&, std::vector<void*, std::allocator<void*>

&) ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#4 0x00a65eb7 in gr_sync_decimator::general_work(int, std::vector<int,
std::allocator >&, std::vector<void const*, std::allocator<void
const*> >&, std::vector<void*, std::allocator<void*> >&) ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#5 0x00a4d185 in gr_block_executor::run_one_iteration() ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#6 0x00a688e3 in
gr_tpb_thread_body::gr_tpb_thread_body(boost::shared_ptr<gr_block>, int)
() from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#7 0x00a62bfc in
boost::detail::function::void_function_obj_invoker0<gruel::thread_body_wrapper<tpb_container>,
void>::invoke(boost::detail::function::function_buffer&) () from
/usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#8 0x001b881c in boost::function0::operator()() const ()
from /usr/local/lib/libgruel-3.5.3git.so.0.0.0

This is on a Fedora-12 machine, here’s /proc/cpuinfo

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 14
model name : Genuine Intel® CPU T2400 @ 1.83GHz
stepping : 8
cpu MHz : 1000.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc
arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm
bogomips : 3657.49
clflush size : 64
cache_alignment : 64
address sizes : 32 bits physical, 32 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 14
model name : Genuine Intel® CPU T2400 @ 1.83GHz
stepping : 8
cpu MHz : 1000.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc
arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm
bogomips : 3657.54
clflush size : 64
cache_alignment : 64
address sizes : 32 bits physical, 32 bits virtual
power management:

On Thu, Mar 8, 2012 at 1:22 PM, Marcus D. Leech [email protected]
wrote:

It provoked:

&) ()
#7 0x00a62bfc in
processor : 0
cpu cores : 2
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
cpu family : 6
initial apicid : 1
arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm
bogomips : 3657.54
clflush size : 64
cache_alignment : 64
address sizes : 32 bits physical, 32 bits virtual
power management:

Well, there’s your problem right there!

flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca
cmov
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc
arch_perfmon bts aperfmperf pni monitor vmx est tm2 xtpr pdcm

Your CPU doesn’t seem to support SSE3, which begs the question, why is
it
allowed to call the SSE3 proto-kernel?

Check your ~/.volk/volk_config to see what it says on that kernel (that
is,
if you have run volk_profile; if you haven’t, run it first and see what
happens).

Tom

On Thu, Mar 8, 2012 at 11:16 AM, Tom R. [email protected] wrote:

To add some more data, I tested the attached flow-graph on the only
#3 0x00a967fb in gr_fft_filter_ccc::work(int, std::vector<void const*,
gr_tpb_thread_body::gr_tpb_thread_body(boost::shared_ptr<gr_block>, int)
This is on a Fedora-12 machine, here’s /proc/cpuinfo
siblings : 2
cpuid level : 10
processor : 1
cpu cores : 2
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
Well, there’s your problem right there!
happens).

I’m not wholly convinced this is the problem. I’d expect to see an
illegal
instruction error, not a memory access error, if an SSE3 instruction
were
called on a machine which doesn’t support it. That said, if SSE3 isn’t
set
in cpuflags it shouldn’t be allowed in Volk. I’ll check the flag in
Volk…

–n

On 08/03/12 02:16 PM, Tom R. wrote:

>
  from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
  from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
  from /usr/local/lib/libgruel-3.5.3git.so.0.0.0
cpu MHz        : 1000.000
coma_bug    : no
cache_alignment    : 64
cache size    : 2048 KB
fpu        : yes
address sizes    : 32 bits physical, 32 bits virtual

Your CPU doesn’t seem to support SSE3, which begs the question, why is
it allowed to call the SSE3 proto-kernel?

Check your ~/.volk/volk_config to see what it says on that kernel
(that is, if you have run volk_profile; if you haven’t, run it first
and see what happens).

Tom

The information in /proc/cpuinfo appears to be incorrect. That
particular CPU, the T2400, actually
does support SSE3, according to the Intel datasheet, and other
googled stuff out there.

On Thu, Mar 8, 2012 at 11:21 AM, Marcus D. Leech [email protected]
wrote:

#2 0x00a8f517 in gri_fft_filter_ccc_generic::filter(int,
#5 0x00a4d185 in gr_block_executor::run_one_iteration() ()
from /usr/local/lib/libgruel-3.5.3git.so.0.0.0
cpu MHz : 1000.000
coma_bug : no
address sizes : 32 bits physical, 32 bits virtual
physical id : 0
fpu_exception : yes

The relevant line in the Volk archs.xml is correct, and it’s looking for
the correct bit in ECX to determine PNI/SSE3 capability. /proc/cpuinfo
calls this “pni”, or Prescott New Instructions, not SSE3.

–n

On Thu, Mar 8, 2012 at 2:24 PM, Nick F. [email protected] wrote:

What CPU are you using for this test? Specifically, please send the
#1 0x008de0d5 in get_volk_32fc_x2_multiply_32fc_a ()
const*> >&, std::vector<void*, std::allocator<void*> >&) ()
/usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
model name : Genuine Intel(R) CPU T2400 @ 1.83GHz
hlt_bug : no
clflush size : 64
cpu MHz : 1000.000
coma_bug : no
address sizes : 32 bits physical, 32 bits virtual

particular CPU, the T2400, actually
does support SSE3, according to the Intel datasheet, and other
googled stuff out there.

The relevant line in the Volk archs.xml is correct, and it’s looking for
the correct bit in ECX to determine PNI/SSE3 capability. /proc/cpuinfo
calls this “pni”, or Prescott New Instructions, not SSE3.

–n

Damn… I should have remembered that…

Tom

On 08/03/12 02:24 PM, Nick F. wrote:

The relevant line in the Volk archs.xml is correct, and it’s looking
for the correct bit in ECX to determine PNI/SSE3 capability.
/proc/cpuinfo calls this “pni”, or Prescott New Instructions, not SSE3.

Consistency being the hobgoblin of small minds or something. Sheesh.


Principal Investigator
Shirleys Bay Radio Astronomy Consortium

On Thu, Mar 8, 2012 at 2:21 PM, Marcus D. Leech [email protected]
wrote:

#2 0x00a8f517 in gri_fft_filter_ccc_generic::filter(int,
#5 0x00a4d185 in gr_block_executor::run_one_iteration() ()
from /usr/local/lib/libgruel-3.5.3git.so.0.0.0


Principal Investigator
Shirleys Bay Radio Astronomy Consortiumhttp://www.sbrac.org

That still might be indicative of the problem, though, that the flags
are
being read incorrectly by the proc system. Very strange, that.

Marcus, if you would, change the volk kernel from the aligned (_a) to
the
unaligned one (_u) and see what that does. In the FFT, since it’s an
FFTW
buffer, it should be aligned, but this will let us know.

Tom

On 08/03/12 02:25 PM, Tom R. wrote:

That still might be indicative of the problem, though, that the flags
are being read incorrectly by the proc system. Very strange, that.

Marcus, if you would, change the volk kernel from the aligned (_a) to
the unaligned one (_u) and see what that does. In the FFT, since it’s
an FFTW buffer, it should be aligned, but this will let us know.

Tom

Ok, changed from the “_a” to the "u" variant in
gri_fft_filter
{ccc,fff}, and voila! It now works!


Principal Investigator
Shirleys Bay Radio Astronomy Consortium

On Thu, Mar 8, 2012 at 2:35 PM, Marcus D. Leech [email protected]
wrote:

Ok, changed from the “_a” to the "u" variant in
gri_fft_filter
{ccc,fff}, and voila! It now works!


Principal Investigator
Shirleys Bay Radio Astronomy Consortium
http://www.sbrac.org

Thanks to Marcus’ help, I figured out the error here was due to a very
silly assumption that I had made regarding our own array for the Fourier
taps. Apparently, this is always handled “correctly” (for Volk’s
definition
of correct) on 64-bit machines but not 32-bit ones.

Attached is a patch that uses fftw_malloc to create an array to store
the
taps in. I created some helper functions in gri_fft to abstract the
specific use of fftw in the filter code itself like we’ve done with the
rest of fftw’s invocation.

Please try this out and let me know. I’ve only verified it on a 32-bit
VM,
so I just want to make sure there’s nothing I’m missing (although it was
failing previously as expected, so I think this works).

There are still a couple of other 32-bit issues that I’m seeing in other
blocks. I’ll look into those next.

Thanks,
Tom

Ok, I have a branch ‘volk_32bit_fixes’ published at:
git://github.com/trondeau/gnuradio.git

This should fix the fft_filter issue and another issue in one of the
volk
convert functions (that is not used in GNU Radio but was failing
anyway).

I’m still seeing the 32fc_x2_dot_product_32fc failure. It looks like
this
is in the SSE_32 proto-kernel. I’m seeing if we can get a quick fix for
it.
This is the same problem Martin was seeing. If we can’t get a fix for
it,
I’m going to temporarily disable it until we get it fixed.

Tom

Hi,
pls suggest fix for this, i have attached my cat /proc/cpuinfo .
Thanks


From: MOHD RAFIQ [email protected]
To: “[email protected][email protected]
Sent: Thursday, 8 March 2012 9:45 PM
Subject: Re: [Discuss-gnuradio] Unhandled exception after upgrading
gnuradio- backtrace included

I apologise for not being clear, here is the backtrace

[New Thread 0xb21feb70 (LWP 16921)]
[Thread 0xb19fdb70 (LWP 16920) exited]
[New Thread 0xb19fdb70 (LWP 16922)]
[New Thread 0xb29ffb70 (LWP 16923)]
[New Thread 0xb33ffb70 (LWP 16924)]
[New Thread 0xb3dffb70 (LWP 16925)]
[New Thread 0xb51ffb70 (LWP 16926)]
[New Thread 0xb47ffb70 (LWP 16927)]
[New Thread 0xb11fcb70 (LWP 16928)]
[New Thread 0xb09fbb70 (LWP 16929)]
[New Thread 0xb01fab70 (LWP 16930)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb29ffb70 (LWP 16923)]
0x002fc4a0 in volk_32fc_x2_multiply_32fc_a_sse3 ()
from /usr/local/lib/libvolk.so.0.0.0
(gdb) bt
#0 0x002fc4a0 in volk_32fc_x2_multiply_32fc_a_sse3
()
from /usr/local/lib/libvolk.so.0.0.0
#1 0x002c25d2 in get_volk_32fc_x2_multiply_32fc_a ()
from /usr/local/lib/libvolk.so.0.0.0
#2 0x005125bf in gri_fft_filter_ccc_generic::filter(int,
std::complex const*, std::complex) ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#3 0x005196ff in gr_fft_filter_ccc::work(int, std::vector<void const
,
std::allocator<void const*> >&, std::vector<void*, std::allocator<void*>

&) ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#4 0x004e537a in gr_sync_decimator::general_work(int, std::vector<int,
std::allocator >&, std::vector<void const*, std::allocator<void
const*> >&, std::vector<void*, std::allocator<void*> >&) ()
from
/usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#5 0x004c983c in gr_block_executor::run_one_iteration() ()
from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#6 0x004e7a7f in
gr_tpb_thread_body::gr_tpb_thread_body(boost::shared_ptr<gr_block>, int)
() from /usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#7 0x004e2374 in
boost::detail::function::void_function_obj_invoker0<gruel::thread_body_wrapper<tpb_container>,
void>::invoke(boost::detail::function::function_buffer&) () from
/usr/local/lib/libgnuradio-core-3.5.3git.so.0.0.0
#8 0x006c8156 in boost::detail::thread_data<boost::function0
::run() ()
from /usr/local/lib/libgruel-3.5.3git.so.0.0.0
#9 0x0017444d in thread_proxy () from
/usr/lib/libboost_thread-mt.so.1.46.0
#10 0x4f94ca2e in start_thread () from /lib/libpthread.so.0
#11 0x4f86689e in clone () from
/lib/libc.so.6

Thanks,
rafiq


From: Tom R. [email protected]
To: MOHD RAFIQ [email protected]
Cc: “[email protected][email protected]
Sent: Thursday, 8 March 2012 8:51 PM
Subject: Re: [Discuss-gnuradio] Unhandled exception after upgrading
gnuradio

On Thu, Mar 8, 2012 at 5:40 AM, MOHD RAFIQ [email protected]
wrote:

Hi,

Mar 8 15:29:11 MRAFIQ-NGN abrtd: Corrupted or bad dump
/var/spool/abrt/ccpp-2012-03-08-15:29:10-12635 (res:2), deleting
usrp2, WBX daughter board
Can you follow the instructions here:
http://gnuradio.org/redmine/projects/gnuradio/wiki/FAQ#How-do-I-debug-GNU-Radio-in-Python

And get us a backtrace (bt in gdb) when the seg fault happens? It’ll
help us to debug it.

Thanks,
Tom

On Sat, Mar 10, 2012 at 5:20 AM, MOHD RAFIQ [email protected]
wrote:

Hi,
pls suggest fix for this, i have attached my cat /proc/cpuinfo .
Thanks

I already did. To quote:

"Ok, I have a branch ‘volk_32bit_fixes’ published at:
git://github.com/trondeau/gnuradio.git

This should fix the fft_filter issue and another issue in one of the
volk
convert functions (that is not used in GNU Radio but was failing
anyway).

I’m still seeing the 32fc_x2_dot_product_32fc failure. It looks like
this
is in the SSE_32 proto-kernel. I’m seeing if we can get a quick fix for
it.
This is the same problem Martin was seeing. If we can’t get a fix for
it,
I’m going to temporarily disable it until we get it fixed.

Tom"

I have been waiting for people with 32-bit machines to try it out and
report back before merging it.

Tom

On 10/03/12 07:38 AM, Tom R. wrote:

report back before merging it.

Just tried it on my T2400-based machine (Centrino M), and it works just
fine.