Volk machine: ssse3_32 segmentation faults (was RE: How to diagnose make test failures)

StamperS_Brian · April 23, 2013, 4:38pm

Hi all,

Previously I posted that I was getting the following make test failures.
I’m building GNU Radio in Fuduntu (a fork of Fedora) on a 32-bit Intel
Atom N270 based netbook (an Asus Eee 1000).

22/192 Test #22: qa_fft_filter …***Failed 1.05 sec
85/192 Test #85: test_gr_filter …***Failed 0.27 sec
91/192 Test #91: qa_fft_filter …***Failed 1.11 sec
92/192 Test #92: qa_hilbert …***Failed 1.09 sec
93/192 Test #93: qa_filter_delay_fc …***Failed 1.10 sec
94/192 Test #94: qa_pfb_arb_resampler …***Failed 1.13 sec
95/192 Test #95: qa_rational_resampler …***Failed 1.23 sec
97/192 Test #97: qa_fir_filter …***Failed 1.23 sec

I started running the test shell scripts individually and found that
they all crashed with the same message:

Using Volk machine: ssse3_32
Segmentation fault

I believe all of my code and dependencies are up to date, but maybe
there is some unlisted dependency I’m missing?

Thanks,

Brian S.

StamperS_Brian · April 24, 2013, 4:16am

On Tue, Apr 23, 2013 at 10:37 AM, Stamper, Brian [email protected]
wrote:

94/192 Test #94: qa_pfb_arb_resampler …***Failed 1.13 sec

Thanks,

Brian S.

Hi Brian,

No, this isn’t a dependency issue. Might be an issue with the
machine/processor you have. But still, the segmentation fault isn’t
necessarily related to Volk. That’s just a message that’s always
printed when using Volk.

The best way to get info from here is to use gdb. In one of the QA
test files (the Python file that’s in the source tree), add the lines:

import os
print os.getpid()
raw_input()

Do this before any other Python code. When you run it now, it will
print the PID of the process. In another terminal, run gdb (probably
as root), and then run ‘attach ’ where is the PID printed
by the program. Then press ‘c’ to continue in gdb. On the other
terminal running the QA code, press enter (to proceed past the
‘raw_input’ line). Wait until it crashes, then, in gdb, type ‘bt’ to
get a back trace. Send that along and it will help us figure out where
the seg fault is happening.

Tom

StamperS_Brian · April 24, 2013, 11:49am

Hi all,

I just want to point out that you could use gdb python for debugging.

just run gdb python and then “run path/file.py”. At least it works for
me and doesn’t need this import os stuff and looking up PIDs.

Johannes

Am 24.04.2013 04:14, schrieb Tom R.:

StamperS_Brian · April 24, 2013, 6:04pm

On Wed, Apr 24, 2013 at 2:48 AM, Johannes D.
[email protected]wrote:

I just want to point out that you could use gdb python for debugging.

just run gdb python and then “run path/file.py”. At least it works for
me and doesn’t need this import os stuff and looking up PIDs.

This works in most cases (and is how I usually debug GNU Radio apps
written
in Python.)

However, in the case of the QA code, the ‘make test’ command executes
the
Python script with a very specific set of environment variables to allow
the QA code to find GNU Radio as part of the build tree, not the system
installation, so the technique Tom outlined is needed to attach gdb
after
all that takes place.

StamperS_Brian · April 24, 2013, 7:09pm

On Wed, Apr 24, 2013 at 9:46 AM, Stamper, Brian [email protected]
wrote:

generic 99% tests passed, 2 tests failed out of 192

I’m curious which QA tests failed for you when everything was set to
‘generic’.

StamperS_Brian · April 24, 2013, 6:48pm

Hi Tom,

From: [email protected] [mailto:[email protected]] On Behalf Of Tom
Rondeau
Sent: Tuesday, April 23, 2013 10:15 PM
On Tue, Apr 23, 2013 at 10:37 AM, Stamper, Brian [email protected] wrote:
…

Previously I posted that I was getting the following make test failures. I’m
building GNU Radio in Fuduntu (a fork of Fedora) on a 32-bit Intel Atom N270 based
netbook (an Asus Eee 1000).
…
I started running the test shell scripts individually and found that they all
crashed with the same message:

Using Volk machine: ssse3_32
Segmentation fault
…
The best way to get info from here is to use gdb. In one of the QA test files
(the Python file that’s in the source tree), add the lines:

import os
print os.getpid()
raw_input()

Do this before any other Python code. When you run it now, it will print the PID
of the process. In another terminal, run gdb (probably as root), and then run
‘attach ’ where is the PID printed by the program. Then press ‘c’ to
continue in gdb. On the other terminal running the QA code, press enter (to
proceed past the ‘raw_input’ line). Wait until it crashes, then, in gdb, type ‘bt’
to get a back trace. Send that along and it will help us figure out where the seg
fault is happening.

Tom

So I picked on one of the failing qa_fft_filter tests:
$ vi
/home/brian/SDR/gnuradio/gnuradio-core/src/python/gnuradio/gr/qa_fft_filter.py
$ /bin/sh
“/home/brian/SDR/gnuradio/build/gnuradio-core/src/python/gnuradio/gr/qa_fft_filter_test.sh”
[then switch to gdb, attach, continue, await crash.]

Here’s the bt:

(gdb) bt
#0 0xb6a6f7a7 in volk_32fc_x2_multiply_32fc_a_sse3 () from
/home/brian/SDR/gnuradio/build/volk/lib/libvolk.so.0.0.0
#1 0xb6a3cc92 in get_volk_32fc_x2_multiply_32fc_a () from
/home/brian/SDR/gnuradio/build/volk/lib/libvolk.so.0.0.0
#2 0xb6e8e539 in gri_fft_filter_ccc_generic::filter(int,
std::complex const*, std::complex) ()
from
/home/brian/SDR/gnuradio/build/gnuradio-core/src/lib/libgnuradio-core-3.6.4.1.so.0.0.0
#3 0xb6e961a5 in gr_fft_filter_ccc::work(int, std::vector<void const,
std::allocator<void const*> >&, std::vector<void*, std::allocator<void*>

&) () from
/home/brian/SDR/gnuradio/build/gnuradio-core/src/lib/libgnuradio-core-3.6.4.1.so.0.0.0
#4 0xb6e5fd8a in gr_sync_decimator::general_work(int, std::vector<int,
std::allocator >&, std::vector<void const*, std::allocator<void
const*> >&, std::vector<void*, std::allocator<void*> >&) ()
from
/home/brian/SDR/gnuradio/build/gnuradio-core/src/lib/libgnuradio-core-3.6.4.1.so.0.0.0
#5 0xb6e3fdf1 in gr_block_executor::run_one_iteration() ()
from
/home/brian/SDR/gnuradio/build/gnuradio-core/src/lib/libgnuradio-core-3.6.4.1.so.0.0.0
#6 0xb6e62ad6 in
gr_tpb_thread_body::gr_tpb_thread_body(boost::shared_ptr<gr_block>, int)
()
from
/home/brian/SDR/gnuradio/build/gnuradio-core/src/lib/libgnuradio-core-3.6.4.1.so.0.0.0
#7 0xb6e5ce6b in
boost::detail::function::void_function_obj_invoker0<gruel::thread_body_wrapper<tpb_container>,
void>::invoke(boost::detail::function::function_buffer&) ()
from
/home/brian/SDR/gnuradio/build/gnuradio-core/src/lib/libgnuradio-core-3.6.4.1.so.0.0.0
#8 0xb6d91b86 in boost::detail::thread_data<boost::function0
::run() ()
from
/home/brian/SDR/gnuradio/build/gruel/src/lib/libgruel-3.6.4.1.so.0.0.0
#9 0xb6cf924d in ?? () from /usr/lib/libboost_thread-mt.so.1.48.0
#10 0xb7608adf in start_thread () from /lib/libpthread.so.0
#11 0xb751042e in clone () from /lib/libc.so.6

That does appear to point to Volk, so I continue to look at that. I
learned about “volk_profile” and ran it, and then tried make test again.
I actually ended up with more failures than before, but it was
interesting that it did something different.

When run, volk_profile builds ~/.volk/volk_config, with entries like
this:
#this file is generated by volk_profile.
#the function name is followed by the preferred architecture.
volk_16ic_s32f_deinterleave_real_32f_a generic
volk_16ic_deinterleave_real_8i_a ssse3
volk_16ic_deinterleave_16i_x2_a sse2
volk_16ic_s32f_deinterleave_32f_x2_a sse
…

Where “generic”, “ssse3”, “sse2”, etc. are all chosen by volk_profile
based on how well your machine runs each. So I decided to try different
volk_config versions where I set all functions to the same architecture,
e.g. one version of volk_config with all “sse” like this:
volk_16ic_s32f_deinterleave_real_32f_a sse
volk_16ic_deinterleave_real_8i_a sse
volk_16ic_deinterleave_16i_x2_a sse
volk_16ic_s32f_deinterleave_32f_x2_a sse
…

Then for each version of volk_config I ran make test again. The results:
type results

volk_profile 92% tests passed, 16 tests failed out of 192
generic 99% tests passed, 2 tests failed out of 192
sse 92% tests passed, 15 tests failed out of 192
sse2 99% tests passed, 2 tests failed out of 192
sse3 94% tests passed, 12 tests failed out of 192
ssse3 99% tests passed, 2 tests failed out of 192

There is generally overlap in the failures, but clearly which arch I
chose affected the outcome. The best run time was on sse2, where only
these two tests fail:
114 - qa_ctcss_squelch
151 - qa_constellation_receiver

These two failures actually appeared with each version of volk_config.
#114 I actually saw before, but it is not a seg fault, it is assertion
errors, so I can continue to debug that on my own a bit. #151 is a seg
fault, but I found that it is intermittent - often the same test passes
or fails, even though I haven’t changed anything.

tl;dr
It does appear to be an issue with configuring Volk for my system.
Setting all functions to use “sse2” in volk_config clears up most of the
problems on my system, but not all.

Thanks again,
Brian

StamperS_Brian · April 24, 2013, 10:57pm

On Wed, Apr 24, 2013 at 10:50 AM, Brian S. [email protected]
wrote:

114: AssertionError: 39 != 31
114: AssertionError: 0.8 != 0.0 within 4 places

Now that’s just being unreasonable

Actually, I recall seeing these same failures sometime in the past,
and am looking around to see if I can find the reference.

–
Johnathan C.
Corgan Labs - SDR Training and Development Services
http://corganlabs.com

StamperS_Brian · April 24, 2013, 7:52pm

On Wed, Apr 24, 2013 at 1:07 PM, Johnathan C.
[email protected] wrote:

On Wed, Apr 24, 2013 at 9:46 AM, Stamper, Brian [email protected] wrote:

generic 99% tests passed, 2 tests failed out of 192

I’m curious which QA tests failed for you when everything was set to ‘generic’.

All tests had these two failures, including generic:
114 - qa_ctcss_squelch (Failed)
151 - qa_constellation_receiver (Failed)
Again, #114 has assertion errors and #151 is a seg fault.

Here I just used ctest -V -R qa_ctcss_squelch:

Start 114: qa_ctcss_squelch
114: Test command: /bin/sh
“/home/brian/SDR/gnuradio/build/gr-analog/python/qa_ctcss_squelch_test.sh”
114: Test timeout computed to be: 9.99988e+06
114: .FF
114:

114: FAIL: test_ctcss_squelch_002 (main.test_ctcss_squelch)
114:

114: Traceback (most recent call last):
114: File
“/home/brian/SDR/gnuradio/gr-analog/python/qa_ctcss_squelch.py”,
line 81, in test_ctcss_squelch_002
114: self.assertFloatTuplesAlmostEqual(expected_result, result_data, 4)
114: File
“/home/brian/SDR/gnuradio/gnuradio-core/src/python/gnuradio/gr_unittest.py”,
line 85, in assertFloatTuplesAlmostEqual
114: self.assertEqual (len(a), len(b))
114: AssertionError: 39 != 31
114:
114:

114: FAIL: test_ctcss_squelch_003 (main.test_ctcss_squelch)
114:

114: Traceback (most recent call last):
114: File
“/home/brian/SDR/gnuradio/gr-analog/python/qa_ctcss_squelch.py”,
line 106, in test_ctcss_squelch_003
114: self.assertFloatTuplesAlmostEqual(expected_result, result_data, 4)
114: File
“/home/brian/SDR/gnuradio/gnuradio-core/src/python/gnuradio/gr_unittest.py”,
line 87, in assertFloatTuplesAlmostEqual
114: self.assertAlmostEqual (a[i], b[i], places, msg)
114: AssertionError: 0.8 != 0.0 within 4 places
114:
114:

114: Ran 3 tests in 0.012s
114:
114: FAILED (failures=2)
1/1 Test #114: qa_ctcss_squelch …***Failed 1.26 sec

And here I used gdb in conjunction with
qa_constellation_receiver_test.sh, as before. It took 5 tries before
it failed again:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xad7f8b40 (LWP 26744)]
0xb6ac1e1a in fcomplex_dotprod_sse ()
from
/home/brian/SDR/gnuradio/build/gnuradio-core/src/lib/libgnuradio-core-3.6.4.1.so.0.0.0

(gdb) bt
#0 0xb6ac1e1a in fcomplex_dotprod_sse ()
from
/home/brian/SDR/gnuradio/build/gnuradio-core/src/lib/libgnuradio-core-3.6.4.1.so.0.0.0
#1 0xb6ac0880 in gr_fir_ccf_simd::filter(std::complex const*) ()
from
/home/brian/SDR/gnuradio/build/gnuradio-core/src/lib/libgnuradio-core-3.6.4.1.so.0.0.0
#2 0xb3836f70 in digital_pfb_clock_sync_ccf::general_work(int,
std::vector<int, std::allocator >&, std::vector<void const*,
std::allocator<void const*> >&, std::vector<void*,
std::allocator<void*> >&) ()
from
/home/brian/SDR/gnuradio/build/gr-digital/lib/libgnuradio-digital-3.6.4.1.so.0.0.0
#3 0x08be43f0 in ?? ()
#4 0xb10007fc in ?? ()
#5 0xb1000040 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)

StamperS_Brian · April 25, 2013, 10:45am

On 24/04/13 22:09, Johnathan C. wrote:

and am looking around to see if I can find the reference.

This was reported last January in the course of a thread related to
failures with Boost 1.5.2:

Re: [Discuss-gnuradio] Tests fail building with boost1.52

It’s not clear what there is in common with your system and the one
having the issue there, but you might review that whole thread and see
if anything jumps out at you.

Hi folks - all my tests in that thread were on x86_64 and we have since
moved to boost-1.53 which has resolved most of those issues.

I am following this thread with interest as we now have errors in i586
builds (Brian - I added details to your other thread that got no
response)
https://lists.gnu.org/archive/html/discuss-gnuradio/2013-04/msg00455.html

StamperS_Brian · April 25, 2013, 4:27pm

On Thu, Apr 25, 2013 at 4:43 AM, Barry J. [email protected]
wrote:

On 24/04/13 22:09, Johnathan C. wrote:
[snip]

This was reported last January in the course of a thread related to
failures with Boost 1.5.2:
[snip]
Hi folks - all my tests in that thread were on x86_64 and we have since
moved to boost-1.53 which has resolved most of those issues.

I am following this thread with interest as we now have errors in i586
builds (Brian - I added details to your other thread that got no response)
Re: [Discuss-gnuradio] How to diagnose make test failures

Thank you Johnathan, and that’s great news, Barry. My system reports
boost-1.48, which I thought would be okay based on this dependency
list, as referenced in the build instructions:
GNU Radio Manual and C++ API Reference: Build Instructions and Information which says boost

=1.35.

I’ll have to try a manual upgrade of boost and run some of my
tests/volk_configs again and report back.

Brian

StamperS_Brian · April 24, 2013, 11:10pm

On Wed, Apr 24, 2013 at 1:56 PM, Johnathan C.
[email protected] wrote:

On Wed, Apr 24, 2013 at 10:50 AM, Brian S. [email protected] wrote:

114: AssertionError: 39 != 31
114: AssertionError: 0.8 != 0.0 within 4 places

Now that’s just being unreasonable

Actually, I recall seeing these same failures sometime in the past,
and am looking around to see if I can find the reference.

This was reported last January in the course of a thread related to
failures with Boost 1.5.2:

https://lists.gnu.org/archive/html/discuss-gnuradio/2013-01/msg00360.html

It’s not clear what there is in common with your system and the one
having the issue there, but you might review that whole thread and see
if anything jumps out at you.

–
Johnathan C.
Corgan Labs - SDR Training and Development Services
http://corganlabs.com

StamperS_Brian · April 26, 2013, 10:36pm

On 25/04/13 15:26, Brian S. wrote:

I am following this thread with interest as we now have errors in i586
I’ll have to try a manual upgrade of boost and run some of my
tests/volk_configs again and report back.

Brian

Discuss-gnuradio mailing list
[email protected]
Discuss-gnuradio Info Page

Brian
I think you misunderstood.
We were using boost-1.52 which is blacklisted for gnuradio, hence the
need for us to update to 1.53.

I don’t think these current test failure problems are boost related and
AFAIK there is no reason for you to update boost.

The test failures I see (the ones that report segfault) are only when
building for i586, which is why I am interested in following this
thread.

Barry

StamperS_Brian · April 27, 2013, 8:05pm

On Fri, Apr 26, 2013 at 4:35 PM, Barry J. [email protected]
wrote:

building for i586, which is why I am interested in following this thread.

Barry

Yeah, things didn’t really change for me by upgrading boost. (I did
make uninstall and rebuilt the whole deal, just to be sure.)

I think I may stick with a volk_config using all ‘sse2’, which gives
me just the 114 - qa_ctcss_squelch and 151 - qa_constellation_receiver
failures, and really it turns out that #151 only fails sometimes. I
looked at the other thread about the qa_ctcss_squelch and didn’t
really find anything new, so I still need to dig into that.

On an OT note, I recently learned my distro of choice (Fuduntu) is
reaching end-of-life, so I’ll be starting this process over in a
different context at some point anyway, though I do believe some of
these issues are more related to hardware (like instruction set
capabilities) rather than software.

Brian

Volk machine: ssse3_32 segmentation faults (was RE: How to diagnose make test failures)

Start 114: qa_ctcss_squelch 114: Test command: /bin/sh “/home/brian/SDR/gnuradio/build/gr-analog/python/qa_ctcss_squelch_test.sh” 114: Test timeout computed to be: 9.99988e+06 114: .FF 114:

114: FAIL: test_ctcss_squelch_002 (main.test_ctcss_squelch) 114:

114: FAIL: test_ctcss_squelch_003 (main.test_ctcss_squelch) 114:

Start 114: qa_ctcss_squelch
114: Test command: /bin/sh
“/home/brian/SDR/gnuradio/build/gr-analog/python/qa_ctcss_squelch_test.sh”
114: Test timeout computed to be: 9.99988e+06
114: .FF
114:

114: FAIL: test_ctcss_squelch_002 (main.test_ctcss_squelch)
114:

114: FAIL: test_ctcss_squelch_003 (main.test_ctcss_squelch)
114: