Segmentation fault in qa_constellation_receiver_test

After the make test failed for this module, I decided to poke around to
see
if there is an easy fix. I made a script that simply executes the test
over
and over until it seg faults and exits after the core file is created.

xxxxx@xxxx:~/src/gnuradio/build/gr-digital/python/digital$ ./runtests.sh
Using Volk machine: avx_64_mmx
Segmentation fault (core dumped)

xxxxx@xxxx:~/src/gnuradio/build/gr-digital/python/digital$ gdb
/usr/bin/python2.7 core
(gdb) bt
(gdb) bt
#0 0x00007fe8f627fb17 in volk_32fc_32f_dot_prod_32fc_a_avx ()
from /home/kelly/src/gnuradio/build/volk/lib/libvolk.so.0.0.0
#1 0x00007fe8f52dd25f in
gr::filter::kernel::fir_filter_ccf::filter(std::complex const*)
()
from
/home/kelly/src/gnuradio/build/gr-filter/lib/libgnuradio-filter-3.8git.so.0.0.0
#2 0x00007fe8f143c45b in
gr::digital::pfb_clock_sync_ccf_impl::general_work(int, std::vector<int,
std::allocator >&, std::vector<void const*, std::allocator<void
const*> >&, std::vector<void*, std::allocator<void*> >&) ()
from
/home/kelly/src/gnuradio/build/gr-digital/lib/libgnuradio-digital-3.8git.so.0.0.0
#3 0x00007fe8f653809e in gr::block_executor::run_one_iteration() ()
from
/home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
#4 0x00007fe8f6573622 in
gr::tpb_thread_body::tpb_thread_body(boost::shared_ptrgr::block, int)
()
from
/home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
#5 0x00007fe8f6565ea1 in
boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrappergr::tpb_container,
void>::invoke(boost::detail::function::function_buffer&) ()
from
/home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
—Type to continue, or q to quit—
#6 0x00007fe8f6526610 in
boost::detail::thread_data<boost::function0

::run() ()
from
/home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0
#7 0x00007fe8f9adc94a in ?? ()
from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.53.0
#8 0x00007fe8fc8a3f6e in start_thread (arg=0x7fe8e2ffd700)
at pthread_create.c:311
#9 0x00007fe8fc5ce9cd in clone ()
at …/sysdeps/unix/sysv/linux/x86_64/clone.S:113

Of course, I had to recompile it with debugging info to glean anything
useful from the stack trace. So, I did that and I traced the bug to
this
line:

c0Val = _mm256_mul_ps(a0Val, b0Val);

I can’t dump the values in a0Val or b0Val, though, because they’re
intermediate values that are optimized away by the optimized kernel
code.
I tried stepping through the assembler instructions but I’m not familiar
with the various sse and avx extensions. Heck, I’m not even familiar
with
the x86_64 instruction set. So I have a huge learning curve ahead of
me,
there. Is it possible to just dump the values in these __m256 data
types
to a file so I can debug it that way? If that’s not easy to do, then
I’m
willing to learn what I have to about the instruction set so I can debug
this thing. But I would sure appreciate some help if anyone has some
advice to offer.

Software version:
I rebased to the latest version of the next branch last night before I
went
to bed at around 1:30 am CDT.

Operating System:
kelly@octs2:~/src/gnuradio/volk/kernels/volk$ uname -a
Linux octs2 3.11.0-17-generic #31-Ubuntu SMP Mon Feb 3 21:52:43 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux
It’s Ubuntu 13.10

Hardware: ASUS X750J
Intel Quad Core i7 4700HQ 2.4GHz

cpuinfo:
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel® Core™ i7-4700HQ CPU @ 2.40GHz
stepping : 3
microcode : 0x8
cpu MHz : 2401.000
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall
nx
pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic
movbe
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat
epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
fsgsbase
tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
bogomips : 4789.27
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:

On Thu, Feb 20, 2014 at 11:25 PM, Kelly B. [email protected]
wrote:

(gdb) bt

&, std::vector<void*, std::allocator<void*> >&) ()

boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrappergr::tpb_container,

#8 0x00007fe8fc8a3f6e in start_thread (arg=0x7fe8e2ffd700)
I can’t dump the values in a0Val or b0Val, though, because they’re
I rebased to the latest version of the next branch last night before I went

physical id : 0
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
power management:

Hi Kelly,

First, this is great debugging, thanks for getting so much info and
trying to go for a fix on your own.

On to the good stuff. I was able to reproduce this on my i7-4700MQ.
Here’s some additional info for the logs:

  • constellation_receiver is a hier block with a fir_filter_ccf inside
    that is calling the volk avx dot product.
  • The avx dot product proto-kernel passes VOLK QA
  • The qa_fir_filter.py is testing a fir_filter_ccf that passes its QA.
  • Just for kicks, I forced VOLK to use the generic kernel and I still
    see the segfault.

A couple of things I’d like to try (and please feel free to give these a
try):

  • Go back to a commit just before fir_filter.cc started using
    volk_malloc and volk_free. (or for bonus points go back to some point
    in time when this test always passes and do a git bisect)
  • fiddle with parameters of the test, data length, number of taps in
    filter, etc.
  • Doubtful this would change, but test on different processors. It
    would be pretty wild if there was something off in the 4700 line, but
    the fact that the generic proto-kernel had the same result and nobody
    else has reported this yet is suspicious. My guess is GCC is actually
    emitting very similar code for the generic and avx dot product
    proto-kernels.

Nathan

On Fri, Feb 21, 2014 at 2:39 AM, West, Nathan
[email protected] wrote:

/usr/bin/python2.7 core
std::allocator >&, std::vector<void const*, std::allocator<void const*>
#5 0x00007fe8f6565ea1 in
from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.53.0

Software version:
Intel Quad Core i7 4700HQ 2.4GHz
cache size : 6144 KB
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
address sizes : 39 bits physical, 48 bits virtual
Here’s some additional info for the logs:
volk_malloc and volk_free. (or for bonus points go back to some point
Nathan
I was having similar issues this week with some AVX boxes. It looks
like it’s a problem using posix_memalign (which is called by
volk_malloc if posix_memalign is available). Removing the use of
posix_memalign solves my problem. I’ll work with Nathan off-list to
see about fixing this, possibly by removing the use of that version of
malloc.

Tom

Thank you, Tom. I’ll try that after I’m off of work tonight. And thank
you
for the great ideas, Nathan.
On Fri, Feb 21, 2014 at 2:39 AM, West, Nathan
[email protected] wrote:

On Thu, Feb 20, 2014 at 11:25 PM, Kelly B. [email protected]
wrote:

After the make test failed for this module, I decided to poke around to
see
if there is an easy fix. I made a script that simply executes the test
over
#0 0x00007fe8f627fb17 in volk_32fc_32f_dot_prod_32fc_a_avx ()
from /home/kelly/src/gnuradio/build/volk/lib/libvolk.so.0.0.0
#1 0x00007fe8f52dd25f in
gr::filter::kernel::fir_filter_ccf::filter(std::complex const*) ()
from

/home/kelly/src/gnuradio/build/gr-filter/lib/libgnuradio-filter-3.8git.so.0.0.0

#2 0x00007fe8f143c45b in
gr::digital::pfb_clock_sync_ccf_impl::general_work(int, std::vector<int,
std::allocator >&, std::vector<void const*, std::allocator<void
const*>

&, std::vector<void*, std::allocator<void*> >&) ()
from

/home/kelly/src/gnuradio/build/gr-digital/lib/libgnuradio-digital-3.8git.so.0.0.0

#3 0x00007fe8f653809e in gr::block_executor::run_one_iteration() ()
from

/home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0

#4 0x00007fe8f6573622 in
gr::tpb_thread_body::tpb_thread_body(boost::shared_ptrgr::block, int)
()
from

/home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0

#5 0x00007fe8f6565ea1 in

boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrappergr::tpb_container,

void>::invoke(boost::detail::function::function_buffer&) ()
from

/home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0

—Type to continue, or q to quit—
#6 0x00007fe8f6526610 in
boost::detail::thread_data<boost::function0

::run() ()
from

/home/kelly/src/gnuradio/build/gnuradio-runtime/lib/libgnuradio-runtime-3.8git.so.0.0.0

c0Val = _mm256_mul_ps(a0Val, b0Val);

I can’t dump the values in a0Val or b0Val, though, because they’re
intermediate values that are optimized away by the optimized kernel
code. I
tried stepping through the assembler instructions but I’m not familiar
with
the various sse and avx extensions. Heck, I’m not even familiar with the
x86_64 instruction set. So I have a huge learning curve ahead of me,
there.
Is it possible to just dump the values in these __m256 data types to a
file
so I can debug it that way? If that’s not easy to do, then I’m willing
to
learn what I have to about the instruction set so I can debug this thing.
But I would sure appreciate some help if anyone has some advice to offer.

Software version:
I rebased to the latest version of the next branch last night before I
went

physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx
est

that is calling the volk avx dot product.

  • The avx dot product proto-kernel passes VOLK QA
  • The qa_fir_filter.py is testing a fir_filter_ccf that passes its QA.
  • Just for kicks, I forced VOLK to use the generic kernel and I still
    see the segfault.

A couple of things I’d like to try (and please feel free to give these a
try):
proto-kernels.

Nathan

I was having similar issues this week with some AVX boxes. It looks
like it’s a problem using posix_memalign (which is called by
volk_malloc if posix_memalign is available). Removing the use of
posix_memalign solves my problem. I’ll work with Nathan off-list to
see about fixing this, possibly by removing the use of that version of
malloc.

Tom

If you just want to get back to a system that passes QA you should
just be able to build off of maint.

I removed the implementation of volk_malloc that uses posix_menacing by
commenting everything from the #if to #else and the final #endif but the
segmentation fault remains. I noticed it’s being called in a few other
files as well. Do I need to remove those, too? Thanks in advance.

On Fri, Feb 21, 2014 at 10:12 PM, Kelly B. [email protected]
wrote:

I’m encountering the same problem on maint. And I did remember to rebuild.
I removed the build directory, recreated it, and started over with cmake
just to be sure. It’s the same stack trace.

Yeah, false alarm on volk_malloc. Turns out my issue was coming from
qtgui::freq_sink_c because we’ve introduced a new AVX proto-kernel and
that block was using the aligned kernel always (not the dispatcher).
There’s one input that is a std::vector which only guarantees
alignment to 16-bytes, so non-AVX calls were safe. I’m going to push a
patch for this soon.

So this will not address the problem you’re having, which I cannot
reproduce (but hopefully Nathan can figure out since he can reproduce
the issue).

Tom

I’m encountering the same problem on maint. And I did remember to
rebuild.
I removed the build directory, recreated it, and started over with cmake
just to be sure. It’s the same stack trace.

On Fri, Feb 21, 2014 at 7:54 PM, West, Nathan