Issues with running volk_profile in GENTOO


Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

On Mon, Feb 11, 2013 at 1:19 PM, Tommy T. II [email protected]
wrote:

I tried running volk_profile in Gentoo and got the following:

volk_profile

Using Volk machine: sse4_2_32_orc
RUN_VOLK_TESTS: volk_32fc_s32fc_rotatorpuppet_32fc_a
generic completed in 361.04s
sse4_1 completed in 0.49s

RUN_VOLK_TESTS: volk_32fc_32f_multiply_32fc_a

sse completed in 0.37s
Segmentation fault

I’m not sure what to make of the first one, but the second is likely
caused
by trying to execute a SIMD instruction on a CPU that doesn’t actually
have
it. This has happened before when the VOLK detection routines had a
bug,
or when a VM “lies” about being able to virtualize the SIMD instruction
set
in the cpuid.

If you could run CMake again, save the output to a file, then grep the
two
lines below:

– Available architectures:
generic;64;3dnow;abm;popcount;mmx;sse;sse2;orc;norc;sse3;ssse3;sse4_a;sse4_1;sse4_2;avx
– Available machines:
generic_orc;sse2_64_mmx_orc;sse3_64_orc;ssse3_64_orc;sse4_a_64_orc;sse4_1_64_orc;sse4_2_64_orc;avx_64_mmx_orc

…you’ll see what VOLK came up with (the above is from the machine I am
typing on now.) You can compare this to the capabilities reported by
/proc/cpuinfo to see if there is a difference.

Johnathan

Thank you. I think this may be the problem:

grep “Available architectures” cmake.out

– Available architectures:
generic;32;3dnow;abm;popcount;mmx;sse;sse2;orc;norc;sse3;ssse3;sse4_a;sse4_1;sse4_2;avx

grep “Available machines” cmake.out

– Available machines:
generic_orc;sse2_32_mmx_orc;sse3_32_orc;ssse3_32_orc;sse4_a_32_orc;sse4_1_32_orc;sse4_2_32_orc;avx_32_mmx_orc

/proc/cpuinfo flags:
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est
tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm arat dts
tpr_shadow vnmi flexpriority ept void

It looks like my processor does not support avx, but Gnuradio assumes it
does. Is there a way to disable avx?

               Sincerely,
      Tommy James Tracy II
        PhD Student

High Performance Low Power Lab
University of Virginia

On Wed, Feb 13, 2013 at 8:25 AM, Tommy T. II [email protected]
wrote:

pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
*ssse3 *cx16 xtpr pdcm sse4_1 sse4_2 popcnt aes lahf_lm arat dts
tpr_shadow vnmi flexpriority ept void

It looks like my processor does not support avx, but Gnuradio assumes it
does. Is there a way to disable avx?

It would be best to find out why libvolk is detecting avx during cmake.
Can you post the rest of the lines from your cmake.out related to volk
(should be near the beginning)?

Johnathan

The problem is that during the build, AVX support was enabled, even
though my processor doesn’t support it.

– Python checking for Cheetah >= 2.0.0
– Python checking for Cheetah >= 2.0.0 - found
– Compiler name: GNU
– x86* CPU detected
– CPU width is 32 bits, Overruled arch 64
– Available architectures:
generic;32;3dnow;abm;popcount;mmx;sse;sse2;orc;norc;sse3;ssse3;sse4_a;sse4_1;sse4_2;avx
– Available machines:
generic_orc;sse2_32_mmx_orc;sse3_32_orc;ssse3_32_orc;sse4_a_32_orc;sse4_1_32_orc;sse4_2_32_orc;avx_32_mmx_orc

               Sincerely,
      Tommy James Tracy II
        PhD Student

High Performance Low Power Lab
University of Virginia

On 02/13/2013 01:44 PM, Johnathan C. wrote:

does. Is there a way to disable avx?

It would be best to find out why libvolk is detecting avx during cmake.

It determines that AVX is supported by the compiler via flags. So,
support for AVX will be built into the library.

From the output of the volk profile, it doesnt seem that AVX was
detected. So, all seems well so far…

-josh

On 02/13/2013 01:59 PM, Tommy T. II wrote:

The problem is that during the build, AVX support was enabled, even though my
processor doesn’t support it.

Thats intended because AVX is supported by the compiler. You should
notice however, that VOLK detected at runtime that AVX was not actually
available on your CPU.

Back to the original issue, I thought there was a segfault in one of the
profile tests. I think a gdb backtrace would be helpful to see which one
is failing.

-josh

What I got from GDB:
Is there any way to get more information using backtrace?

Program received signal SIGSEGV, Segmentation fault.
0xf7f77098 in volk_32fc_32f_multiply_32fc_a_generic () from
/usr/lib/libvolk.so.0.0.0

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

               Sincerely,
      Tommy James Tracy II
        PhD Student

High Performance Low Power Lab
University of Virginia

Nick, thank you. I was wondering why AVX showed up in the list if the
processor didn’t support it.
Does anyone have any ideas why rotatorpupper would take so long?

               Sincerely,
      Tommy James Tracy II
        PhD Student

High Performance Low Power Lab
University of Virginia

On 02/13/2013 12:17 PM, Tommy T. II wrote:

Nick, thank you. I was wondering why AVX showed up in the list if the
processor didn’t support it.
Does anyone have any ideas why rotatorpupper would take so long?

I don’t, but I’m not that worried about it as the generic implementation
is only there for backup when the hardware doesn’t support the more
effective SIMD version. Generic implementation times can vary hugely
dependent on which version of GCC you’re using, what optimization flags
were enabled, etc. And sometimes GCC just optimizes really, really
terribly.

The segfault is a different story. Like Josh suggests a backtrace would
be helpful to see exactly what went wrong.

–n

On Wed, Feb 13, 2013 at 12:49 PM, Nick F. [email protected] wrote:

The segfault is a different story. Like Josh suggests a backtrace would be
helpful to see exactly what went wrong.

The generic implementation of the rotator function normally takes 5-6
seconds on a typical machine, so the 341 seconds is quite the outlier,
and
I’d suspect there is something else going on to cause that besides poor
compiler optimization.

Regarding the segfault, yes, compiling GNU Radio with debug symbols
turned
on and doing a backtrace under gdb would provide us more info.

This can be done by re-running CMake with -DCMAKE_BUILD_TYPE=DEBUG and
recompiling.

Johnathan