Volk library invalid opcode exception

Hello, I’m getting the invalid opcode exception whenever the volk
library is used from gr/grc. It is also easy to reproduce by executing
volk_profile:

[user@rflab gnuradio]$ volk_profile
Using Volk machine: avx_64
RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_real_32f_a
Illegal instruction
[user@rflab gnuradio]$ dmesg
[ 6920.211094] volk_profile[25627] trap invalid opcode ip:7f8145b74d40
sp:7fff41dfac78 error:0 in libvolk.so.0.0.0[7f8145ad7000+cf000]

I tried v3.5.2 and v3.5.2 build directly from git, using the building
script from here:

http://gnuradio.org/redmine/repositories/changes/gnuradio/README

Here’s my cpuinfo:

[user@rflab gnuradio]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core™ i5-2540M CPU @ 2.60GHz
stepping : 7
cpu MHz : 2591.660
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 1
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2
ss ht syscall nx lm constant_tsc nopl aperfmperf pni pclmulqdq ssse3
cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm ida arat epb pln
pts dts
bogomips : 5183.32
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

(and repeated 3x times for the other cores).

And, FWIW, this is the autoconfig snippet:

– Configuring volk support…
– Enabling volk support.
– Override with -DENABLE_VOLK=ON/OFF
– Boost version: 1.46.0
– Found the following Boost libraries:
– unit_test_framework
– checking for module ‘orc-0.4’
– found orc-0.4, version 0.4.16
– Found ORC: /usr/lib64/liborc-0.4.so
– Check size of void*
– Check size of void* - done
– Performing Test have_maltivec
– Performing Test have_maltivec - Failed
– Performing Test have_mfpu=neon
– Performing Test have_mfpu=neon - Failed
– Performing Test have_mfloat-abi=softfp
– Performing Test have_mfloat-abi=softfp - Failed
– Performing Test have_funsafe-math-optimizations
– Performing Test have_funsafe-math-optimizations - Success
– 32 overruled
– Performing Test have_m64
– Performing Test have_m64 - Success
– Performing Test have_m3dnow
– Performing Test have_m3dnow - Success
– Performing Test have_msse4.2
– Performing Test have_msse4.2 - Success
– Performing Test have_mpopcnt
– Performing Test have_mpopcnt - Success
– Performing Test have_mmmx
– Performing Test have_mmmx - Success
– Performing Test have_msse
– Performing Test have_msse - Success
– Performing Test have_msse2
– Performing Test have_msse2 - Success
– Performing Test have_lorc-0.4
– Performing Test have_lorc-0.4 - Success
– Performing Test have_msse3
– Performing Test have_msse3 - Success
– Performing Test have_mssse3
– Performing Test have_mssse3 - Success
– Performing Test have_msse4a
– Performing Test have_msse4a - Success
– Performing Test have_msse4.1
– Performing Test have_msse4.1 - Success
– Performing Test have_mavx
– Performing Test have_mavx - Success
– Available arches:
generic;64;3dnow;abm;popcount;mmx;sse;sse2;orc;sse3;ssse3;sse4_a;sse4_1;sse4_2;avx
– Available machines:
generic;sse2_only;sse2_64;sse3_64;ssse3_64;sse4_a_64;sse4_1_64;sse4_2_64;avx_64;avx_only
– Using install prefix: /usr/local
– Found Doxygen: /usr/bin/doxygen

One more thing to note that I’m running in a Xen PV VM, although this
should not matter, as the usermode instructions execute directly on the
CPU in this mode.

Thanks,
joanna.

On 04/15/12 13:07, Joanna Rutkowska wrote:

sp:7fff41dfac78 error:0 in libvolk.so.0.0.0[7f8145ad7000+cf000]

I tried v3.5.2 and v3.5.2 build directly from git, using the building
script from here:

I meant: v3.5.2 and v3.5.3, of course.

On Sun, Apr 15, 2012 at 7:07 AM, Joanna Rutkowska
[email protected] wrote:

sp:7fff41dfac78 error:0 in libvolk.so.0.0.0[7f8145ad7000+cf000]
vendor_id : GenuineIntel
apicid : 3
clflush size : 64
– Override with -DENABLE_VOLK=ON/OFF
– Performing Test have_mfpu=neon
– Performing Test have_msse4.2
– Performing Test have_lorc-0.4 - Success
– Available arches:
Thanks,
joanna.

Can you try to build using cmake? We’ve had some issues with the
autotools scripts setting up the right Volk machines and being on a VM
might be confusing it.

Tom

On Sun, Apr 15, 2012 at 10:11 AM, Joanna Rutkowska
[email protected] wrote:

Illegal instruction

siblings : 4
cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm ida arat epb pln

– Check size of void* - done
– Performing Test have_m64 - Success
– Performing Test have_msse2
– Performing Test have_msse4.1 - Success
should not matter, as the usermode instructions execute directly on the
cmake only for the volk component manually:
RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_real_32f_a
Illegal instruction

Perhaps you meant to not use cmake? Can you provide the specific build
instructions I should try?

Thanks,
joanna.

No, I definitely meant that you should use cmake. Above you had
mentioned the “autoconfig snippet,” so I though you were using the
autotools build.

Does ‘make test’ pass? If not, can you run:

ctest -V -R volk

And provide the output.

Tom

On 04/15/12 15:28, Tom R. wrote:

[user@rflab gnuradio]$ dmesg
[user@rflab gnuradio]$ cat /proc/cpuinfo
core id : 1
pts dts
– Configuring volk support…
– Performing Test have_maltivec
– Performing Test have_m3dnow
– Performing Test have_msse2 - Success
– Performing Test have_mavx
CPU in this mode.

Thanks,
joanna.

Can you try to build using cmake? We’ve had some issues with the
autotools scripts setting up the right Volk machines and being on a VM
might be confusing it.

Hm… actually I’ve been using cmake already… Anyway, I tried to run
cmake only for the volk component manually:

[user@rflab gnuradio]$ cd volk/
[user@rflab volk]$ mkdir build
[user@rflab volk]$ cd build/
[user@rflab build]$ cmake -D GR_RUNTIME_DIR=bin …
/…/
[user@rflab build]$ make
/…/
[user@rflab build]$ apps/volk_profile
Using Volk machine: avx_64
RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_real_32f_a
Illegal instruction

Perhaps you meant to not use cmake? Can you provide the specific build
instructions I should try?

Thanks,
joanna.

On 04/15/12 16:29, Tom R. wrote:

Using Volk machine: avx_64

cache size : 3072 KB
flags : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr
sse sse2

– Found ORC: /usr/lib64/liborc-0.4.so
– 32 overruled
– Performing Test have_msse
– Performing Test have_msse4a - Success

[user@rflab build]$ apps/volk_profile
No, I definitely meant that you should use cmake. Above you had
mentioned the “autoconfig snippet,” so I though you were using the
autotools build.

Does ‘make test’ pass? If not, can you run:

ctest -V -R volk

And provide the output.

It fails:

[user@rflab volk]$ cd build/
[user@rflab build]$ make test
Running tests…
Test project /rw/home/user/gnuradio/gnuradio/volk/build
Start 1: qa_volk_test_all
1/1 Test #1: qa_volk_test_all …***Failed 0.01 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 0.03 sec

The following tests FAILED:
1 - qa_volk_test_all (Failed)
Errors while running CTest
make: *** [test] Error 8
[user@rflab build]$ ctest -V -R volk
UpdateCTestConfiguration from
:/rw/home/user/gnuradio/gnuradio/volk/build/DartConfiguration.tcl
UpdateCTestConfiguration from
:/rw/home/user/gnuradio/gnuradio/volk/build/DartConfiguration.tcl
Test project /rw/home/user/gnuradio/gnuradio/volk/build
Constructing a list of tests
Done constructing a list of tests
Checking test dependency graph…
Checking test dependency graph end
test 1
Start 1: qa_volk_test_all

1: Test command: /rw/home/user/gnuradio/gnuradio/volk/build/lib/test_all
1: Test timeout computed to be: 9.99988e+06
1: Running 88 test cases…
1: Using Volk machine: avx_64
1: RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_real_32f_a
1: unknown location(0): fatal error in
“volk_16ic_s32f_deinterleave_real_32f_a_test”: signal: illegal operand;
address of failing instruction: 0x7fed0f681d40
1: /rw/home/user/gnuradio/gnuradio/volk/lib/testqa.cc(7): last
checkpoint
1:
1: *** 1 failure detected in test suite “Master Test Suite”
1/1 Test #1: qa_volk_test_all …***Failed 0.01 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 0.02 sec

The following tests FAILED:
1 - qa_volk_test_all (Failed)
Errors while running CTest

joanna.

On Sun, Apr 15, 2012 at 10:32 AM, Joanna Rutkowska
[email protected] wrote:

Constructing a list of tests
1: RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_real_32f_a
Total Test time (real) = 0.02 sec

The following tests FAILED:
1 - qa_volk_test_all (Failed)
Errors while running CTest

joanna.

Unfortunately, that doesn’t narrow things down. How about running
volk_profile under gdb? Let’s see if we can find the instruction it’s
puking on.

I find it odd that in your original post, it looks like volk_profile
is running avx_64, but your proc/cpuinfo doesn’t show that the
processor has AVX. This shouldn’t matter in this case since the
deinterleave kernel doesn’t have AVX, but I think something is still
getting confused, probably through Xen.

Tom

On Sun, Apr 15, 2012 at 10:50 AM, Joanna Rutkowska
[email protected] wrote:

What’s the recommended way to build volk with debug symbols using cmake?
Sorry, I don’t have much experience with cmake…

joanna.

You pass the -DCMAKE_BUILD_TYPE=“Debug” to cmake when configuring.
That sets the -g flag when building to get us the symbols out.

Tom

On 04/15/12 16:46, Tom R. wrote:

Unfortunately, that doesn’t narrow things down. How about running
volk_profile under gdb? Let’s see if we can find the instruction it’s
puking on.

I find it odd that in your original post, it looks like volk_profile
is running avx_64, but your proc/cpuinfo doesn’t show that the
processor has AVX. This shouldn’t matter in this case since the
deinterleave kernel doesn’t have AVX, but I think something is still
getting confused, probably through Xen.

This is what gdb spits out (without debug symbols):

Program received signal SIGILL, Illegal instruction.
0x00007ffff7b4cd40 in volk_16ic_s32f_deinterleave_real_32f_a_sse4_1 ()
from /rw/home/user/gnuradio/gnuradio/volk/build/lib/libvolk.so.0.0.0
Missing separate debuginfos, use: debuginfo-install
boost-test-1.46.0-3.fc15.x86_64 glibc-2.14.1-6.x86_64
libgcc-4.6.3-2.fc15.x86_64 libstdc+±4.6.3-2.fc15.x86_64
orc-0.4.16-5.fc15.x86_64

What’s the recommended way to build volk with debug symbols using cmake?
Sorry, I don’t have much experience with cmake…

joanna.

On 04/15/12 16:52, Tom R. wrote:

deinterleave kernel doesn’t have AVX, but I think something is still
orc-0.4.16-5.fc15.x86_64

What’s the recommended way to build volk with debug symbols using cmake?
Sorry, I don’t have much experience with cmake…

joanna.
You pass the -DCMAKE_BUILD_TYPE=“Debug” to cmake when configuring.
That sets the -g flag when building to get us the symbols out.

With dbg symbols:

Starting program:
/rw/home/user/gnuradio/gnuradio/volk/build/apps/volk_profile
[Thread debugging using libthread_db enabled]
Using Volk machine: avx_64
RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_real_32f_a

Program received signal SIGILL, Illegal instruction.
0x00007ffff7b289bd in volk_16ic_s32f_deinterleave_real_32f_a_sse4_1
(iBuffer=0x7053a0, complexVector=0x7ffff7e47020, scalar=4.59163468e-41,
num_points=4160742024) at
/rw/home/user/gnuradio/gnuradio/volk/include/volk/volk_16ic_s32f_deinterleave_real_32f_a.h:17
17 static inline void
volk_16ic_s32f_deinterleave_real_32f_a_sse4_1(float* iBuffer, const
lv_16sc_t* complexVector, const float scalar, unsigned int num_points){

j.

On 04/15/12 16:59, Joanna Rutkowska wrote:

(iBuffer=0x7053a0, complexVector=0x7ffff7e47020, scalar=4.59163468e-41,
num_points=4160742024) at

/rw/home/user/gnuradio/gnuradio/volk/include/volk/volk_16ic_s32f_deinterleave_real_32f_a.h:17

17 static inline void
volk_16ic_s32f_deinterleave_real_32f_a_sse4_1(float* iBuffer, const
lv_16sc_t* complexVector, const float scalar, unsigned int num_points){

Sorry, this didn’t paste in into the last message:

Dump of assembler code for function
volk_16ic_s32f_deinterleave_real_32f_a_sse4_1:
0x00007ffff7b289a4 <+0>: push %rbp
0x00007ffff7b289a5 <+1>: mov %rsp,%rbp
0x00007ffff7b289a8 <+4>: sub $0x108,%rsp
0x00007ffff7b289af <+11>: mov %rdi,-0x158(%rbp)
0x00007ffff7b289b6 <+18>: mov %rsi,-0x160(%rbp)
=> 0x00007ffff7b289bd <+25>: vmovss %xmm0,-0x164(%rbp)

j.

On Sun, Apr 15, 2012 at 11:04 AM, Joanna Rutkowska
[email protected] wrote:

0x00007ffff7b289bd in volk_16ic_s32f_deinterleave_real_32f_a_sse4_1
Dump of assembler code for function
volk_16ic_s32f_deinterleave_real_32f_a_sse4_1:
0x00007ffff7b289a4 <+0>: push %rbp
0x00007ffff7b289a5 <+1>: mov %rsp,%rbp
0x00007ffff7b289a8 <+4>: sub $0x108,%rsp
0x00007ffff7b289af <+11>: mov %rdi,-0x158(%rbp)
0x00007ffff7b289b6 <+18>: mov %rsi,-0x160(%rbp)
=> 0x00007ffff7b289bd <+25>: vmovss %xmm0,-0x164(%rbp)

j.

Yes, so the vmovss is an AVX instruction (the AVX version of movss),
but your processor doesn’t have AVX according to your flags above.
Except that it does. According to Intel, the i5-2540M processor
supports AVX, but your OS isn’t recognizing the avx flag in
/proc/cpuinfo. The Volk build process asks the processor directly for
the flags that it can use.

I really think this is a problem with Xen (or at least something in the
setup).

Tom

On Sun, Apr 15, 2012 at 11:51 AM, Marcus D. Leech [email protected]
wrote:

I really think this is a problem with Xen (or at least something in the


Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium
http://www.sbrac.org

No, you didn’t misunderstand, I misspoke. Josh and the other Volk
developers figured this out already. The system builds libraries for
all intrinsics that the compiler supports. It’s at run-time that the
correct machine is chosen from the list of what’s available and what
your system can support.

I’m not sure if I’m explaining that exactly or well, but the problem
you brought up was considered in the design to allow for us to do
exactly that with Volk. I think in this case, the OS and the actual
processor are at odds with what they can do, causing the problem.

Tom

On 04/15/2012 11:45 AM, Tom R. wrote:

Tom

So, how is this going to play out with packaged-binaries? If the
decisions about which instruction sets to use are made at compile time,
you could end up with packaged binaries that aren’t portable, and
will blow the heck up. Or am I mis-understanding what you mean
by “at build time”?


Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

On Sun, Apr 15, 2012 at 2:26 PM, Joanna Rutkowska <
[email protected]> wrote:

Another potential explanations of why this doesn’t work I could come up
with:

  1. Perhaps volk somehow erroneously interprets cpuid info and assumes
    that AVX is present, while it is no…? Tom, can you point out the
    specific code in volk that is responsible for deciding whether to use
    AVX or not?

Your CPU has AVX capability, no doubt about it. I agree with Tom that
it’s
likely that Xen is disabling AVX support with XSETVB – I’m not sure why
it
does that. Normal people do not disable extended instruction sets on new
processors. It’s just turning off silicon you paid for, after all. =)

Attached is a patch for Volk which performs the additional step of
verifying AVX with XGETBV to determine that the OS is not turning off
useful things. This doesn’t fix the fact that Xen is busted, it just
won’t
run AVX instructions when the instructions are disabled.

Joanna, please test this patch for me and verify that your Volk machine
enumerates as sse4_2_64. Thanks!

Tom, the patch is available (based on latest master) at
github/bistromath:gnuradio.git on the xgetbv branch.

–n

On 04/15/12 17:45, Tom R. wrote:

Yes, so the vmovss is an AVX instruction (the AVX version of movss),
but your processor doesn’t have AVX according to your flags above.
Except that it does. According to Intel, the i5-2540M processor
supports AVX, but your OS isn’t recognizing the avx flag in
/proc/cpuinfo. The Volk build process asks the processor directly for
the flags that it can use.

I really think this is a problem with Xen (or at least something in the setup).

Assuming that the VM kernel is messing up the info that is exposed to
apps via /proc/cpuinfo (this might be likely, sure), and that volk uses
cpuid instruction to actually figure out whether AVX is supported, it
should still work fine – volk would just use AVX instruction and it
SHOULD work, because this is a ring3 instruction and Xen has no way to
intercept it or prevent its execution (this is true for both PV and HVM
guests – in case of HVMs there is no VMX intercept that would trigger
on AVX execution, at least I couldn’t find one in the SDM)…

So, why it doesn’t work? Is there any way one can configure a processor
(via MSR perhaps?) to disable AVX? Xen could be doing that, and
forgetting to remove the AVX flag from the cpuid info exposed to
guests…

Another potential explanations of why this doesn’t work I could come up
with:

  1. Perhaps volk somehow erroneously interprets cpuid info and assumes
    that AVX is present, while it is no…? Tom, can you point out the
    specific code in volk that is responsible for deciding whether to use
    AVX or not?

  2. There is a compiler error with generating this opcode correctly
    (which would be, however, very strange, as the gdb displays this
    instruction fine…),

  3. My processor is buggy :o

Any other idea?

This is getting interesting :slight_smile:

Thanks,
joanna.

On Sun, Apr 15, 2012 at 8:24 PM, Nick F. [email protected] wrote:

Attached is a patch with one further check – to make sure the check that
AVX is enabled by the OS, is enabled by the OS.

No kidding.

–n

Wonderful, Nick, thanks.

Joanna, let us know how this works for you. I’m going to be on the
road for the next few days, so I’ll be sparsely available.

Tom

Attached is a patch with one further check – to make sure the check
that
AVX is enabled by the OS, is enabled by the OS.

No kidding.

–n

On 04/16/12 13:47, Tom R. wrote:

road for the next few days, so I’ll be sparsely available.
Yes, it works! I can now use the gr_add block to add to sin signals, how
cool! :wink:

BTW, as Nick’s patch didn’t apply cleanly on v3.5.3, I pulled from the
git and applied it on top of the HEAD – please let me know if you think
it can get me into troubles to work on GR/GRC build from the HEAD
instead of from some v3.5.x tag.

Anyway, I looked into Xen sources and it seems like Xen does allow the
guest PV kernel to set bits 1 and 2 in the XCR0 register – here’s the
relevant code (I think):

case 0xd1: /* XSETBV */
{
u64 new_xfeature = (u32)regs->eax | ((u64)regs->edx << 32);

        if ( lock || rep_prefix || opsize_prefix
             || !(v->arch.guest_context.ctrlreg[4] & 

X86_CR4_OSXSAVE) )
{
do_guest_trap(TRAP_invalid_op, regs, 0);
goto skip;
}

        if ( !guest_kernel_mode(v, regs) )
            goto fail;

        switch ( (u32)regs->ecx )
        {
            case XCR_XFEATURE_ENABLED_MASK:
                /* bit 0 of XCR0 must be set and reserved bit must

not be set */
if ( !(new_xfeature & XSTATE_FP) || (new_xfeature &
~xfeature_mask) )
goto fail;

                v->arch.xcr0 = new_xfeature;
                v->arch.xcr0_accum |= new_xfeature;
                set_xcr0(new_xfeature);
                break;
            default:
                goto fail;
        }
        break;

So, it seems like it is not a Xen issue, but instead that the kernel I’m
using in the VM (essentially a vanilla 3.0.4) is not enabling AVX in
XCR0. It would be interesting if anybody could try this on a non-Xen
system with a similarly old kernel as I have (and on the AVX-capable
processor, of course).

Thanks,
joanna.

On 04/16/12 14:20, Joanna Rutkowska wrote:
/…/

So, it seems like it is not a Xen issue, but instead that the kernel
I’m using in the VM (essentially a vanilla 3.0.4) is not enabling AVX
in XCR0. It would be interesting if anybody could try this on a
non-Xen system with a similarly old kernel as I have (and on the
AVX-capable processor, of course).

Interestingly, I tried another kernel in the VM (essentially a vanilla
3.2.7), and again the AVX seemed to be disabled in the VM (volk used
sse4 again).

So, I quickly looked through the kernel sources and I think that the
kernel enables AVX in this code:

static void __init xstate_enable_boot_cpu(void)
{
/…/
cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
pcntxt_mask = eax + ((u64)edx << 32);

if ((pcntxt_mask & XSTATE_FPSSE) != XSTATE_FPSSE) {
    printk(KERN_ERR "FP/SSE not shown under xsave features 

0x%llx\n",
pcntxt_mask);
BUG();
}

/*
 * Support only the state known to OS.
 */
pcntxt_mask = pcntxt_mask & XCNTXT_MASK;

xstate_enable();

The XSTATE_FPSSE and XCNTXT_MASK are defined as follows:

#define XSTATE_FP 0x1
#define XSTATE_SSE 0x2
#define XSTATE_YMM 0x4

#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)
#define XCNTXT_MASK (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)

The YMM corresponds to the AVX enable flag (not sure why they call it
differently?).

Anyway, this shows that, assuming CPUID returns correct values (that
indicate AVX to be enabled), then the (guest) kernel should enable AVX
(and Xen should emulate this and allow for this, as indicated in the
previous message). And if CPUID was not returning correct values (i.e.
omitting the AVX flag), then we would not have this whole discussion.

So, what am I missing?

Can you point out which kernels do you use that you have AVX working
fine?

joanna.