Using volk in Mac: test report

Hi all,

I would like to use the volk library in a C++ program that uses
gnuradio-core and currently builds under Linux and MacOS X. In MacOS
1.6.8 (Snow Leopard, updated), I used macports for installing
gnuradio-core (which is in version 3.3, enough for my app). Since, in
my understanding (please correct me if I’m wrong), volk is a library
that can live independently from the gnuradio version, I did the
following:

$ git clone git://gnuradio.org/gnuradio
$ cd gnuradio/volk
$ cmake .
$ make

[100%] Built target volk_profile
$ sudo make install

Then I ran the tests:

$ lib/test_all

All test but one passed, and I see that in some functions the generic
architecture is the best one, which is beyond my understanding. The
test that failed is:


volk_32fc_32f_multiply_32fc_a: fail on arch sse
Best arch: sse
/Users/carlesfernandez/Documents/workspace/gnuradio/volk/lib/testqa.cc:25:
error in “volk_32fc_32f_multiply_32fc_a_test”: check
run_volk_tests(volk_32fc_32f_multiply_32fc_a_get_func_desc(), (void
(*)())volk_32fc_32f_multiply_32fc_a_manual,
std::string(“volk_32fc_32f_multiply_32fc_a”), 1e-4, 0, 20460, 1, 0) ==
0 failed [true != 0]

I’m quite happy because I see dramatic improvements in some functions
of my interest (basically I want to implement correlators and mixers,
so I’m sensible precisely to this function, bad luck), but this
“generic” superiority in some cases intrigues me. I would appreciate
if anyone can shed some light on the internals of volk, or if I have
to configure or install something else. Anyway, thanks to the
developers for releasing such interesting stuff :slight_smile:

This is the complete output, for the records:

volk carlesfernandez$ cmake .
– The C compiler identification is GNU
– The CXX compiler identification is GNU
– Checking whether C compiler has -isysroot
– Checking whether C compiler has -isysroot - yes
– Checking whether C compiler supports OSX deployment target flag
– Checking whether C compiler supports OSX deployment target flag - yes
– Check for working C compiler: /usr/local/bin/gcc
– Check for working C compiler: /usr/local/bin/gcc – works
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Checking whether CXX compiler has -isysroot
– Checking whether CXX compiler has -isysroot - yes
– Checking whether CXX compiler supports OSX deployment target flag
– Checking whether CXX compiler supports OSX deployment target flag -
yes
– Check for working CXX compiler: /usr/bin/c++
– Check for working CXX compiler: /usr/bin/c++ – works
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Found PythonInterp: /opt/local/bin/python (found version “2.6.7”)
– Boost version: 1.48.0
– Found the following Boost libraries:
– unit_test_framework
– checking for module ‘orc-0.4’
– package ‘orc-0.4’ not found
– orc files (missing: ORC_LIBRARY ORC_INCLUDE_DIR ORCC_EXECUTABLE)
– Check size of void*
– Check size of void* - done
– Performing Test have_maltivec
– Performing Test have_maltivec - Failed
– Performing Test have_mfpu=neon
– Performing Test have_mfpu=neon - Failed
– Performing Test have_mfloat-abi=softfp
– Performing Test have_mfloat-abi=softfp - Failed
– Performing Test have_funsafe-math-optimizations
– Performing Test have_funsafe-math-optimizations - Success
– 32 overruled
– Performing Test have_m64
– Performing Test have_m64 - Success
– Performing Test have_m3dnow
– Performing Test have_m3dnow - Success
– Performing Test have_msse4.2
– Performing Test have_msse4.2 - Success
– Performing Test have_mpopcnt
– Performing Test have_mpopcnt - Failed
– Performing Test have_mmmx
– Performing Test have_mmmx - Success
– Performing Test have_msse
– Performing Test have_msse - Success
– Performing Test have_msse2
– Performing Test have_msse2 - Success
– orc overruled
– Performing Test have_msse3
– Performing Test have_msse3 - Success
– Performing Test have_mssse3
– Performing Test have_mssse3 - Success
– Performing Test have_msse4a
– Performing Test have_msse4a - Success
– Performing Test have_msse4.1
– Performing Test have_msse4.1 - Success
– Performing Test have_mavx
– Performing Test have_mavx - Failed
– Available arches:
generic;64;3dnow;abm;mmx;sse;sse2;sse3;ssse3;sse4_a;sse4_1;sse4_2
– Available machines:
generic;sse2_only;sse2_64;sse3_64;ssse3_64;sse4_1_64
– Did not find liborc and orcc, disabling orc support…
– Using install prefix: /usr/local
– Configuring done
– Generating done

Tests output:

Running 77 test cases…
Using Volk machine: sse4_1_64
RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_real_32f_a
sse4_1 completed in 1.5e-05s
sse completed in 5.5e-05s
generic completed in 1.4e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16ic_deinterleave_real_8i_a
ssse3 completed in 7e-06s
generic completed in 8e-06s
Best arch: ssse3
RUN_VOLK_TESTS: volk_16ic_deinterleave_16i_x2_a
ssse3 completed in 1.7e-05s
sse2 completed in 1.1e-05s
generic completed in 2.1e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_32f_x2_a
sse completed in 7.4e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16ic_deinterleave_real_16i_a
ssse3 completed in 6e-06s
sse2 completed in 8e-06s
generic completed in 9e-06s
Best arch: ssse3
RUN_VOLK_TESTS: volk_16ic_magnitude_16i_a
sse3 completed in 0.000132s
sse completed in 0.00015s
generic completed in 0.000218s
Best arch: sse3
RUN_VOLK_TESTS: volk_16ic_s32f_magnitude_32f_a
sse3 completed in 0.000113s
sse completed in 0.000107s
generic completed in 2.7e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16i_s32f_convert_32f_a
sse4_1 completed in 1.2e-05s
sse completed in 2e-05s
generic completed in 1.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16i_s32f_convert_32f_u
sse4_1 completed in 1.2e-05s
sse completed in 2.1e-05s
generic completed in 1.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16i_convert_8i_a
sse2 completed in 4e-06s
generic completed in 6e-06s
Best arch: sse2
RUN_VOLK_TESTS: volk_16i_convert_8i_u
sse2 completed in 6e-06s
generic completed in 6e-06s
Best arch: sse2
RUN_VOLK_TESTS: volk_16u_byteswap_a
sse2 completed in 6e-06s
generic completed in 1.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_accumulator_s32f_a
sse completed in 2.5e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_x2_add_32f_a
sse completed in 1.9e-05s
generic completed in 2.4e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_32f_multiply_32fc_a
sse completed in 5.5e-05s
generic completed in 7.2e-05s
offset 4 in1: 0.387495 in2: 0.103868
offset 6 in1: 0.201248 in2: -0.203787
offset 8 in1: 0.549574 in2: 0.499452
offset 12 in1: 0.00829957 in2: 0.00535752
offset 14 in1: 0.139478 in2: 0.0225341
offset 23 in1: 0.440276 in2: 0.620457
offset 24 in1: 0.103921 in2: 0.238003
offset 25 in1: 0.126775 in2: 0.290342
offset 29 in1: 0.135211 in2: -0.115313
offset 30 in1: 0.375913 in2: 0.478058
volk_32fc_32f_multiply_32fc_a: fail on arch sse
Best arch: sse
/Users/carlesfernandez/Documents/workspace/gnuradio/volk/lib/testqa.cc:25:
error in “volk_32fc_32f_multiply_32fc_a_test”: check
run_volk_tests(volk_32fc_32f_multiply_32fc_a_get_func_desc(), (void
(*)())volk_32fc_32f_multiply_32fc_a_manual,
std::string(“volk_32fc_32f_multiply_32fc_a”), 1e-4, 0, 20460, 1, 0) ==
0 failed [true != 0]
RUN_VOLK_TESTS: volk_32fc_s32f_power_32fc_a
sse completed in 0.000989s
generic completed in 0.000985s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_calc_spectral_noise_floor_32f_a
sse completed in 1.8e-05s
generic completed in 4.2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_s32f_atan2_32f_a
sse4_1 completed in 0.000503s
sse completed in 0.000503s
generic completed in 0.000503s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_32fc_x2_conjugate_dot_prod_32fc_u
generic completed in 1.6e-05s
sse3 completed in 1.5e-05s
Best arch: sse3
RUN_VOLK_TESTS: volk_32fc_deinterleave_32f_x2_a
sse completed in 1.8e-05s
generic completed in 2.3e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_deinterleave_64f_x2_a
sse2 completed in 4.4e-05s
generic completed in 3.8e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32fc_s32f_deinterleave_real_16i_a
sse completed in 2.7e-05s
generic completed in 2e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32fc_deinterleave_real_32f_a
sse completed in 1.1e-05s
generic completed in 1.5e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_deinterleave_real_64f_a
sse2 completed in 1.5e-05s
generic completed in 1.9e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc_a
generic completed in 8.8e-05s
sse_64 completed in 2e-05s
sse3 completed in 2.5e-05s
sse4_1 completed in 2.6e-05s
Best arch: sse_64
RUN_VOLK_TESTS: volk_32fc_index_max_16u_a
sse3 completed in 5e-06s
generic completed in 1e-05s
Best arch: sse3
RUN_VOLK_TESTS: volk_32fc_s32f_magnitude_16i_a
sse3 completed in 3.3e-05s
sse completed in 3.1e-05s
generic completed in 8.1e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_magnitude_32f_a
sse3 completed in 2.2e-05s
sse completed in 2.1e-05s
generic completed in 2.2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_x2_multiply_32fc_a
sse3 completed in 2.4e-05s
generic completed in 0.000201s
Best arch: sse3
RUN_VOLK_TESTS: volk_32f_s32f_convert_16i_a
sse2 completed in 7e-06s
sse completed in 2.3e-05s
generic completed in 1.9e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_16i_u
sse2 completed in 1e-05s
sse completed in 2.3e-05s
generic completed in 1.8e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_32i_a
sse2 completed in 8e-06s
sse completed in 2e-05s
generic completed in 1.4e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_32i_u
sse2 completed in 1.5e-05s
sse completed in 2.3e-05s
generic completed in 1.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_convert_64f_a
sse2 completed in 1.4e-05s
generic completed in 1.6e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_convert_64f_u
sse2 completed in 2.1e-05s
generic completed in 1.6e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_convert_8i_a
sse2 completed in 7e-06s
sse completed in 2.1e-05s
generic completed in 2e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_8i_u
sse2 completed in 9e-06s
sse completed in 2.5e-05s
generic completed in 2e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32fc_s32f_power_spectrum_32f_a
sse3 completed in 1.8e-05s
generic completed in 1.5e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32fc_x2_square_dist_32f_a
sse3 completed in 3e-06s
generic completed in 4e-06s
Best arch: sse3
RUN_VOLK_TESTS: volk_32fc_x2_s32f_square_dist_scalar_mult_32f_a
sse3 completed in 6e-06s
generic completed in 6e-06s
Best arch: sse3
RUN_VOLK_TESTS: volk_32f_x2_divide_32f_a
sse completed in 2.3e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_x2_dot_prod_32f_a
generic completed in 0.000351s
sse completed in 0.000112s
sse3 completed in 0.000121s
sse4_1 completed in 7.5e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_32f_x2_dot_prod_32f_u
generic completed in 0.000942s
sse completed in 0.000477s
sse3 completed in 0.000267s
sse4_1 completed in 0.000395s
Best arch: sse3
RUN_VOLK_TESTS: volk_32f_index_max_16u_a
sse4_1 completed in 1.6e-05s
sse completed in 2e-05s
generic completed in 7e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_32f_x2_s32f_interleave_16ic_a
sse2 completed in 1.2e-05s
sse completed in 3.6e-05s
generic completed in 2.7e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_x2_interleave_32fc_a
sse completed in 1.4e-05s
generic completed in 1.9e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_max_32f_a
sse completed in 1.1e-05s
generic completed in 1.8e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_min_32f_a
sse completed in 1.8e-05s
generic completed in 2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_multiply_32f_a
sse completed in 1.4e-05s
generic completed in 1.3e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_normalize_a
sse completed in 6e-06s
generic completed in 5e-06s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_power_32f_a
sse4_1 completed in 0.000523s
sse completed in 0.000521s
generic completed in 0.000521s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_sqrt_32f_a
sse completed in 2.5e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_stddev_32f_a
sse4_1 completed in 8e-06s
sse completed in 6e-06s
generic completed in 2.2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_stddev_and_mean_32f_x2_a
sse4_1 completed in 9e-06s
sse completed in 6e-06s
generic completed in 2.1e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_subtract_32f_a
sse completed in 1.2e-05s
generic completed in 1.3e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x3_sum_of_poly_32f_a
sse3 completed in 6e-06s
generic completed in 1.7e-05s
Best arch: sse3
RUN_VOLK_TESTS: volk_32i_x2_and_32i_a
sse completed in 1.2e-05s
generic completed in 1.4e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32i_s32f_convert_32f_a
sse2 completed in 7e-06s
generic completed in 1e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32i_s32f_convert_32f_u
sse2 completed in 1.1e-05s
generic completed in 1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32i_x2_or_32i_a
sse completed in 1.2e-05s
generic completed in 1.4e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32u_byteswap_a
sse2 completed in 1.3e-05s
generic completed in 2.2e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64f_convert_32f_a
sse2 completed in 1.1e-05s
generic completed in 1.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64f_convert_32f_u
sse2 completed in 1.9e-05s
generic completed in 1.6e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_64f_x2_max_64f_a
sse2 completed in 2.4e-05s
generic completed in 2.7e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64f_x2_min_64f_a
sse2 completed in 2.2e-05s
generic completed in 2.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64u_byteswap_a
sse2 completed in 2.7e-05s
generic completed in 2.9e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_8ic_deinterleave_16i_x2_a
sse4_1 completed in 9e-06s
generic completed in 0.000114s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_s32f_deinterleave_32f_x2_a
sse4_1 completed in 1.4e-05s
sse completed in 7.2e-05s
generic completed in 9.5e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_deinterleave_real_16i_a
sse4_1 completed in 5e-06s
generic completed in 3e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_s32f_deinterleave_real_32f_a
sse4_1 completed in 8e-06s
sse completed in 5.3e-05s
generic completed in 4.8e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_deinterleave_real_8i_a
ssse3 completed in 5e-06s
generic completed in 5e-06s
Best arch: ssse3
RUN_VOLK_TESTS: volk_8ic_x2_multiply_conjugate_16ic_a
sse4_1 completed in 1.9e-05s
generic completed in 0.000318s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_x2_s32f_multiply_conjugate_32fc_a
sse4_1 completed in 2.2e-05s
generic completed in 0.000356s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_convert_16i_a
sse4_1 completed in 5e-06s
generic completed in 3.3e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_convert_16i_u
sse4_1 completed in 6e-06s
generic completed in 3.3e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_s32f_convert_32f_a
sse4_1 completed in 7e-06s
generic completed in 4.8e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_s32f_convert_32f_u
sse4_1 completed in 1.3e-05s
generic completed in 4.9e-05s
Best arch: sse4_1

*** 1 failure detected in test suite “Master Test Suite”

Best regards,
Carles

Hi all,

We are using the volk library in a C++ program that uses
gnuradio-core and currently builds under Linux and MacOS X. In MacOS
1.6.8 (Snow Leopard, updated), I used macports for installing
gnuradio-core (which is in version 3.3, enough for my app). Since, in
my understanding (please correct me if I’m wrong), volk is a library
that can live independently from the gnuradio version, I did the
following:

$ git clone git://gnuradio.org/gnuradio
$ cd gnuradio/volk
$ cmake .
$ make

[100%] Built target volk_profile
$ sudo make install

Then I ran the tests:

$ lib/test_all

All test but one passed, and I see that in some functions the generic
architecture is the best one, which is beyond my understanding. The
test that failed is:


volk_32fc_32f_multiply_32fc_a: fail on arch sse
Best arch: sse
/Users/carlesfernandez/Documents/workspace/gnuradio/volk/lib/testqa.cc:25:
error in “volk_32fc_32f_multiply_32fc_a_test”: check
run_volk_tests(volk_32fc_32f_multiply_32fc_a_get_func_desc(), (void
(*)())volk_32fc_32f_multiply_32fc_a_manual,
std::string(“volk_32fc_32f_multiply_32fc_a”), 1e-4, 0, 20460, 1, 0) ==
0 failed [true != 0]

I’m quite happy because I see dramatic improvements in some functions
of my interest (basically I want to implement correlators and mixers,
so I’m sensible precisely to this function, bad luck), but this
“generic” superiority in some cases intrigues me. I would appreciate
if anyone can shed some light on the internals of volk, or if I have
to configure or install something else. Anyway, thanks to the
developers for releasing such interesting stuff :slight_smile:

This is the complete output, for the records:

volk carlesfernandez$ cmake .
– The C compiler identification is GNU
– The CXX compiler identification is GNU
– Checking whether C compiler has -isysroot
– Checking whether C compiler has -isysroot - yes
– Checking whether C compiler supports OSX deployment target flag
– Checking whether C compiler supports OSX deployment target flag - yes
– Check for working C compiler: /usr/local/bin/gcc
– Check for working C compiler: /usr/local/bin/gcc – works
– Detecting C compiler ABI info
– Detecting C compiler ABI info - done
– Checking whether CXX compiler has -isysroot
– Checking whether CXX compiler has -isysroot - yes
– Checking whether CXX compiler supports OSX deployment target flag
– Checking whether CXX compiler supports OSX deployment target flag -
yes
– Check for working CXX compiler: /usr/bin/c++
– Check for working CXX compiler: /usr/bin/c++ – works
– Detecting CXX compiler ABI info
– Detecting CXX compiler ABI info - done
– Found PythonInterp: /opt/local/bin/python (found version “2.6.7”)
– Boost version: 1.48.0
– Found the following Boost libraries:
– unit_test_framework
– checking for module ‘orc-0.4’
– package ‘orc-0.4’ not found
– orc files (missing: ORC_LIBRARY ORC_INCLUDE_DIR ORCC_EXECUTABLE)
– Check size of void*
– Check size of void* - done
– Performing Test have_maltivec
– Performing Test have_maltivec - Failed
– Performing Test have_mfpu=neon
– Performing Test have_mfpu=neon - Failed
– Performing Test have_mfloat-abi=softfp
– Performing Test have_mfloat-abi=softfp - Failed
– Performing Test have_funsafe-math-optimizations
– Performing Test have_funsafe-math-optimizations - Success
– 32 overruled
– Performing Test have_m64
– Performing Test have_m64 - Success
– Performing Test have_m3dnow
– Performing Test have_m3dnow - Success
– Performing Test have_msse4.2
– Performing Test have_msse4.2 - Success
– Performing Test have_mpopcnt
– Performing Test have_mpopcnt - Failed
– Performing Test have_mmmx
– Performing Test have_mmmx - Success
– Performing Test have_msse
– Performing Test have_msse - Success
– Performing Test have_msse2
– Performing Test have_msse2 - Success
– orc overruled
– Performing Test have_msse3
– Performing Test have_msse3 - Success
– Performing Test have_mssse3
– Performing Test have_mssse3 - Success
– Performing Test have_msse4a
– Performing Test have_msse4a - Success
– Performing Test have_msse4.1
– Performing Test have_msse4.1 - Success
– Performing Test have_mavx
– Performing Test have_mavx - Failed
– Available arches:
generic;64;3dnow;abm;mmx;sse;sse2;sse3;ssse3;sse4_a;sse4_1;sse4_2
– Available machines:
generic;sse2_only;sse2_64;sse3_64;ssse3_64;sse4_1_64
– Did not find liborc and orcc, disabling orc support…
– Using install prefix: /usr/local
– Configuring done
– Generating done

Tests output:

Running 77 test cases…
Using Volk machine: sse4_1_64
RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_real_32f_a
sse4_1 completed in 1.5e-05s
sse completed in 5.5e-05s
generic completed in 1.4e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16ic_deinterleave_real_8i_a
ssse3 completed in 7e-06s
generic completed in 8e-06s
Best arch: ssse3
RUN_VOLK_TESTS: volk_16ic_deinterleave_16i_x2_a
ssse3 completed in 1.7e-05s
sse2 completed in 1.1e-05s
generic completed in 2.1e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_32f_x2_a
sse completed in 7.4e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16ic_deinterleave_real_16i_a
ssse3 completed in 6e-06s
sse2 completed in 8e-06s
generic completed in 9e-06s
Best arch: ssse3
RUN_VOLK_TESTS: volk_16ic_magnitude_16i_a
sse3 completed in 0.000132s
sse completed in 0.00015s
generic completed in 0.000218s
Best arch: sse3
RUN_VOLK_TESTS: volk_16ic_s32f_magnitude_32f_a
sse3 completed in 0.000113s
sse completed in 0.000107s
generic completed in 2.7e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16i_s32f_convert_32f_a
sse4_1 completed in 1.2e-05s
sse completed in 2e-05s
generic completed in 1.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16i_s32f_convert_32f_u
sse4_1 completed in 1.2e-05s
sse completed in 2.1e-05s
generic completed in 1.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_16i_convert_8i_a
sse2 completed in 4e-06s
generic completed in 6e-06s
Best arch: sse2
RUN_VOLK_TESTS: volk_16i_convert_8i_u
sse2 completed in 6e-06s
generic completed in 6e-06s
Best arch: sse2
RUN_VOLK_TESTS: volk_16u_byteswap_a
sse2 completed in 6e-06s
generic completed in 1.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_accumulator_s32f_a
sse completed in 2.5e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_x2_add_32f_a
sse completed in 1.9e-05s
generic completed in 2.4e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_32f_multiply_32fc_a
sse completed in 5.5e-05s
generic completed in 7.2e-05s
offset 4 in1: 0.387495 in2: 0.103868
offset 6 in1: 0.201248 in2: -0.203787
offset 8 in1: 0.549574 in2: 0.499452
offset 12 in1: 0.00829957 in2: 0.00535752
offset 14 in1: 0.139478 in2: 0.0225341
offset 23 in1: 0.440276 in2: 0.620457
offset 24 in1: 0.103921 in2: 0.238003
offset 25 in1: 0.126775 in2: 0.290342
offset 29 in1: 0.135211 in2: -0.115313
offset 30 in1: 0.375913 in2: 0.478058
volk_32fc_32f_multiply_32fc_a: fail on arch sse
Best arch: sse
/Users/carlesfernandez/Documents/workspace/gnuradio/volk/lib/testqa.cc:25:
error in “volk_32fc_32f_multiply_32fc_a_test”: check
run_volk_tests(volk_32fc_32f_multiply_32fc_a_get_func_desc(), (void
(*)())volk_32fc_32f_multiply_32fc_a_manual,
std::string(“volk_32fc_32f_multiply_32fc_a”), 1e-4, 0, 20460, 1, 0) ==
0 failed [true != 0]
RUN_VOLK_TESTS: volk_32fc_s32f_power_32fc_a
sse completed in 0.000989s
generic completed in 0.000985s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_calc_spectral_noise_floor_32f_a
sse completed in 1.8e-05s
generic completed in 4.2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_s32f_atan2_32f_a
sse4_1 completed in 0.000503s
sse completed in 0.000503s
generic completed in 0.000503s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_32fc_x2_conjugate_dot_prod_32fc_u
generic completed in 1.6e-05s
sse3 completed in 1.5e-05s
Best arch: sse3
RUN_VOLK_TESTS: volk_32fc_deinterleave_32f_x2_a
sse completed in 1.8e-05s
generic completed in 2.3e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_deinterleave_64f_x2_a
sse2 completed in 4.4e-05s
generic completed in 3.8e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32fc_s32f_deinterleave_real_16i_a
sse completed in 2.7e-05s
generic completed in 2e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32fc_deinterleave_real_32f_a
sse completed in 1.1e-05s
generic completed in 1.5e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_deinterleave_real_64f_a
sse2 completed in 1.5e-05s
generic completed in 1.9e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32fc_x2_dot_prod_32fc_a
generic completed in 8.8e-05s
sse_64 completed in 2e-05s
sse3 completed in 2.5e-05s
sse4_1 completed in 2.6e-05s
Best arch: sse_64
RUN_VOLK_TESTS: volk_32fc_index_max_16u_a
sse3 completed in 5e-06s
generic completed in 1e-05s
Best arch: sse3
RUN_VOLK_TESTS: volk_32fc_s32f_magnitude_16i_a
sse3 completed in 3.3e-05s
sse completed in 3.1e-05s
generic completed in 8.1e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_magnitude_32f_a
sse3 completed in 2.2e-05s
sse completed in 2.1e-05s
generic completed in 2.2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32fc_x2_multiply_32fc_a
sse3 completed in 2.4e-05s
generic completed in 0.000201s
Best arch: sse3
RUN_VOLK_TESTS: volk_32f_s32f_convert_16i_a
sse2 completed in 7e-06s
sse completed in 2.3e-05s
generic completed in 1.9e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_16i_u
sse2 completed in 1e-05s
sse completed in 2.3e-05s
generic completed in 1.8e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_32i_a
sse2 completed in 8e-06s
sse completed in 2e-05s
generic completed in 1.4e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_32i_u
sse2 completed in 1.5e-05s
sse completed in 2.3e-05s
generic completed in 1.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_convert_64f_a
sse2 completed in 1.4e-05s
generic completed in 1.6e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_convert_64f_u
sse2 completed in 2.1e-05s
generic completed in 1.6e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_convert_8i_a
sse2 completed in 7e-06s
sse completed in 2.1e-05s
generic completed in 2e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_s32f_convert_8i_u
sse2 completed in 9e-06s
sse completed in 2.5e-05s
generic completed in 2e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32fc_s32f_power_spectrum_32f_a
sse3 completed in 1.8e-05s
generic completed in 1.5e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32fc_x2_square_dist_32f_a
sse3 completed in 3e-06s
generic completed in 4e-06s
Best arch: sse3
RUN_VOLK_TESTS: volk_32fc_x2_s32f_square_dist_scalar_mult_32f_a
sse3 completed in 6e-06s
generic completed in 6e-06s
Best arch: sse3
RUN_VOLK_TESTS: volk_32f_x2_divide_32f_a
sse completed in 2.3e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_x2_dot_prod_32f_a
generic completed in 0.000351s
sse completed in 0.000112s
sse3 completed in 0.000121s
sse4_1 completed in 7.5e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_32f_x2_dot_prod_32f_u
generic completed in 0.000942s
sse completed in 0.000477s
sse3 completed in 0.000267s
sse4_1 completed in 0.000395s
Best arch: sse3
RUN_VOLK_TESTS: volk_32f_index_max_16u_a
sse4_1 completed in 1.6e-05s
sse completed in 2e-05s
generic completed in 7e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_32f_x2_s32f_interleave_16ic_a
sse2 completed in 1.2e-05s
sse completed in 3.6e-05s
generic completed in 2.7e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32f_x2_interleave_32fc_a
sse completed in 1.4e-05s
generic completed in 1.9e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_max_32f_a
sse completed in 1.1e-05s
generic completed in 1.8e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_min_32f_a
sse completed in 1.8e-05s
generic completed in 2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_multiply_32f_a
sse completed in 1.4e-05s
generic completed in 1.3e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_normalize_a
sse completed in 6e-06s
generic completed in 5e-06s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_power_32f_a
sse4_1 completed in 0.000523s
sse completed in 0.000521s
generic completed in 0.000521s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_sqrt_32f_a
sse completed in 2.5e-05s
generic completed in 2.1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32f_s32f_stddev_32f_a
sse4_1 completed in 8e-06s
sse completed in 6e-06s
generic completed in 2.2e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_stddev_and_mean_32f_x2_a
sse4_1 completed in 9e-06s
sse completed in 6e-06s
generic completed in 2.1e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x2_subtract_32f_a
sse completed in 1.2e-05s
generic completed in 1.3e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32f_x3_sum_of_poly_32f_a
sse3 completed in 6e-06s
generic completed in 1.7e-05s
Best arch: sse3
RUN_VOLK_TESTS: volk_32i_x2_and_32i_a
sse completed in 1.2e-05s
generic completed in 1.4e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32i_s32f_convert_32f_a
sse2 completed in 7e-06s
generic completed in 1e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_32i_s32f_convert_32f_u
sse2 completed in 1.1e-05s
generic completed in 1e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_32i_x2_or_32i_a
sse completed in 1.2e-05s
generic completed in 1.4e-05s
Best arch: sse
RUN_VOLK_TESTS: volk_32u_byteswap_a
sse2 completed in 1.3e-05s
generic completed in 2.2e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64f_convert_32f_a
sse2 completed in 1.1e-05s
generic completed in 1.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64f_convert_32f_u
sse2 completed in 1.9e-05s
generic completed in 1.6e-05s
Best arch: generic
RUN_VOLK_TESTS: volk_64f_x2_max_64f_a
sse2 completed in 2.4e-05s
generic completed in 2.7e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64f_x2_min_64f_a
sse2 completed in 2.2e-05s
generic completed in 2.5e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_64u_byteswap_a
sse2 completed in 2.7e-05s
generic completed in 2.9e-05s
Best arch: sse2
RUN_VOLK_TESTS: volk_8ic_deinterleave_16i_x2_a
sse4_1 completed in 9e-06s
generic completed in 0.000114s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_s32f_deinterleave_32f_x2_a
sse4_1 completed in 1.4e-05s
sse completed in 7.2e-05s
generic completed in 9.5e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_deinterleave_real_16i_a
sse4_1 completed in 5e-06s
generic completed in 3e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_s32f_deinterleave_real_32f_a
sse4_1 completed in 8e-06s
sse completed in 5.3e-05s
generic completed in 4.8e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_deinterleave_real_8i_a
ssse3 completed in 5e-06s
generic completed in 5e-06s
Best arch: ssse3
RUN_VOLK_TESTS: volk_8ic_x2_multiply_conjugate_16ic_a
sse4_1 completed in 1.9e-05s
generic completed in 0.000318s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8ic_x2_s32f_multiply_conjugate_32fc_a
sse4_1 completed in 2.2e-05s
generic completed in 0.000356s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_convert_16i_a
sse4_1 completed in 5e-06s
generic completed in 3.3e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_convert_16i_u
sse4_1 completed in 6e-06s
generic completed in 3.3e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_s32f_convert_32f_a
sse4_1 completed in 7e-06s
generic completed in 4.8e-05s
Best arch: sse4_1
RUN_VOLK_TESTS: volk_8i_s32f_convert_32f_u
sse4_1 completed in 1.3e-05s
generic completed in 4.9e-05s
Best arch: sse4_1

*** 1 failure detected in test suite “Master Test Suite”

Best regards,
Carles

On Fri, Feb 17, 2012 at 8:14 AM, Tom R. [email protected] wrote:

the Volk kernel. But more likely, it’s simply because the operation being
performed is so trivial that it doesn’t really matter.

Another reason could be that the tests aren’t long enough to avoid
OS-level variances while completing a test. The tests use the clock()
function for calculating the time difference, which is only the approximate
time of the process. It might mean that we need to run the tests for a bit
longer to see if that makes any difference. I have noticed that some of the
tests where generic wins, it only wins by a very, very small amount of time.

Please ignore the “best arch” reports during the QA code execution; it’s
very often wrong. The “best arch” report is intended for the
volk_profiler,
which reuses the same test code with much larger datasets for better
execution time resolution, as Tom suggested. The QA code is only
intended
to show that Volk is working and to find functions which are executing
incorrectly. Use volk_profiler to benchmark Volk functions; it will
create
a custom profile for your machine.

One caveat – the dataset size on E100/NEON is enough that the profiler
might run for several hours, so either recompile with smaller datasets
or
avoid the profiler… eventually I guess I’ll make the benchmark program
benchmark itself to set appropriate dataset sizes.

–n

Thanks for the inputs!

We are interested in determining the best architecture at instantation
time. What would be the best strategy? We though about running the
same operations several times for each architecture, measure the
results and use the fastest one for the processing blocks. Would this
be the right approach?

Best regards,
Carles.

On Fri, Feb 17, 2012 at 11:20 AM, Carles Fernandez <
[email protected]> wrote:

Thanks for the inputs!

We are interested in determining the best architecture at instantation
time. What would be the best strategy? We though about running the
same operations several times for each architecture, measure the
results and use the fastest one for the processing blocks. Would this
be the right approach?

Carles,

Run volk_profile. It does exactly what you said, and writes the results
to
~/.volk/volk_config. Volk reads this file when it is involked (sorry) to
determine which particular function to execute. So all you do is run
volk_profile once on any given machine, and it’s optimized.

–n

Carles,

Thanks for the report! We’ll look into those failures. Hopefully just
some
minor misundertanding.

As for the generic sometimes being the best arch, I’m not sure I can
help
too much on it. I can certainly speculate. Having seen this in my own
machines and looked at some of the kernels where generic wins out (which
have some overlap with yours), I think it’s something about the
operation
being performed. First, we might be able to do something a bit smarter
in
the Volk kernel. But more likely, it’s simply because the operation
being
performed is so trivial that it doesn’t really matter.

Another reason could be that the tests aren’t long enough to avoid
OS-level
variances while completing a test. The tests use the clock() function
for
calculating the time difference, which is only the approximate time of
the
process. It might mean that we need to run the tests for a bit longer to
see if that makes any difference. I have noticed that some of the tests
where generic wins, it only wins by a very, very small amount of time.

Tom

On Tue, Jan 17, 2012 at 3:26 PM, Carles Fernandez <

Great!

You guys are making all this stuff pretty easy to use, even for
non-experts. Thanks for letting us squeeze our processors :slight_smile:

Carles

On Fri, Feb 17, 2012 at 2:30 PM, Nick F. [email protected] wrote:

Carles,

Run volk_profile. It does exactly what you said, and writes the results to
~/.volk/volk_config. Volk reads this file when it is involked (sorry) to
determine which particular function to execute. So all you do is run
volk_profile once on any given machine, and it’s optimized.

–n

Carles,
This is discussed on the webpage:
http://gnuradio.org/redmine/projects/gnuradio/wiki/volk

We’ll be updating this as things progress with Volk, but the profiler
info
is there already.

Tom

I built Tom’s safe_align branch on E100 and ran volk_profile. It
segfaulted on "RUN_VOLK_TESTS:volk_32fc_s32fc_multiply_32fc_a. I’ll get
a stack trace for you.

Sean

From: discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalid
[mailto:discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalid] On
Behalf Of Tom R.
Sent: Friday, February 17, 2012 2:33 PM
To: Nick F.
Cc: [email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 2:30 PM, Nick F.
<[email protected]mailto:[email protected]> wrote:
On Fri, Feb 17, 2012 at 11:20 AM, Carles Fernandez
<[email protected]mailto:[email protected]> wrote:
Thanks for the inputs!

We are interested in determining the best architecture at instantation
time. What would be the best strategy? We though about running the
same operations several times for each architecture, measure the
results and use the fastest one for the processing blocks. Would this
be the right approach?

Carles,

Run volk_profile. It does exactly what you said, and writes the results
to ~/.volk/volk_config. Volk reads this file when it is involked (sorry)
to determine which particular function to execute. So all you do is run
volk_profile once on any given machine, and it’s optimized.

–n

Carles,
This is discussed on the webpage:
http://gnuradio.org/redmine/projects/gnuradio/wiki/volk

We’ll be updating this as things progress with Volk, but the profiler
info is there already.

Tom

On Fri, Feb 17, 2012 at 5:11 PM, Nowlan, Sean
[email protected]wrote:

I built Toms safe_align branch on E100 and ran volk_profile. It
segfaulted on RUN_VOLK_TESTS:volk_32fc_s32fc_multiply_32fc_a. Ill get a
stack trace for you.****


Sean

Really interesting that it’s the same block. Hopefully, it’s a single,
simple fix. I’ll look into it when you can get me the stack trace.

Thanks for reporting!
Tom

Don’t know how helpful these are, but here you go.

Sean

From: [email protected] [mailto:[email protected]] On Behalf Of
Tom R.
Sent: Friday, February 17, 2012 5:25 PM
To: Nowlan, Sean
Cc: Nick F.; [email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 5:11 PM, Nowlan, Sean
<[email protected]mailto:[email protected]> wrote:
I built Tom’s safe_align branch on E100 and ran volk_profile. It
segfaulted on "RUN_VOLK_TESTS:volk_32fc_s32fc_multiply_32fc_a. I’ll get
a stack trace for you.

Sean

Really interesting that it’s the same block. Hopefully, it’s a single,
simple fix. I’ll look into it when you can get me the stack trace.

Thanks for reporting!
Tom

From:
discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalidmailto:[email protected]
[mailto:discuss-gnuradio-bounces+sean.nowlanmailto:discuss-gnuradio-bounces%2Bsean.nowlan[email protected]mailto:[email protected]]
On Behalf Of Tom R.
Sent: Friday, February 17, 2012 2:33 PM
To: Nick F.
Cc: [email protected]mailto:[email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 2:30 PM, Nick F.
<[email protected]mailto:[email protected]> wrote:
On Fri, Feb 17, 2012 at 11:20 AM, Carles Fernandez
<[email protected]mailto:[email protected]> wrote:
Thanks for the inputs!

We are interested in determining the best architecture at instantation
time. What would be the best strategy? We though about running the
same operations several times for each architecture, measure the
results and use the fastest one for the processing blocks. Would this
be the right approach?

Carles,

Run volk_profile. It does exactly what you said, and writes the results
to ~/.volk/volk_config. Volk reads this file when it is involked (sorry)
to determine which particular function to execute. So all you do is run
volk_profile once on any given machine, and it’s optimized.

–n

Carles,
This is discussed on the webpage:
http://gnuradio.org/redmine/projects/gnuradio/wiki/volk

We’ll be updating this as things progress with Volk, but the profiler
info is there already.

Tom

On Fri, Feb 17, 2012 at 6:04 PM, Nowlan, Sean
[email protected]wrote:

Dont know how helpful these are, but here you go.****


Sean

Sean,
It looks like a couple of functions are failing from the stdout:

volk_32fc_s32f_magnitude_16i_a: fail on arch orc
volk_32fc_x2_multiply_32fc_a: fail on arch orc

These are both the Orc implementations of the functions, which seem to
work
fine on my Intel processors. I don’t have access to an OSX box or an
E100,
so I can’t really test this out. The files you sent me don’t (appear to)
tell me what the real problem is.

We’ll need some other brave soul out there who can dig into these issues
on
the platforms for us.

Thanks,
Tom

On Sat, Feb 18, 2012 at 1:05 PM, Tom R. [email protected] wrote:

Sean,
We’ll need some other brave soul out there who can dig into these issues
on the platforms for us.

Those are functions I wrote, so they’re my problem. =) I’ll hack on it
this
week. What’s strange is I absolutely validated them on Orc before
committing them… Sean, what version of Orc are you running on your
E100?

–n

I believe I’m using 0.4.16. It’s the version packaged in the e1xx-002
official image.

Sean

From: Nick F. [mailto:[email protected]]
Sent: Saturday, February 18, 2012 11:22 PM
To: Tom R.
Cc: Nowlan, Sean; [email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Sat, Feb 18, 2012 at 1:05 PM, Tom R.
<[email protected]mailto:[email protected]> wrote:
On Fri, Feb 17, 2012 at 6:04 PM, Nowlan, Sean
<[email protected]mailto:[email protected]> wrote:
Don’t know how helpful these are, but here you go.

Sean

Sean,
It looks like a couple of functions are failing from the stdout:

volk_32fc_s32f_magnitude_16i_a: fail on arch orc
volk_32fc_x2_multiply_32fc_a: fail on arch orc

These are both the Orc implementations of the functions, which seem to
work fine on my Intel processors. I don’t have access to an OSX box or
an E100, so I can’t really test this out. The files you sent me don’t
(appear to) tell me what the real problem is.

We’ll need some other brave soul out there who can dig into these issues
on the platforms for us.

Those are functions I wrote, so they’re my problem. =) I’ll hack on it
this week. What’s strange is I absolutely validated them on Orc before
committing them… Sean, what version of Orc are you running on your
E100?

–n

Thanks,
Tom

From: [email protected]mailto:[email protected]
[mailto:[email protected]mailto:[email protected]] On Behalf
Of Tom R.
Sent: Friday, February 17, 2012 5:25 PM
To: Nowlan, Sean
Cc: Nick F.;
[email protected]mailto:[email protected]

Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 5:11 PM, Nowlan, Sean
<[email protected]mailto:[email protected]> wrote:
I built Tom’s safe_align branch on E100 and ran volk_profile. It
segfaulted on "RUN_VOLK_TESTS:volk_32fc_s32fc_multiply_32fc_a. I’ll get
a stack trace for you.

Sean

Really interesting that it’s the same block. Hopefully, it’s a single,
simple fix. I’ll look into it when you can get me the stack trace.

Thanks for reporting!
Tom

From:
discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalidmailto:[email protected]
[mailto:discuss-gnuradio-bounces+sean.nowlanmailto:discuss-gnuradio-bounces%2Bsean.nowlan[email protected]mailto:[email protected]]
On Behalf Of Tom R.
Sent: Friday, February 17, 2012 2:33 PM
To: Nick F.
Cc: [email protected]mailto:[email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 2:30 PM, Nick F.
<[email protected]mailto:[email protected]> wrote:
On Fri, Feb 17, 2012 at 11:20 AM, Carles Fernandez
<[email protected]mailto:[email protected]> wrote:
Thanks for the inputs!

We are interested in determining the best architecture at instantation
time. What would be the best strategy? We though about running the
same operations several times for each architecture, measure the
results and use the fastest one for the processing blocks. Would this
be the right approach?

Carles,

Run volk_profile. It does exactly what you said, and writes the results
to ~/.volk/volk_config. Volk reads this file when it is involked (sorry)
to determine which particular function to execute. So all you do is run
volk_profile once on any given machine, and it’s optimized.

–n

Carles,
This is discussed on the webpage:
http://gnuradio.org/redmine/projects/gnuradio/wiki/volk

We’ll be updating this as things progress with Volk, but the profiler
info is there already.

Tom

Tom, Sean,

There’s a couple of things here. First, the
Orc volk_32fc_s32f_magnitude_16i_a function is rounding differently than
the generic versions on E100 for some reason. Not fatal, totally usable,
but it makes the QA code fail. Second, the volk_32fc_x2_multiply_32fc_a
looks like it’s working fine but the thresholds are too close in the
comparison function, which is strange because it uses the same threshold
I
use everywhere else. I’ll keep looking into that. In any case, they’re
fine
for use in Volk as-is.

I think the segfault in volk_32fc_s32fc_multiply_32fc_a is being caused
by
a bug in the profiler code as well. It’s not correctly handling complex
scalars. The function itself doesn’t actually work either, which doesn’t
help, but it wasn’t caught because the profiler code was buggy…

Tom, I pushed a fix to my github under “volk_fix”. For now I’ve disabled
volk_32fc_s32fc_multiple_32fc_a since I can’t figure out a clean way to
get
it to work under Orc; I had a misunderstanding of how float parameters
are
handled inside array operations. I also added complex scalar handling.
I’ll
keep looking into solving this one for real but this will get things
working for now.

–n

On Tue, Feb 21, 2012 at 6:43 PM, Nick F. [email protected] wrote:

working for now.

–n

Nick,
Thanks a ton for working on this. I’ll merge your branch asap.

Tom

Hi Nick -

Sorry, just did ‘opkg list-installed | grep orc’ and got:

liborc-0.4-0 - 0.4.11-r0.9
liborc-test-0.4-0 - 0.4.11-r0.9

Is this the version you expect for e1xx-002?

Thanks,
Sean

From: discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalid
[mailto:discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalid] On
Behalf Of Nowlan, Sean
Sent: Monday, February 20, 2012 11:36 AM
To: Nick F.; Tom R.
Cc: [email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

I believe I’m using 0.4.16. It’s the version packaged in the e1xx-002
official image.

Sean

From: Nick F.
[mailto:[email protected]]mailto:[mailto:[email protected]]
Sent: Saturday, February 18, 2012 11:22 PM
To: Tom R.
Cc: Nowlan, Sean;
[email protected]mailto:[email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Sat, Feb 18, 2012 at 1:05 PM, Tom R.
<[email protected]mailto:[email protected]> wrote:
On Fri, Feb 17, 2012 at 6:04 PM, Nowlan, Sean
<[email protected]mailto:[email protected]> wrote:
Don’t know how helpful these are, but here you go.

Sean

Sean,
It looks like a couple of functions are failing from the stdout:

volk_32fc_s32f_magnitude_16i_a: fail on arch orc
volk_32fc_x2_multiply_32fc_a: fail on arch orc

These are both the Orc implementations of the functions, which seem to
work fine on my Intel processors. I don’t have access to an OSX box or
an E100, so I can’t really test this out. The files you sent me don’t
(appear to) tell me what the real problem is.

We’ll need some other brave soul out there who can dig into these issues
on the platforms for us.

Those are functions I wrote, so they’re my problem. =) I’ll hack on it
this week. What’s strange is I absolutely validated them on Orc before
committing them… Sean, what version of Orc are you running on your
E100?

–n

Thanks,
Tom

From: [email protected]mailto:[email protected]
[mailto:[email protected]mailto:[email protected]] On Behalf
Of Tom R.
Sent: Friday, February 17, 2012 5:25 PM
To: Nowlan, Sean
Cc: Nick F.;
[email protected]mailto:[email protected]

Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 5:11 PM, Nowlan, Sean
<[email protected]mailto:[email protected]> wrote:
I built Tom’s safe_align branch on E100 and ran volk_profile. It
segfaulted on "RUN_VOLK_TESTS:volk_32fc_s32fc_multiply_32fc_a. I’ll get
a stack trace for you.

Sean

Really interesting that it’s the same block. Hopefully, it’s a single,
simple fix. I’ll look into it when you can get me the stack trace.

Thanks for reporting!
Tom

From:
discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalidmailto:[email protected]
[mailto:discuss-gnuradio-bounces+sean.nowlanmailto:discuss-gnuradio-bounces%2Bsean.nowlan[email protected]mailto:[email protected]]
On Behalf Of Tom R.
Sent: Friday, February 17, 2012 2:33 PM
To: Nick F.
Cc: [email protected]mailto:[email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 2:30 PM, Nick F.
<[email protected]mailto:[email protected]> wrote:
On Fri, Feb 17, 2012 at 11:20 AM, Carles Fernandez
<[email protected]mailto:[email protected]> wrote:
Thanks for the inputs!

We are interested in determining the best architecture at instantation
time. What would be the best strategy? We though about running the
same operations several times for each architecture, measure the
results and use the fastest one for the processing blocks. Would this
be the right approach?

Carles,

Run volk_profile. It does exactly what you said, and writes the results
to ~/.volk/volk_config. Volk reads this file when it is involked (sorry)
to determine which particular function to execute. So all you do is run
volk_profile once on any given machine, and it’s optimized.

–n

Carles,
This is discussed on the webpage:
http://gnuradio.org/redmine/projects/gnuradio/wiki/volk

We’ll be updating this as things progress with Volk, but the profiler
info is there already.

Tom

I confirmed this works on E100 insofar as I no longer get a segfault on
volk_32fc_s32fc_multiple_32fc_a. But volk_32fc_s32f_magnitude_16i_a and
volk_32fc_x2_multiply_32fc_a still fail as expected.

Sean

From: [email protected] [mailto:[email protected]] On Behalf Of
Tom R.
Sent: Tuesday, February 21, 2012 6:49 PM
To: Nick F.
Cc: Nowlan, Sean; [email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Tue, Feb 21, 2012 at 6:43 PM, Nick F.
<[email protected]mailto:[email protected]> wrote:
Tom, Sean,

There’s a couple of things here. First, the Orc
volk_32fc_s32f_magnitude_16i_a function is rounding differently than the
generic versions on E100 for some reason. Not fatal, totally usable, but
it makes the QA code fail. Second, the volk_32fc_x2_multiply_32fc_a
looks like it’s working fine but the thresholds are too close in the
comparison function, which is strange because it uses the same threshold
I use everywhere else. I’ll keep looking into that. In any case, they’re
fine for use in Volk as-is.

I think the segfault in volk_32fc_s32fc_multiply_32fc_a is being caused
by a bug in the profiler code as well. It’s not correctly handling
complex scalars. The function itself doesn’t actually work either, which
doesn’t help, but it wasn’t caught because the profiler code was
buggy…

Tom, I pushed a fix to my github under “volk_fix”. For now I’ve disabled
volk_32fc_s32fc_multiple_32fc_a since I can’t figure out a clean way to
get it to work under Orc; I had a misunderstanding of how float
parameters are handled inside array operations. I also added complex
scalar handling. I’ll keep looking into solving this one for real but
this will get things working for now.

–n

Nick,
Thanks a ton for working on this. I’ll merge your branch asap.

Tom

On Sat, Feb 18, 2012 at 1:05 PM, Tom R.
<[email protected]mailto:[email protected]> wrote:
On Fri, Feb 17, 2012 at 6:04 PM, Nowlan, Sean
<[email protected]mailto:[email protected]> wrote:
Don’t know how helpful these are, but here you go.

Sean

Sean,
It looks like a couple of functions are failing from the stdout:

volk_32fc_s32f_magnitude_16i_a: fail on arch orc
volk_32fc_x2_multiply_32fc_a: fail on arch orc

These are both the Orc implementations of the functions, which seem to
work fine on my Intel processors. I don’t have access to an OSX box or
an E100, so I can’t really test this out. The files you sent me don’t
(appear to) tell me what the real problem is.

We’ll need some other brave soul out there who can dig into these issues
on the platforms for us.

Thanks,
Tom

From: [email protected]mailto:[email protected]
[mailto:[email protected]mailto:[email protected]] On Behalf
Of Tom R.
Sent: Friday, February 17, 2012 5:25 PM
To: Nowlan, Sean
Cc: Nick F.;
[email protected]mailto:[email protected]

Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 5:11 PM, Nowlan, Sean
<[email protected]mailto:[email protected]> wrote:
I built Tom’s safe_align branch on E100 and ran volk_profile. It
segfaulted on "RUN_VOLK_TESTS:volk_32fc_s32fc_multiply_32fc_a. I’ll get
a stack trace for you.

Sean

Really interesting that it’s the same block. Hopefully, it’s a single,
simple fix. I’ll look into it when you can get me the stack trace.

Thanks for reporting!
Tom

From:
discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalidmailto:[email protected]
[mailto:discuss-gnuradio-bounces+sean.nowlanmailto:discuss-gnuradio-bounces%2Bsean.nowlan[email protected]mailto:[email protected]]
On Behalf Of Tom R.
Sent: Friday, February 17, 2012 2:33 PM
To: Nick F.
Cc: [email protected]mailto:[email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 2:30 PM, Nick F.
<[email protected]mailto:[email protected]> wrote:
On Fri, Feb 17, 2012 at 11:20 AM, Carles Fernandez
<[email protected]mailto:[email protected]> wrote:
Thanks for the inputs!

We are interested in determining the best architecture at instantation
time. What would be the best strategy? We though about running the
same operations several times for each architecture, measure the
results and use the fastest one for the processing blocks. Would this
be the right approach?

Carles,

Run volk_profile. It does exactly what you said, and writes the results
to ~/.volk/volk_config. Volk reads this file when it is involked (sorry)
to determine which particular function to execute. So all you do is run
volk_profile once on any given machine, and it’s optimized.

–n

Carles,
This is discussed on the webpage:
http://gnuradio.org/redmine/projects/gnuradio/wiki/volk

We’ll be updating this as things progress with Volk, but the profiler
info is there already.

Tom

On Wed, Feb 22, 2012 at 10:19 PM, Nowlan, Sean
[email protected]wrote:

I confirmed this works on E100 insofar as I no longer get a segfault on
volk_32fc_s32fc_multiple_32fc_a. But volk_32fc_s32f_magnitude_16i_a and
volk_32fc_x2_multiply_32fc_a still fail as expected.****


Sean

Sean,
I just merged Nick’s branch into my safe_align branch. Can you check
that
one out and test when you get a chance? I just want to make sure we’re
all
on the same branch here. And please post the output of ‘ctest -V -R
volk’.

Thanks!
Tom

Hi Tom,

I tested with your merged branch. No segfault and same tests fail as
expected.

I noticed several weird numbers in the orc results. Some of them
correspond to the failed cases. Do you know what is causing these?
Printf formatting issue? Hitting bounds of float type and wrapping
around? Relevant output:

RUN_VOLK_TESTS: volk_16ic_deinterleave_16i_x2_a
generic completed in 42.47s
orc completed in 3.10883e-39s
Best arch: orc

RUN_VOLK_TESTS: volk_16ic_s32f_deinterleave_32f_x2_a
generic completed in 17.28s
orc completed in -3.90577e+11s
Best arch: orc

RUN_VOLK_TESTS: volk_32fc_s32f_magnitude_16i_a
generic completed in 4.37s
orc completed in 1.35136e+09s
offset 1107 in1: 29281 in2: 29282
offset 1187 in1: -27601 in2: -27600
offset 1522 in1: -31248 in2: -31249
offset 2396 in1: 26146 in2: 26145
offset 2486 in1: 25394 in2: 25393
offset 4084 in1: 16452 in2: 16451
offset 5052 in1: 28692 in2: 28691
offset 5296 in1: 30869 in2: 30868
offset 5467 in1: -32706 in2: -32705
offset 6388 in1: 19556 in2: 19557
volk_32fc_s32f_magnitude_16i_a: fail on arch orc
Best arch: generic

RUN_VOLK_TESTS: volk_32fc_magnitude_32f_a
generic completed in 35.24s
orc completed in 6.93125e+10s
Best arch: generic

RUN_VOLK_TESTS: volk_32fc_x2_multiply_32fc_a
generic completed in 52.66s
orc completed in -3.4978e+12s
offset 3 in1: 0.382086 in2: 0.382086
offset 4 in1: 0.496706 in2: 0.496706
offset 8 in1: 0.170967 in2: 0.170967
offset 10 in1: 0.165878 in2: 0.165878
offset 14 in1: 0.398192 in2: 0.398192
offset 15 in1: 0.492358 in2: 0.492358
offset 17 in1: 0.568251 in2: 0.568251
offset 19 in1: 0.0630723 in2: 0.0630723
offset 20 in1: 0.251459 in2: 0.251459
offset 22 in1: 0.348539 in2: 0.348539
volk_32fc_x2_multiply_32fc_a: fail on arch orc
offset 0 in1: 0.140486 in2: 0.140486
offset 1 in1: 0.691375 in2: 0.691375
offset 5 in1: 0.63745 in2: 0.63745
offset 11 in1: 0.644697 in2: 0.644697
offset 14 in1: 0.858205 in2: 0.858205
offset 15 in1: 0.94011 in2: 0.94011
offset 16 in1: 0.490713 in2: 0.490713
offset 18 in1: 0.190573 in2: 0.190573
offset 19 in1: 0.0226408 in2: 0.0226408
offset 20 in1: 0.895774 in2: 0.895774
volk_32fc_x2_multiply_32fc_a: fail on arch orc
offset 1 in1: 0.524585 in2: 0.524585
offset 2 in1: 0.236218 in2: 0.236218
offset 6 in1: 0.733853 in2: 0.733853
offset 9 in1: 0.290247 in2: 0.290247
offset 11 in1: 0.529422 in2: 0.529422
offset 12 in1: 0.180218 in2: 0.180218
offset 14 in1: 0.496568 in2: 0.496568
offset 15 in1: 0.0297472 in2: 0.0297472
offset 19 in1: 0.351138 in2: 0.351138
offset 20 in1: 0.300737 in2: 0.300737
volk_32fc_x2_multiply_32fc_a: fail on arch orc
Best arch: generic

Sean

From: [email protected] [mailto:[email protected]] On Behalf Of
Tom R.
Sent: Thursday, February 23, 2012 11:18 AM
To: Nowlan, Sean
Cc: Nick F.; [email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Wed, Feb 22, 2012 at 10:19 PM, Nowlan, Sean
<[email protected]mailto:[email protected]> wrote:
I confirmed this works on E100 insofar as I no longer get a segfault on
volk_32fc_s32fc_multiple_32fc_a. But volk_32fc_s32f_magnitude_16i_a and
volk_32fc_x2_multiply_32fc_a still fail as expected.

Sean

Sean,
I just merged Nick’s branch into my safe_align branch. Can you check
that one out and test when you get a chance? I just want to make sure
we’re all on the same branch here. And please post the output of ‘ctest
-V -R volk’.

Thanks!
Tom

From: [email protected]mailto:[email protected]
[mailto:[email protected]mailto:[email protected]] On Behalf
Of Tom R.
Sent: Tuesday, February 21, 2012 6:49 PM
To: Nick F.
Cc: Nowlan, Sean;
[email protected]mailto:[email protected]

Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Tue, Feb 21, 2012 at 6:43 PM, Nick F.
<[email protected]mailto:[email protected]> wrote:
Tom, Sean,

There’s a couple of things here. First, the Orc
volk_32fc_s32f_magnitude_16i_a function is rounding differently than the
generic versions on E100 for some reason. Not fatal, totally usable, but
it makes the QA code fail. Second, the volk_32fc_x2_multiply_32fc_a
looks like it’s working fine but the thresholds are too close in the
comparison function, which is strange because it uses the same threshold
I use everywhere else. I’ll keep looking into that. In any case, they’re
fine for use in Volk as-is.

I think the segfault in volk_32fc_s32fc_multiply_32fc_a is being caused
by a bug in the profiler code as well. It’s not correctly handling
complex scalars. The function itself doesn’t actually work either, which
doesn’t help, but it wasn’t caught because the profiler code was
buggy…

Tom, I pushed a fix to my github under “volk_fix”. For now I’ve disabled
volk_32fc_s32fc_multiple_32fc_a since I can’t figure out a clean way to
get it to work under Orc; I had a misunderstanding of how float
parameters are handled inside array operations. I also added complex
scalar handling. I’ll keep looking into solving this one for real but
this will get things working for now.

–n

Nick,
Thanks a ton for working on this. I’ll merge your branch asap.

Tom

On Sat, Feb 18, 2012 at 1:05 PM, Tom R.
<[email protected]mailto:[email protected]> wrote:
On Fri, Feb 17, 2012 at 6:04 PM, Nowlan, Sean
<[email protected]mailto:[email protected]> wrote:
Don’t know how helpful these are, but here you go.

Sean

Sean,
It looks like a couple of functions are failing from the stdout:

volk_32fc_s32f_magnitude_16i_a: fail on arch orc
volk_32fc_x2_multiply_32fc_a: fail on arch orc

These are both the Orc implementations of the functions, which seem to
work fine on my Intel processors. I don’t have access to an OSX box or
an E100, so I can’t really test this out. The files you sent me don’t
(appear to) tell me what the real problem is.

We’ll need some other brave soul out there who can dig into these issues
on the platforms for us.

Thanks,
Tom

From: [email protected]mailto:[email protected]
[mailto:[email protected]mailto:[email protected]] On Behalf
Of Tom R.
Sent: Friday, February 17, 2012 5:25 PM
To: Nowlan, Sean
Cc: Nick F.;
[email protected]mailto:[email protected]

Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 5:11 PM, Nowlan, Sean
<[email protected]mailto:[email protected]> wrote:
I built Tom’s safe_align branch on E100 and ran volk_profile. It
segfaulted on "RUN_VOLK_TESTS:volk_32fc_s32fc_multiply_32fc_a. I’ll get
a stack trace for you.

Sean

Really interesting that it’s the same block. Hopefully, it’s a single,
simple fix. I’ll look into it when you can get me the stack trace.

Thanks for reporting!
Tom

From:
discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalidmailto:[email protected]
[mailto:discuss-gnuradio-bounces+sean.nowlanmailto:discuss-gnuradio-bounces%2Bsean.nowlan[email protected]mailto:[email protected]]
On Behalf Of Tom R.
Sent: Friday, February 17, 2012 2:33 PM
To: Nick F.
Cc: [email protected]mailto:[email protected]
Subject: Re: [Discuss-gnuradio] Using volk in Mac: test report

On Fri, Feb 17, 2012 at 2:30 PM, Nick F.
<[email protected]mailto:[email protected]> wrote:
On Fri, Feb 17, 2012 at 11:20 AM, Carles Fernandez
<[email protected]mailto:[email protected]> wrote:
Thanks for the inputs!

We are interested in determining the best architecture at instantation
time. What would be the best strategy? We though about running the
same operations several times for each architecture, measure the
results and use the fastest one for the processing blocks. Would this
be the right approach?

Carles,

Run volk_profile. It does exactly what you said, and writes the results
to ~/.volk/volk_config. Volk reads this file when it is involked (sorry)
to determine which particular function to execute. So all you do is run
volk_profile once on any given machine, and it’s optimized.

–n

Carles,
This is discussed on the webpage:
http://gnuradio.org/redmine/projects/gnuradio/wiki/volk

We’ll be updating this as things progress with Volk, but the profiler
info is there already.

Tom