# Using volk

Hello all,

I wondered about volk. I want it to compute mean to peak value of a
complex
array. How could I do this?
Besides, I really need to know is there any example of using volk? The
code
itself, doesn’t reflect input and output parameters explicitly.

Best,
Mostafa

Hi Mostafa,

VOLK is but an accelerated Library of Vector Optimized Kernels.
What you want is basically three operations:
a) finding maximum absolute
b) finding average absolute
c) dividing these two values

Now, looking closer at a) and b), one notices that both require the
samples to be converted to their magnitudes, first. And because we’re in
the business of optimizing things, let’s just use the squared magnitude,
because that’s faster to compute by one sqrt, usually. So this boils
down to
a) take mag_squared of input (length N)
b1) find maximum of a)
b2) find sum of a)
c) sqrt(b2/b1)/N

As you can see, c) is not a vector operation, and thus not a case for
volk.
For a) (“Complex to Mag ^ 2”) there is a GNU Radio block that uses VOLK.
That’s the example for using VOLK that I would have recommended to read,
anyway

In other terms: If you don’t have to write your own highly optimized
block, don’t use VOLK directly, use the standard GNU Radio blockset.
It’s rather optimized

Now, for the maximum search b1, things are a bit more complicated.
Searching for a maximum is not easily vectorizable, because it is a
inherently sequential operation (think of it as the first step of a
bubble sort).
Now, you can achieve awesome performance by basically turning your
linear search into a N-ary tree, with N being the order of parallelism
you can achieve by using a maximum-finding SIMD instruction. But that
requires the size of the problem to be a power of N. That just doesn’t
fly well with the usually more “multiple of 64 bit”-typey alignment
restrictions.
You’re however, highly encouraged to try just that: use the existing
volk_32f_x2_max_32f, which compares two vectors, and stores the
element-wise maximum in a third one, to compare the first with the
second half of your mag_squared vector, and repeat the same with the
first and second half of the result (and so on) until you have a single
maximum value. That’s the comparison tree from above for the N=2 case.
You can employ clever overlapping to use as many values twice in the
input to virtually extend your input’s length to a power of two, and
then just waltz on.

For b2) you can simply use the “integrate” block, which is not VOLK
optimized (possibly because it’s a gengen template and these are so
much fun
to specialize). But seeing as it is simply an accumulating for
loop, I kind of expect your compiler to make the best of the situation.
However, you can also use the volk_32f_accumulator_s32f VOLK kernel. I
kind of want to use that in integrate, because for my machine, the SSE
VOLK kernel is 4 times as fast as the generic implementation, which
nicely matches the 4-operand SSE SIMD instruction behind it.

Greetings,
Marcus

On Tue, Oct 7, 2014 at 3:49 PM, Mostafa A.
[email protected]
wrote:

Hello all,

I wondered about volk. I want it to compute mean to peak value of a
complex array. How could I do this?
Besides, I really need to know is there any example of using volk? The
code itself, doesn’t reflect input and output parameters explicitly.

Best,
Mostafa

Marcus gave you a far more complete answer. Just a few words here.
First,
VOLK is used all over the place in GNU Radio. So look for blocks that
use
VOLK. Second, the name scheme of VOLK is incredibly explicit as to the
input and output parameters. See the manual page and wiki page
describing
VOLK:

There also my talk on VOLK recorded for the IEEE Signal Processing
Society:

http://www.trondeau.com/blog/2013/6/12/nearly-50-minutes-of-volk.html

Tom

Thank you so much Marcus,

I’ve learnt so much from you here
The algorithm of finding the Max of a vector by comparing one half to
the
other half, is an appropriate idea! I can’t use GNURadio blocks for this
calculation because I must do these within my own block.

Unfortunately, I’m not familiar with optimization and parallelization
algorithm, so I just want to compute fast not necessarily as fast as
possible

Best,
Mostafa

On Wed, Oct 8, 2014 at 5:56 AM, Marcus M. [email protected]
wrote:

the business of optimizing things, let’s just use the squared magnitude,

can achieve by using a maximum-finding SIMD instruction. But that requires
on.
Greetings,
itself, doesn’t reflect input and output parameters explicitly.

[email protected]

Department of Electrical Engineering
Aboureyhan Building
MMWCL LAB
Amirkabir University Of Technology
Tehran
IRAN
Tel: +98 (919) 158-7730
LAB: http://ele.aut.ac.ir/~mmwcl/?page_id=411

Compilers are good, just use a linear comparison:

float *current= input;
float max = *current;
float *end = first + length_of_array;
while(current < end){
max = (*current > max) ? *current++ : max;