I have a complex phase rotation function that uses a pre-generated

sin/cos

LUT and some basic multiple/adds.

As it turns out, the rotation calc, which uses “straight” C/C++ math is

still the bottleneck in a demod.

## I was wondering, is there some uber-efficient rotation block/class I

should

be using? I notice there is a volk kernel for the job and gr_rotator.

But I also should mention that the phase rotation operation must happen

one

sample at a time. This is due to the sequential nature of the algorithm

ie. I can’t align and call a kernel with hundreds of nicely-aligned

samples.

Any advice?

-John

Traditionally this was a job for CORDIC. I don’t know what the

tradeoffs

look like on a modern processor, though. If a significant part of your

algorithm operates in phase/magnitude, might you consider a rect->polar

conversion?

John M. [[email protected]] wrote:

ie. I can’t align and call a kernel with hundreds of nicely-aligned

samples.

Any advice?

-John

Hi John,

I have a complex phase rotation function that uses a pre-generated sin/cos

LUT and some basic multiple/adds.

As it turns out, the rotation calc, which uses “straight” C/C++ math is

still the bottleneck in a demod.

I don’t quite get what you need …

Rotating a single sample by a given angle is one sin/cos (for which

you use LUT) and a single complex multiply (and gcc is going to have a

pretty optimized version of that).

Doing stuff for single samples at a time, the bottleneck isn’t going

to be the computation, it’s going to be the load/store from/to memory.

Also are we talking changing the phase by a fixed amount , or an

amount that change after each sample (like the rotator kernel does) ?

Need more info … In optimization the devil is in the details

Also lack of alignement doesn’t mean SIMD is out if you have several

of them to do at once …

Cheers,

Sylvain

On Tue, May 26, 2015 at 7:37 PM, John M.

<[email protected]

wrote:

the algorithm - ie. I can’t align and call a kernel with hundreds of

nicely-aligned samples.

Any advice?

-John

To follow-up on Sylvain’s questions: is the restriction really on doing

single-sample rotation (because of some intermediate calculation to

generate the phase advance for the next sample), or on the alignment?

I’ll

note that *in general* intel doesn’t take much (including, often, a

non-measurable) hit on non-aligned SIMD operations. Also - even if you

aren’t operating in the hundreds of sample range, that the SIMD kernel

can

save you if you’re operating on e.g. several sample as a time (for SSE

it’s

only doing two at a time anyways, with a clean-up loop for an odd-number

of

samples, and AVX does four at a time, with a clean-up loop for

non-multiples of four).

Besides, who can resist looking at something called the rotatorpuppet

for

inspiration on how to call the main kernel?

I’ll also point out that the gr::blocks::rotator (which is not a block,

i.e. separate from the rotator_cc, which is a block) has both a rotate()

method that operates on a single sample, and a rotateN() method that

operates on n samples. It is the later that calls down into the volk

rotator kernel.

## It really does need to happen on one sample at a time - at least

assuming I

use the same algorithm I’m using now. I am pretty much using the method

Sylvain suggests. The rotation is one operation of many inside a block

ie. many rotations happen per call to work(), but one rotation per input

sample as the rotation is dependent on what happened with previous

samples.

Still processing what Tom/Doug are suggesting otherwise. The mod is

generally product by something that isnt on GNU Radio. When we recreate

the mod in software we definitely use a form that is easily done in

vector

form. Haven’t quite wrapped my head on how to do the same on the

receiving

end while achieving optimum detection… Got any good papers?

Converting everything to phase might be a half-way reasonable approach.

Imminently, I only need to make this ~22% faster. It’s possible this

might

work on a faster processor.

-John

On Thu, May 28, 2015 at 1:52 PM, Douglas G.

[email protected] wrote:

To follow-up on Sylvain’s questions: is the restriction really on doing

single-sample rotation (because of some intermediate calculation to generate

the phase advance for the next sample), or on the alignment?

Based on available information, I’m guessing the prior scenario with

something like a filtered sequence feeding a phase modulator to

generate GMSK or similar non-linear CPM signal. In cases like these,

is the phase modulator really necessary?

Many non-linear digital modulations in use these days have some form

of linear representation that is more efficient to implement on

vector-capable hardware. One solution to the rotator inefficiency

might be to get rid of it entirely.

-TT