Fast, Single-Sample Phase Rotation

Detlef_R · May 27, 2015, 1:38am

I have a complex phase rotation function that uses a pre-generated
sin/cos
LUT and some basic multiple/adds.

As it turns out, the rotation calc, which uses “straight” C/C++ math is
still the bottleneck in a demod.

I was wondering, is there some uber-efficient rotation block/class I
should
be using? I notice there is a volk kernel for the job and gr_rotator.
But I also should mention that the phase rotation operation must happen
one
sample at a time. This is due to the sequential nature of the algorithm

ie. I can’t align and call a kernel with hundreds of nicely-aligned
samples.

Any advice?

-John

jmaenpaa · May 27, 2015, 1:55am

Traditionally this was a job for CORDIC. I don’t know what the
tradeoffs
look like on a modern processor, though. If a significant part of your
algorithm operates in phase/magnitude, might you consider a rect->polar
conversion?

John M. [[email protected]] wrote:

ie. I can’t align and call a kernel with hundreds of nicely-aligned
samples.

Any advice?

-John

jmaenpaa · May 28, 2015, 3:54pm

Hi John,

I have a complex phase rotation function that uses a pre-generated sin/cos
LUT and some basic multiple/adds.

As it turns out, the rotation calc, which uses “straight” C/C++ math is
still the bottleneck in a demod.

I don’t quite get what you need …

Rotating a single sample by a given angle is one sin/cos (for which
you use LUT) and a single complex multiply (and gcc is going to have a
pretty optimized version of that).

Doing stuff for single samples at a time, the bottleneck isn’t going
to be the computation, it’s going to be the load/store from/to memory.

Also are we talking changing the phase by a fixed amount , or an
amount that change after each sample (like the rotator kernel does) ?

Need more info … In optimization the devil is in the details
Also lack of alignement doesn’t mean SIMD is out if you have several
of them to do at once …

Cheers,

Sylvain

jmaenpaa · May 28, 2015, 10:53pm

On Tue, May 26, 2015 at 7:37 PM, John M.
<[email protected]

wrote:

the algorithm - ie. I can’t align and call a kernel with hundreds of
nicely-aligned samples.

Any advice?

-John

To follow-up on Sylvain’s questions: is the restriction really on doing
single-sample rotation (because of some intermediate calculation to
generate the phase advance for the next sample), or on the alignment?
I’ll
note that in general intel doesn’t take much (including, often, a
non-measurable) hit on non-aligned SIMD operations. Also - even if you
aren’t operating in the hundreds of sample range, that the SIMD kernel
can
save you if you’re operating on e.g. several sample as a time (for SSE
it’s
only doing two at a time anyways, with a clean-up loop for an odd-number
of
samples, and AVX does four at a time, with a clean-up loop for
non-multiples of four).
Besides, who can resist looking at something called the rotatorpuppet
for
inspiration on how to call the main kernel?

I’ll also point out that the gr::blocks::rotator (which is not a block,
i.e. separate from the rotator_cc, which is a block) has both a rotate()
method that operates on a single sample, and a rotateN() method that
operates on n samples. It is the later that calls down into the volk
rotator kernel.

jmaenpaa · May 29, 2015, 12:58am

It really does need to happen on one sample at a time - at least
assuming I
use the same algorithm I’m using now. I am pretty much using the method
Sylvain suggests. The rotation is one operation of many inside a block

ie. many rotations happen per call to work(), but one rotation per input
sample as the rotation is dependent on what happened with previous
samples.

Still processing what Tom/Doug are suggesting otherwise. The mod is
generally product by something that isnt on GNU Radio. When we recreate
the mod in software we definitely use a form that is easily done in
vector
form. Haven’t quite wrapped my head on how to do the same on the
receiving
end while achieving optimum detection… Got any good papers?

Converting everything to phase might be a half-way reasonable approach.

Imminently, I only need to make this ~22% faster. It’s possible this
might
work on a faster processor.

-John

jmaenpaa · May 28, 2015, 11:51pm

On Thu, May 28, 2015 at 1:52 PM, Douglas G.
[email protected] wrote:

To follow-up on Sylvain’s questions: is the restriction really on doing
single-sample rotation (because of some intermediate calculation to generate
the phase advance for the next sample), or on the alignment?

Based on available information, I’m guessing the prior scenario with
something like a filtered sequence feeding a phase modulator to
generate GMSK or similar non-linear CPM signal. In cases like these,
is the phase modulator really necessary?

Many non-linear digital modulations in use these days have some form
of linear representation that is more efficient to implement on
vector-capable hardware. One solution to the rotator inefficiency
might be to get rid of it entirely.

-TT

Fast, Single-Sample Phase Rotation

I was wondering, is there some uber-efficient rotation block/class I should be using? I notice there is a volk kernel for the job and gr_rotator. But I also should mention that the phase rotation operation must happen one sample at a time. This is due to the sequential nature of the algorithm

It really does need to happen on one sample at a time - at least assuming I use the same algorithm I’m using now. I am pretty much using the method Sylvain suggests. The rotation is one operation of many inside a block

I was wondering, is there some uber-efficient rotation block/class I
should
be using? I notice there is a volk kernel for the job and gr_rotator.
But I also should mention that the phase rotation operation must happen
one
sample at a time. This is due to the sequential nature of the algorithm

It really does need to happen on one sample at a time - at least
assuming I
use the same algorithm I’m using now. I am pretty much using the method
Sylvain suggests. The rotation is one operation of many inside a block