Fast dot product?

Hello,

I’d like to calculate the complex dot product with GNU Radio.

Is there a way to do it faster that this? This correlation takes
“ages”…

— snipp —

// calculates the inner product <a, b> between complex vectors (dot
// product), optionally shifts vector b first.
gr_complex ofdm_rx_cc::dot_prod(gr_complex* a, gr_complex* b, unsigned
int len, int shift)
{
gr_complex cor;

``````assert((unsigned int)abs(shift)<len);

if (shift<0) { return conj(dot_prod(b, a, -shift)); }

b=b+shift;
for(unsigned int i=0; i<(len-shift); i++) {
cor+=a[i]*conj(b[i]);
}

return cor;
``````

}

— snipp –

Jens

On Thu, Jul 27, 2006 at 12:47:46AM +0200, Jens E. wrote:

Hello,

I’d like to calculate the complex dot product with GNU Radio.

Is there a way to do it faster that this? This correlation takes
“ages”…

You can probably repurpose the existing gr_fir_ccc.h code.
It doesn’t apply the conjugate, so you’ll need to work that into your
b values. If you use gr_fir_ccc.h, you’ll get hand-coded SIMD
assembler on x86 and x86-64, and generic C++ code on the rest.

The set_taps operation is relatively expensive. For our purposes, we
expected it to be done infrequently. This allowed us to amortize the
cost of building multiple copies of the taps at different alignments.
This was required to take advantage of the 128-bit loads with SSE.

Eric

On Wed, Jul 26, 2006 at 11:08:25PM -0700, Eric B. wrote:

It doesn’t apply the conjugate, so you’ll need to work that into your
b values. If you use gr_fir_ccc.h, you’ll get hand-coded SIMD
assembler on x86 and x86-64, and generic C++ code on the rest.
Thanks, I’ll have a look at it.

The set_taps operation is relatively expensive. For our purposes, we
expected it to be done infrequently. This allowed us to amortize the
cost of building multiple copies of the taps at different alignments.
This was required to take advantage of the 128-bit loads with SSE.
I’m using guard interval cross correlation to synchronize the OFDM
frame and to estimate the frequency offset. The correlation code is
the bottle neck, and I guess anything will be faster than what I have
now.

Jens