On Wed, Mar 14, 2007 at 05:26:02PM +0800, hanwen wrote:
Recently, I’m implementing some synchronization algorithm in block, which
require higher efficiency. I try to use the function ccomplex_dotprod_sse()
to speed up the block, but I always get “segment fault”.
I’m using a PC with Pentium D CPU, and I’m sure the ccomplex_dotprod_sse()
works well in the fir filter blocks.
I just simply include the ccomplex_dotprod_x86.h and call the
ccmplex_dotprod_sse() function. Maybe I missed something but I have no idea
of what it is? Please give me some hints.
You’re probably not honoring its alignment requirements.
input and taps are guarenteed to be 16 byte aligned.
n_2_ccomplex_blocks is != 0
ccomplex_dotprod_generic (const float *input,
const float *taps, unsigned
n_2_ccomplex_blocks, float *result)
float sum0 = 0;
float sum1 = 0;
float sum2 = 0;
float sum3 = 0;
sum0 += input * taps - input * taps;
sum1 += input * taps + input * taps;
sum2 += input * taps - input * taps;
sum3 += input * taps + input * taps;
input += 4;
taps += 4;
} while (–n_2_ccomplex_blocks != 0);
result = sum0 + sum2;
result = sum1 + sum3;
There’s a reason for all that other code that you are ignoring…