Recently, I’m implementing some synchronization algorithm in block,
which
require higher efficiency. I try to use the function
ccomplex_dotprod_sse()
to speed up the block, but I always get “segment fault”.
I’m using a PC with Pentium D CPU, and I’m sure the
ccomplex_dotprod_sse()
works well in the fir filter blocks.
I just simply include the ccomplex_dotprod_x86.h and call the
ccmplex_dotprod_sse() function. Maybe I missed something but I have no
idea
of what it is? Please give me some hints.
On Wed, Mar 14, 2007 at 05:26:02PM +0800, hanwen wrote:
Hi, everyone,
Recently, I’m implementing some synchronization algorithm in block, which
require higher efficiency. I try to use the function ccomplex_dotprod_sse()
to speed up the block, but I always get “segment fault”.
I’m using a PC with Pentium D CPU, and I’m sure the ccomplex_dotprod_sse()
works well in the fir filter blocks.
I just simply include the ccomplex_dotprod_x86.h and call the
ccmplex_dotprod_sse() function. Maybe I missed something but I have no idea
of what it is? Please give me some hints.
You’re probably not honoring its alignment requirements.
input and taps are guarenteed to be 16 byte aligned.
n_2_ccomplex_blocks is != 0
ccomplex_dotprod_generic (const float *input,
const float *taps, unsigned
n_2_ccomplex_blocks, float *result)
{
float sum0 = 0;
float sum1 = 0;
float sum2 = 0;
float sum3 = 0;
do {
sum0 += input[0] * taps[0] - input[1] * taps[1];
sum1 += input[0] * taps[1] + input[1] * taps[0];
sum2 += input[2] * taps[2] - input[3] * taps[3];
sum3 += input[2] * taps[3] + input[3] * taps[2];
input += 4;
taps += 4;
} while (–n_2_ccomplex_blocks != 0);
result[0] = sum0 + sum2;
result[1] = sum1 + sum3;
}
There’s a reason for all that other code that you are ignoring…
Eric
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.