I’ve added the framework for Altivec, and implemented
gr_fir_fff_altivec using it. This gives a speedup of between 1.8 and
3.0 depending on the type of machine it’s running on. There’s probably
another factor of 1.25 - 1.50 that can be obtained by recoding
dotprod_fff_altivec.c directly in assembler (I coded it using
C intrinsics). The compiler is generating about 20% more instructions
in the inner loop than is necessary. Also, it may be possible to
improve dispatch by hand-scheduling the assembly.
This code lives in the features/mp-sched branch.
I’m not planning on doing anything more on Altivec at this time.
If you’re interested in adding the rest of the support, the framework
is complete. Take a look in filter/-powerpc and filter/altivec