On Mon, Sep 29, 2008 at 03:13:09PM -0700, Inderaj B. wrote:
Yes I want to use SIMD. Since I want to spend most time improving
performance, it would be nice if I can start off from something functioning
or put together something quickly.
How much effort would it be to get a GSM (other?) all software system
together (except A/D I guess). Maybe I could use pre-generated streams on
both ends in software.
This is great. What we’ve been thinking about is building a library
of SIMD accelerated primitives, along the lines of Intel’s Integrated
Performance Primitives. The crucial differences would be: free
software (GPLv3); support for SSE, SSE2, SSE3, Altivec and Cell SPE
Our working title for this is the “Generic Performance Primtives” (GPP).
One unresolved issue is what code to start with. We need a framework
that provides for reference implementations, QA, testing all argument
alignments, correctness, performance, etc; runtime dispatch based on
the equivalent of cpuid; can be built as both shared and static
libraries (need static on the SPE).
The basic idea (for the user visible routines) would be to start with
the well thought out API described in Volume 1 (Signal Processing) of
the IPP docs, peforming a s/ipp/gpp/g.
Two possible starting points are:
liboil http://liboil.sourceforge.net (currently x86, x86-64
Framewave http://framewave.sourceforge.net (x86 and x86-64 only)
(Framewave is built on top of SSEPlus, a thin wrapper on top of the SSE
C/C++ intrinsics. http://sseplus.sourceforge.net
Mostly it appears that they provide emulations for instructions that
are missing at a particular level. E.g., your code could target SSE3,
and they’d emulate the missing addsub instruction in terms of SSE.)
For starters, it would be great if you could look at these two options
(and any others that you come across) and let us know how you think
these would work out as starting points, given the requirements above.
If this seems like more than you want to bite off, I can provide a
list of high-priority functions and you could start implementing the
reference version and any of the SSE*, Altivec or SPE versions that
grab your attention. We’re big on complex arithmetic
Please let me know how this sounds and if you’ve got any comments or