On Wed, Dec 12, 2007 at 11:51:20PM +0530, Rohit G. wrote:
I was following the separate discussion on this list about writing
various trig functions using vector intrinsics. I googled for it. The
top few results I got were for “old” processors when SIMD intrinsics
were new. The gcc documentation (my version is 4.1.2) has a list of
intrinsics but no description, not even one line per intrinsic.
I believe those are 1-to-1 with the actual machine instructions.
See the intel or AMD docs.
As there is need to optimize the codebase for new processors (conroe,
barcelona etc) any way, can you please point me to some real
documenatation on the subject. I would really appreciate any help.
I’m not sure exactly what you’re looking for. Both intel and AMD
have manuals about optimizing code for their microarchitectures.
You’ll find them somewhere on their developer sites.
Probably the biggest place that needs improvement is trig functions.
I suggest starting with sin(x), cos(x) and sincos(x) for x a scalar
float, and a related version that computes 4 in parallel for x a
vector of 4 floats. I’d do two versions of each: SSE2 for x86 and
SSE2 for x86_64 (on the 64 you’ve got twice as many registers to work
We need them with something close to single-precision floating point
accuracy. You’ll need to figure out what input domain you’re willing to
accept; I’d say at a minimum +/- 4*pi.
As a related question, possibly a digression, given that these
extensions are the key to unlock full power of new processors and yet
are rather low level (we are still writing trig funcs), is there any
FLOSS library for simd math?
Not sure. Please check it out and let us know what you find.
There is of course the ATLAS stuff (optimized BLAS).