Please see answers in-line.
Thanks!
Eric B. [email protected]
12/11/2007 02:31 PM
To
Eugene Grayver [email protected]
cc
[email protected]
Subject
Re: [Discuss-gnuradio] Re-writing blocks using intel libraries
On Tue, Dec 11, 2007 at 10:13:32AM -0800, Eugene Grayver wrote:
Hello,
We are working on some systems that require high sampling rates. I am
already using the Intel C++ compiler at the highest optimization ratio,
but a lot of the blocks are very slow still. It appears that intel C++
does not properly vectorize data type.
General curiosity questions:
Are you using oprofile to measure performance?
I am a bit of a maverick, and for various reasons am using a pure C++
environment. I hacked my own ‘connect_block’ function (can;t wait for
v3.2, where these will be part of native gr). I am measuring the
performance using a custom block (gr_throughput) that simply reports the
average number of samples processed per second.
What h/w platform are you running on / tuning for?
The platform is currently Intel Xeon or Core2 Duo.
You’re not trying to run your app on a cache-crippled machine like a
Celeron, are you?
No, very high end.
Which blocks are causing you the biggest problem?
I got a 2x improvement on all the filtering blocks. About a 40%
improvement for sine/cosine generation blocks. This includes gr_expj,
gr_rotate.
Are your problems caused primarily by lack of CPU cycles, cache
misses or mis-predicted branches?
I am not sure, since I am not at all a software expect (mostly
dsp/comm).
My guess is that the SSE instructions are not being used (or not used to
a
full extent). Even the ‘multiply’ block is VERY slow compared to a
vector
x vector multiplication in the Intel library. Some of the gr_blocks
process each sample using a separate function call (e.g.
for (n=0; n<noutput_samples; n++)
scale(in[n])
Replacing this with a single vectorized function call is much faster.
I have been replacing almost every low level block with a functionally
equivalent using the intel performance libraries (IPP). These libraries
are not GPL, but are free for noncommercial use under Linux ($200
otherwise). At some point, I would like to contribute our work back to
gnuradio. Would this fit with the gr philosophy? How should we
structure
the code? (i.e. have a separate set of files, use #defines, or …)?
Eugene
We would not accept the changes. Part of what we’re up to is building
an ever expanding universe of free code. Instead of using the
non-free IPP code, please consider using a free library such as ATLAS,
or help us find and fix performance challenges in a way that doesn’t
require non-free code. Also, are you sure that your performance
issues can’t be better addressed with an algorithmic change? If
you’re using a lot of very low-level blocks (e.g., add, multiply,
etc.) you’re probably better off writing a block that aggregates some
of the operations into a single block.
That’s what I expected. We’ll try to contribute the more dsp-centric
blocks such as demodulators.
Eric