Tesla C2000 series and CUDA and Gnu Radio

Is anyone out there taking another look at CUDA + Gnu Radio?

Some of the couple-of-years-old charts I’ve looked at suggest that
speedups for some of the
most important transforms we use vary between modest and
disappointing.

Cross-over points for things like FFTs are usually up in the atmospheric
levels of FFT sizes before
a CUDA-based transform would win even slightly against a
multi-threaded CPU-based FFTW, for
example. But that was a couple of years ago. Anything new along
those lines?

It seems like the kinds of things that do well on a GPU are ones that
take a small amount of input
data, compute ferociously, and produce modest amounts of output data.
Or schemes that might
consume deluges of input data, but produce output data only
occasionally–a flow that did
a bunch of FFTs and produced averaged mag-squared outputs only “once
in a while” might fare
well on a GPU.

On a related note, has anyone looked at enabling the multi-threaded FFTW
stuff? The cross-over
points there (between FFTW in a single-thread and FFTW in
multiple-threads) seem to be lower-down
on the FFT-size curve.


Principal Investigator
Shirleys Bay Radio Astronomy Consortium

On Wed, Dec 01, 2010 at 01:40:03AM -0500, Marcus D. Leech wrote:

On a related note, has anyone looked at enabling the multi-threaded FFTW
stuff? The cross-over
points there (between FFTW in a single-thread and FFTW in
multiple-threads) seem to be lower-down
on the FFT-size curve.

Marcus,

I haven’t tried it, but I’d guess that the crossover point is >= 16K
points.

If you try it, let us know what you find.

Eric

Hi Marcus,

Actually we are doing all our processing on the GPU. We use GNURADIO to
bring in the data and then run everything on the GPU and only copy back
the results.
As you wrote in your mail CPU-GPU resp. GPU-CPU copies can be
bottlenecks and should be avoided or reduced whenever possible. With the
current blocks it only makes sense to utilize the GPU if ALL blocks are
also available for the GPU. E.g. if you do a cross-correlation (as we
do) it does not make sense to do the FFTs on the GPU, but do the complex
multiplication on the CPU as there is a data-transfer GPU-CPU-GPU before
you can run the IFFTs.
For us the GPU is the device which enables our application to run in
real-time, which we could hardly achieve with the CPU. But getting the
GPU into GNURADIO is another story, I guess…

Regards,
Thomas

On 12/01/2010 03:40 PM, Marcus D. Leech wrote:

example. But that was a couple of years ago. Anything new along
well on a GPU.

On a related note, has anyone looked at enabling the multi-threaded FFTW
stuff? The cross-over
points there (between FFTW in a single-thread and FFTW in
multiple-threads) seem to be lower-down
on the FFT-size curve.


Dr. Thomas H.
Space-Time Measurement Project
Space-Time Standards Group
New Generation Network Research Center
National Institute of Information and Communications Technology

4-2-1 Nukui-Kitamachi, Koganei
184-8795 Tokyo
Japan