On Wed, Jan 12, 2011 at 11:03 AM, Tom R. [email protected]
wrote:
I wanted to throw out another idea that no one seems to be bringing
up, and this relates to a comment back about how CUDA is limited
because of the bus transfers. That’s not CUDA that is doing that but
the architecture of the machine and having the host (CPU) and device
(GPU) separated on a bus. That has nothing to do with CUDA as a
language.
I think the notion that the language is not the barrier (the hardware
architecture is) is precisely why I personally am more excited about
OpenCL as a language than CUDA per-se. CUDA is inherently tied to
nVidia hardware, and while is conceivable that CUDA will end up being
supported on a wider variety of CPU/GPU architectures (e.g. the
recently announced ‘Project Denver’), I don’t imagine it will ever
find support on non-nVidia hardware. OpenCL is, on the other hand,
enjoying support from a wide variety of hardware vendors (AMD/ATI,
nVidia, IBM, Intel, Apple, etc.), and was designed to run on a wide
variety of architectures (including a mix of CPU’s, GPU’s,
accelerator/DSP boards, etc.). In the long run it seems to me to be a
much better environment for dealing with heterogeneous computing, and
without bringing up any serious concerns about being tied to any
single vendor.
Currently, though, GPUs still have a place for certain applications,
even in signal processing and radio. They are not a panacea for
improving the performance of all signal processing applications, but
if you understand the limitations and where they benefit you, you can
get some really good gains out of them. I’m excited about anyone
researching and experimenting in this area and very hopeful for the
future use of any knowledge and expertise we can generate now.
Tom
Agreed. Having spent some time on working with OpenCL on GPU’s for
solving a different sort of problem, I completely agree they are both
powerful, and not a silver bullet.
I would like to echo some of the previous comments: replacing single
processing blocks in a flowgraph with a drop-in CUDA/OpenCL
replacement is not likely to lead to any significant gains. It may
relieve some of the work the CPU has to do (and thus be a net gain in
terms of total samples that can be processed without dropping any on
the floor), but I suspect Steve is correct: the big gains will be made
in either applications requiring large filtering/channelizers/etc. or
with complete RX and/or TX chains written in OpenCL, and GNURadio
merely acting as a shuttle from the USRPx/UHD-enabled source/sink and
the smaller trickle of bits coming back out (or going in). If that is
the case, I think the follow-on question becomes: does GNURadio need
to do anything to support OpenCL/CUDA/etc. enabled applications, or is
everyone that is doing that sort of work simply writing their own
custom block to interface with their custom OpenCL/CUDA/etc. kernel,
since they are likely going to have to do all sorts of nasty
optimization tricks to get the best performance for their particular
application anyways? Or can a common block serve as a generic
interface, which loads whatever custom kernel needs to be written, and
works well enough in 90% of the cases? I’d like to think the latter is
true, but I don’t have any evidence as of yet that it might be.
Perhaps at a later date I’ll have something to share that points in
one direction or the other.
Doug
–
Doug G.
[email protected]