PS3/Cell BE platform

All,

has anyone been looking at the Cell BE processor as a gnuradio backend
platform?

For US$600 you get a Playstation 3 that boots linux, and has 7 128bit
RISC processors, each with 256k local memory, all moving at ~3.4GHz.
Each RISC processor is optimized for vector math / stream processing…
:slight_smile:

There is already an SDK (modified GNU toolchain) from IBM. It seems to
use a standard toolchain and apps that run on a 64bit ppc, but the
toolchain is modified to compile special threads (different exec type)
for the RISC processors (called SPEs). threads are written in C against
a provided library.

My thought for GnuRadio was to make near realtime mod/demod of ATSC a
possibility. As well as speeding up most other stream processing.

The SDK and info are here [1]. The .iso seems to have everything
needed, and the provided code examples aren’t too much different from
writing pthreads. The magic is in the added instruction set.

Good luck getting a hold of one before Christmas :slight_smile:

Jason.

[1] http://www-128.ibm.com/developerworks/power/cell/

Jason wrote:

There is already an SDK (modified GNU toolchain) from IBM. It seems
writing pthreads. The magic is in the added instruction set.

Good luck getting a hold of one before Christmas :slight_smile:

Jason.
Quite apart from the Cell processors, it also boasts dual GPUs, which
are also good at doing multiply-accumulate.
Hmmmm.

Yes. For a work project my group is purchasing two Mercury 1U Cell
servers. This will make ATSC and HDTV a real possibility. These
Mercury servers have a high bandwidth PCI express connection for getting
data in and out of the world. Another fellow and I will be porting
GnuRadio to the Cell BE ont he Mercury. It is not for consumers in
this form. These things are over $10,000 a piece with PPC Linux
installed. I cannot wait until we get the PS3’s hacked to be Linux
desktops.

Bob

Jason wrote:

There is already an SDK (modified GNU toolchain) from IBM. It seems
writing pthreads. The magic is in the added instruction set.
[email protected]
Discuss-gnuradio Info Page


AMSAT Director and VP Engineering. Member: ARRL, AMSAT-DL,
TAPR, Packrats, NJQRP, QRP ARCI, QCWA, FRC. ARRL SDR WG Chair
“You see, wire telegraph is a kind of a very, very long cat.
You pull his tail in New York and his head is meowing in Los
Angeles. Do you understand this? And radio operates exactly
the same way: you send signals here, they receive them there.
The only difference is that there is no cat.” - Einstein

The Nividia GPU’s have fft and blas running on them. They are doing
teraflops and the tools/SDK are available under NDA. They do indeed
have multiply , accumulate, etc.

We are going to have GnuRadio in good enough shape to “take over the
world” when these high speed SDR engines (gaming tools that need DSP
hardware) become widely and openly available and as consumer product
based, they should be inexpensive. In my personal opinion, the SDR part
is now relegated to the “mundane” and the most interesting work is now
in AI/Cognitive work, classification, serious mesh networking, etc.
that becomes more easily attainable with the high bandwidth channel back
to the “front end” which is the completed SDR toolbox operating on these
fancy DSP enabled processors with their huge bandwidth IO pipes.

The needs of the single-person-shooter games to render video on the fly
rather than play through rastered and stored video has certainly been
our friend. The “good old days” ARE NOW.

Bob

Marcus L. wrote:

needed, and the provided code examples aren’t too much different from


Discuss-gnuradio mailing list
[email protected]
Discuss-gnuradio Info Page


AMSAT Director and VP Engineering. Member: ARRL, AMSAT-DL,
TAPR, Packrats, NJQRP, QRP ARCI, QCWA, FRC. ARRL SDR WG Chair
“You see, wire telegraph is a kind of a very, very long cat.
You pull his tail in New York and his head is meowing in Los
Angeles. Do you understand this? And radio operates exactly
the same way: you send signals here, they receive them there.
The only difference is that there is no cat.” - Einstein

On Thursday 16 November 2006 16:22, Robert McGwier wrote:

The Nividia GPU’s have fft and blas running on them. They are doing
teraflops and the tools/SDK are available under NDA. They do indeed
have multiply , accumulate, etc.

ATI/AMD have something similar…
http://ati.amd.com/companyinfo/researcher/resources.html

(Not that I have used either)

Thomas S. wrote:

And somebody even did the implementation of the fft on a GPU for
gnuradio already:
www.ecsl.cs.sunysb.edu/fir/fir.ps

Looks kind of old, but very interesting…

If you get a copy of the CellSDK iso [1], extract the cell-sdk-1.1 rpm,
all source in there (including ‘src/lib/fft/*’) is under the Common
Public License 1.0, which is approved by OSI [2]. The fft code will do
1d and 2d…

The only thing which concerns me is that none of the source files in the
SDK refer to the license file, they simply say “All rights reserved”.
Which doesn’t exactly give me a warm fuzzy. However, the CPL-1.0 file
is in the ‘license/’ directory just under the root of the SDK tree. It
should be reasonable to assume that all files in the SDK fall under that
license, but ianal.

Legal nitpicks aside, the SPE will apparently take in an array of four
single-precision floats (128bits total in one register), and multiply or
add it to another array of four floats, all in one instruction. This
would massively reduce the number of instructions for, say, multiplying
two 4x4 matrices together. Then toss in the fact that there are 7
SPE’s, each moving at 3.2GHz, each with 256K of DMA accessible local
memory. Sorry, I’m drooling. :slight_smile:

Jason.

[1] http://www-128.ibm.com/developerworks/power/cell/downloads.html
[2] http://www.opensource.org/licenses/cpl1.0.php

Robert McGwier wrote:

Jason wrote:
[snip]

toolchain is modified to compile special threads (different exec type)
for the RISC processors (called SPEs). threads are written in C
against a provided library.

My thought for GnuRadio was to make near realtime mod/demod of ATSC a
possibility. As well as speeding up most other stream processing.

[snip]

Yes. For a work project my group is purchasing two Mercury 1U Cell
servers. This will make ATSC and HDTV a real possibility. These
Mercury servers have a high bandwidth PCI express connection for
getting
data in and out of the world. Another fellow and I will be porting
GnuRadio to the Cell BE ont he Mercury. It is not for consumers in
this form. These things are over $10,000 a piece with PPC Linux
installed. I cannot wait until we get the PS3’s hacked to be Linux
desktops.

Already been done. [1], [2] Hell, it’s not even a hack. The PS3
supports installing Yellow Dog out of the box. So your CellBE mods to
GnuRadio will be of interest as soon as the Christmas rush is over. :slight_smile:

Jason.

[1] Power Developer - Platform Support - Linux support for Sony PlayStation® 3
[2] http://ps3.ign.com/articles/739/739688p1.html

On 11/16/06, Jason [email protected] wrote:

Legal nitpicks aside, the SPE will apparently take in an array of four
single-precision floats (128bits total in one register), and multiply or
add it to another array of four floats, all in one instruction. This
would massively reduce the number of instructions for, say, multiplying
two 4x4 matrices together. Then toss in the fact that there are 7
SPE’s, each moving at 3.2GHz, each with 256K of DMA accessible local
memory. Sorry, I’m drooling. :slight_smile:

From my understanding, each SPE has 128 registers each with a width of
128 bits. Your program and data that you are currently working on
needs to all fit within the local store memory of 256k - so you need
to be very aware of where your data is and where it is going. You can
then have one SPE dedicated to streaming FFT’s to another SPE which is
doing some kind of correlation/filtering then followed by whatever
else you want the other SPE’s to do. It should also be noted that DMA
from one SPE to another is not all equal - some combinations work
faster than others, but if I remember correctly, you can’t choose
which SPE you assign programs to - it just chooses one given the
interface from IBM.

The IBM dev kit has a complete simulator in it that should be able to
do everything you might want to do with a Cell, just without the great
performance.

Brian

On 11/15/06, Daniel O’Connor [email protected] wrote:

On Thursday 16 November 2006 16:22, Robert McGwier wrote:

The Nividia GPU’s have fft and blas running on them. They are doing
teraflops and the tools/SDK are available under NDA. They do indeed
have multiply , accumulate, etc.

ATI/AMD have something similar…
http://ati.amd.com/companyinfo/researcher/resources.html

(Not that I have used either)

And somebody even did the implementation of the fft on a GPU for
gnuradio already:
www.ecsl.cs.sunysb.edu/fir/fir.ps

Looks kind of old, but very interesting…

Thomas

Newell J. wrote:

As I am new to all this please correct me if I am wrong as I am trying
to get a better understanding of everything. I am assuming that even
though you are able run linux on the PS3 that you will not be able to
use gnuradio as it stands because of the different processor
architechture, right? That is, the code in gnuradio wouldn’t take
advantage of the 7 SPEs etc.? With reading over what I have, it seems
like using the Cell BE would be a great thing and the future looks
bright for this. Thanks

Yes and no. From casual observation (I don’t have a ppc platform
currently), all of gnuradio’s dependencies already run on most linux ppc
and ppc64 distributions. So getting it up and running shouldn’t be too
difficult.

The fun part will be modding GnuRadio and its dependencies to take
advantage of the SPEs. So, even though there is a libfft (for example)
running on ppc/ppc64, it doesn’t take advantage of the SPEs. To do so
is a bit more involved than just swapping out some math
instructions/functions. It involves, from preliminary review, wrapping
the chunk of code you want on the SPE in an spu_thread, shipping data in
and out via DMA, minimizing ooo (out of order, ie branchy) code, then
reworking the algorithm to take advantage of the SIMD (single
instruction, multiple data) instruction set. Once that’s done,
recompile it with IBM’s modified GNU toolchain, and watch it crash. :slight_smile:

hth,

Jason.

As I am new to all this please correct me if I am wrong as I am trying
to
get a better understanding of everything. I am assuming that even
though
you are able run linux on the PS3 that you will not be able to use
gnuradio
as it stands because of the different processor architechture, right?
That
is, the code in gnuradio wouldn’t take advantage of the 7 SPEs etc.?
With
reading over what I have, it seems like using the Cell BE would be a
great
thing and the future looks bright for this. Thanks

Newell


MSN Shopping has everything on your holiday list. Get expert picks by
style,
age, and price. Try it!
http://shopping.msn.com/content/shp/?ctId=8000,ptnrid=176,ptnrdata=200601&tcode=wlmtagline

On Friday 17 November 2006 09:42, Jason wrote:

instructions/functions. It involves, from preliminary review, wrapping
the chunk of code you want on the SPE in an spu_thread, shipping data in
and out via DMA, minimizing ooo (out of order, ie branchy) code, then
reworking the algorithm to take advantage of the SIMD (single
instruction, multiple data) instruction set. Once that’s done,
recompile it with IBM’s modified GNU toolchain, and watch it crash. :slight_smile:

I wouldn’t be surprised if a patch for fftw turns up pretty quickly… I
imagine the code in there is already fairly branch free and organised
for
SIMD operation because of the existing MMX/SSE optimisations.

On Thu, Nov 16, 2006 at 10:10:23PM +0000, Newell J. wrote:

As I am new to all this please correct me if I am wrong as I am trying to
get a better understanding of everything. I am assuming that even though
you are able run linux on the PS3 that you will not be able to use gnuradio
as it stands because of the different processor architechture, right? That
is, the code in gnuradio wouldn’t take advantage of the 7 SPEs etc.? With
reading over what I have, it seems like using the Cell BE would be a great
thing and the future looks bright for this. Thanks

Newell

We’d have to compile the blocks for the SPE’s.
Hopefully the compiler does a decent job of extracting parallelism
from the code without too much help. Of course we could hand tune the
filter kernels like we do today.

The scheduler and behind-the-scenes block interconnect/buffering would
need to be modified, but that’s pretty well abstracted away from the
user’s view of the world.

We’d need to build some tools that would allow us to measure
performance on the SPE and allow us to do feedback based assignment of
subsets of blocks in a graph to the SPEs. I think this would be an
iterative process. That is, partition the graph across the SPEs. Run
your test case. Measure. Repeat.

Sounds like fun.

And yes, I think we could get the HDTV receiver running in real time :wink:

Eric

performance on the SPE and allow us to do feedback based assignment of
subsets of blocks in a graph to the SPEs. I think this would be an
iterative process. That is, partition the graph across the SPEs. Run
your test case. Measure. Repeat.

Sounds like fun.

And yes, I think we could get the HDTV receiver running in real time :wink:

Eric

I am looking to build a new computer (or buy) for graduate school and I
would like my platform to be able to have real-time image-processing
capabilities. Do you know of any other platforms where that is possible
right now? Also, what do you think of the possibility of developing
another
peripheal that would act like a Cell BE board that would be used in
conjunction with the USRP?

Newell


View Athlete’s Collections with Live Search
http://sportmaps.live.com/index.html?source=hmemailtaglinenov06&FORM=MGAC01

Newell,

You might take a look at http://www.gpgpu.org/

From their web site:

GPGPU stands for General-Purpose computation on GPUs. With the
increasing programmability of commodity graphics processing units
(GPUs), these chips are capable of performing more than the specific
graphics computations for which they were designed. They are now
capable coprocessors, and their high speed makes them useful for a
variety of applications. The goal of this page is to catalog the
current and historical use of GPUs for general-purpose computation.

There’s a fairly large amount of activity going on in this area, and
there’s some open source code to look at, e.g.,

At a minimum, you’d think the GPU on a PC graphics card could serve as
a “smart sink” capable of doing its own histograms, FFT’s, waterfalls,
etc. for display purposes. Maybe more…

Steve