How to set up MP-Benchmark Test PowerPC Processors (using Altivec)

matty · April 30, 2010, 9:12am

I want to set up several Benchmark Tests for PowerPC Processors using
Altivec.
The MP-Scheduler Benchmark works for Core2 Duo and on PS3 (without
Altivec)
fine.

But how can i set up the Benchmark for PowerPC Processors (e.g. Cell BE)
using the Altivec Extensions?
There are no options given to use Altivec, or is there a special
benchmark-code?

Best regards
Matthias

matty · April 30, 2010, 7:32pm

On Fri, Apr 30, 2010 at 09:08:59AM +0200, matty wrote:

I want to set up several Benchmark Tests for PowerPC Processors using
Altivec.
The MP-Scheduler Benchmark works for Core2 Duo and on PS3 (without Altivec)
fine.

But how can i set up the Benchmark for PowerPC Processors (e.g. Cell BE)
using the Altivec Extensions?
There are no options given to use Altivec, or is there a special
benchmark-code?

I’m pretty sure that it uses Altivec by default on PPC. (The only
hand coded Altivec code is in the gr_fir_fff filter.)

Do your results look like this:

http://gnuradio.org/images/perf-data-images/ps3-altivec.png

or like this:

http://gnuradio.org/images/perf-data-images/ps3.png

The PPC on the Cell is seriously lame, as you have no doubt found out
by now…

Eric

The relevant wiki page (needs some formatting cleanup)
http://gnuradio.org/redmine/wiki/gnuradio/MPSchedulerPerformance

matty · May 3, 2010, 2:22pm

Hi,

here is my mp-benchmark of PS3! Looks a little bit deformed at pipe 6.
I’m
running at the moment another test, to check out.
OK, i think this is with altivec. But now, how can i perform this test
without altivec like here:
http://gnuradio.org/images/perf-data-images/ps3.png

Thanks in advance for help!

Best Regards
Matty

2010/5/1 matty [email protected]

matty · May 1, 2010, 1:06pm

Thanks,

i will test it monday on PS3 an will tell my results!

Regards
Matty

2010/4/30 Eric B. [email protected]

matty · May 3, 2010, 8:39pm

Best thanks,

i’m running Fedora 11 with XFCE4! I will run another test in text mode.
My second benchmark was essentially better.

Is there any usable benchmark-code for gcell, where the SPEs are
working?

Matty

2010/5/3 Eric B. [email protected]

matty · May 3, 2010, 8:46pm

On Mon, May 03, 2010 at 07:16:18PM +0200, matty wrote:

Best thanks,

i’m running Fedora 11 with XFCE4! I will run another test in text mode.
My second benchmark was essentially better.

Is there any usable benchmark-code for gcell, where the SPEs are working?

There are benchmarks that test the infrastructure and overhead.
Those are benchmark_nop and benchmark_dma.

You’re on your own for the runtime of a given offloaded portion,
but it’s easy to measure directly on the SPE using the decrementer.

Eric

matty · May 5, 2010, 11:07pm

Is it generally possible to use qa_fft.py or the CGRAN gcellized FFT as
offloaded portion,
because i want to analyse the benefits of the SPEs on some GNU Radio
code.
Therefore a benchmark result like this (
http://gnuradio.org/redmine/attachments/104/R-10231-ps3-20090115-0226.png)
would be convincing, because you can see the linear dependence between
speedup and # of spe’s.

Best Regards
Matty

2010/5/3 Eric B. [email protected]

matty · May 3, 2010, 4:34pm

On Mon, May 03, 2010 at 02:10:07PM +0200, matty wrote:

Hi,

here is my mp-benchmark of PS3! Looks a little bit deformed at pipe 6. I’m
running at the moment another test, to check out.

Do you have anything else running? Window manager?
I’ve got mine configured to runlevel 3, no X, then ssh into it.
Saves a bunch of resources.

OK, i think this is with altivec.

Yep, that’s with Altivec.

But now, how can i perform this test without altivec like here:
http://gnuradio.org/images/perf-data-images/ps3.png

I think this will kill the Alitvec code:

./configure --with-md-cpu=generic
(cd gnuradio-core/src/lib/filter; make clean)
make && make install

Eric

matty · May 7, 2010, 5:14am

On 05/06/2010 11:01 PM, Eric B. wrote:

The cgran version shows the speedup pretty well, but you need to be
using big FFTs to see the win, where big >= 4096 points.

I laugh heartily at your puny FFTs of only 4096 points

I regularly do FFTs with 1Hz resolution over bandwidths of several MHz
with Gnu Radio.
They run in “real time” on a reasonably-snappy “normal” CPU. But I
guess I could be
doing them on a GPU at some point

Something that requires big crunchies in my application space is
coherent de-dispersion, which requires
the construction of a (usually largish) complex FFT filter. Seems
that might benefit from
GPU speedup. Right now, I have to decide whether I want an RFI filter
(an FFT notch filter
basically, and it doesn’t have to be all that long), or a
de-dispersion filter (generally much
longer). But oh boy, can I have both, please?

Actually, I’m probably going to start playing with a Phenom II X6 1090T
some time this summer, and
I may get back enough crunchies to do both coherent de-dispersion and
RFI notch filtering in
real time. Yay!

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

matty · May 7, 2010, 5:03am

On Wed, May 05, 2010 at 06:50:36PM +0200, matty wrote:

Is it generally possible to use qa_fft.py or the CGRAN gcellized FFT as
offloaded portion,
because i want to analyse the benefits of the SPEs on some GNU Radio code.
Therefore a benchmark result like this (
http://gnuradio.org/redmine/attachments/104/R-10231-ps3-20090115-0226.png)
would be convincing, because you can see the linear dependence between
speedup and # of spe’s.

Best Regards
Matty

The cgran version shows the speedup pretty well, but you need to be
using big FFTs to see the win, where big >= 4096 points.

Eric