Beagle board update

I have some NEON code in the fff dotproduct routine, the qa code passes:

root@beagleboard:/home/balister/oe/tmp/work/armv7a-angstrom-linux-gnueabi/gnuradio-3.1.3+svnr9809-r4.1/trunk/gnuradio-core/src/tests#
./test_filter
. [generic] [cortex_a8]
. [generic] [cortex_a8]
. [generic]
. [generic]
. [generic]
. [generic]
.>>> gr_fir_fff: using cortex_a8

OK (9 tests)

root@beagleboard:/home/balister/oe/tmp/work/armv7a-angstrom-linux-gnueabi/gnuradio-3.1.3+svnr9809-r4.1/trunk/gnuradio-core/src/tests#
./benchmark_dotprod_fff
generic: taps: 256 input: 4e+07 cpu: 968.586 taps/sec:
1.057e+07
cortex_a8: taps: 256 input: 4e+07 cpu: 45.703 taps/sec:
2.241e+08

Philip

On Thu, Oct 23, 2008 at 09:38:26PM -0700, Philip B. wrote:

.>>> gr_fir_fff: using cortex_a8

OK (9 tests)

root@beagleboard:/home/balister/oe/tmp/work/armv7a-angstrom-linux-gnueabi/gnuradio-3.1.3+svnr9809-r4.1/trunk/gnuradio-core/src/tests#
./benchmark_dotprod_fff
generic: taps: 256 input: 4e+07 cpu: 968.586 taps/sec: 1.057e+07
cortex_a8: taps: 256 input: 4e+07 cpu: 45.703 taps/sec: 2.241e+08

Philip

Cool!

The good news / bad news is that the spread is worse than on the P4!

Is there a way to get the compiler to use the NEON instruction set in
scalar mode? E.g., something like -mfpmath=sse on x86? Maybe -mfp=vfp?
Are you providing the -mcpu=cortex-a8 gcc option?

Eric

On Fri, Oct 24, 2008 at 6:27 AM, Eric B. [email protected] wrote:

. [generic]
Philip

Cool!

The good news / bad news is that the spread is worse than on the P4!

Is there a way to get the compiler to use the NEON instruction set in
scalar mode? E.g., something like -mfpmath=sse on x86? Maybe -mfp=vfp?
Are you providing the -mcpu=cortex-a8 gcc option?

The Cortex-A8 numbers use assembler to unroll the inner loop 8 times.
I think this code can get better. I’ll have to double check the flags,
but I do not think gcc does a good job generating code for the
vfp/NEON unit. (We are happy gcc can generate anything supporting NEON
and not crash …)

Remember, this is clocked at 600 MHz and consumes about 1 Watt.

Philip

On Fri, Oct 24, 2008 at 08:19:49AM -0700, Philip B. wrote:

On Fri, Oct 24, 2008 at 6:27 AM, Eric B. [email protected] wrote:

Remember, this is clocked at 600 MHz and consumes about 1 Watt.
Understood. I’m trying to keep you out of the assembly business. The
fact that your assembly code is 20 time faster is scary. That’s why I
was asking about compiler flags. I suspect that you’re not telling
gcc enough about the machine.

Eric