Oprofile from a flow graph running on an OMAP3


#1

From gnuradio running on the Beagle. The flow graph is the one I
posted earlier, except fed real data.

Anyone know what the std::vector<float, std::allocator

::operator[](unsigned int) call does? I guess it is time to hack
some NEON into the generic FIR fiilter code.

Philip

root@beagleboard:~# opreport -l --threshold=1
CPU: ARM V7 PMNC, speed 0 MHz (estimated)
Counted CPU_CYCLES events (Number of CPU cycles) with a unit mask of
0x00 (No unit mask) count 100000
samples % app name symbol name
244 32.2751 libgnuradio-core.so.0.0.0
gr_fir_fff_generic::filter(float const*)
167 22.0899 libgnuradio-core.so.0.0.0 std::vector<float,
std::allocator >::operator[](unsigned int)
80 10.5820 vmlinux-2.6.27-rc7-omap1 generic_interrupt
46 6.0847 libgnuradio-core.so.0.0.0 gr_fast_atan2f(float, float)
39 5.1587 libgnuradio-core.so.0.0.0 .plt
36 4.7619 vmlinux-2.6.27-rc7-omap1 schedule
13 1.7196 vmlinux-2.6.27-rc7-omap1 handle_IRQ_event
11 1.4550 vmlinux-2.6.27-rc7-omap1 vfp_notifier
10 1.3228 libgcc_s.so.1 __mulsc3
10 1.3228 vmlinux-2.6.27-rc7-omap1 thumbee_notifier
8 1.0582 vmlinux-2.6.27-rc7-omap1 mmc_omap_start_command


#2

On Sat, 2008-10-11 at 14:38 -0400, Philip B. wrote:

Anyone know what the std::vector<float, std::allocator

::operator[](unsigned int) call does?

That is the array deference operator for the an STL vector of floats:

v = std::vector(…)
int pos = …
float f = v[pos]

This last results in the call to the function you quoted.

I’m not an ARM architecture expert, but I would suspect unaligned memory
access is resulting in the enormous amount of time spent here.


#3

On Sat, Oct 11, 2008 at 02:38:28PM -0400, Philip B. wrote:

From gnuradio running on the Beagle. The flow graph is the one I
posted earlier, except fed real data.

Anyone know what the std::vector<float, std::allocator

::operator[](unsigned int) call does? I guess it is time to hack
some NEON into the generic FIR fiilter code.

I think that’s just a sampling artifact of the inlined foo[x].
I’m pretty sure it’s spending time 55% of the time in
gr_fir_fff_generic::filter.

You may want to try gr.fft_filter_fff instead.

Eric


#4

On Sat, Oct 11, 2008 at 11:22 PM, Eric B. removed_email_address@domain.invalid wrote:

gr_fir_fff_generic::filter.

You may want to try gr.fft_filter_fff instead.

Sort of related, I have GNU radio detecting it is being built for NEON
capable ARM and building the generic and NEON filters. (oddly enough
the cheesy no-unrolling c I use for NEON beats the code in generic by
a factor of 3 or so) Is there a clever way to get the gcc .s files
built?

Philip