Profiling GNU Radio flow graphs

Hi all,

I am currently trying to optimize the performance of my DRM transmitter
and for this purpose I want to profile my flow graphs.

After some googling and searching the archives I stumbled upon oprofile
which looks quite nice to me. However, a first try did not really
provide
very significant results. But that could also be due to
misconfiguration,
I did not read the manual very carefully…

Just wanted to know if there are other/better solutions for profiling
you
would recommend. Any comments are appreciated!

Best regards,
Felix

Hi Felix,

Currently I also need to profile and optimize my system. Now I just add
some some sentences to print the processing time of each block, but this
is
definitely not a good method. Could you describe your profiling method
in
more details, perhaps your results can be a reference for me.

Hi Tim,

Have you tried this Kcachegrind on GNU Radio flow graphs? Does it works
well? I want to see the profile of each gr module (in python).


Yang, Qing
Information Engineering, CUHK

2012/8/24 [email protected]

Hi,

using oprofile is quite easy. Basically you configure your profiler,
start it, start your application, kill it after some time, kill the
profiler and look at the results. You don’t have to set any special
compiler flags. However, if you want to get annotated source, you need
to compile with debug symbols. All this can be done in a simple shell
script what makes it very convenient.

For more details I’d like to refer you to the oprofile manual [1] as
it’s good to read and not too extensive.

In my case, most CPU time is used by memmove (about 40%!).
Unfortunately, I wasn’t able to figure out where it gets called from.

Best regards,
Felix

[1] OProfile manual

Am 24.08.2012 08:28, schrieb Qing Y.:

kcachegrind is also a great tool to look at
http://kcachegrind.sourceforge.net/html/Shot3Large.html

Hi there,

Could anyone give me a concrete example on how to profile the gnuradio
code
in Python?
My PC is Linux yangqing-825-PC03 2.6.35-32-generic-pae #67-Ubuntu SMP
Mon
Mar 5 21:23:19 UTC 2012 i686 GNU/Linux. I use Ubuntu 10.10 and Xeon
W3530.

I can use Kcachegrind to profile code written in C++. But when I profile
python code(e.g., dial_tone.py), there is no profile data output.

yangqing@yangqing-825-PC03:~/Public$ valgrind --tool=callgrind
./dial_tone.py
==30385== Callgrind, a call-graph generating cache profiler
==30385== Copyright (C) 2002-2010, and GNU GPL’d, by Josef Weidendorfer
et
al.
==30385== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
copyright info
==30385== Command: ./dial_tone.py
==30385==
==30385== For interactive control, run ‘callgrind_control -h’.
yangqing@yangqing-825-PC03:~/Public$ ls -l
total 12
-rw------- 1 yangqing yangqing 0 2012-08-27 16:29
callgrind.out.30385
** the size of profile data is 0? **
-rwxr-xr-x 1 yangqing yangqing 2006 2012-07-05 16:37 dial_tone.py
-rwxr-xr-x 1 yangqing yangqing 249 2012-08-27 16:22 mainloop.py
drwxr-xr-x 25 yangqing yangqing 4096 2012-08-27 00:23 oprofile

Then I try Oprofile, but also failed :frowning:

yangqing@yangqing-825-PC03:~/Public$ sudo opcontrol --init
yangqing@yangqing-825-PC03:~/Public$ sudo opcontrol --setup --no-vmlinux
yangqing@yangqing-825-PC03:~/Public$ sudo opcontrol --start
ATTENTION: Use of opcontrol is discouraged. Please see the man page for
operf.
Using default event: CPU_CLK_UNHALTED:100000:0:1:1
Using 2.6+ OProfile kernel interface.
Using log file /var/lib/oprofile/samples/oprofiled.log
Daemon started.
Profiler running.
yangqing@yangqing-825-PC03:~/Public$ sudo opcontrol --reset
Signalling daemon… done
yangqing@yangqing-825-PC03:~/Public$ ./dial_tone.py
^Cyangqing@yangqing-825-PC03:~/Public$ sudo opcontrol --dump
yangqing@yangqing-825-PC03:~/Public$ sudo opcontrol --shutdown
Stopping profiling.
Killing daemon.
yangqing@yangqing-825-PC03:~/Public$ opreport -l dial_tone.py
Using /var/lib/oprofile/samples/ for samples directory.
error: no sample files found: profile specification too strict ? **
can’t
find the profile data? **

and I try
yangqing@yangqing-825-PC03:~/Public$ opreport -l|less
CPU: Intel Core/i7, speed 2.794e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
unit
mask of 0x00 (No unit mask) count 100000
samples % image name app name
symbol
name
97293 44.4076 no-vmlinux no-vmlinux
/no-vmlinux
43324 19.7744 nvidia_drv.so nvidia_drv.so
/usr/lib/nvidia-173/xorg/nvidia_drv.so
23189 10.5842 chromium-browser chromium-browser
/usr/lib/chromium-browser/chromium-browser
8264 3.7719 libpixman-1.so.0.18.4 libpixman-1.so.0.18.4
/usr/lib/libpixman-1.so.0.18.4
5975 2.7272 libglib-2.0.so.0.2600.1 libglib-2.0.so.0.2600.1
/lib/libglib-2.0.so.0.2600.1
4409 2.0124 libgobject-2.0.so.0.2600.1 libgobject-2.0.so.0.2600.1
/usr/lib/libgobject-2.0.so.0.2600.1
3811 1.7395 libcairo.so.2.11000.0 libcairo.so.2.11000.0
/usr/lib/libcairo.so.2.11000.0
3001 1.3698 libpangoft2-1.0.so.0.2800.2
libpangoft2-1.0.so.0.2800.2
/usr/lib/libpangoft2-1.0.so.0.2800.2
2714 1.2388 python2.6 python2.6
/usr/bin/python2.6
2083 0.9507 libdbus-1.so.3.5.2 libdbus-1.so.3.5.2
/lib/libdbus-1.so.3.5.2
2003 0.9142 libwfb.so libwfb.so
/usr/lib/xorg/modules/libwfb.so
1969 0.8987 Xorg Xorg
/usr/bin/Xorg
1894 0.8645 libgtk-x11-2.0.so.0.2200.0 libgtk-x11-2.0.so.0.2200.0
/usr/lib/libgtk-x11-2.0.so.0.2200.0
1732 0.7905 libgnuradio-core-3.5.2git.so.0.0.0
libgnuradio-core-3.5.2git.so.0.0.0 gr_sig_source_f::work(int,
std::vector<void const*, std::allocat
or<void const*> >&, std::vector<void*, std::allocator<void*> >&)
1578 0.7202 libgdk-x11-2.0.so.0.2200.0 libgdk-x11-2.0.so.0.2200.0
/usr/lib/libgdk-x11-2.0.so.0.2200.0
1378 0.6290 libpango-1.0.so.0.2800.2 libpango-1.0.so.0.2800.2
/usr/lib/libpango-1.0.so.0.2800.2
962 0.4391 [vdso] (tgid:1237 range:0xb77ec000-0xb77ed000) Xorg
[vdso] (tgid:1237 range:0xb77ec000-0xb77ed000)
802 0.3661 libpthread-2.12.1.so libpthread-2.12.1.so
pthread_mutex_lock
690 0.3149 libc-2.12.1.so libc-2.12.1.so
__memcpy_ssse3_rep
640 0.2921 libpthread-2.12.1.so libpthread-2.12.1.so
__pthread_mutex_unlock_usercnt
459 0.2095 libX11.so.6.3.0 libX11.so.6.3.0
/usr/lib/libX11.so.6.3.0
444 0.2027 libQtGui.so.4.7.0 libQtGui.so.4.7.0
/usr/lib/libQtGui.so.4.7.0
409 0.1867 libc-2.12.1.so libc-2.12.1.so
_int_malloc
354 0.1616 librt-2.12.1.so librt-2.12.1.so
clock_gettime
341 0.1556 libc-2.12.1.so libc-2.12.1.so
__memset_sse2_rep
337 0.1538 libgnuradio-audio-3.5.2git.so.0.0.0
libgnuradio-audio-3.5.2git.so.0.0.0 audio_alsa_sink::work_s32(int,
std::vector<void const*, std::allocator<void const*> >&,
std::vector<void*,
std::allocator<void*> >&)
332 0.1515 anon (tgid:25090 range:0x4fc0a000-0x4fcff000)
chromium-browser anon (tgid:25090 range:0x4fc0a000-0x4fcff000)
322 0.1470 libc-2.12.1.so libc-2.12.1.so
__strcmp_sse4_2
237 0.1082 oprofiled oprofiled
odb_update_node_with_offset
230 0.1050 libpangocairo-1.0.so.0.2800.2
libpangocairo-1.0.so.0.2800.2 /usr/lib/libpangocairo-1.0.so.0.2800.2
224 0.1022 libpthread-2.12.1.so libpthread-2.12.1.so
pthread_getspecific
223 0.1018 libstdc++.so.6.0.14 libstdc++.so.6.0.14
/usr/lib/libstdc++.so.6.0.14
214 0.0977 libxcb.so.1.1.0 libxcb.so.1.1.0
/usr/lib/libxcb.so.1.1.0
210 0.0959 metacity metacity
/usr/bin/metacity
204 0.0931 libc-2.12.1.so libc-2.12.1.so
fgetc
180 0.0822 ld-2.12.1.so ld-2.12.1.so
do_lookup_x
168 0.0767 libc-2.12.1.so libc-2.12.1.so
__i686.get_pc_thunk.bx
165 0.0753 libc-2.12.1.so libc-2.12.1.so
_IO_vfscanf
163 0.0744 ibus-daemon ibus-daemon
/usr/bin/ibus-daemon
… …

I can’t find my thread of dial_tone.py. I guess I use Oprofile in the
wrong
way, could you give me some tips?

Sincerely,

Yang, Qing
Information Engineering, CUHK

2012/8/24 Felix W. [email protected]

Hi Felix,

we have some notes on code profiling here:
http://gnss-sdr.org/documentation/how-profile-code

We use the tools described there in a C+±only flowgraph, but I hope
some of them will also work for you.

Best regards,
Carles