Profile gr python code using Oprofile and Kcachegrind

Hi everyone,

Where can I find a concrete example on how to profile the gnuradio code
in
Python?

My PC is Linux 2.6.35-32-generic-pae #67-Ubuntu SMP Mon Mar 5 21:23:19
UTC
2012 i686 GNU/Linux. I use Ubuntu 10.10 and Xeon W3530.

I can use Kcachegrind to profile code written in C++. But when I profile
python code(e.g., dial_tone.py), there is no profile data output.

address@hidden:~/Public$ valgrind --tool=callgrind ./dial_tone.py
==30385== Callgrind, a call-graph generating cache profiler
==30385== Copyright © 2002-2010, and GNU GPL’d, by Josef Weidendorfer
et
al.
==30385== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
copyright info
==30385== Command: ./dial_tone.py
==30385==
==30385== For interactive control, run ‘callgrind_control -h’.
address@hidden:~/Public$ ls -l
total 12
-rw------- 1 yangqing yangqing 0 2012-08-27 16:29
callgrind.out.30385
** the size of profile data is 0? **
-rwxr-xr-x 1 yangqing yangqing 2006 2012-07-05 16:37 dial_tone.py
-rwxr-xr-x 1 yangqing yangqing 249 2012-08-27 16:22 mainloop.py
drwxr-xr-x 25 yangqing yangqing 4096 2012-08-27 00:23 oprofile

Then I try Oprofile, but also failed :frowning:

address@hidden:~/Public$ sudo opcontrol --init
address@hidden:~/Public$ sudo opcontrol --setup --no-vmlinux
address@hidden:~/Public$ sudo opcontrol --start
ATTENTION: Use of opcontrol is discouraged. Please see the man page for
operf.
Using default event: CPU_CLK_UNHALTED:100000:0:1:1
Using 2.6+ OProfile kernel interface.
Using log file /var/lib/oprofile/samples/oprofiled.log
Daemon started.
Profiler running.
address@hidden:~/Public$ sudo opcontrol --reset
Signalling daemon… done
address@hidden:~/Public$ ./dial_tone.py
address@hidden:~/Public$ sudo opcontrol --dump
address@hidden:~/Public$ sudo opcontrol --shutdown
Stopping profiling.
Killing daemon.
address@hidden:~/Public$ opreport -l dial_tone.py
Using /var/lib/oprofile/samples/ for samples directory.
error: no sample files found: profile specification too strict ? **
can’t
find the profile data? **

and I try
address@hidden:~/Public$ opreport -l|less
CPU: Intel Core/i7, speed 2.794e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a
unit
mask of 0x00 (No unit mask) count 100000
samples % image name app name
symbol
name
97293 44.4076 no-vmlinux no-vmlinux
/no-vmlinux
43324 19.7744 nvidia_drv.so nvidia_drv.so
/usr/lib/nvidia-173/xorg/nvidia_drv.so
23189 10.5842 chromium-browser chromium-browser
/usr/lib/chromium-browser/chromium-browser
8264 3.7719 libpixman-1.so.0.18.4 libpixman-1.so.0.18.4
/usr/lib/libpixman-1.so.0.18.4
5975 2.7272 libglib-2.0.so.0.2600.1 libglib-2.0.so.0.2600.1
/lib/libglib-2.0.so.0.2600.1
4409 2.0124 libgobject-2.0.so.0.2600.1 libgobject-2.0.so.0.2600.1
/usr/lib/libgobject-2.0.so.0.2600.1
3811 1.7395 libcairo.so.2.11000.0 libcairo.so.2.11000.0
/usr/lib/libcairo.so.2.11000.0
3001 1.3698 libpangoft2-1.0.so.0.2800.2
libpangoft2-1.0.so.0.2800.2
/usr/lib/libpangoft2-1.0.so.0.2800.2
2714 1.2388 python2.6 python2.6
/usr/bin/python2.6
2083 0.9507 libdbus-1.so.3.5.2 libdbus-1.so.3.5.2
/lib/libdbus-1.so.3.5.2
2003 0.9142 libwfb.so libwfb.so
/usr/lib/xorg/modules/libwfb.so
1969 0.8987 Xorg Xorg
/usr/bin/Xorg
1894 0.8645 libgtk-x11-2.0.so.0.2200.0 libgtk-x11-2.0.so.0.2200.0
/usr/lib/libgtk-x11-2.0.so.0.2200.0
1732 0.7905 libgnuradio-core-3.5.2git.so.0.0.0
libgnuradio-core-3.5.2git.so.0.0.0 gr_sig_source_f::work(int,
std::vector<void const*, std::allocat
or<void const*> >&, std::vector<void*, std::allocator<void*> >&)
1578 0.7202 libgdk-x11-2.0.so.0.2200.0 libgdk-x11-2.0.so.0.2200.0
/usr/lib/libgdk-x11-2.0.so.0.2200.0
1378 0.6290 libpango-1.0.so.0.2800.2 libpango-1.0.so.0.2800.2
/usr/lib/libpango-1.0.so.0.2800.2
962 0.4391 [vdso] (tgid:1237 range:0xb77ec000-0xb77ed000) Xorg
[vdso] (tgid:1237 range:0xb77ec000-0xb77ed000)
802 0.3661 libpthread-2.12.1.so libpthread-2.12.1.so
pthread_mutex_lock
690 0.3149 libc-2.12.1.so libc-2.12.1.so
__memcpy_ssse3_rep
640 0.2921 libpthread-2.12.1.so libpthread-2.12.1.so
__pthread_mutex_unlock_usercnt
459 0.2095 libX11.so.6.3.0 libX11.so.6.3.0
/usr/lib/libX11.so.6.3.0
444 0.2027 libQtGui.so.4.7.0 libQtGui.so.4.7.0
/usr/lib/libQtGui.so.4.7.0
409 0.1867 libc-2.12.1.so libc-2.12.1.so
_int_malloc
354 0.1616 librt-2.12.1.so librt-2.12.1.so
clock_gettime
341 0.1556 libc-2.12.1.so libc-2.12.1.so
__memset_sse2_rep
337 0.1538 libgnuradio-audio-3.5.2git.so.0.0.0
libgnuradio-audio-3.5.2git.so.0.0.0 audio_alsa_sink::work_s32(int,
std::vector<void const*, std::allocator<void const*> >&,
std::vector<void*,
std::allocator<void*> >&)
332 0.1515 anon (tgid:25090 range:0x4fc0a000-0x4fcff000)
chromium-browser anon (tgid:25090 range:0x4fc0a000-0x4fcff000)
322 0.1470 libc-2.12.1.so libc-2.12.1.so
__strcmp_sse4_2
237 0.1082 oprofiled oprofiled
odb_update_node_with_offset
230 0.1050 libpangocairo-1.0.so.0.2800.2
libpangocairo-1.0.so.0.2800.2 /usr/lib/libpangocairo-1.0.so.0.2800.2
224 0.1022 libpthread-2.12.1.so libpthread-2.12.1.so
pthread_getspecific
223 0.1018 libstdc++.so.6.0.14 libstdc++.so.6.0.14
/usr/lib/libstdc++.so.6.0.14
214 0.0977 libxcb.so.1.1.0 libxcb.so.1.1.0
/usr/lib/libxcb.so.1.1.0
210 0.0959 metacity metacity
/usr/bin/metacity
204 0.0931 libc-2.12.1.so libc-2.12.1.so
fgetc
180 0.0822 ld-2.12.1.so ld-2.12.1.so
do_lookup_x
168 0.0767 libc-2.12.1.so libc-2.12.1.so
__i686.get_pc_thunk.bx
165 0.0753 libc-2.12.1.so libc-2.12.1.so
_IO_vfscanf
163 0.0744 ibus-daemon ibus-daemon
/usr/bin/ibus-daemon
… …

I can’t find my thread of dial_tone.py. I guess I use Oprofile in the
wrong
way, could you give me some tips?

Sincerely,

On Tue, Aug 28, 2012 at 4:21 AM, Qing Y. [email protected]
wrote:

address@hidden:~/Public$ valgrind --tool=callgrind ./dial_tone.py

Qing,

Try this:

address@hidden:~/Public$ valgrind --tool=callgrind python ./dial_tone.py

I’ve used callgrind with Python before, but you need to explicitly
call it through the interpreter for callgrind to catch on. I think
this works, but it’s just from memory. If it doesn’t, Google callgrind
and Python; I know the answer is out there.

Tom

Hi Tom,

Thanks for your suggestion, my Kcachegrind works well now.

But this method (explicitly call through python interpreter) doesn’t
work
for Oprofile, I still can’t find the session name “dial_tone” in the
profiled data. BTW, if Kcachegrind works well with python application,
why
do you guys recommend Oprofile?

Sincerely,

Yang, Qing
Information Engineering, CUHK

2012/8/28 Tom R. [email protected]

On 28/08/12 09:21, Qing Y. wrote:

Hi everyone,

Where can I find a concrete example on how to profile the gnuradio code
in Python?

Hi,

I’ve just read about timeit in an article, not sure if it fits your
neeeds, but just in case:

Chris

et al.
-rwxr-xr-x 1 yangqing yangqing 249 2012-08-27 16:22 mainloop.py
Using 2.6+ OProfile kernel interface.
address@hidden:~/Public$ opreport -l dial_tone.py
symbol name
4409 2.0124 libgobject-2.0.so.0.2600.1 libgobject-2.0.so.0.2600.1
/usr/lib/xorg/modules/libwfb.so
1378 0.6290 libpango-1.0.so.0.2800.2 libpango-1.0.so.0.2800.2
459 0.2095 libX11.so.6.3.0 libX11.so.6.3.0
libgnuradio-audio-3.5.2git.so.0.0.0 audio_alsa_sink::work_s32(int,
224 0.1022 libpthread-2.12.1.so http://libpthread-2.12.1.so/
http://ld-2.12.1.so/ do_lookup_x

Sincerely,

Yang, Qing
Information Engineering, CUHK

This body part will be downloaded on demand.


Christian G.,
Embedded systems engineer.
Techworks Marine
1 Harbour road
Dun Laoghaire
Co. Dublin
Ireland
Tel: + 353 (0) 1 236 5990
Web: http://www.techworks.ie/

Hi Tom,

Kcachegrind can only show the cost of C++ functions, can we get the cost
of
python functions or python modules? Because it is more interesting to
look
at the cost of each gr modules in the python code level.

Sincerely,

Yang, Qing
Information Engineering, CUHK

2012/8/28 Tom R. [email protected]

On Thu, Aug 30, 2012 at 2:24 AM, Qing Y. [email protected]
wrote:


Yang, Qing
Information Engineering, CUHK

We recommended Oprofile from a longer history of using it, and that’s
it. I’ve actually switched over to using valgrind for all of my
analysis and profiling work. In many ways, I still prefer Oprofile
because it works within the Linux kernel, but recent updates to Ubuntu
have made it much more difficult to install and use Oprofile, and I
can’t be bothered any more to worry about stuff like that. So valgrind
with callgrind works great and it’s now my tool of choice.

But we should probably update the wiki to include information about
using callgrind wit Kcachegrind.

Tom

On Thu, Aug 30, 2012 at 6:17 AM, Qing Y. [email protected]
wrote:

Hi Tom,

Kcachegrind can only show the cost of C++ functions, can we get the cost of
python functions or python modules? Because it is more interesting to look
at the cost of each gr modules in the python code level.

Sincerely,

Yang, Qing
Information Engineering, CUHK

If you’re doing performance critical stuff in Python, stop it :slight_smile:

I’ve never worried about profiling Python code since any performance
critical stuff I’ve wanted to do, I do it in C++. Python is just for
interfacing and interacting with the GNU Radio flowgraph and blocks,
and that shouldn’t need profiling.

Tom

Information Engineering, CUHK
If you’re doing performance critical stuff in Python, stop it :slight_smile:

I’ve never worried about profiling Python code since any performance
critical stuff I’ve wanted to do, I do it in C++. Python is just for
interfacing and interacting with the GNU Radio flowgraph and blocks,
and that shouldn’t need profiling.

To recast things in telecom/networking terms, I like to think of the
Python layer as the control plane, and the C++ layer as the
data plane.

There are still people out there who think that Python was a horrible
choice for Gnu Radio, because they think we’re pushing significant
data through the Python layer. The hier2 blocks that are written in
Python add to this confusion, since at casual glance, it looks like
we’re pushing data through Python with these, and of course, we’re
not. The hier2 blocks just “encapsulate” a bunch of C++ blocks
to make a composite block, but again, Python just plays the role of
“control/constructor” plane in these scenarios.

I was on the RTLSDR chat group yesterday when I pointed out that all the
fundamental blocks in Gnu Radio are written in C++, and I got a
“well, yeah, except for all those fancy demodulator/modulator
blocks”. So the misunderstanding is persistent.


Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

On Fri, Aug 31, 2012 at 7:10 AM, Christian G.
[email protected] wrote:

I’ve just read about timeit in an article, not sure if it fits your neeeds,
but just in case:
timeit — Measure execution time of small code snippets — Python 3.11.4 documentation

Chris

Chris,
That’s a good tip. I’ve used it before and found it nice to time the
runtime of a flowgraph.

So Qing, if that’s what you were talking about before, yes, something
like this is probably what you want to use to time the execution of a
block from the Python world. I might have been confused what you meant
by wanting to profile Python stuff.

Tom

Hi Chris,

Thanks for your suggestion, I would like to try it out.

Hi Tom,

Actually we are building a broadband OFDM system targeting at 20MHz. Now
the transmit part can support 20MHz transmitting (set argument -W 20M),
but
the receive part will block and show overrun message once we set
–sample-rate larger than 1M. We use Xeon w3530 and N210.

I think the reason is: the code of receiver is inefficient thus unable
to
process the samples from USRP in time. Before optimizing the code, I
should
first find the bottleneck–which module chokes my system–from the whole
system’s perspective. That’s why I want to profile python code rather
than
C++ code.

Sincerely,

Yang, Qing
Information Engineering, CUHK

2012/9/1 Tom R. [email protected]