Forum: GNU Radio Inefficiency of message passing in inband code

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Stefan Brüns (Guest)
on 2008-10-09 23:48
(Received via mailing list)
Hi,

I ran some performance analysis of the inband code, and found the pmt
based
message passing as highly inefficient.

As a first step, I changed the pmt_list[1-6] implementations from using
an
imbalanced tree with refcounted pairs against a much more simple
pmt_vector
based one, resulting in a huge speedup.

Unfortunately, this means going from ~770 instructions per sample [1] to
~300
instructions per sample, which is still just to much.

The high cost is due to the following reasons:
On each received sample block, the data is passed between the mblocks.
For
this, messages are created, every fields refcount gets incremented, on
message reception the fields refcounts are adjusted again, and when the
receiving block finishes, the fields refcounts and the messages refcount
are
decreased, leading to release of the fields and the message (as the
receiving
block is normally the only block holding a reference to the message).

Much of this work is just not necessary - I think the messages should be
much
more statically typed, using refcounted objects only for large objects
(e.g,
pass the sample data as refcounted object, the package length as a POD).

Any opions on this topic?

Stefan

--
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
mailto:lurch at gmx.li  http://www.kawo1.rwth-aachen.de/~lurchi/
   phone: +49 241 53809034     mobile: +49 151 50412019
Eric B. (Guest)
on 2008-10-10 00:11
(Received via mailing list)
On Thu, Oct 09, 2008 at 09:26:47PM +0200, Stefan Brüns wrote:
> instructions per sample, which is still just to much.
> more statically typed, using refcounted objects only for large objects (e.g,
> pass the sample data as refcounted object, the package length as a POD).
>
> Any opions on this topic?
>
> Stefan

I think changing the args from list to vector would be a simple win.

The other problem is that mblocks currently use a thread per block,
instead of using a small pool of threads to execute all the blocks.

The pmt type probably needs to be reimplemented.  The current version
was coded in about 1 day.  It was the simplest thing that could have
possibly worked.  I'm disinclined to go with statically typed messages
for a number of reasons that I don't have time to go into now.  Think
about lisp, scheme, python, ruby, erlang...  marshalling
self-describing data across a network connection...

It is possible to have types like these with very good performance.
See SBCL or any other high-performance lisp system for a concrete
example.  There's over 40 years of literature and corresponding
high-performance systems in this area.

Eric
George N. (Guest)
on 2008-10-10 00:43
(Received via mailing list)
Eric B. wrote:
> The pmt type probably needs to be reimplemented.  The current version
> was coded in about 1 day.  It was the simplest thing that could have
> possibly worked.  I'm disinclined to go with statically typed messages
> for a number of reasons that I don't have time to go into now.  Think
> about lisp, scheme, python, ruby, erlang...  marshalling
> self-describing data across a network connection...

Just a little bit from just about the only person who has *actually*
used the PMT type other than Eric ;)  I definitely agree with Eric
against statically typed messages.  It will severely impact a major goal
of the PMT, which is to support dynamic messages between blocks for MAC
implementations. :)

- George
Johnathan C. (Guest)
on 2008-10-10 03:31
(Received via mailing list)
On Thu, Oct 9, 2008 at 1:41 PM, George N. <removed_email_address@domain.invalid> 
wrote:

> Just a little bit from just about the only person who has *actually* used
> the PMT type other than Eric ;)  I definitely agree with Eric against
> statically typed messages.  It will severely impact a major goal of the PMT,
> which is to support dynamic messages between blocks for MAC implementations.

I have used PMT/mblock in some commercial contract work.  I agree that
there are performance improvements to be made in a number of places,
but the "late binding" nature of the dynamic typing is sort of what
PMT is all about, and makes possible the simplicity of the mblock
design.

--
Johnathan C.
Corgan Enterprises LLC
http://corganenterprises.com/
Stefan Brüns (Guest)
on 2008-10-10 22:28
(Received via mailing list)
On Friday 10 October 2008 01:15:56 Johnathan C. wrote:
>
> I have used PMT/mblock in some commercial contract work.  I agree that
> there are performance improvements to be made in a number of places,
> but the "late binding" nature of the dynamic typing is sort of what
> PMT is all about, and makes possible the simplicity of the mblock
> design.

I found at least one point which gives some improvement - there are a
few
functions, which take an argument of type "pmt_t", but can be changed
to "const pmt_t&" without any negative side effect.

(Passing pmt_t by value creates an pmt_t instance, using a reference
does not.
For e.g. pmt_eq, this saves creation of two pmt_t instances, including
cost
for malloc'ing and so on.)

Functions which should be changed are for example pmt_eq (and similar),
pmt_to_long (and similar). This breaks ABI, but not API, but the gain is
quite substantial - this saves ~30 instructions _per sample_ for the
test_usrp_inband_rx example.


At the moment, I am experimenting with memory pools for some of the
passed
objects - this should give some improvement (most time is spent in
malloc/free), without breaking ref counting, polymorphic types ...
Malloc/Free is taking most of the time atm, so lets see what this
brings.

Stefan

PS: If you are wondering what I am using for analysis, its valgrind's
callgrind tool, and kcachegrind for visualization.

--
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
mailto:lurch at gmx.li  http://www.kawo1.rwth-aachen.de/~lurchi/
   phone: +49 241 53809034     mobile: +49 151 50412019
Johnathan C. (Guest)
on 2008-10-11 00:17
(Received via mailing list)
On Fri, Oct 10, 2008 at 11:24 AM, Stefan
Brüns<removed_email_address@domain.invalid> wrote:

> I found at least one point which gives some improvement - there are a few
> functions, which take an argument of type "pmt_t", but can be changed
> to "const pmt_t&" without any negative side effect.

Good.

> At the moment, I am experimenting with memory pools for some of the passed
> objects - this should give some improvement (most time is spent in
> malloc/free), without breaking ref counting, polymorphic types ...
> Malloc/Free is taking most of the time atm, so lets see what this brings.

Keep us up to date with your experiments, you're on the right track.

--
Johnathan C.
Corgan Enterprises LLC
http://corganenterprises.com/
This topic is locked and can not be replied to.