Inefficiency of message passing in inband code

Stefan_BrSSS_ns · October 9, 2008, 9:48pm

Hi,

I ran some performance analysis of the inband code, and found the pmt
based
message passing as highly inefficient.

As a first step, I changed the pmt_list[1-6] implementations from using
an
imbalanced tree with refcounted pairs against a much more simple
pmt_vector
based one, resulting in a huge speedup.

Unfortunately, this means going from ~770 instructions per sample [1] to
~300
instructions per sample, which is still just to much.

The high cost is due to the following reasons:
On each received sample block, the data is passed between the mblocks.
For
this, messages are created, every fields refcount gets incremented, on
message reception the fields refcounts are adjusted again, and when the
receiving block finishes, the fields refcounts and the messages refcount
are
decreased, leading to release of the fields and the message (as the
receiving
block is normally the only block holding a reference to the message).

Much of this work is just not necessary - I think the messages should be
much
more statically typed, using refcounted objects only for large objects
(e.g,
pass the sample data as refcounted object, the package length as a POD).

Any opions on this topic?

Stefan

–
Stefan Brüns / Bergstraße 21 / 52062 Aachen
mailto:lurch at gmx.li http://www.kawo1.rwth-aachen.de/~lurchi/
phone: +49 241 53809034 mobile: +49 151 50412019

Stefan_BrSSS_ns · October 9, 2008, 10:11pm

On Thu, Oct 09, 2008 at 09:26:47PM +0200, Stefan Brüns wrote:

instructions per sample, which is still just to much.
more statically typed, using refcounted objects only for large objects (e.g,
pass the sample data as refcounted object, the package length as a POD).

Any opions on this topic?

Stefan

I think changing the args from list to vector would be a simple win.

The other problem is that mblocks currently use a thread per block,
instead of using a small pool of threads to execute all the blocks.

The pmt type probably needs to be reimplemented. The current version
was coded in about 1 day. It was the simplest thing that could have
possibly worked. I’m disinclined to go with statically typed messages
for a number of reasons that I don’t have time to go into now. Think
about lisp, scheme, python, ruby, erlang… marshalling
self-describing data across a network connection…

It is possible to have types like these with very good performance.
See SBCL or any other high-performance lisp system for a concrete
example. There’s over 40 years of literature and corresponding
high-performance systems in this area.

Eric

Stefan_BrSSS_ns · October 9, 2008, 10:43pm

Eric B. wrote:

The pmt type probably needs to be reimplemented. The current version
was coded in about 1 day. It was the simplest thing that could have
possibly worked. I’m disinclined to go with statically typed messages
for a number of reasons that I don’t have time to go into now. Think
about lisp, scheme, python, ruby, erlang… marshalling
self-describing data across a network connection…

Just a little bit from just about the only person who has actually
used the PMT type other than Eric I definitely agree with Eric
against statically typed messages. It will severely impact a major goal
of the PMT, which is to support dynamic messages between blocks for MAC
implementations.

George

Stefan_BrSSS_ns · October 10, 2008, 1:31am

On Thu, Oct 9, 2008 at 1:41 PM, George N. [email protected] wrote:

Just a little bit from just about the only person who has actually used
the PMT type other than Eric I definitely agree with Eric against
statically typed messages. It will severely impact a major goal of the PMT,
which is to support dynamic messages between blocks for MAC implementations.

I have used PMT/mblock in some commercial contract work. I agree that
there are performance improvements to be made in a number of places,
but the “late binding” nature of the dynamic typing is sort of what
PMT is all about, and makes possible the simplicity of the mblock
design.

–
Johnathan C.
Corgan Enterprises LLC
http://corganenterprises.com/

Stefan_BrSSS_ns · October 10, 2008, 10:17pm

On Fri, Oct 10, 2008 at 11:24 AM, Stefan
Brüns[email protected] wrote:

I found at least one point which gives some improvement - there are a few
functions, which take an argument of type “pmt_t”, but can be changed
to “const pmt_t&” without any negative side effect.

Good.

At the moment, I am experimenting with memory pools for some of the passed
objects - this should give some improvement (most time is spent in
malloc/free), without breaking ref counting, polymorphic types …
Malloc/Free is taking most of the time atm, so lets see what this brings.

Keep us up to date with your experiments, you’re on the right track.

–
Johnathan C.
Corgan Enterprises LLC
http://corganenterprises.com/

Stefan_BrSSS_ns · October 10, 2008, 8:28pm

On Friday 10 October 2008 01:15:56 Johnathan C. wrote:

I have used PMT/mblock in some commercial contract work. I agree that
there are performance improvements to be made in a number of places,
but the “late binding” nature of the dynamic typing is sort of what
PMT is all about, and makes possible the simplicity of the mblock
design.

I found at least one point which gives some improvement - there are a
few
functions, which take an argument of type “pmt_t”, but can be changed
to “const pmt_t&” without any negative side effect.

(Passing pmt_t by value creates an pmt_t instance, using a reference
does not.
For e.g. pmt_eq, this saves creation of two pmt_t instances, including
cost
for malloc’ing and so on.)

Functions which should be changed are for example pmt_eq (and similar),
pmt_to_long (and similar). This breaks ABI, but not API, but the gain is
quite substantial - this saves ~30 instructions per sample for the
test_usrp_inband_rx example.

At the moment, I am experimenting with memory pools for some of the
passed
objects - this should give some improvement (most time is spent in
malloc/free), without breaking ref counting, polymorphic types …
Malloc/Free is taking most of the time atm, so lets see what this
brings.

Stefan

PS: If you are wondering what I am using for analysis, its valgrind’s
callgrind tool, and kcachegrind for visualization.

–
Stefan Brüns / Bergstraße 21 / 52062 Aachen
mailto:lurch at gmx.li http://www.kawo1.rwth-aachen.de/~lurchi/
phone: +49 241 53809034 mobile: +49 151 50412019