take your time this is not a simple problem to solve and hence it
does not have a simple solution. I like the idea of starting with
changing the scheduler behaviour first to experiment with. Indeed it
will take a lot of measurements, talks, experiments (not exactly in this
order) to understand what is to be optimized.
But this talk is touching the gnuradio scheduler area which is an area
of its own and it brings me to one question: what happen with GRAS?
On Sunday, May 25, 2014 10:14 PM, Marcus M. [email protected]
-----BEGIN PGP SIGNED MESSAGE-----
thanks for your comment
Such an optimizer would be really, really fancy.
In a way, though, GNU Radio already does this when running a flow
graph: It just asks blocks to compute a reasonable amount of items to
fill up the downstream buffer. This actually (conceptually simple)
approach is one great strength, because it just keeps the computer
“busy” as much as possible.
There might be space for optimization, though, I agree: Maybe it would
be better for some blocks just to wait longer (and thus, not utilize
the CPU) if it was computationally beneficial to work on larger
chunks, as long as there are enough other blocks competing for CPU
However, this leads to the problem of balancing average throughput
What the GNU Radio infrastructure does to approach this is actually
- Although it might be “fastest” to process all data at once, buffer
lengths set a natural limit to the chunk size, and thus latency. So we
have an upper boundary.
- It is best to be closest possible to that upper boundary. To
achieve that, block developers are always encouraged to consume as
many input items and produce as much output as possible, even if the
overhead of having multiple (general_)work calls is minute. This
ensures that adjacent blocks don’t get asked to produce / consume
small item chunks (which will happen if they were in a waiting state
and a small number of items was produced or a small amount of output
buffer was marked as read).
Optimizing this will be hard. Maybe one could profile the same
flowgraph with a lot of different settings of per-block maximum output
chunk sizes, but I do believe this will only give as little more
information than what the block developer already knew when he
optimized the block in the first place. If he didn’t optimize, his
main interest will be if his block poses a problem at at all; for
that, standard settings should be employed.
To give developers an API to inform the runtime of item amount
preference, different methods exist, however. I’ll give a short
rundown of them.
- Most notable are the
fixed_rate properties of
sync_block, and the
interpolator block types
- If your block will only produce a multiples of a certain number of
set_output_multiple is a method that will potentially
decrease overhead introduced by pointless calls to
- In hardware optimization, alignment is often the performance
critical factor. To account for that, set_alignment was introduced.
It’s working very similar to
set_output_multiple, but does not
enforce the multiples, but sets an unaligned flag if non-multiple
consumption occurred. The runtime will always try to achieve that the
start of your current item chunk is memory-aligned to a certain item
multiple. If however less was produced, your block might still be
called, to keep the data flowing.
To properly apply these flags, you’ll basically need a human
understanding of what the block does. It may, nevertheless, be very
helpful to understand how well your block performs with different item
chunk sizes. To realize that, some mechanism to change scheduling
I will look into that; I think it should be possible to manipulate the
block_executors to manipulate them into changing their
forecasting/work calling behavior at runtime, but I’m quite sure that
this will bring new code into the main tree.
All in all, right now I’m really stuck with what I actually want to
improve with the performance analysis of GNU Radio flowgraphs offered
by performance counters/gr-perf-monitorx, because they address many of
these issues already. Your execution-per-item over chunk size idea is
 I’ll really have to take a deeeep look at block_executor and the
tpb scheduler to tell; if I decide to add functionality that
introduces significant runtime overhead or changes too much of
internal behaviour, noone will be pleased, so I might take this slow
and will have to discuss it with experienced core developers. I’m not
very hesitant when it comes to fiddling with in-tree source code, but
my workings almost never make it to the public, because I always
figure they don’t address a problem properly or break too much in
comparison to what they can possibly improve.
On 25.05.2014 19:34, Bogdan D. wrote:
Building on block computing performance measurements there is one
perspective the optimizer would fall in the category of
all the constructive feedback on my proposal, and all the support,
Then, there will be code; you will find that on my github page
mailing list [email protected]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----