Different CPU loads with 2 similar GRC graphs

dubstep · September 22, 2011, 7:42am

I am comparing the following 2 systems on GRC:

[source] --> [block A] --> [sink]

and

[source] --> [block A] -->[sink]
|___________________^

where [block A] is a very CPU-intensive SP block, source and sink are
very simple SP blocks,
and neither source nor sink set the sample rate, neither do i have any
throttle in the graph (intentionally).

When I run the first configuration on a 2-core machine
I get 2 CPU traces that alternate between 100% and 20%
for large chunks of time which means that the thread that is running
A overwhelms the system.

So far so good.

When I run the second configuraiton i get 2 CPU traces that alternate
around a load of
65% without any of them touching 100% (except occasionally) which seems
like a
more “normal” situation.

So I have 2 questions:

why is it that the system does not “crash” even in the absense of a
throttle?
why is it that the 2nd configuration results in a lower overall load?

thanks
Achilleas

Achilleas_A · September 22, 2011, 3:46pm

On Sep 22, 2011, at 1:41 AM, Achilleas A. wrote:

[source] --> [block A] -->[sink]
|___________________^

Just to clarify … is your graph:

connect (source, A, sink)
connect (source, sink)

I don’t know for certain why the loads are different, but I can make
some educated guesses. But, I’m also happy to defer to others who might
know better. - MLD

Achilleas_A · September 22, 2011, 6:32pm

Yes.

Achilleas

Achilleas_A · September 22, 2011, 7:02pm

Well, in answer to question 1, you don’t need a throttle to avoid system
instability. You really use the throttle to keep your computer from
becoming
entirely unresponsive when you have multiple threads with high priority
running simultaneously as fast as they can.

I don’t know the answer to question 2 – I suspect to find the answer
you’re
going to have to do some deep hunting in the scheduler.

–n

On Thu, Sep 22, 2011 at 8:06 AM, Achilleas A.
<[email protected]

Achilleas_A · September 29, 2011, 10:46pm

On Thu, Sep 22, 2011 at 1:41 AM, Achilleas A.
[email protected] wrote:

very simple SP blocks,
When I run the second configuraiton i get 2 CPU traces that alternate
around a load of
65% without any of them touching 100% (except occasionally) which seems like a
more “normal” situation.

So I have 2 questions:

why is it that the system does not “crash” even in the absense of a throttle?

why is it that the 2nd configuration results in a lower overall load?

Then I guess that your sink is getting twice the amount of data in the
second graph. As the sink is now slower tahn source, it drives the
pipeline. Then your block A gets samples at half rate. First graph is
100%+20% = 120%. If block A uses all CPU power, then the second graph
should use around 120% / 2 = 60%.

Does it makes sense?

Pascal

Achilleas_A · September 22, 2011, 9:53pm

On Sep 22, 2011, at 1:00 PM, Nick F. wrote:

Well, in answer to question 1, you don’t need a throttle to avoid system
instability. You really use the throttle to keep your computer from becoming
entirely unresponsive when you have multiple threads with high priority running
simultaneously as fast as they can

That would make an interesting test for Achilleas: Do test 1 using
real-time priority & see if that locks up the system or not.

I don’t know the answer to question 2 – I suspect to find the answer you’re
going to have to do some deep hunting in the scheduler.

Let me see if I can work anything out:

Roughly: In the TPB model when “progress is made” for a given block
(e.g., data is generated or consumed), the appropriate thread(s)
containing adjacent blocks are signaled (IIRC, from waiting on a
condition)(e.g., if data is consumed then the prior block(s) [those
generating the data] are notified). When those threads wake up, they
check to see whether or not they have enough input data and output space
to do processing. This check is done by the TPB “scheduler” – it’s an
algorithm that you can read about in
“gnuradio-core/src/lib/runtime/gr_block_executor.cc” if you really want
to. A given block cannot do processing until enough input data and
output space area simultaneously available, and the TPB scheduler tries
to maximize data processing.

For case #1, the signaling and scheduler’s job is simple because each
block is connected only to adjacent blocks. If you were to plot out the
block execution pattern, I’d guess it was something like:

(source == 1; sink == 2; E == “executing”; W == “waiting”):

Thread Time ->
1 E E E E E …
A W E E E E …
2 W W E E E …

For this case data is well-pipelined – A can process as soon as 1 is
finished, and 2 can process as soon as A is finished. Assuming A is CPU
intensive & the graph isn’t otherwise throttled, then each block could
easily saturate a single CPU (if that’s how much processing each
requires).

For case #2, the signaling and scheduler’s job is complicated by the
dual paths from source to sink. If you were to plot out the block
execution pattern, I’d guess it was something like:

(X == “waiting, unsuccessful execution, and then more waiting”; I switch
from “W” to “X” because of the signaling between threads):

Thread Time ->
1 E E X E E X E E …
A W E E X E E X E …
2 W X E X X E X X …

So, in this case because of the odd data shuffling, you’ll see somewhat
less than full CPU usage (on the average). The above “diagram” assumes
that 1 is generating a sizable chunk of data (relative to the total
buffer size), such that it can do one or two writes before its output
buffer is full (that’s roughly how the TPB scheduler works, btw). The
actual wait time-duration I think depends primarily on how long it takes
A to do processing. Either way: The basic idea is that the data is not
well-pipelined, and hence there are waits that reduce the average CPU
usage [another possible factor is how much overhead time is spent during
X’s (emerging from wait, checking for processing, then going back into
wait)].

I hope this is reasonably accurate, makes sense, and helps! - MLD

Achilleas_A · October 4, 2011, 1:27pm

Michael,

Good description of the TPB running mechanism!

Here is a more detailed explanation with some graph illustrations (see
page 2 of the linked paper):

http://gnuradio.org/redmine/attachments/download/264

Andrew