Flowgraph running in "fits and starts"

Marcus_DSLeech · September 6, 2010, 4:39pm

I’ve got a flow-graph with a throttled random byte source, which is a
test input for a modulator:

http://www.sbrac.org/files/fm4_test_modulator.grc

http://www.sbrac.org/files/fm4_test_modulator.py

The source is throttled to the byte rate required to produce the correct
number of symbols/second (4800).

What I’ve noticed is that this graph only runs in “fits and starts”,
rather than continuously. I assume this has something to
do with the Gnu Radio buffering and internal scheduler.

In the case of a “real” flow-graph, taking real data in at
4800symbols/second, going to a real USRP transmitter, will it still
run in “fits and starts” or will it “do the right thing”??

I realize that buffering is an important part of Gnu Radio, but how do
you actually send low-rate data in something approaching
correct real-time?

I at first thought this was due to the throttle block, so I replaced it
with an external (via a FIFO) source that produced random bytes
at a 1200-bytes/second rate (2 bits/symbol), and it behaves exactly
the same as a a throttled random source–the graph seems to run in
“fits and starts”.

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · September 6, 2010, 4:52pm

On Sat, Sep 4, 2010 at 12:19 AM, Marcus D. Leech [email protected]
wrote:

On 09/03/2010 11:52 PM, Eric B. wrote:
Thought about that, as well. So replaced the graphical FFT sink with a
file sink, and set the
“unbuffered” flag. That file fills up in "fits and starts’–that is,
it spends quite a while with
zero bytes in it, then really a lot of bytes, then no more bytes for
quite some time, then
another lump of bytes, etc. I confirmred that the “producer” end of
the FIFO was producing
bytes at the correct rate.

So when I’m sending “real” data to an actual USRP (f’rexample), the
symbols will get clocked out at the right rate, provided
that I issue those bits in sufficiently large lumps to prevent the
USRP from underrun on transmit.

But what about situations where you might have a source of bits that’s
running in “real time” (like my little test case with the
external FIFO), and you’d like the resulting symbols to be “clocked
out” at something resembling “real time”? My test case
was just a test case, but I can certainly imagine situations where it
actually matters.

Remember that GNU Radio runs stuff through each signal processing
block in “chunks.” These chunk sizes can be around 100 to 32000 items
in size, more when there is time to spare and less when the system is
trying to operate quickly. When you’re running with a sample rate of
4800, that’s going to pass 4800 samples each second. At this rate, the
GNU Radio scheduler is likely using very large block sizes (you could
print out the value of noutput_items in one of the work functions to
see for sure). Let’s say that, generally, each block is given an
noutput_items=8192. That’s almost 2 seconds worth of data.

I just created a simple flowgraph in GRC with a noise source,
throttle, and scope sink. With varying rates on the throttle, you can
get see this happening. With such a simple flowgraph, the
noutput_items is always either 4095 or 4096, so it’s pretty regular.
With a rate of 4800, you get a scope update about every second.

I’ve seen something like what you were observing with more complicated
flowgraphs with very small sample rates; when the scheduler doesn’t
produce the same number of items each time through, it runs in “fits
and starts” as you said. Conversely, when running with a source like
the USRP, the USRP source is being run at a minimum of 250 ksps, so
the flow graph has to work to keep up with that and therefore runs
data through the whole graph “faster” but only because the sinks are
being updated with new data more quickly.

Like Eric said, remove the throttle or at least change the rate and
that should clean things up.

Tom

Marcus_DSLeech · September 6, 2010, 4:39pm

On Fri, Sep 03, 2010 at 10:09:01PM -0400, Marcus D. Leech wrote:

I’ve got a flow-graph with a throttled random byte source, which is a
test input for a modulator:

http://www.sbrac.org/files/fm4_test_modulator.grc

http://www.sbrac.org/files/fm4_test_modulator.py

The source is throttled to the byte rate required to produce the correct
number of symbols/second (4800).

The throttle block was written so that the GUI elements could be
tested without an inherently rate limiting source being in the graph.
It is not designed to precisely rate limit anything. For any use
other than that, you’re asking for trouble. Think about it: what
definitely of time to you use? Over what period of time do you
average, etc, etc.

/*!

\brief throttle flow of samples such that the average rate does not
exceed samples_per_sec.
\ingroup misc_blk
input: one stream of itemsize; output: one stream of itemsize
N.B. this should only be used in GUI apps where there is no other
rate limiting block. It is not intended nor effective at precisely
controlling the rate of samples. That should be controlled by a
source or sink tied to sample clock. E.g., a USRP or audio card.
*/

What I’ve noticed is that this graph only runs in “fits and starts”,
rather than continuously. I assume this has something to
do with the Gnu Radio buffering and internal scheduler.

In the case of a “real” flow-graph, taking real data in at
4800symbols/second, going to a real USRP transmitter, will it still
run in “fits and starts” or will it “do the right thing”??

It will do the right thing, assuming that all blocks “do the right
thing” and compute as much output as they are asked to.

I realize that buffering is an important part of Gnu Radio, but how do
you actually send low-rate data in something approaching
correct real-time?

You don’t send it at the right rate, you let the sink (or source)
handle the timing issues.

Note that NONE of GNU Radio has any idea of the actual sample rate.

There are some places where sample rates are used (e.g.,
gr.sig_source), but they are there as a convenience so that people
don’t have to continually puzzle over normalized frequencies.
However, this may give the impression that “sample_rate” actually
means something in the real world, and it doesn’t — with the exception
of i/o devices connected to a sample clock.

I at first thought this was due to the throttle block, so I replaced it
with an external (via a FIFO) source that produced random bytes
at a 1200-bytes/second rate (2 bits/symbol), and it behaves exactly
the same as a a throttled random source–the graph seems to run in
“fits and starts”.

The display may appear to run in “fits and starts” because the
internal decimation rate of the sink may be too high for the throttled
data rate that you’re sending. It may take a long time to get enough
data for the FFT sink to display anything. Or there could be bugs in
the sink…

E.g., the GL fft sink has at least a bug or two related to the
mis-specification of python’s ‘/’ operator. If you use integers,
1/3 == 0, but 1.0/3 = .3333 The bug I’m thinking of shows up as a
divide
by zero in the persistence code when the ftt sink is invoked with its
default parameters (sample_rate = 1, fft_rate = 15). There may also
be problems with mapping the user provided fft_rate into the
decimation factor of the keep_one_in_n block. Not sure about that
one, but this is a place where it’s possible to ask for ill-specified
behavior. E.g., if I say that the fft_rate is 15, and my sample rate
is 1, do I expect interpolation by 15???

See Python PEP-238 for background on the divide issue and the use of

from future import division

to debork the behavior of ‘/’, and possibly help fix the sinks.

If you want to see the details of what the scheduler is doing,
change

#define ENABLE_LOGGING 0

to

#define ENABLE_LOGGING 1

at the top of gr_block_executor.cc It will then create a separate ASCII
log file for each block. They’re named “sst-NNN.log”. The first line
of each log identifies the block.

Hope this helps!

Eric

Marcus_DSLeech · September 6, 2010, 5:08pm

On 09/04/2010 08:08 PM, Tom R. wrote:

On Sat, Sep 4, 2010 at 12:19 AM, Marcus D. Leech [email protected] wrote

Like Eric said, remove the throttle or at least change the rate and
that should clean things up.

Tom

I also noted in the reply to Eric that I observe the same behaviour with
an external source that is producing 4800 symbols/second–so
it’s not the throttle per se, but rather the way that work “chunks”
get scheduled in Gnu Radio. With a “fast” source, you dont find
yourself
in a situation where there aren’t enough “chunks” to keep things busy.

But a very reasonable example would be something like a cross-band
digital repeater application, where bits/symbols would be arriving
at the “channel rate”, and need to leave the Tx in something at least
approaching real time–you certainly need to have a bit of
elastic buffering to compensate for clock-skew between the two sides,
but several-tens-of-seconds of latency isn’t likely to be very
useful in the real world.

Note that I’m not criticizing anybody or anything. I’m making
observations, and I do understand why it is the way it is.
My little test flow-graph failed the “least astonishment test”, which
is why I felt I needed to comment.

Would it be reasonable to open a discussion about this class of
flow-graph? I think they can be characterized as flow-graphs with
a low symbol rate, and high interpolation (which I think is where the
buffer-multiplier effect may be coming into play). In such flow-graphs,
would it be reasonable to be able to “tweak” the scheduler to deal
with this type of situation? I have little insight into how the
scheduler
works in detail, but I think I understand the “fits and starts” that I
was observing.

So, is this a reasonable discussion topic? Are other folks working on
“stuff” that will run into part of the performance diagram I ran
into yesterday? Or is everyone else working on high-event-rate type
signal chains?

Cheers

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · September 6, 2010, 5:08pm

On Sat, Sep 04, 2010 at 08:22:38PM -0400, Marcus D. Leech wrote:

On 09/04/2010 08:08 PM, Tom R. wrote:

On Sat, Sep 4, 2010 at 12:19 AM, Marcus D. Leech [email protected] wrote

Like Eric said, remove the throttle or at least change the rate and
that should clean things up.

Tom

I also noted in the reply to Eric that I observe the same behaviour with
an external source that is producing 4800 symbols/second–so
it’s not the throttle per se, but rather the way that work “chunks”
get scheduled in Gnu Radio. With a “fast” source, you dont find yourself
in a situation where there aren’t enough “chunks” to keep things busy.

But a very reasonable example would be something like a cross-band
digital repeater application, where bits/symbols would be arriving
at the “channel rate”, and need to leave the Tx in something at least
approaching real time–you certainly need to have a bit of
elastic buffering to compensate for clock-skew between the two sides,
but several-tens-of-seconds of latency isn’t likely to be very
useful in the real world.

Note that I’m not criticizing anybody or anything. I’m making
observations, and I do understand why it is the way it is.
My little test flow-graph failed the “least astonishment test”, which
is why I felt I needed to comment.

Would it be reasonable to open a discussion about this class of
flow-graph? I think they can be characterized as flow-graphs with
a low symbol rate, and high interpolation (which I think is where the
buffer-multiplier effect may be coming into play). In such flow-graphs,
would it be reasonable to be able to “tweak” the scheduler to deal
with this type of situation? I have little insight into how the scheduler
works in detail, but I think I understand the “fits and starts” that I
was observing.

So, is this a reasonable discussion topic? Are other folks working on
“stuff” that will run into part of the performance diagram I ran
into yesterday? Or is everyone else working on high-event-rate type
signal chains?

Marcus,

This is certainly a reasonable discussion topic.
I suggest before wading in that you first enable the scheduler logging
that I mentioned in a prior post and take a look at that.

Feel free to send me the logs too.

What we’re looking for is which block is forcing the large chunk size.
If you were reading from a file using for example gr.file_source, it
won’t return until until it’s completely filled up the downstream
buffer given to it. That’s just how it’s written.

A trivial change would be to have it loop until it it read
min(N_USER_SPECIFIED_ITEMS, noutput_items) items.

Eric

Marcus_DSLeech · September 6, 2010, 4:44pm

On 09/03/2010 11:52 PM, Eric B. wrote:

The throttle block was written so that the GUI elements could be
tested without an inherently rate limiting source being in the graph.
It is not designed to precisely rate limit anything. For any use
other than that, you’re asking for trouble. Think about it: what
definitely of time to you use? Over what period of time do you
average, etc, etc.

I understand that. See below.

It will do the right thing, assuming that all blocks “do the right
thing” and compute as much output as they are asked to.

You don’t send it at the right rate, you let the sink (or source)
handle the timing issues.

Note that NONE of GNU Radio has any idea of the actual sample rate.

There are some places where sample rates are used (e.g.,
gr.sig_source), but they are there as a convenience so that people
don’t have to continually puzzle over normalized frequencies.
However, this may give the impression that “sample_rate” actually
means something in the real world, and it doesn’t — with the exception
of i/o devices connected to a sample clock.

Yes, I “get” that.

The display may appear to run in “fits and starts” because the
internal decimation rate of the sink may be too high for the throttled
data rate that you’re sending. It may take a long time to get enough
data for the FFT sink to display anything. Or there could be bugs in
the sink…

E.g., the GL fft sink has at least a bug or two related to the
mis-specification of python’s ‘/’ operator. If you use integers,
1/3 == 0, but 1.0/3 = .3333 The bug I’m thinking of shows up as a divide
by zero in the persistence code when the ftt sink is invoked with its
default parameters (sample_rate = 1, fft_rate = 15). There may also
be problems with mapping the user provided fft_rate into the
decimation factor of the keep_one_in_n block. Not sure about that
one, but this is a place where it’s possible to ask for ill-specified
behavior. E.g., if I say that the fft_rate is 15, and my sample rate
is 1, do I expect interpolation by 15???

See Python PEP-238 for background on the divide issue and the use of

from future import division

to debork the behavior of ‘/’, and possibly help fix the sinks.

Thought about that, as well. So replaced the graphical FFT sink with a
file sink, and set the
“unbuffered” flag. That file fills up in "fits and starts’–that is,
it spends quite a while with
zero bytes in it, then really a lot of bytes, then no more bytes for
quite some time, then
another lump of bytes, etc. I confirmred that the “producer” end of
the FIFO was producing
bytes at the correct rate.

So when I’m sending “real” data to an actual USRP (f’rexample), the
symbols will get clocked out at the right rate, provided
that I issue those bits in sufficiently large lumps to prevent the
USRP from underrun on transmit.

But what about situations where you might have a source of bits that’s
running in “real time” (like my little test case with the
external FIFO), and you’d like the resulting symbols to be “clocked
out” at something resembling “real time”? My test case
was just a test case, but I can certainly imagine situations where it
actually matters.

If you want to see the details of what the scheduler is doing,
change

#define ENABLE_LOGGING 0

to

#define ENABLE_LOGGING 1

at the top of gr_block_executor.cc It will then create a separate ASCII
log file for each block. They’re named “sst-NNN.log”. The first line
of each log identifies the block.

Hope this helps!

Eric

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · September 6, 2010, 11:05pm

On Sat, Sep 4, 2010 at 8:47 PM, Eric B. [email protected] wrote:

On Sat, Sep 04, 2010 at 08:22:38PM -0400, Marcus D. Leech wrote:

On 09/04/2010 08:08 PM, Tom R. wrote:

On Sat, Sep 4, 2010 at 12:19 AM, Marcus D. Leech [email protected] wrote

Like Eric said, remove the throttle or at least change the rate and
that should clean things up.

Tom

I also noted in the reply to Eric that I observe the same behaviour with
an external source that is producing 4800 symbols/second–so
it’s not the throttle per se, but rather the way that work “chunks”
get scheduled in Gnu Radio. With a “fast” source, you dont find yourself
in a situation where there aren’t enough “chunks” to keep things busy.

But a very reasonable example would be something like a cross-band
digital repeater application, where bits/symbols would be arriving
at the “channel rate”, and need to leave the Tx in something at least
approaching real time–you certainly need to have a bit of
elastic buffering to compensate for clock-skew between the two sides,
but several-tens-of-seconds of latency isn’t likely to be very
useful in the real world.

Note that I’m not criticizing anybody or anything. I’m making
observations, and I do understand why it is the way it is.
My little test flow-graph failed the “least astonishment test”, which
is why I felt I needed to comment.

Would it be reasonable to open a discussion about this class of
flow-graph? I think they can be characterized as flow-graphs with
a low symbol rate, and high interpolation (which I think is where the
buffer-multiplier effect may be coming into play). In such flow-graphs,
would it be reasonable to be able to “tweak” the scheduler to deal
with this type of situation? I have little insight into how the scheduler
works in detail, but I think I understand the “fits and starts” that I
was observing.

So, is this a reasonable discussion topic? Are other folks working on
“stuff” that will run into part of the performance diagram I ran
into yesterday? Or is everyone else working on high-event-rate type
signal chains?

Marcus,

This is certainly a reasonable discussion topic.
I suggest before wading in that you first enable the scheduler logging
that I mentioned in a prior post and take a look at that.

Feel free to send me the logs too.

What we’re looking for is which block is forcing the large chunk size.
If you were reading from a file using for example gr.file_source, it
won’t return until until it’s completely filled up the downstream
buffer given to it. That’s just how it’s written.

A trivial change would be to have it loop until it it read
min(N_USER_SPECIFIED_ITEMS, noutput_items) items.

Eric

Marcus,
Indeed, this could be something we want to talk more about. Kind of on
the periphery of my vision, I can see a handful of applications where
the large chunking issue could be a problem. If we can define enough
need, then we can talk more about finding the right way about it.

Eric’s suggestion is a good start. Tell it how many items you want and
then run the loop based off that number or the noutput_items,
whichever is smaller. If this works well for you, we might want to
find a way of integrating that concept as part of the
scheduler/basic_block.

Well, like I said, we can think this through more clearly if you come
up with positive results with that hack.

Tom

Marcus_DSLeech · September 7, 2010, 2:52am

On 09/06/2010 05:03 PM, Tom R. wrote:

find a way of integrating that concept as part of the
scheduler/basic_block.

Well, like I said, we can think this through more clearly if you come
up with positive results with that hack.

Tom

I hacked in a hard-coded value as a temporary test that amounts to
100msec worth of “super symbols” (the actual symbols are
di-bits, at a nominal 4800symbol/sec rate, but I send 1200 packed
bytes/second over the FIFO) from my external source.

Looking at the debug logging setup by the scheduler, the scheduler has
asked for 32767 noutput_items on the gr.file_source() in my
flow-graph, and I’m returning 100msec worth (which is 120 items in my
case).

The result is a flow-graph that runs with much less apparent latency,
depending on what blocks I pick. If I put in an interpolator block
as the last item in the graph before the FFT display, it becomes
“chunky” again, so I put in a rational resampler to resample up to
the final channel bandwidth (mininum 500KHz for a USRP2, if I’ve done
my math correctly ).

But the result is quite a bit more CPU hungry than the previous “fits
and starts” version of the flow-graph. So this little hack is
instructive, but not in and of itself any kind of path forward.

It seems like some kind of global approach to the latency issue for
narrow-bandwidth/low-event-rate applications is definitely
worth discussing. Likely much careful treading required

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium