Phase errors and expected behavior of in-flight samples when a flowgraph is reconfigured

musicdenotation · April 21, 2014, 2:32am

I’ve seen three different intermittent misbehaviors in my application
which can be described as parallel signal paths getting out of phase
with each other. (For example, one of them is one stereo audio output
channel being delayed from the other.)

They all seem to occur when the flowgraph is reconfigured (lock, change
connections, unlock), even though the subgraph that contains the phase
error was not itself modified.

Is GNU Radio expected or known to handle reconfigurations poorly in
this way? Should I treat this as a bug or a feature request?

As a further related question, what is intended to happen to samples
which are buffered between two blocks (A->B) when that connection is
removed? Are they discarded, delivered to B, or delivered to whatever
A’s output is now?)

–
Kevin R. http://switchb.org/kpreid/

dotty · April 21, 2014, 9:54am

On Mon, Apr 21, 2014 at 2:31 AM, Kevin R. [email protected] wrote:

I’ve seen three different intermittent misbehaviors in my application
which can be described as parallel signal paths getting out of phase with
each other. (For example, one of them is one stereo audio output channel
being delayed from the other.)

They all seem to occur when the flowgraph is reconfigured (lock, change
connections, unlock), even though the subgraph that contains the phase
error was not itself modified.

On which operating system do you experience this? I have noticed such
behavior on OSX using the new audio-osx backend, but not using the
portaudio backend or on linux.

Alex

dotty · April 21, 2014, 4:36pm

On Sun, Apr 20, 2014 at 8:31 PM, Kevin R. [email protected] wrote:

this way? Should I treat this as a bug or a feature request?

As a further related question, what is intended to happen to samples which
are buffered between two blocks (A->B) when that connection is removed? Are
they discarded, delivered to B, or delivered to whatever A’s output is now?)

–
Kevin R. http://switchb.org/kpreid/

During reconfiguration, any connect/disconnect that occurs removes the
buffers and adds new ones, therefore losing any data you had between
those
blocks. This was the specification of the reconfiguration process when
it
was built.

Tom

dotty · April 21, 2014, 5:40pm

On Apr 21, 2014, at 7:35, Tom R. [email protected] wrote:

During reconfiguration, any connect/disconnect that occurs removes the buffers
and adds new ones, therefore losing any data you had between those blocks. This
was the specification of the reconfiguration process when it was built.

That makes sense (though I could imagine uses for other behaviors).
Thanks.

What about blocks other than “those blocks”, that is, buffers in
connections that are not modified by a given reconfiguration? Are
those intended to be kept or discarded (I would hope kept)?

I ask because the behavior I am seeing could be explained by them being
inconsistently kept or discarded.

–
Kevin R. http://switchb.org/kpreid/

dotty · April 21, 2014, 7:45pm

On Mon, Apr 21, 2014 at 11:19 AM, Kevin R. [email protected] wrote:

On Apr 21, 2014, at 7:35, Tom R. [email protected] wrote:

During reconfiguration, any connect/disconnect that occurs removes the
buffers and adds new ones, therefore losing any data you had between those
blocks. This was the specification of the reconfiguration process when it
was built.

That makes sense (though I could imagine uses for other behaviors). Thanks.

Agreed. The other (obvious) option is to keep the data and reinsert it
into
the buffers, but that’s difficult because of potential rate changes and
other things affecting both buffer sizes and what you’re expecting to do
with the data after the reconfiguration.

What about blocks other than “those blocks”, that is, buffers in
connections that are not modified by a given reconfiguration? Are those
intended to be kept or discarded (I would hope kept)?

I ask because the behavior I am seeing could be explained by them being
inconsistently kept or discarded.

–
Kevin R. http://switchb.org/kpreid/

Correct, if the connections between blocks was not broken and
reconnected,
the data will be preserved before and after the reconfiguration. (I
believe; Johnathan wrote that code, but this is my recollection of a)
how
it’s supposed to work and b) how it looks to behave when I’ve looked at
that portion of the code. It also agrees with your observations.)

Seems like you could disconnect the entire flowgraph and reconnect to
make
sure all data is getting flushed and you should maintain sync that way.
It
might sound like a big hammer, but it’d be good to know if that works.

Tom

dotty · April 21, 2014, 4:24pm

On Apr 21, 2014, at 0:53, Alexandru C. [email protected] wrote:

On Mon, Apr 21, 2014 at 2:31 AM, Kevin R. [email protected] wrote:

I’ve seen three different intermittent misbehaviors in my application which can
be described as parallel signal paths getting out of phase with each other. (For
example, one of them is one stereo audio output channel being delayed from the
other.)

They all seem to occur when the flowgraph is reconfigured (lock, change
connections, unlock), even though the subgraph that contains the phase error was
not itself modified.

On which operating system do you experience this? I have noticed such behavior
on OSX using the new audio-osx backend, but not using the portaudio backend or on
linux.

My audio chain does not use a GR audio_sink of any description it exits
the flowgraph through a single message_sink after vectorizing the
channels.

The other two misbehaviors don’t involve audio, or any sources or sinks.
Furthermore, they can both be described as a subgraph (in one case, a
concrete hier block) which has a single input and a single output, but
two paths within that subgraph which split and rejoin, and those two
paths become out of sync with each other.

–
Kevin R. http://switchb.org/kpreid/

dotty · April 21, 2014, 7:48pm

The audio sinks (all of them, to the best of my knowledge/understanding)
do nothing special when it comes to data channel synch. That’s just not
their job (timing such as overflow or underrun might be, but that’s a
different issue than being covered here). They should assume that
incoming data is already correctly synchronized across all channels; how
could they know otherwise? If you’re finding that the phase is
changing, then that’s coming from upstream somewhere. - MLD

dotty · April 21, 2014, 8:38pm

Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

dotty · April 22, 2014, 2:44am

On Apr 21, 2014, at 10:43, Tom R. [email protected] wrote:

Seems like you could disconnect the entire flowgraph and reconnect to make sure
all data is getting flushed and you should maintain sync that way. It might sound
like a big hammer, but it’d be good to know if that works.

From my non-systematic observations so far, you’re probably right that
this would restore sync. However, doing so would require a nontrivial
modification to many hier blocks in my application (of which there are a
lot), wouldn’t actually fix the underlying problem, and would be
undesirable because it would mean things that (logically) are unaffected
by the UI action would drop data.

My current plan is to produce a reduced test case and bug report
(assuming this does in fact turn out to be a gnuradio bug), perhaps with
some spot reconnect-based kludges in the mean time.

Thank you for confirming my understanding of the expected behavior.

–
Kevin R. http://switchb.org/kpreid/

dotty · April 22, 2014, 11:25am

Kevin,

It is my understanding that top_block.disconnet_all() only disconnects
blocks that were connected at the top block level and will not destroy
your hier_blocks.

Alex

dotty · April 24, 2014, 4:47am

[quoting reordered for regularity]

On Apr 22, 2014, at 2:24, Alexandru C. [email protected] wrote:

On Tue, Apr 22, 2014 at 2:42 AM, Kevin R. [email protected] wrote:

On Apr 21, 2014, at 10:43, Tom R. [email protected] wrote:

Seems like you could disconnect the entire flowgraph and reconnect to make
sure all data is getting flushed and you should maintain sync that way. It might
sound like a big hammer, but it’d be good to know if that works.

From my non-systematic observations so far, you’re probably right that this
would restore sync. However, doing so would require a nontrivial modification to
many hier blocks in my application (of which there are a lot), wouldn’t actually
fix the underlying problem, and would be undesirable because it would mean things
that (logically) are unaffected by the UI action would drop data.

It is my understanding that top_block.disconnet_all() only disconnects
blocks that were connected at the top block level and will not destroy
your hier_blocks.

That is correct. I had interpreted Tom R.'s suggestion as being to
reconnect every connection in the flowgraph, i.e. including all
internal connections of hier blocks.

As it happens, my top block already uses disconnect_all every time it
reconfigures, so I’m already doing that much.

General followup: I have reduced one case of the problem and filed a
bug.
http://gnuradio.org/redmine/issues/667

–
Kevin R. http://switchb.org/kpreid/