Comments on stream tags and metadata storage

addis_a · July 18, 2014, 12:13am

Some comments after playing with stream tags and metadata this
afternoon.

(1) Although the discussion of stream tag insertion hints that this
should be done within the scheduler’s call to work() it could be more
clear that doing it in any other context can result in race conditions.
(I did think I saw it stated more clearly somewhere, but can’t find
that now, so maybe this point has been addressed.)

(2) In the current implementation it’s further necessary that tags be
added to an output in monotonic non-decreasing offset order.
file_meta_sink does not sort the return value from get_tags_in_range(),
and emits all data up to the timestamp of the next tag, so a subsequent
tag with an earlier offset is dropped from the archive.

(I note that tagged_file_sink() does sort the tags it receives in one
case, but not in others.)

I don’t see this requirement on ordered generation documented. In some
cases, it may be inconvenient to do this, e.g. when a block’s analysis
discovers after-the-fact that something interesting can be associated
with a past sample. Similarly, a user might want a block to associate
a tag with sample that not yet arrived, to notify a downstream block
that will need to process the event.

A simple solution for the infrastructure is to require that tags only be
generated from within work(), with offsets corresponding to samples
generated in that call to work(), and in non-decreasing offset order
(though this last requirement could be handled by add_item_tag()). The
developer must then handle the too-late/too-early tag associations
through some other mechanism, such as carrying the effective offset as
part of the tag value.

(3) Qt GUI Range with widget Counter + Slider invokes callbacks twice,
even if the value itself was set exactly once through the counter text
entry. If the callback records the change by queuing a stream tag for
addition to the output, multiple tags with the same offset/key/value
will be generated.

There are ugly solutions to this but it’s probably sufficient to note
somewhere that it can happen. It’s really not specific to tags, but is
clearly visible in that case.

(4) The in-memory stream of tags can produce multiple settings of the
same key at the same offset. However, when stored to a file only the
last setting of the key is recorded.

I believe this last behavior is incorrect and that it’s a mistake to use
a map instead of a multimap or simple list for the metadata record of
stream tags associated with a sample.

One argument is that it’s critical that a stream archive of a processing
session faithfully record the contents of the stream so that re-running
the application using playback reproduces that stream and thus the
original behavior (absent non-determinism due to asynchrony). This
faithful reproduction is what would allow a maintainer to diagnose an
operational failure caused by a block with a runtime failure when the
same tag is processed twice at the same offset. This is true even if
the same key is set to the same value at the same sample offset multiple
times, which some might otherwise want to argue is redundant.

A corollary argument is that the sample number at which an event like a
tuner configuration change occurs usually can’t be exactly associated
with a sample; the best estimate is likely to be the index of the first
sample generated by the next call to work. But depending on processing
speed an application might change an attribute of a data source multiple
times before work was invoked. The effect of those intermediate changes
may be visible in the signal, and to lose the fact they occurred by
discarding all but the last change affects both reproducibility and
interpretation of the signal itself.

(5) All stream tags are placed in the extras block, and when a segment
is completed file_meta_sink will generate a new header. The new header
contains copies of the unique tags, but updates their offsets to be the
start of the new segment.

This is incorrect as the original stream did not have those tags
associated with those samples, so re-playing will introduce a behavioral
difference. For example, a tag that is meant to be associated with the
start of a packet will be duplicated at an offset that is probably not
the start of a packet.

Solutions include (a) leave the original offset setting for tags in the
extras section when they’re reproduced in a new segment, even though
that offset is not present in the segment; (b) treat stream tags as
ephemeral and do not persist them in the extras section when generating
a new segment; © extend the add_item_tag API to record whether the
tag is ephemeral or persistent. Offhand I can see no argument
supporting persisting a tag and updating its offset, and only rare cases
where it’s appropriate to replicate outdated information in a new
segment, so (b) seems to be the right move.

All the above is based on my understanding and expectations of how
stream tags are/should be used. If my understanding is mistaken,
please let me know.

Peter

Peter_ASBigot · July 18, 2014, 5:05am

From: discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalid
discuss-gnuradio-bounces+sean.nowlan=removed_email_address@domain.invalid on behalf of Peter
A. Bigot [email protected]
Sent: Thursday, July 17, 2014 6:11 PM
To: [email protected]
Subject: [Discuss-gnuradio] comments on stream tags and metadata storage

Some comments after playing with stream tags and metadata this
afternoon.

I can’t speak to all of these issues due to not having played around
much with the file_meta_sink and tagged_file_sink blocks but I have some
responses to some of your comments/questions.

tag with an earlier offset is dropped from the archive.

I don’t think that ordered generation is required per se, but certain
blocks sort and others don’t. For instance, the tag_work function in
usrp_sink_impl.cc “does” sort precisely because get_tags_in_range
doesn’t.

A simple solution for the infrastructure is to require that tags only be
generated from within work(), with offsets corresponding to samples
generated in that call to work(), and in non-decreasing offset order
(though this last requirement could be handled by add_item_tag()). The
developer must then handle the too-late/too-early tag associations
through some other mechanism, such as carrying the effective offset as
part of the tag value.

As far as I’m aware, adding tags from within work is the only safe way
to add tags to a stream. Also, it is required that offsets correspond to
the valid range spanning the buffer of input items passed to work. The
scheduler prunes others outside this range. It’s also worth noting that
although the history mechanism allows viewing past samples (filters use
this, for example), attempting to add tags to samples in history will
not work; those tags will be pruned.

If tags need to be stored for future processing in subsequent calls to
work, it’s up to the programmer to push them onto a stack/queue/whatever
inside the block. The scheduler won’t handle this.

(4) The in-memory stream of tags can produce multiple settings of the
original behavior (absent non-determinism due to asynchrony). This
speed an application might change an attribute of a data source multiple
times before work was invoked. The effect of those intermediate changes
may be visible in the signal, and to lose the fact they occurred by
discarding all but the last change affects both reproducibility and
interpretation of the signal itself.

I agree this is a problem, but I don’t see a workaround as the data
plane (work, streams, etc.) is asynchronous to the control logic. On the
bright side, I believe the USRP source block does associate tuner,
sample rate, etc. changes with an absolute sample in the stream, but
this set of features doesn’t necessarily extend to other hardware data
sources. As for other asynchronous events generating stream tags, I
think the user is stuck dealing with the inevitable latency unless the
data source can produce metadata that is tightly coupled in time and
pass that information along to GNU Radio.

All the above is based on my understanding and expectations of how

Sean

Peter_ASBigot · July 25, 2014, 1:02pm

I’d hoped my comments below would start a more extensive dialog on GNU
Radio’s metadata infrastructure. Several years experience that I have
with this capability in a non-commercial C++ DSP framework suggests many
enhancements in flow, representation, and utilities.

I have a slight itch to contribute to a solution, but without community
involvement can’t hope to provide anything mergable. Is this simply not
something anybody feels needs to be addressed, or did I ask in the wrong
forum?

Peter

Peter_ASBigot · July 18, 2014, 1:17pm

On 07/17/2014 10:04 PM, Nowlan, Sean wrote:

I don’t see this requirement on ordered generation documented. In some
cases, it may be inconvenient to do this, e.g. when a block’s analysis
discovers after-the-fact that something interesting can be associated
with a past sample. Similarly, a user might want a block to associate
a tag with sample that not yet arrived, to notify a downstream block
that will need to process the event.
I don’t think that ordered generation is required per se, but certain blocks
sort and others don’t. For instance, the tag_work function in usrp_sink_impl.cc
“does” sort precisely because get_tags_in_range doesn’t.

My point is really that, because the infrastructure doesn’t sort, only
blocks that are aware of the problem have compensated for it. Other
blocks are dropping data. This could be solved in the infrastructure
with a stable sort in get_tags_in_range or add_item_tags. (If the
latter, then the infrastructure could also diagnose violations of the
offset-must-be-in-valid-range expectation, which might be helpful.)

If tags need to be stored for future processing in subsequent calls to work,
it’s up to the programmer to push them onto a stack/queue/whatever inside the
block. The scheduler won’t handle this.

Thanks; that confirms and is consistent with my expectations.

session faithfully record the contents of the stream so that re-running
with a sample; the best estimate is likely to be the index of the first
sample generated by the next call to work. But depending on processing
speed an application might change an attribute of a data source multiple
times before work was invoked. The effect of those intermediate changes
may be visible in the signal, and to lose the fact they occurred by
discarding all but the last change affects both reproducibility and
interpretation of the signal itself.

I agree this is a problem, but I don’t see a workaround as the data plane (work,
streams, etc.) is asynchronous to the control logic. On the bright side, I believe
the USRP source block does associate tuner, sample rate, etc. changes with an
absolute sample in the stream, but this set of features doesn’t necessarily extend
to other hardware data sources. As for other asynchronous events generating stream
tags, I think the user is stuck dealing with the inevitable latency unless the
data source can produce metadata that is tightly coupled in time and pass that
information along to GNU Radio.

Inaccuracy in identifying the associated sample is something we have to
live with, yes. My argument is that GNU Radio’s stream tag
infrastructure (including storage as metadata) needs to accommodate this
by not dropping tags based solely on offset and key (and value), because
the “duplication” may actually carry information. So an offset-specific
map from key alone is the wrong data structure for tag storage.

With fork and join flows the tag propagation policy might introduce
replications. A candidate workaround is a unique identifier, added
internally by gr::block::add_item_tag, which can be used to identify and
drop redundant tag instances as they’re propagated. That identifier
must be unique across all blocks in the system, not just an
block-specific ordinal, since the tag srcid is optional. It need not be
preserved in archived metadata, though, since at that point we “know”
the tags are complete and unique; new identifiers would be added when
archived tags are replayed as a live stream.

As background: I’m digging into this because I plan to update
gr-osmosdr’s rtlsdr_source so I know the sample rate, frequency, gain,
and collection time of the signal, and (roughly) where they changed.
Mostly because I keep collecting files with captured and processed data
for analysis, and have no idea what parameters I used to generate them.
Preserving metadata with signal data in a single archive package is
really important to me.

Peter

Peter_ASBigot · July 25, 2014, 4:04pm

Hey Peter,

you did come to the right place! And no-one here claims that stream tags
are “finished”. On the other hand, you should realize that you’ve
brought up something pretty specific, and the few people who are
actually qualified to answer all of your questions might be pretty busy.

I would like to invite you to our monthly developer’s calls, where
things like these can be addressed in a more interactive way. Also, if
you believe you’ve found a bug, submit it to our tracker, so we can
track it. If you post your intent to fix it, we’ll be very happy, and
certainly won’t stop you from fixing something! This would also split up
your lengthy email into sizable chunks.

Now, I realize you did not only post what you thought were bugs, but
also suggestions on concepts. Again, a good place to ask those are the
dev calls, and depending on time, IRC is actually a decent place to
discuss this.

That said, a few short comments (probably not what you wanted):

Tag ordering: We simply don’t have a rule that tags need to be
ordered. Blocks that require ordered tags run a quick std::sort()
usually, but most blocks don’t care.
Since we use absolute offsets, that’s usually not a problem
If file_meta_sink doesn’t sort(), that might actually be a bug, and
we’d appreciate an issue (and a fix
If a tag position is ‘inaccurate’, it is still unambiguously mapped to
an item, and it’s up to the dev to handle an ‘inaccuracy’

– M

Peter_ASBigot · July 25, 2014, 3:48pm

Hi Peter,

I agree that this is a very relevant topic, and especially the
performance of tag handling might prove to be problematic soon…
However, it’s a bit hard to start a discussion like that; a lot of
things in GNU Radio are like they are because someone wrote them like
that, and they proved to just work, or if they didn’t, they got
remodeled.
That being said, your mail was very long, and it took me multiple
sessions to read it. I’ve now decided to share my reply as partial as it
is.

sooo let me just whip up a few comments:
(1) That’s a documentation issue, isn’t it? Anyway, I’m not quite sure
you’re right; the insertion of tags is mutexed IIRC, and the
get_tags_in_range() functionality, too, so once the user got his vector
of tags, that won’t change anymore. There’s the possibility that he
misses some of the tags for the range that get inserted after he
got_tags_in_range(), but that’s only fair – it’s quite intuitive not to
insert tags after you’ve handed off the samples to which they would
belong to downstream tags.

(2) that’s an interesting point.

In the current implementation it’s further necessary that tags be
added to an output in monotonic non-decreasing offset order.
Uh, that’s news to me, can you point me to the reason? If a block
assumes things to be ordered, but they aren’t… again, this is not
well-documented, so you’re right for raising this issue!

I’m a bit worried that you always suffer at one end: If tags are always
stored ordered, than inserting tags gains computational complexity, even
if the getter doesn’t need them sorted.

In my opinion we shouldn’t go for the “generate tags only in work()”
because that would increase the complexity of the insertion (inserting
would have to device a check that it’s being called from work, or this
will only be a contract…) and is kind of unnecessary. A block always
(even outside of work()) has access to nitems_written() so it’s always
able to avoid generating tags for samples that might already have been
read downstream.

(3) I don’t see the relation to the discussion, as you said but that
sounds like a bug, so if you opened up a new thread or filed a bug at
gnuradio.org, that would be awesome

(4) I’m fairly certain the buffers use a deque to store tags, not a map
of any kind. So maybe I’m misunderstanding you, or you misread code
somewhere?
I think what you’re describing might be a bug in metadata_filesink, so
that might need some attention! see (3).

(5)>(5) All stream tags are placed in the extras block,
sorry, can’t follow you there. Extras block?

Greetings,
Marcus

Peter_ASBigot · July 25, 2014, 4:05pm

On 07/25/2014 03:46 PM, Marcus M. wrote:

(1) That’s a documentation issue, isn’t it? Anyway, I’m not quite sure
you’re right; the insertion of tags is mutexed IIRC, and the
get_tags_in_range() functionality, too, so once the user got his vector
of tags, that won’t change anymore. There’s the possibility that he
misses some of the tags for the range that get inserted after he
got_tags_in_range(), but that’s only fair – it’s quite intuitive not to
insert tags after you’ve handed off the samples to which they would
belong to downstream tags.

Also, if there’s a race condition bug, we should rather fix this than
force people to only add items in work(). If you (Peter) have a way to
reproduce this, it would help.

M

Peter_ASBigot · July 25, 2014, 4:38pm

On Fri, Jul 25, 2014 at 10:02 AM, Martin B. [email protected]
wrote:

track it. If you post your intent to fix it, we’ll be very happy, and

Tag ordering: We simply don’t have a rule that tags need to be
ordered. Blocks that require ordered tags run a quick std::sort()
usually, but most blocks don’t care.

Since we use absolute offsets, that’s usually not a problem

If file_meta_sink doesn’t sort(), that might actually be a bug, and
we’d appreciate an issue (and a fix

If a tag position is ‘inaccurate’, it is still unambiguously mapped to
an item, and it’s up to the dev to handle an ‘inaccuracy’

– M

Peter,

Basically everything that Martin said. I haven’t even read your email in
its entirety, yet. It was just too much at one time to start any decent
conversation about any of the topics. It’s much easier to address issues
one at a time in an email thread. And yes, use our bug/feature tracker.

And as for tags, we’ve had lots of people contribute complaints, but no
one’s contributed solutions or patches

Tom

Peter_ASBigot · July 25, 2014, 8:24pm

On 07/25/2014 09:36 AM, Tom R. wrote:

busy.
also suggestions on concepts. Again, a good place to ask those are the
we'd appreciate an issue (and a fix :)
in its entirety, yet. It was just too much at one time to start any
decent conversation about any of the topics. It’s much easier to
address issues one at a time in an email thread. And yes, use our
bug/feature tracker.

And as for tags, we’ve had lots of people contribute complaints, but
no one’s contributed solutions or patches

Tom

Thanks for the responses.

I perceive there’s some variation in expectations not only for how tags
should behave, but how they do behave as currently implemented. I’ll
reproduce whatever I see as problematic behavior and create bug reports
so there’s context for targeted discussion among a smaller audience.
Much of the rest of my essay related to capabilities/behavior GNU Radio
doesn’t support them now, but that need to be considered at the
architecture level if they’re ever to be supported. Those points might
be harder to work through, but I’ll do that via issues as well.

I am coming in new here, but my tentative conclusion is that it’ll be
difficult to achieve my goals with the existing tag infrastructure. I
found few obvious uses of it in the baseline repository. Part of my
goal in this discussion was to get a feel for how much it’s being
actively used for real applications (i.e, has the ship sailed/horse left
the barn/pick your metaphor). I still don’t have a sense for that.

My work flow doesn’t accommodate IRC very well but I’ve added #gnuradio
to my chat list (handle: pabigot) and will try to log in whenever I’m
working on topic. How does one find information on the next developers
call? I see some information on
http://gnuradio.org/redmine/projects/gnuradio/wiki/DevelopersCalls but
no schedule for future calls.

Peter

Peter_ASBigot · July 25, 2014, 11:38pm

I’ve added five issues to cover the topics from my original email and
followups.

http://gnuradio.org/redmine/issues/698 proposes that the key of a stream
tag be a namespaced identifier to avoid conflicts between
individually-developed components.

The other four are listed inline below with additional responses to
Marcus:

On 07/25/2014 08:46 AM, Marcus M. wrote:

sooo let me just whip up a few comments:
(1) That’s a documentation issue, isn’t it?

I don’t believe so.

http://gnuradio.org/redmine/issues/699 expands on point (1), that
add_item_tag() can only be safely called from within gr::block::work().

Anyway, I’m not quite sure
you’re right; the insertion of tags is mutexed IIRC, and the
get_tags_in_range() functionality, too, so once the user got his vector
of tags, that won’t change anymore. There’s the possibility that he
misses some of the tags for the range that get inserted after he
got_tags_in_range(), but that’s only fair – it’s quite intuitive not to
insert tags after you’ve handed off the samples to which they would
belong to downstream tags.

I wasn’t concerned about tags across multiple calls to
get_tags_in_range(), but for tags that are added to a stream after one
call to work() and before the end of the next call to work().

In my opinion we shouldn’t go for the “generate tags only in work()”
because that would increase the complexity of the insertion (inserting
would have to device a check that it’s being called from work, or this
will only be a contract…) and is kind of unnecessary. A block always
(even outside of work()) has access to nitems_written() so it’s always
able to avoid generating tags for samples that might already have been
read downstream.

See below at point (4).

(3) I don’t see the relation to the discussion, as you said but that
sounds like a bug, so if you opened up a new thread or filed a bug at
gnuradio.org, that would be awesome

http://gnuradio.org/redmine/issues/700 records point (3), that GRC
parameter callbacks can be invoked multiple times as a result of a
single user action due to the architecture of GUI and other components.

(4) I’m fairly certain the buffers use a deque to store tags, not a map
of any kind. So maybe I’m misunderstanding you, or you misread code
somewhere?
I think what you’re describing might be a bug in metadata_filesink, so
that might need some attention! see (3).

Yes, the lead-in example was specific to the behavior of
file_metadata_sink, but the fundamental requirement goes across the
entire infrastructure. I’m hoping to see support for making that
requirement stick.

http://gnuradio.org/redmine/issues/701 expands on points (2) and (4),
that the infrastructure should promise to maintain all tags inserted by
blocks in their original order (for any sample offset) and should
document the situations where tags may be discarded by the
infrastructure.

(5)>(5) All stream tags are placed in the extras block,
sorry, can’t follow you there. Extras block?

The extras block is described in the section “Extras Information” at:
http://gnuradio.org/doc/doxygen/page_metadata.html

http://gnuradio.org/redmine/issues/702 records points (2) and (5), that
the existing file_meta_source/sink corrupt (IMO) the metadata.

Thanks for your time. I look forward to seeing any feedback on the
issues.

Again I may not have made it clear that I’m not intending to raise these
simply as complaints. I have the necessary software skills to fix them,
and they generally include an outline of the approach I’d propose. At
this time I won’t make any promises, but I’d consider contributing the
necessary effort if I can get enough feedback/sanity-checks/interface
validation/similar assistance from you folks to be confident that it’d
be seen as a worthwhile enhancement.

Peter

Peter_ASBigot · July 26, 2014, 12:37pm

Hi Peter,
I’m on very limited time and internet, so let me just answer things
in-text, shortening where contributing to readability.
On 25.07.2014 23:37, Peter A. Bigot wrote:

I’ve added five issues to cover the topics from my original email and
followups.

http://gnuradio.org/redmine/issues/698 proposes that the key of a
stream tag be a namespaced identifier to avoid conflicts between
individually-developed components.
That’s a good idea, and actually, standarizing the way stream tags and
messages should be named and structured was one of the points of the
last dev call, and is thus work to be done.

I don’t believe so.

http://gnuradio.org/redmine/issues/699 expands on point (1), that
add_item_tag() can only be safely called from within gr::block::work().
And I have to contradict Sean and you
Inserting tags is safe at any time. You get logical errors when you
don’t take care that you don’t attach tags to your stream that belong in
ranges that might already have been processed. This is logical; I do
think it would be a good idea to explicitely state it in the
documentation.
However, there’s no reason not to insert tags anywhere else, just make
sure that nitems_written() is bigger than your inserted tags’ offsets.

I wasn’t concerned about tags across multiple calls to
get_tags_in_range(), but for tags that are added to a stream after one
call to work() and before the end of the next call to work().
I’m not sure I understand you correctly.
If (general_)work gets called, the samples that can be processed within
that call to work have left the upstream block’s work function already.
If that upstream block decides that he wants to insert tags to samples
he produced in a prior call already, my comment to (1) and basic “how
the hell is this going to look causal to the block downstream” applies!

(3) I don’t see the relation to the discussion, as you said but that
sounds like a bug, so if you opened up a new thread or filed a bug at
gnuradio.org, that would be awesome

http://gnuradio.org/redmine/issues/700 records point (3), that GRC
parameter callbacks can be invoked multiple times as a result of a
single user action due to the architecture of GUI and other components.
Thanks! Sounds hard to fix, though, if you say it’s due to architecture.
requirement stick.
It is NOT a requirement (so far). So far it is the rule that everyone
that expects multiple tags will do the sorting. And this makes sense –
computationally. I don’t think we should “break” correctness of existing
blocks by taking away that requirement from the tag readers just to
shift it to the tag producers. So I strongly disagree with you here,
sorry.

http://gnuradio.org/redmine/issues/701 expands on points (2) and (4),
that the infrastructure should promise to maintain all tags inserted
by blocks in their original order (for any sample offset) and should
document the situations where tags may be discarded by the
infrastructure.

Also, I don’t understand the bullet points at the end of your ticket
#701.

preserve order of insertion: GR does this, so how is this an issue?
preserve incoming tag order: I’m not quite sure I get your point, but
if I do: GR does this, so how is this an issue?
never discard a tag at the block that inserted it: Why would someone
discard a tag he inserted it? GR doesn’t discard tags, so how is this an
issue?
clearly document duplicate elimination process: duplicates are not
detected, so how is this an issue?

Seriously, tag insertion is this 2 lines of code
(gnuradio-runtime/lib/buffer_detail.cc):
void
buffer::add_item_tag(const tag_t &tag)
{
gr::scoped_lock guard(*mutex());
d_item_tags.push_back(tag);
}

So how did you come to the conclusion that there might be race
conditions, necessity to call it only from work, elimination and
reordering problems?
All of the problems you raise are logical problems that can – in my
private opinion – best be solved by what should be called a contract,
ie. “do not shoot yourself in the knee by inserting tags to item numbers
that you already processed, and do not read tags from samples you don’t
have yet. If you need your tags sorted, sort 'em”, which basically boils
down to “um, you now, don’t try to circumvent causality by using tags,
and by the way, don’t assume they’re sorted”, which, for the signal
processing guys can further be reduced to “tags aren’t per se sorted.
Usual considerations apply.”

As a serious attempt to get something like a private conclusion to this
discussion, I’d like to point out that so far tag propagation works ok;
the performance of get_tags_in_range leaves a lot of room for concepts
that perform better, but that should be applied under the hood; if there
are two tags at the same item: someone put them there, let’s keep it
like that. If tags are in a certain order: we should keep it like that
(which metadata filesink seems to break, so thanks for pointing that
out!). If you insert tags to items that have been processed in the past:
don’t assume someone else will ever look at them. If you’re expecting
tags to be in a specific order and don’t know who inserted them: sort
em. In most cases, this is a non-issue because tags will usually be
inserted to mark something in a stream, and therefore get the
item-chronological order automatically.

(5)>(5) All stream tags are placed in the extras block,
sorry, can’t follow you there. Extras block?

The extras block is described in the section “Extras Information” at:
http://gnuradio.org/doc/doxygen/page_metadata.html

This is all about the on-disk storage format of the metadata file sink,
so it’s not really related to the way tags are handled. Maybe we’re
misunderstanding each other, but there is a section in the file format
for extras.

http://gnuradio.org/redmine/issues/702 records points (2) and (5),
that the existing file_meta_source/sink corrupt (IMO) the metadata.

Thanks for that bug report!

Again I may not have made it clear that I’m not intending to raise
these simply as complaints. I have the necessary software skills to
fix them, and they generally include an outline of the approach I’d
propose. At this time I won’t make any promises, but I’d consider
contributing the necessary effort if I can get enough
feedback/sanity-checks/interface validation/similar assistance from
you folks to be confident that it’d be seen as a worthwhile enhancement.

Greetings,
Marcus

Peter_ASBigot · July 26, 2014, 1:27pm

I’m confused: are we to discuss these issues on the mailing list, or as
comments on the wiki issues I created? I thought the latter was the
right location. Putting detailed discussion in two places that do not
link to each other is not a good approach.

My comments below address interpretation and philosophy, not technical
issues.

On 07/26/2014 05:36 AM, Marcus M. wrote:

That’s a good idea, and actually, standarizing the way stream tags and
messages should be named and structured was one of the points of the
last dev call, and is thus work to be done.

I’ve updated the issue with my proposed approach. If this is already
being worked and people have an alternative architecture in mind, then
please comment on that issue so I stop thinking about it.

I don’t believe so.

http://gnuradio.org/redmine/issues/699 expands on point (1), that
add_item_tag() can only be safely called from within gr::block::work().
And I have to contradict Sean and you
Inserting tags is safe at any time. You get logical errors when you
don’t take care that you don’t attach tags to your stream that belong in
ranges that might already have been processed. This is logical; I do
think it would be a good idea to explicitely state it in the documentation.
However, there’s no reason not to insert tags anywhere else, just make
sure that nitems_written() is bigger than your inserted tags’ offsets.

I do not see how this allows me to guarantee association of a tag with a
specific absolute sample. I can guess why you want to insert from
outside work(), and could propose compromises, but I’d prefer to do that
at http://gnuradio.org/redmine/issues/699

I wasn’t concerned about tags across multiple calls to
get_tags_in_range(), but for tags that are added to a stream after one
call to work() and before the end of the next call to work().
I’m not sure I understand you correctly.
If (general_)work gets called, the samples that can be processed within
that call to work have left the upstream block’s work function already.
If that upstream block decides that he wants to insert tags to samples
he produced in a prior call already, my comment to (1) and basic “how
the hell is this going to look causal to the block downstream” applies!

I don’t think we disagree on this behavior, only on how to ensure that
the original insertion doesn’t get lost or misordered (and whether
that’s a problem for which the infrastructure should take some
responsibility).

somewhere?
shift it to the tag producers. So I strongly disagree with you here, sorry.
If there is information that is conveyed by receiving tags in a
non-monotonic offset order, I’d like to have more details.

If the concern is that having the infrastructure maintain the tag order
I propose at http://gnuradio.org/redmine/issues/701 will introduce a
performance bottleneck, I’d like to see an argument. Preferably as a
comment on that issue.

Otherwise, since I feel that a deterministic non-decreasing order of
arrival of tags simplifies implementing blocks that process those tags,
if you feel that every block that might take advantage of ordered
arrival should instead enforce it in their work() functions then we do
indeed disagree.

http://gnuradio.org/redmine/issues/701 expands on points (2) and (4),
that the infrastructure should promise to maintain all tags inserted
by blocks in their original order (for any sample offset) and should
document the situations where tags may be discarded by the
infrastructure.

Also, I don’t understand the bullet points at the end of your ticket #701.

preserve order of insertion: GR does this, so how is this an issue?
You missed the “within the same offset”. Yes, I really do want
downstream blocks to be guaranteed that the tags they receive are sorted
by non-decreasing offset. No, I don’t want to make every block that
inserts or receives tags have to do the sorting.

preserve incoming tag order: I’m not quite sure I get your point, but
if I do: GR does this, so how is this an issue?

never discard a tag at the block that inserted it: Why would someone
discard a tag he inserted it? GR doesn’t discard tags, so how is this an
issue?

The implementation of file_metadata_sink discards tags, as described.
This suggests that GNU Radio as a holistic system does not have an
architectural requirement that metadata be preserved. I’d like to see
that resolution adopted.

From the rest of your mail I get the impression that our expectations
of what the framework should provide versus what individual component
developers are responsible for handling are at odds. If the consensus
of the GNU Radio development team is that the solution to all this is to
clarify and document the existing behavior with no framework changes at
all, then most of my issues can be closed once that’s done. Ten years
experience with systems named X-Midas and Midas 2k, which do much of
what GNU Radio does, leads me to believe that would have significant
long term costs in development effort and system reliability.

Peter

Peter_ASBigot · July 26, 2014, 2:24pm

Hi Peter,

On 26.07.2014 13:25, Peter A. Bigot wrote:

I’m confused: are we to discuss these issues on the mailing list, or as
comments on the wiki issues I created? I thought the latter was the
right location. Putting detailed discussion in two places that do not
link to each other is not a good approach.
While I agree, I think the bug tracker might in the first place not have
been the right choice for “architectural ideas”, which fit better here.

That’s a good idea, and actually, standarizing the way stream tags and
messages should be named and structured was one of the points of the
last dev call, and is thus work to be done.

I’ve updated the issue with my proposed approach. If this is already
being worked and people have an alternative architecture in mind, then
please comment on that issue so I stop thinking about it.
Um, that’s being worked on, but please don’t take it personal when I say
that I think the guys that have been using tags in the GNU Radio project
should be the ones to agree on a way to standarize things, and do that
at their own pace; I’m pretty sure your input will be valued, though!
There have been things in discussion, and I guess there will be some
communication regarding this soon, so I think it’s safe to say it
doesn’t hurt if you don’t spent too much time on this.

I do not see how this allows me to guarantee association of a tag with a
specific absolute sample.
That’s always guaranteed - a Tag is a (offset, PMT) tuple.

I don’t think we disagree on this behavior, only on how to ensure that
the original insertion doesn’t get lost or misordered (and whether
that’s a problem for which the infrastructure should take some
responsibility).
I agree on disagreeing on whether this needs to be dealt with by the
infrastructure enforcing this. Especially on the “introducing a complex
restrictionary functionality just to stop people from doing logically
wrong things”.

Basically, we’re giving people vectors of void* to write samples to – I
think our users do pretty well at not writing to arbitrary memory
locations, so we should trust ourselves not to produce tags for samples
that happened in the past. If there is reason to do so, let them do so.
I’m pretty confident that programmers that use tags are aware of the
fact that the samples flow through their flow graph and that downstream
blocks don’t care about items that they already processed, and that they
can’t magically take into account tags you add while they are processing
the samples that these tags are related to.

If there is information that is conveyed by receiving tags in a
non-monotonic offset order, I’d like to have more details.
I don’t think there is. But I can’t rule it out, either. So it’s a
undecided, but applications might be relying on current behaviour.

If the concern is that having the infrastructure maintain the tag order
I propose at http://gnuradio.org/redmine/issues/701 will introduce a
performance bottleneck, I’d like to see an argument. Preferably as a
comment on that issue.
sorry to break your preference, but sorted inserting means that instead
of constant time insertion you get time dependent on amount of tags, no
arguing that, unless you find a better insertion algorithm for a sorted
list

Otherwise, since I feel that a deterministic non-decreasing order of
arrival of tags simplifies implementing blocks that process those tags,
agree on the simplifying aspect.
if you feel that every block that might take advantage of ordered
arrival should instead enforce it in their work() functions then we do
indeed disagree.
So, indeed, we disagree.

Also, I don’t understand the bullet points at the end of your ticket
#701.

preserve order of insertion: GR does this, so how is this an issue?
You missed the “within the same offset”. Yes, I really do want
downstream blocks to be guaranteed that the tags they receive are sorted
by non-decreasing offset. No, I don’t want to make every block that
inserts or receives tags have to do the sorting.
GR does this, it’s just appending to a deque. How would that mix up the
order later on? If you do a std::stable_sort, the original order will
also be kept. One more reason to let the user do this himself: He can
decide whether stable_sorted, sorted or unsorted tags suffice.

never discard a tag at the block that inserted it: Why would someone
discard a tag he inserted it? GR doesn’t discard tags, so how is this an
issue?

The implementation of file_metadata_sink discards tags, as described.
file_metadata_sink is not infrastructure, so I think we’re mixing up
subjects again. I was hoping you’d pack all metadata sink-related issues
into one bug report and keep the discussion purely on the GNU Radio
framework as is.

This suggests that GNU Radio as a holistic system does not have an
architectural requirement that metadata be preserved. I’d like to see
that resolution adopted.
Metadata is preserved within the running framework. Ok, storage seems to
break this in some minor aspects, but I fail to see how that’s affecting
the way GNU Radio applications need to be designed.

From the rest of your mail I get the impression that our expectations of
what the framework should provide versus what individual component
developers are responsible for handling are at odds.
Me too
If the consensus
of the GNU Radio development team is that the solution to all this is to
clarify and document the existing behavior with no framework changes at
all, then most of my issues can be closed once that’s done.
Wow, no, this is not a consensus. You got replies of four people so far,
and I think Martin, Tom and I made it clear how we feel, as persons,
about this, and that we all hope this discussion yields approaches that
can be discussed
Ten years
experience with systems named X-Midas and Midas 2k, which do much of
what GNU Radio does, leads me to believe that would have significant
long term costs in development effort and system reliability.

Peter, please don’t feel like your input isn’t valued! This discussion
has shown that at least you and I disagree on what should be enforced by
a framework. GNU Radio is constantly changing, and thus, we should
definitely look at the way tags are handled today for the GNU Radio of
tomorrow.

I think much of our time went into analysis of each other’s mails;
that’s a bad thing, and I’d love to blame you for mixing in the metadata
filesink so casually that half of the infrastructure points are mixed up
with filesink issues, but that wouldn’t honor the fact that I just went
along with it and didn’t as you to consider the filesink and the tags
infrastructure as separate things.

I got the feeling that this is exactly the discussion you were hoping
for: Less on a “this is a bug, needs fixin’” level, more on a “what are
the pros and cons of (not) doing A and B” level. Now you have that
discussion, and I disagree with you, shouldn’t that encourage you?

I really do think the “unordered tag retrieval” thing is something that
deserves discussion. For how things are now, my opinion is “let the user
deal with that”. For how things can be with 3.8 or even later versions,
this is something we should put attention too, and that’s why everyone
is encouraging you to partake in the hangout.

From my side, I really strongly dislike the way get_tags_in_range works:
iterating through all tags, comparing each offset, filling the result
vector with the matching tags, each time the function is called.
Corollary, I’d like to use internally ordered tag storage, assuming that
insertion happens less often than extraction, this should yield an
enormous speed gain, but it would break GNU Radio as it is now.

However, many times in this discussion, I’ve got the feeling that you
liked to mix up things that are wrong now (which is the filesink
stuff, mainly) with things that you architecturally disagree with (if
tags should be ordered on retrieval) with things that would need
attention if things were different than they are, with assumptions you
made based on Sean’s email (which wasn’t wrong, by the way, but all the
“there’s potential for a race condition” discussion really lacked all
background, if you read the code; I don’t see the pruning Sean mentions,
though, so maybe I should ask Sean where that happens – most likely
while tag propagation over buffers). Maybe that’s a reason for a slight
frustration on my side when you mention your 10 years of experience – I
don’t have that, but I feel like we were trying to engage in a fruitful
discussion, and yet you pull the “I’ve seen things”-Joker. That doesn’t
work well when you’re actively discussing architecture with people of
years of experience with that exact framework you are looking at, and
these people nevertheless encourage you to specify your concerns, file
bugs, take part in community meetings etc.

Greetings,
Marcus

Peter_ASBigot · July 26, 2014, 4:00pm

Alright, guys, I think we’re officially done with this thread!

Seriously, while there’s been good conversation here, there’s too much
going on to make any sane judgement on technical arguments. Let’s move
any
discussion over to the issues tracker where Peter set them up and
discuss
there.

Thanks!
Tom

On Sat, Jul 26, 2014 at 8:22 AM, Marcus Müller
[email protected]

Peter_ASBigot · July 26, 2014, 4:54pm

On 07/26/2014 08:59 AM, Tom R. wrote:

Alright, guys, I think we’re officially done with this thread!

Seriously, while there’s been good conversation here, there’s too much
going on to make any sane judgement on technical arguments. Let’s move
any discussion over to the issues tracker where Peter set them up and
discuss there.

Thanks!

Thank you. To close the loop: I’ve added comments to
http://gnuradio.org/redmine/issues/699 and
http://gnuradio.org/redmine/issues/701 that include links to Marcus’
email with my response in excerpted context. Follow-ups there.

Peter

Peter_ASBigot · July 26, 2014, 5:52pm

On 07/25/2014 08:22 PM, Peter A. Bigot wrote:

My work flow doesn’t accommodate IRC very well but I’ve added #gnuradio
to my chat list (handle: pabigot) and will try to log in whenever I’m
working on topic. How does one find information on the next developers
call? I see some information on
http://gnuradio.org/redmine/projects/gnuradio/wiki/DevelopersCalls but
no schedule for future calls.

Calls are every third Thursday of the month. If there’s no agenda up,
that usually means we haven’t decided what we want to talk about.

M