Proposed Offline Tools: Header Generation, File Manipulation and Format Conversion

To use GNURadio as a data analysis platform, I’ve been developing a few
tools which I think will be useful for the general community. These are
primarily Python/command line tools intended to manipulate prerecorded
files. I’m airing them out here for some feedback to ensure I’m not
duplicating work and get helpful suggestions. I’ll make this a feature
request on GR.org if people have sufficient interest.

Header Generation:
A user may record raw RF samples to disk but have metadata collected
(frequency, sample rate, time of recording, etc) in a notebook, text
file, etc. There should be a way to generate headers to add metadata
after the fact. To this end, I’ve developed gr_mkheader which generates
detached headers given some command line inputs. This task could be
extended to manipulating existing headers.

Data File Types:
Currently, either one must know a-priori the file type of raw data, or
interpret three variables (Complex,item size, and type) if headers are
available. It is proposed these types be rolled up into a more succinct
format, such as is used in UHD [1]. These file types should also be
interpreted by gr_read_file_metadata. For example,
(Complex=true,itemsize=4,type=short) should be displayed as interleaved
shorts (sc16). There should be a table in the documentation describing
all (or common) combinations for easy reference.

File Truncation with Headers:
For long records, it is often desirable to look at a subset of the data.
While there are certainly ways to do this in GRC/Python, to my knowledge
there is no header support for this. Additionally, a different GRC
flowgraph is required for every data type. The desired program would
read a file with recorded samples and metadata and generate the subset
with proper headers. It would also take a output type and perform file
conversion. As a special case, this program can convert between file
types by asking for the entire data set instead of a strict subset. I’m
in the process of developing a program to do this.

Do the file conversion routines currently support metadata? E.g. do
they update the data type/item size tag? My understanding is that
detached headers must start immediately in the file. If we use a
head/skip N block on a stream with metadata information, will the new
headers be interpolated based on the closest header to the desired
sample subset?

[1]

On 01/12/2015 02:39 PM, Garver, Paul W wrote:

To use GNURadio as a data analysis platform, I’ve been developing a
few tools which I think will be useful for the general community.
These are primarily Python/command line tools intended to manipulate
prerecorded files. I’m airing them out here for some feedback to
ensure I’m not duplicating work and get helpful suggestions. I’ll
make this a feature request on GR.org if people have sufficient
interest.

Paul,

this sounds very useful. I suggest doing all the dev work in an OOT,
that’ll give it immediate exposure. In the long run, that might even be
a better way to disseminate your code than merging it into GR, or maybe
not – time will tell. Once you have something running, post your OOT
details to PyBOMBS, and the tools will be part of the GNU Radio
ecosystem. We can still merge them into the mainline when the code has
gotten some testing.

I like the idea of this being a Python library rather just command-line
executables, so Python scripts can make use of them instead of having to
invoke shells.

Some more comments:

Header Generation: A user may record raw RF samples to disk but have
metadata collected (frequency, sample rate, time of recording, etc)
in a notebook, text file, etc. There should be a way to generate
headers to add metadata after the fact. To this end, I’ve developed
gr_mkheader which generates detached headers given some command line
inputs. This task could be extended to manipulating existing
headers.

Indeed sounds very useful. Manipulating headers also sounds very useful.

Data File Types: Currently, either one must know a-priori the file
type of raw data, or interpret three variables (Complex,item size,
and type) if headers are available. It is proposed these types be
rolled up into a more succinct format, such as is used in UHD [1].
These file types should also be interpreted by gr_read_file_metadata.
For example, (Complex=true,itemsize=4,type=short) should be displayed
as interleaved shorts (sc16). There should be a table in the
documentation describing all (or common) combinations for easy
reference.

We kind of have that, since you specify a single type in block IO
signatures. I’m not entirely sure when this would be necessary, but I’m
happy to be convinced otherwise.

File Truncation with Headers: For long records, it is often desirable
to look at a subset of the data. While there are certainly ways to
do this in GRC/Python, to my knowledge there is no header support for
this. Additionally, a different GRC flowgraph is required for every
data type. The desired program would read a file with recorded
samples and metadata and generate the subset with proper headers. It
would also take a output type and perform file conversion. As a
special case, this program can convert between file types by asking
for the entire data set instead of a strict subset. I’m in the
process of developing a program to do this.

Kind of like sox for GR files, sounds useful.

Cheers,
M