UHD USRP Source for B2x0 overflows File Sink

How would I best set up a UHD Source block for USRP B2x0 devices to
output to a flowgraph that uses a File Sink block to write to disk
without overflows (and how would I best set up the File Sink block)?

This is the attached system hardware, dedicated to GR…
Gigabyte GB-BXi7-4770R Brix Pro PC
Crucial Ballistix Sport 2x8=16GB RAM DDR3L-1600 (PC3L-12800) CL9 Timing
9-9-9-24
Samsung 840 EVO 250GB SATA-III internal 2.5" SSD
…so a lot of overkill on hardware resources, but no knowledge on how
to set up the UHD Source block or File Sink block to take advantage of
this.

I am attempting to record data at 16Ms/s on the USRP and 8Ms/s to the
disk but I would like to take this even higher (perhaps as high as 28
Ms/s on the USRP or as fast as I can get it) if possible with a fast
enough system.

Fedora 20 with LXDE (uname -r is 3.18.9-100.fc20.x86_64)
UHD is 3.8.1
GR is 3.7.6.1

My flowgraph has:
a. UHD USRP Source with master clock 32M, sample rate 16M, wire format
auto, output type complex float32, and a device argument of
num_recv_frames=512.
b. Polyphase Decimator with decim 2, 175 taps from
firdes.low_pass_2(), FFT rotator and filters.
c. Constant multiply by 32768 with type complex.
d. Complex to IShort with vector output yes.
e. File Sink with input type short, vector length of 2, unbuffered on
(there’s a confusing parameter name, which I am not sure actually does
anything), and append file overwrite.

Thanks,

  • John

If you want high file-write performance, leave it in buffered mode.

Also, a 175-tap filter, running at 16Msps is going to chew up a lot of
CPU.

How about a simple low-pass filter, decim=2? Make the transition
bandwidth fairly sloppy.

On 2015-05-07 12:48, Murphy, John wrote:

this.
My flowgraph has:

Thanks,

  • John

Discuss-gnuradio mailing list
[email protected]
Discuss-gnuradio Info Page [1]

Links:

Or alternatively, just run the USRP at your desired sample-rate into the
file-sink.

On 2015-05-07 12:48, Murphy, John wrote:

this.
My flowgraph has:

Thanks,

  • John

Discuss-gnuradio mailing list
[email protected]
Discuss-gnuradio Info Page [1]

Links:

On Thu, May 7, 2015 at 1:01 PM, [email protected] wrote:

If you want high file-write performance, leave it in buffered mode.

Also, a 175-tap filter, running at 16Msps is going to chew up a lot of CPU.
How about a simple low-pass filter, decim=2? Make the transition bandwidth
fairly sloppy.

Or alternatively, just run the USRP at your desired sample-rate into the
file-sink.

I presume you mean buffered mode on the File Sink? With the wording of
the param, I can not understand wether buffered mode is “Unbuffered =
off” or “Unbuffered = on”. Do you know which it is? Also the doc page
indicates this param is not passed to the make function, so I am not
even sure it is really used.

Transition bandwidth is sloppy, double the (sample rate minus
two-sided passband width), or in this case something on the order of
1/4 the input sample rate.
CPU usage is 7% (total for all processes on the machine) with this
whole flowgraph running. It is a polyphase filter which is more
efficient, although with decim only 2 my guess is there are only two
polyphase arms so the FIR decim only used a CPU of about 9% for the
same set of taps. And same buffer overrun problem with the FIR decim
as the PFB. But still.

I and my co-wrokers have all these same issues every time we try this
for the past year with flowgraphs with just the two blocks… USRP
Source and File Sink. So I am not sure the filter or the CPU usage is
an issue at all here.
To find out, I just now tried it again, disabled everything but the
USRP Source and the File Sink, changing the USRP output format to
complex int16, and within 90 seconds I got an overrun. The CPU usage
in this mode is about 3% or 4% total.

Also note I am talking about the flowgraph running fine for a minute
or so then getting a handful (1, maybe 3 or 4) 'O’s. Then another
minute or so with another burst of O’s. Etc. It is not a huge thing
but when you are doing stuff like testing synchronizers or signal
recovery it is enough to be a problem. On something like GPS playback
for a test bench the hiccups make it completely unworkable.

  • John

Leave “unbuffered” = OFF. This flag was added for “slow” file-sinks,
when, for example, you’re writing slow data to an external process via
somehthing like FIFO, and you don’t want the default stdio buffering to
get in the way. The default, if you leave it off, is to use stdio
buffering.

The “bursts of O” behaviour is to be expected, depending on how much RAM
the kernel allocates to write-behind buffers, and how fast your disk
subsystem is. Your writes get posted to the kernel. The kernel throws
them into a usually-immense, write-behind cache. When its I/O scheduler
decides its time to commit those writes, things slow down.

You might try also using num_recv_frames on the device arguments (if you
aren’t already), to try and smooth out the flow from the USB controller.

You might also compare what you’re doing with something that doesn’t use
Gnu Radio at all, such as rx_samples_to_file.

On 2015-05-07 14:01, Murphy, John wrote:

off" or “Unbuffered = on”. Do you know which it is? Also the doc page
same set of taps. And same buffer overrun problem with the FIR decim

Also note I am talking about the flowgraph running fine for a minute
or so then getting a handful (1, maybe 3 or 4) 'O’s. Then another
minute or so with another burst of O’s. Etc. It is not a huge thing
but when you are doing stuff like testing synchronizers or signal
recovery it is enough to be a problem. On something like GPS playback
for a test bench the hiccups make it completely unworkable.

  • John

How would I best set up a UHD Source block for USRP B2x0 devices to output to a
flowgraph that uses a File Sink block to write to disk without overflows (and how
would I best set up the File Sink block)? This is the attached system hardware,
dedicated to GR… Gigabyte GB-BXi7-4770R Brix Pro PC Crucial Ballistix Sport
2x8=16GB RAM DDR3L-1600 (PC3L-12800) CL9 Timing 9-9-9-24 Samsung 840 EVO 250GB
SATA-III internal 2.5" SSD …so a lot of overkill on hardware resources, but no
knowledge on how to set up the UHD Source block or File Sink block to take
advantage of this. I am attempting to record data at 16Ms/s on the USRP and 8Ms/s
to the disk but I would like to take this even higher (perhaps as high as 28 Ms/s
on the USRP or as fast as I can get it) if possible with a fast enough system.
Fedora 20 with LXDE (uname -r is 3.18.9-100.fc20.x86_64) UHD is 3.8.1 GR is
3.7.6.1 My flowgraph has: a. UHD USRP Source with master clock 32M, sample rate
16M, wire format auto, output type
complex float32, and a device argument of num_recv_frames=512. b.
Polyphase Decimator with decim 2, 175 taps from firdes.low_pass_2(), FFT
rotator and filters. c. Constant multiply by 32768 with type complex. d.
Complex to IShort with vector output yes. e. File Sink with input type
short, vector length of 2, unbuffered on (there’s a confusing parameter
name, which I am not sure actually does anything), and append file
overwrite.


Discuss-gnuradio mailing list
[email protected]
Discuss-gnuradio Info Page [1]

Links:

On Thu, May 7, 2015 at 2:01 PM, Murphy, John [email protected]
wrote:

Transition bandwidth is sloppy, double the (sample rate minus
two-sided passband width), or in this case something on the order of
1/4 the input sample rate.

Okay, actually I do have a tighter width, because with the decimation
by 2 it wraps around 1/4 rate instead of 1/2 rate. So with a 6dB point
at fs/4 x 31/32 the transition width is around 1/16 of the input
sample rate.
I could move the 6dB point, but I have the overruns at about the same
rate with or without the filter CPU usage.

For the record and completeness I tried again with just the complex
int16 USRP Source and File Sink, setting “Unbuffered = off”, and it
still bahaved the same. It may matter but it is not enough to make a
noticeable difference in this case.

  • John

The sequential rates I gave are the published rates for the SSD. Maybe
(probably?) specsmanship, sure.
But since it does mostly keep up, isn’t this a case of just needing
the correct buffer set-up to allow it to ride through the worst of the
hiccups?

I am going to have to find and figure out how to run
rx_samples_to_file before I can let you know if it makes any
difference.

  • John

On Thu, May 7, 2015 at 3:07 PM, [email protected] wrote:

How did you test your sequential-write rate?

Writing files that are less than the current write-behind buffer size in the
kernel will give you a very false sense of how fast your disk subsystem is.

Hi Marcus,

I am using num_recv_frames=512 but I have no idea why 512 or what the
ideal value should be for a system that has a lion’s share of 16 GB of
RAM to burn.
In terms of the disk hardware sequential writes are up to 520
MBytes/sec. While there may still be some moments where things fall
behind, I expect some buffer somewhere, if properly set up (which I
believe is the issue but I do not know where or how), to be able to
take up the slack and prevent overruns.
Surely there must be poeple that do this all the time without O’s,
given a proper setup?

  • John

So /dev/null works, I do not know what that really says about this
though. Is there a difference between using dev/null and just running
any non-disk-write flowgraph? Because I know I can run a flowgraph at
16 MS/s decimated to 8 MS/s, with never a single O even for hours of
operation.
With 16 GBytes of RAM one can’t somehow in GR buffer up the 64
MBytes/sec data flow during one of those hiccups?
What do all those “min output buffer” and “max output buffer” params
on the advanced tabs of the blocks do?

  • John

On Thu, May 7, 2015 at 3:43 PM, [email protected] wrote:

I looked at their blurb on that drive, and its sustained rate comes out to
about 69Mbyte/second. Sure, it’ll take bursts at screaming-fast rates,
because, like the Linux kernel, it has a whacking great write-behind buffer.

It’ll be under /usr/local/lib{64}/uhd/examples

I looked at their blurb on that drive, and its sustained rate comes
out to about 69Mbyte/second. Sure, it’ll take bursts at screaming-fast
rates, because, like the Linux kernel, it has a whacking great
write-behind buffer.

Try specifying a filename of “/dev/null”, which will bypass your disk
subsystem entirely, and give you some idea of what you can sustain in
the absence of actual disk-subsystem writes.

On 2015-05-07 15:32, Murphy, John wrote:

Links:

Marcus et al,

Had to drop this to do some work on another project yesterday, but
still want to pursue this just a little further if you don’t mind,
because the numbers you are giving all look to me like it should be
able to be made to work.

You found my SDD sequential sustained write speed of 69 MBytes/sec.
If I attempt to save data at 14 MSamp/sec to disk with complex 16-bit
integers I believe there is an average long-term rate of 56 MBytes/sec
going to the disk.

So I am not understanding - it seems to me like I have plenty of
sustained throughput overhead to make this work, with the right
buffering to take up the temporary slack.

With 16 GBytes of RAM (the system is using some, but still) I would
expect that I can buffer up something like 4 minutes of data at the
required 56 MBytes/sec rate - seems like with the proper setup there
should be plenty of capability to ride through whatever other kernel
operations etc are momentarily stalling the disk writes.

Thanks, I appreciate you taking the time and list bandwidth to help me
understand this,

  • John

Date: Thu, 07 May 2015 17:35:57 -0400
From: “Marcus D. Leech” [email protected]

The basic problem is that if the long-term-average offered-load on 

your
write medium (your SSD in this case) is higher than it can sustain,
it
doesn’t
matter how much buffering you add in front of it, eventually, the
piper has to be paid. Buffering is useful to meeting short-term
short-falls in
throughput capacity. They cannot help if offered load, on
average,
exceeds the long-term capacity of the resource. Now, having said
that,
if you only need to record for a short time, consider adding more
RAM, and creating a ramdisk, then stage the ramdisk out to your hard
disk.
But this ONLY WORKS if you don’t need to record continuously,
otherwise, you’re back to the buffering issue…
But at 8Msps, and 4-bytes per sample, that’s 32Mbyte/second, you
have
about 30 seconds/gigabyte.

On 05/07/2015 04:12 PM, Murphy, John wrote:

So /dev/null works, I do not know what that really says about this
though. Is there a difference between using dev/null and just running
any non-disk-write flowgraph? Because I know I can run a flowgraph at
16 MS/s decimated to 8 MS/s, with never a single O even for hours of
operation.
Writing to /dev/null actually exercises your kernel write code.

Just sticking a “null sink” in your flow-graph doesn’t, so you don’t get
as much information out of the experiment.

With 16 GBytes of RAM one can’t somehow in GR buffer up the 64
MBytes/sec data flow during one of those hiccups?
What do all those “min output buffer” and “max output buffer” params
on the advanced tabs of the blocks do?

  • John
    The kernel is already doing that–Linux kernels have always used RAM for
    write-behind buffering unless there’s other pressure on it, like
    lots of disjoint processes with large working sets.

The buffering parameters that you refer to are to do with the Gnu Radio
scheduler, and have little application to the problem at hand.

The basic problem is that if the long-term-average offered-load on your
write medium (your SSD in this case) is higher than it can sustain, it
doesn’t
matter how much buffering you add in front of it, eventually, the
piper has to be paid. Buffering is useful to meeting short-term
short-falls in
throughput capacity. They cannot help if offered load, on average,
exceeds the long-term capacity of the resource. Now, having said that,
if you only need to record for a short time, consider adding more
RAM, and creating a ramdisk, then stage the ramdisk out to your hard
disk.
But this ONLY WORKS if you don’t need to record continuously,
otherwise, you’re back to the buffering issue…

But at 8Msps, and 4-bytes per sample, that’s 32Mbyte/second, you have
about 30 seconds/gigabyte.

On 05/09/2015 09:02 AM, Murphy, John wrote:

going to the disk.

Thanks, I appreciate you taking the time and list bandwidth to help me
understand this,

  • John

You could try a simple experiment that tests your disk subsystem write
performance without SDR stuff at all. Something like:

time dd if=/dev/zero of=some-file-on-your-disk bs=2000000 count=15000

This should write a 30GB file. The ‘time’ command will report how long
it took. Then it’s just a simple matter of math to figure out what your
achievable long-term write rate is.

Date: Sun, 10 May 2015 17:18:58 -0400
From: “Marcus D. Leech” [email protected]
You could try a simple experiment that tests your disk subsystem write
performance without SDR stuff at all. Something like:
time dd if=/dev/zero of=some-file-on-your-disk bs=2000000 count=15000
This should write a 30GB file. The ‘time’ command will report how long
it took. Then it’s just a simple matter of math to figure out what your
achievable long-term write rate is.

[jmurphy@localhost Documents]$ /usr/bin/time dd if=/dev/zero
of=./thrutest bs=2000000 count=15000
15000+0 records in
15000+0 records out
30000000000 bytes (30 GB) copied, 54.9445 s, 546 MB/s
0.00user 15.17system 0:54.95elapsed 27%CPU (0avgtext+0avgdata
3964maxresident)k
96inputs+58593760outputs (0major+564minor)pagefaults 0swaps

That is 30GBytes written in about 55 seconds. That is 545 MBytes per
second, which it actually says in the command response.

So… I still do not see any long term or average throughput problem
here. Although even if it had said 69 MBytes/sec I would still think
that is enough to handle 56 MBytes/sec average rate under the right
circumstances.
So… do I now need to do some sort of buffer setup to handle the data
flow during periods the system is too busy to write to the disk, or
the disk is too busy to be written to?
I already have the UHD USRP Source with num_recv_frames=1024 (have
also tried 512). In this version of UHD (3.8.1) you have to enter this
in the Device Arguments param space without quotes.
I also see the advanced tab of all the blocks that has the minimum and
maximum output buffer sizes.
But… I know nothing about how any of these work, certainly not in a
quantitative sense anyhow.

Or am I still missing some reason why this will never work at all?

  • John

John M.
[email protected]

I have 5 blocks in my flowgraph:
USRP Src -> PFB Decim by 2 -> Mult 32768 -> Cplx-iShort -> FileSink
I am running 14 MS/s x 2 x 32 bits from the USRP, 7 MS/s x 2 x 16 bits
to the disk.
Previously, even setting the USRP num_recv_frames=512 or 1024, it
would run for about a minute fine then print 1-4 "O"s then run for
another 45 seconds (all estimated) and print another 1-4 "O"s etc.
Changing the USRP to output iShorts at half the rate and connecting
directly to the FileSink made no difference in this behavior. Even
halving the rate actually made no real difference in this in the few
attempts I made at that. Doubling the rate on the other hand did cause
the expected breakage.

I just tried this change… on 4/5 blocks (all except the FileSink)
set the advanced tab “Min Buffer Size” to 2097152 instead of leaving
this at zero which accepts the defaults assigned by GNU Radio
(presumably the scheduler does this). Gut feel that it would be best
to distribute this rather than concentrate it all at one end or the
other of the chain. No idea if that is correct, or if it makes a
difference one way or the other at all, toss a coin. Likewise no idea
wether there is a better setting than num_recv_frames=1024 which is
what I stuck with.

Actually, I first tried to set all the Min Buffer Size to 2^31 (2 Gig)
instead of 2^21 (2 Meg) but that generated a runtime error when it
tried to allocate those on starting the flowgraph (even though it is
only 10 GB? on a machine with 16 GB that hardly uses it for anything
else).

So… when I tried this with 2097152’s it ran for at least 4-5 minutes
without a single “O” and generated a file of 7,337,295,872 bytes on
disk (or 7,337,291,776 total, whatever). My belief, not yet tested, is
that I could have left it like and it would have continued along those
lines that until it ran out of disk space. Which is the desired
behavior (not running out of disk space, just not generating overrun
errors in the recorded data).

So I believe adjusting buffering does work for flawless recording of
USRP output to a disk within the rate the disk can handle, and that
"O"s are not just a fact of life one must live with. But I still have
no idea what all this is doing quantitatively or wether there is a
better or more efficient way to allocate this buffer usage.

And that is something that would be nice to know, so if you have more
blocks in your graph how would go about figuring out what to change
this to for that, etc? Must be something far better than trial and
error to apply here.

John M.
[email protected]