USRP1 problem, UHD problem, and one suggestion

dubstep · May 27, 2011, 2:30am

USRP1:

When we have a very simple flowgraph with a USRP1 sink connected to a
signal source and a USRP1 source connected to a WX scope- trying to shut
down the app using the close box causes the USB on the host system to
freeze up requiring a reboot. Yanking USRP power or ctrl+c’ing avoids
this problem. This problem exists on many flowgraphs, both GRC generated
and not- as far as I know it is limited to flowgraphs with both USRP1
source and sink. This is a serious problem that has hit us on multiple
platforms and machines and causes unnecessary reboots. It is honestly an
unacceptable bug.

USRP2 / UHD:

With a similar flowgraph to the one above, changing the secs/div
causes the various traces to change phase relative to one another but
this doesn’t happen when a USRP1 source is substituted. However, I
believe this is indicative of a deeper problem. We also see with the
same flowgraph and a slider that controls both the TX and RX frequencies
simultaneously, the flowgraph gets into a place where it seems to be
getting data but it no longer represents the state of what’s coming in
and we also see the phase slippage. Long story short, create a flowgraph
with a UHD (USRP2) sink and source with a siggen at a fixed
frequency/amplitude, a wx scope, and a slider that sets the TX+RX
frequencies to the slider value. Direct connect the TX to the RX with an
SMA cable. Run the flowgraph and move the slider. At least on LFTX/RX
this seems to give various results. Do the same thing with a USRP1 for
comparison. To me it seems like UHD is losing data or the various paths
in the flowgraph get out of whack with eachother. There were no O’s or
U’s printed.

We were trying to do a simple VNA in UHD and it just doesn’t work as
expected, but switching it all over to a USRP1 its fine and dandy.

On a general note- I think there should be two new block sets added:

A simple source block that provides samples in the appropriate format
(float, complex, etc depending on the _f / _c etc) which generates as
fast as possible and counts how many it generates in a second which gets
output on a float output.
The same thing but a consumer.

The idea being it would help diagnose blocks that end up putting out
more or less data than they take in and whose decimation/interpolation
rates aren’t apparent. For instance, I have a decimating filter block
that appears to actually be producing more samples than it takes in,
causing the data to show up almost 30 seconds later on the scope which
is set at the source’s data rate. I’d love to put the timed consumer and
timed provider blocks on either side and see how the in/out amounts
compare.

Brett_LSTrotter · May 27, 2011, 3:07am

On Thu, 2011-05-26 at 19:29 -0500, Brett L. Trotter wrote:

USRP1:

When we have a very simple flowgraph with a USRP1 sink connected to a
signal source and a USRP1 source connected to a WX scope- trying to shut
down the app using the close box causes the USB on the host system to
freeze up requiring a reboot. Yanking USRP power or ctrl+c’ing avoids
this problem. This problem exists on many flowgraphs, both GRC generated
and not- as far as I know it is limited to flowgraphs with both USRP1
source and sink. This is a serious problem that has hit us on multiple
platforms and machines and causes unnecessary reboots. It is honestly an
unacceptable bug.

UHD or gr-usrp? What OS? What version of libusb? Do all USB devices
“freeze” or just the USRP? Does power cycling the USRP un-freeze it?
This is definitely not something I’ve seen before.

with a UHD (USRP2) sink and source with a siggen at a fixed
frequency/amplitude, a wx scope, and a slider that sets the TX+RX
frequencies to the slider value. Direct connect the TX to the RX with an
SMA cable. Run the flowgraph and move the slider. At least on LFTX/RX
this seems to give various results. Do the same thing with a USRP1 for
comparison. To me it seems like UHD is losing data or the various paths
in the flowgraph get out of whack with eachother. There were no O’s or
U’s printed.

If you lose samples somewhere in the chain, which can happen, the TX and
RX paths will change their relative alignment. Have you tried reducing
your sample rate, or the refresh rate on the graphical sink? The various
graphical sinks can be very CPU-intensive.

It is generally not a great idea to rely on the TX and RX paths staying
aligned with respect to each other all the time. The fact that the USRP1
seems to in your test is a bonus, but I wouldn’t rely on that going
forward either.

If you require the TX and RX paths to maintain a fixed relationship, the
USRP2 with UHD will let you use timed samples to achieve this down to
10ns. You could also align your TX and RX paths in your application,
using known TX waveforms to correlate the RX against. This approach
probably fits best into your design flow (don’t have to code C++ with
UHD).

–n

fast as possible and counts how many it generates in a second which gets
timed provider blocks on either side and see how the in/out amounts compare.
This is a cool idea, but I’m not sure how it could be implemented in
Gnuradio’s scheduler. I’ll have to think about that one.

–n

Brett_LSTrotter · May 27, 2011, 3:38am

Replying to self- here’s another case on the USRP2/UHD-

TX Path: Sig Source -> UHD (USRP2) Sink

RX Path: UHD (USRP2) Source -> Band Pass Filter -> Scope Sink

It seems that any kind of filter, even with appropriate calculation of
the rate coming out of the filter taking into account decimation will
yield a very delayed signal on the scope sink. The same problem does not
happen on USRP1 when the sinks are swapped out and the sample rates
adjusted appropriately.

Brett_LSTrotter · May 27, 2011, 4:15am

On 05/26/2011 08:06 PM, Nick F. wrote:

unacceptable bug.
UHD or gr-usrp? What OS? What version of libusb? Do all USB devices
“freeze” or just the USRP? Does power cycling the USRP un-freeze it?
This is definitely not something I’ve seen before.
gr-usrp
Ubuntu x86_64 with libusrp /usr/lib/libusb-0.1.so.4.4.4
RHEL-6 x86-64 with libusrp /usr/lib64/libusb-0.1.so.4.4.4
I believe I’ve also seen it on FC14 and RHEL5

USB devices continue to function, but it will not recognize new
connection events of any device, the USRP will not reinitialize even if
power cycled and reconnected, even to a different port.

frequency/amplitude, a wx scope, and a slider that sets the TX+RX

UHD).

–n
The alignment I’m talking about wasn’t even relative between RX and TX-
it was between branches of the RX path such as the real and imaginary
components of that path when viewed on the scope.

Brett_LSTrotter · May 27, 2011, 4:26am

On 05/26/2011 08:06 PM, Nick F. wrote:

The alignment I’m talking about wasn’t even relative between RX and TX-
it was between branches of the RX path such as the real and imaginary
components of that path when viewed on the scope.

So, you’re talking about splitting I/Q for the same signal, and joining
them later, and seeing
phase slip on the scope?

By default Gnu Radio now schedules each block in its own CPU thread. So
there could be
differences in instantaneous latencies down each path.

–
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Brett_LSTrotter · May 27, 2011, 4:22am

unacceptable bug.

My intuition here is that the “close” box doesn’t cause the flow-graph
to do the
usual “finish the flow-graph” thing. Which means that the USRP1 is
still streaming, and
nobody is listening. For the ‘power off’ case, the USRP1 resets
itself, and stops streaming
data, and for ctrl-C, there’s built-in logic that causes the
flow-graph to shutdown
“politely”, and send a “please stop streaming” command to the USRP1.
My suspicion about
USB freeze-up is that the problem is due to the USB drivers in the
kernel not doing the
right thing with a deluge of data still arriving when nobody is
actually listening. Which makes it a
not-strictly-GnuRadio thing, and more of a USB drivers thing. Also,
USB is inherently half-duplex,
which may (somehow) play into scenarios like this–some kind of weird
deadlock problem in the
kernel USB drivers?

frequency/amplitude, a wx scope, and a slider that sets the TX+RX

There’s a tremendous amount of buffering inside a Gnu Radio flow-graph,
which can
easily cause seconds of latency. The buffer-sizing algorithm is
complicated, and the
buffering at any point in the graph is calculated by whatever is
downstream, including
decimators.

I’ve long opined that the buffer-sizing (with its inherent latency)
isn’t actually correct all the time,
but I admit to not having offered any meaningful solutions. I don’t
know whether UHD exacerbates
this problem or not.

–
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Brett_LSTrotter · May 27, 2011, 10:48pm

the various things- so say in the simplest case you have a vector source
shortest available.

I’m not sure how the scope sink deals with the “minimum amount of data
available” issue.

But the broader question becomes something like:

given two (or more) subgraphs where the signals in the subgraphs had
relative phase “foo” at the
head of those sub-graphs, and the two (or more) subgraphs do
different numbers and types of
“things” to those signals, is relative phase preserved as seen by a
sink object?

It seems, on reflection, that unless those things explicitly modify
phase, then phase should
naturally be preserved. This is in stark contrast to the analog
world, where there will be
minor (or major!) phase-distortions as a result of following different
paths.

–
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Brett_LSTrotter · May 27, 2011, 9:51am

On 05/26/2011 09:14 PM, Brett L. Trotter wrote:

platforms and machines and causes unnecessary reboots. It is honestly an
connection events of any device, the USRP will not reinitialize even if
power cycled and reconnected, even to a different port.

Correction- apparently there’s libusb1 installed here. Silly me looking
for /usr/lib/libusb-*.

RHEL-6:
libusb1-static-1.0.3-1.el6.x86_64
libusb-devel-0.1.12-23.el6.x86_64
libusb1-1.0.3-1.el6.x86_64
libusb-static-0.1.12-23.el6.x86_64
libusb1-devel-1.0.3-1.el6.x86_64

un ia32-libusb-0.1-4
ii libusb-0.1-4
2:0.1.12-15ubuntu2 ii libusb-1.0-0
2:1.0.8-2 ii
libusb-1.0-0-dev 2:1.0.8-2
ii libusb-dev
2:0.1.12-15ubuntu2 un libusb0
ii
libusbmuxd1 1.0.4-1

Brett_LSTrotter · May 28, 2011, 3:34pm

On Fri, May 27, 2011 at 3:21 AM, Marcus D. Leech [email protected]
wrote:

unacceptable bug.
“politely”, and send a “please stop streaming” command to the USRP1.
My suspicion about
USB freeze-up is that the problem is due to the USB drivers in the
kernel not doing the
right thing with a deluge of data still arriving when nobody is
actually listening. Which makes it a
not-strictly-GnuRadio thing, and more of a USB drivers thing. Also,
USB is inherently half-duplex,
which may (somehow) play into scenarios like this–some kind of weird
deadlock problem in the
kernel USB drivers?

From the sounds of things, I’d say Marcus is correct. At least, it’s
what
I’m thinking is the problem, as well.

with a UHD (USRP2) sink and source with a siggen at a fixed

There’s a tremendous amount of buffering inside a Gnu Radio flow-graph,
which can
easily cause seconds of latency. The buffer-sizing algorithm is
complicated, and the
buffering at any point in the graph is calculated by whatever is
downstream, including
decimators.

The GNU Radio scheduler optimizes for throughput, not latency, so yes,
large
latencies can build up in the buffers because of the difference in the
optimization process.

I’ve long opined that the buffer-sizing (with its inherent latency)
isn’t actually correct all the time,
but I admit to not having offered any meaningful solutions. I don’t
know whether UHD exacerbates
this problem or not.

Yes, it would be nice to have this ability. Unfortunately, it’s not on
my
list of things to get to immediately. Obviously, it’s going to require
some
delicate surgery inside the scheduler code. And when I say delicate, I’m
no
saying that it’s necessarily difficult, but that it must be thought
about
carefully to make sure we’re not harming ourselves in any other way.

Tom