Understanding flow control

Charles_I · January 14, 2010, 8:18pm

Hello,
I’m trying to understand the flow control between the USRP2 and host
machine. I assume that it needs to be worked out where the USRP2 will
always have a constant stream of uninterrupted radio data when sending
and receiving (unless a more complex radio is in place which allows
the signal to drop).

I have read that overruns are not really an issue, is this due to the
processing power/throughput of the spartan 3 vs the host processor?

I see that pause frames are used to help with flow control but I can
only see this being used if an overrun or full fifo has occured on the
USRP2, what happens if the fifo becomes empty?

I’m trying to catch the pause frames with tcp dump and i’m either
doing it wrong or they are not happening. I have tried usrp2_siggen
and usrp2_fft. I’m using the dev code from git on ubuntu 9.04 with
gnuradio 3.2.2

We are either overflowing, underflowing, or are perfectly in sync.
Only overflow with pause frames to control makes sense to me. If this
is not the case an explanation would be very much appreciated.

Charles

Charles_I · January 15, 2010, 1:26am

On Thu, Jan 14, 2010 at 02:13:01PM -0500, Charles I. wrote:

I see that pause frames are used to help with flow control but I can
is not the case an explanation would be very much appreciated.

Charles

The USRP2 only applies backpressure to data transmitted from the host.
It does this by transmitting PAUSE frames, which timeout after a
certain period of time (see the the Gig E spec for details).

In the FPGA -> host direction, there is no flow control since it is
rate limited by the baseband rate that the host has requested. If the
host asks for a higher rate than the host can handle, the host is in
error. There’s no amount of flow control and buffering that can fix
that symptom.

If you want to force the USRP2 to generate pause frames, just use
usrp_siggen.py

Also, with most NIC’s you can confirm that asymmetric flow control has
been negotiated using $ ethtool -a .
It should report Rx: On, Tx: Off

Eric

Charles_I · January 15, 2010, 2:02am

That’s very interesting… just this morning we were playing with
turning rx OFF with ethtool
and convincing ourselves that it seemed to stabilize our system.

If we turn off RX pause (ethtool -A eth0 rx off), does the USRP2 stop
sending pause frames?

We are running two USRP2s connected with a MIMO cable. The master
USRP2 receiving
and transmitting data while the slave USRP provides a second source to
the host (on eth1).

(without going into details of our application which I don’t
understand - I’m just the
programmer), we are munging the data received on the two channels and
sending
it right back to eth0. sometimes the system is stable but very often
it seems to get
into this state where it is stable for 30 seconds and then “blows” up
in the host code.

suspecting it had something to do with ethernet traffic I was fooling
about with ethtool
and found that things were much more stable with RX OFF on eth0. or
it was when I last
saw it running.

so I guess my question is: is there anything in host code or usrp2
firmware that behaves
differently if RX pause is off?

Charles_I · January 15, 2010, 4:15am

Following up on my previous email, thinking about this some more:

I’m guessing that we are sending the USRP2 more data than it can
handle, it is sending pause packets back, which when RX is ON, the
ethernet card recognizes and slows down its output (not knowing
anything about gig-e control flow but this sounds like x-on x-off),
which causes our system to become unstable, BUT when we turn of RX on
the ethernet device, it ignores the pause packets coming back from the
USPR2, and keeps bombarding the USRP2 with transmitter data.

so what happens if we ignore the pause packets? does the USRP2 drop
packets on the floor and just output stuff as fast as it can?

Charles_I · January 15, 2010, 7:18pm

On Thu, Jan 14, 2010 at 10:11:38PM -0500, Tom G. wrote:

so what happens if we ignore the pause packets? does the USRP2 drop
packets on the floor and just output stuff as fast as it can?

Tom, are there any switches between the host and the USRP2?
If so, try removing them. PAUSE frames and switches don’t interact
well.

I’m not sure I’m following the physical interconnection of the pieces.
Is this the set up?

Does the host have two gig E ports?
Is there a dedicated cable between eth and USRP2 <1>?
Is there a dedicated cable between eth<N+1> and USRP2 <2>?
The two USRP2’s are connected with a MIMO cable?

Eric

Charles_I · January 15, 2010, 7:57pm

We’ve tried it with and without a switch - definitely better without the
switch.

Thinking about our setup the behavior actually makes sense to me,
although I’m waiting to discuss it with my signal processing guru.

two usrp2s, connected with a mimo cable. the slave is just getting
its clock from the master.

the master is sampling, and sending data back to the host, which is
also getting data from the slave, which is also sampling.

usrp2 master is on eth0 (receive and transmit). usrp2 slave is on eth1.

basically my flow graph in grc is pretty simple:

two usrp2 sources feeding my custom block, which is doing some magic
stuff and outputs a single channel to a single usrp2 sink (the master
usrp2).

I suspect that when RX is ON for eth0, the pause packets are causing
the data going out on eth0 to back up, and the delay is worse (in
terms of the algorithm running on the host) than the consequences of
some data just being dropped on the floor, and never being
transmitted.

I guess I could study the firmware source (if it’s in the C code where
this happens) to figure out what happens if RX is OFF. My assumption
is that somewhere in the USRP2 code there is some recognition that it
can’t keep up with transmit data, thus causing it to send pause
signals back to the ethernet controller (is that correct?). Maybe
it’s not in the firmware but built into some ethernet port controller
chip.

Or maybe my understanding of what RX ON/OFF does is completely wrong.
So, I guess I’m asking: as I understand it, the USRP2 sends pause
packets (or something) to the ethernet controller when it can’t keep
up with data being sent to it. RX ON means that the controller will
acknowledge these pause commands and stop sending data. Or have I got
that completely backwards?

Charles_I · January 15, 2010, 8:54pm

Thanks Eric, that’s exactly what I thought. I think in our
application it’s probably better to drop data, or at least that’s why
things are more stable for us when RX is OFF. Or maybe we should just
slow down a bit. Something to think about.

Charles_I · January 15, 2010, 8:10pm

On Fri, Jan 15, 2010 at 01:54:24PM -0500, Tom G. wrote:

I guess I could study the firmware source (if it’s in the C code where
this happens) to figure out what happens if RX is OFF. My assumption
is that somewhere in the USRP2 code there is some recognition that it
can’t keep up with transmit data, thus causing it to send pause
signals back to the ethernet controller (is that correct?). Maybe
it’s not in the firmware but built into some ethernet port controller
chip.

Actually, PAUSE handling is all handled in the FPGA. When the FIFO is
getting full, a PAUSE frame is sent on the wire telling the host to
stop sending for a while.

Or maybe my understanding of what RX ON/OFF does is completely wrong.
So, I guess I’m asking: as I understand it, the USRP2 sends pause
packets (or something) to the ethernet controller when it can’t keep
up with data being sent to it. RX ON means that the controller will
acknowledge these pause commands and stop sending data. Or have I got
that completely backwards?

In ethtool lingo, “Rx ON” means that the host will listen to the PAUSE
frames. This is what we want. Otherwise the host will continue
blasting away, and they’ll get dropped somewhere along the way. “Tx
OFF” means that the host does not send PAUSE frames. This is what we
want. The USRP2 never listens to PAUSE frames, since it doesn’t have
enough buffer to avoid an overrun.

We’re using “Asymmetric flow control”. See also:

http://grouper.ieee.org/groups/802/3/z/public/presentations/nov1996/asym.pdf

Eric

Charles_I · January 16, 2010, 12:08am

Incidentally my System Engineer/Project Lead points out that if the
USRP2 is actually telling the host to stop sending (which certainly
appears to be the case) then we are only able to get overall
throughput with two USRP2s over two gig-e connections comparable to
what we were getting with a single USRP over a single USB 2.0 line.
Something of a disappointment to us.

That is not correct. If you have 2 USRP2s both connected by gig-e, then
you need 2 separate gig-e cards. You should be able to get the full
throughput to each one, but your computer may have a hard time keeping
up.

Matt

Charles_I · January 15, 2010, 11:55pm

On Fri, Jan 15, 2010 at 2:07 PM, Eric B. [email protected] wrote:

Actually, PAUSE handling is all handled in the FPGA. When the FIFO is
getting full, a PAUSE frame is sent on the wire telling the host to
stop sending for a while.

Incidentally my System Engineer/Project Lead points out that if the
USRP2 is actually telling the host to stop sending (which certainly
appears to be the case) then we are only able to get overall
throughput with two USRP2s over two gig-e connections comparable to
what we were getting with a single USRP over a single USB 2.0 line.
Something of a disappointment to us.

Charles_I · January 16, 2010, 12:13am

yes of course we have two separate gig-e cards. if the usrp2 is
sending us “pause” commands then it seems evident the usrp2 is having
trouble keeping up, not the computer.

Charles_I · January 16, 2010, 12:23am

On Fri, Jan 15, 2010 at 15:08, Tom G. [email protected] wrote:

yes of course we have two separate gig-e cards. if the usrp2 is
sending us “pause” commands then it seems evident the usrp2 is having
trouble keeping up, not the computer.

The host software, when creating a data stream to be sent to the USRP2
for TX, will create the data as fast as the processor allows, and TX
on the GbE at full wire rate. The USRP2, however, is “consuming” data
at a fixed rate proportional to the configured TX RF baseband sample
rate. Even at the fastest sample rate, 25 Msps (interpolation rate
4), this is only 800 Mbps + framing overhead. So the USRP2 doesn’t
have problems “keeping up”, it’s just that the host can create the
digital sample stream faster than real time, so the USRP2 pauses it
periodically to keep the average data rate down to what is needed.

Johnathan

Charles_I · January 16, 2010, 12:16am

Matt,
What is the maximum data rate that the USRP2 transmitter can accept
from the host without firing pause signals back to the host?
-Tom

Charles_I · January 16, 2010, 12:28am

On 01/15/2010 03:14 PM, Tom G. wrote:

Matt,
What is the maximum data rate that the USRP2 transmitter can accept
from the host without firing pause signals back to the host?

See my other message. The USRP2 will ALWAYS put out pause frames. In
fact, when the data rate is low it will actually put out MORE pause
frames. This is normal and is not something you should want to avoid.

Matt

Charles_I · January 16, 2010, 12:28am

On 01/15/2010 03:08 PM, Tom G. wrote:

yes of course we have two separate gig-e cards. if the usrp2 is
sending us “pause” commands then it seems evident the usrp2 is having
trouble keeping up, not the computer.

First off, the USRP2 will only send pause frames when you are
transmitting, not receiving. Pause frames are not generated due to the
USRP2 being unable to keep up. Pause frames are not generated due to
the computer not being able to keep up. Pause frames are generated as a
normal part of the transmission process. This is fundamental, and you
need to understand exactly why.

When you are transmitting, the USRP2 can only consume samples at a fixed
rate, determined by the clock rate (100 MHz) and the interpolation rate
(4 or higher). No matter what that rate is, your computer should be
able to generate samples faster than that. Since your computer does not
have a 100 MHz clock to pace the generation of those samples, it will
generate them too fast. When they are sent the USRP2, which can only
consume them at a certain rate, they will pile up in the buffers of the
USRP2. Once the buffers get full enough, the USRP2 will send pause
frames back to the computer to tell it to wait until the samples it has
can work their way through the pipeline.

Again, this is completely normal and not because your computer is too
slow, and not because the USRP2 is too slow. It is a normal consequence
of the practicalities of generating samples asynchronously to their
consumption.

So in normal transmit operation, you will see large numbers of pause
frames going from the USRP2 to the computer. Do not be alarmed.

When receiving, the USRP only generates data as fast as samples are
created by the ADC clock, divided by the decimation rate. If the
decimation rate is 4 then the sample rate is 25 MS/s at 32 bits per
sample, or 800 mbits. This fits just fine into gigabit ethernet, and so
it all just goes out almost immediately over the ethernet, and nothing
ever backs up and stalls. If your computer couldn’t keep up, then it
MIGHT WANT TO send pause frames, but we have disabled that. Instead, if
your computer can’t keep up it will drop frames and you’ll see “S” in
your terminal window. Get a faster computer or do less processing if
you see this.

If you were to try a decimation of 3 or lower, then you would be
generating more than 1 gigabit per second, and the ethernet wouldn’t
keep up, and the buffers in the USRP2 would overflow and cause an
overrun (“O”) error. You shouldn’t be doing this because it won’t work.

I hope this has cleared it up. To summarize – each USRP2 needs its own
Gigabit ethernet card to talk to EVEN if it is using only a tiny
fraction of the total bandwidth. And those cards need to be configured
to honor flow control.

Matt

Charles_I · January 16, 2010, 12:57am

Thanks Matt, Eric and Jonathan (hope I didn’t forget anyone. ).

We greatly appreciate the information and need to think about stuff on
our end. I’ve been deliberately vague about our application (not that
I could really explain it even if I felt authorized discuss it). The
thing to remember is that we believe (maybe we are fooling ourselves)
that we see easily reproducible problems when RX is ON but not when RX
is OFF.

One more question was just sent to me to pass on, while I was
composing this email:

crazy idea: is there any way to “clock” the host, i.e. keep track of a
time stamp in the host and gate/trigger the host outputs at a constant
sample rate that is consistent with the sample rate of the USRP2?

just thought I would throw that out there…

have a good weekend!
-Tom

Charles_I · January 16, 2010, 3:30am

On 01/15/2010 03:53 PM, Tom G. wrote:

Thanks Matt, Eric and Jonathan (hope I didn’t forget anyone. ).

We greatly appreciate the information and need to think about stuff on
our end. I’ve been deliberately vague about our application (not that
I could really explain it even if I felt authorized discuss it). The
thing to remember is that we believe (maybe we are fooling ourselves)
that we see easily reproducible problems when RX is ON but not when RX
is OFF.

It is very hard to help when we don’t have information about what you
are trying to do. The important piece of information is that you are
transmitting and receiving at the same time when you see this problem.
This indicates that there may be flow control tuning issues.

Is the RX stream ok and the TX has a problem? Or is it that the TX is
ok and the RX has a problem? Or is it both?

Do you have a TTL serial port hooked up to J305? Do you see characters
there? Do you see “S” characters on the receive application window?

Are you trying to use 2 separate programs (1 tx, 1 rx) to talk the the
USRP2, or are they in the same app?

One more question was just sent to me to pass on, while I was
composing this email:

crazy idea: is there any way to “clock” the host, i.e. keep track of a
time stamp in the host and gate/trigger the host outputs at a constant
sample rate that is consistent with the sample rate of the USRP2?

No, for 2 reasons:

Even if the host had a clock, clocks drift relative to each other
The USRP2 might need to hold off on sending for some reason.

This is a system that requires feedback and there is no way around it.
On the USRP1 the feedback is done by the flow control built into the USB
protocol. On the USRP2 the feedback is done by the flow control built
into ethernet. You could imagine doing a different feedback mechanism
using your own protocol, but it would still involve the device telling
the computer when to go and when to stop.

Actually there is one way around it – have infinitely large buffers in
the USRP2, but that would add to the cost

Matt

Charles_I · January 16, 2010, 4:07am

Tom-

composing this email:

crazy idea: is there any way to “clock” the host, i.e. keep track of a
time stamp in the host and gate/trigger the host outputs at a constant
sample rate that is consistent with the sample rate of the USRP2?

just thought I would throw that out there…

“Clock the host” at multi-MHz rate? One way would be to connect the A/D
converter directly to the PC and omit
external radio hardware. Then you would not need FPGA de-modulation,
GbE, etc. That would be a multi-year hardware
and software effort, but sounds like something you and your mystery
questioner might be willing to take on

-Jeff

Charles_I · January 16, 2010, 6:13pm

Hi Matt,

I’ll try hooking up my ttl line on Monday (I have done that but not
with our current configuration, so far only when I was playing around
with building custom firmware).

Here’s the official description of what we are doing (I hope this
helps!):

We have two USRP2s sharing common clocks via the MIMO cable. The first
USRP2 receives a 20MHz input signal and outputs an amplitude and phase
shifted copy of this same signal. Analog subtraction is performed
between the input and output signals of the first USRP2. The result
of this subtraction is an error signal that is used as an input to the
second USRP2. The host implements a signal processing algorithm that
adaptively estimates the amplitude and phase shift to apply to the
20MHz signal such that the resulting error signal at the input to the
second USPR2 is driven to zero. This processing forms the foundation
of a real-time adaptive closed-loop control system that has a myriad
of diverse applications. The system described above appears to work
well for relatively low throughput bandwidths (i.e. for USRP2 sample
rates below about 3.1MHz, decim = 32). In addition, because of
apparent latencies between the time at which the host updates its
calculations and the time at which these calculations are reflected in
the output of the first USRP2, we must use additional decimation in
the host by a significant number of samples (something like 512 is
typical). So the effective closed-loop throughput bandwidth is
approximately 100MHz divided by decim * 512 or only about 6kHz. We
would, of course, like to run this system with as much throughput
bandwidth as possible and we are not sure at this point where the
biggest bottleneck is coming from.

I would only add that what is referred to as the “first” USPR2 above
does input and output through eth0 (the controller on the motherboard

this is on a quad-core Lenovo running the latest 64-bit Ubuntu).
The second USRP2 (the slave, set up to get its clock through the MIMO
cable from the first USRP2) is receiving date through eth1. My code
(the signal processing algorithm that drives the error signal to zero)
seems to “blow up” after about 30 seconds unless I set RX to OFF on
eth0.

-Tom

Charles_I · January 16, 2010, 8:16pm

On 01/16/2010 09:03 AM, Tom G. wrote:

shifted copy of this same signal. Analog subtraction is performed
between the input and output signals of the first USRP2. The result
of this subtraction is an error signal that is used as an input to the
second USRP2. The host implements a signal processing algorithm that
adaptively estimates the amplitude and phase shift to apply to the
20MHz signal such that the resulting error signal at the input to the
second USPR2 is driven to zero. This processing forms the foundation
of a real-time adaptive closed-loop control system that has a myriad
of diverse applications.

What daughterboards are you using? If it is the BasicRX or LFRX and
BasicTX or LFTX, the you are probably better off doing this all in the
same USRP2. It would take some relatively small modifications to the
FPGA.

The system described above appears to work

biggest bottleneck is coming from.
You need to be careful here – increasing decimation will increase your
latency. This is fundamental to all signal processing systems.

What you really want is to reduce buffer sizes.

I would only add that what is referred to as the “first” USPR2 above
does input and output through eth0 (the controller on the motherboard

this is on a quad-core Lenovo running the latest 64-bit Ubuntu).
The second USRP2 (the slave, set up to get its clock through the MIMO
cable from the first USRP2) is receiving date through eth1. My code
(the signal processing algorithm that drives the error signal to zero)
seems to “blow up” after about 30 seconds unless I set RX to OFF on
eth0.

Can you be more specific about what “blow up” means?

Matt