USRP Delay Measurements

Hi all,

I am currently trying to measure delays on the USRP and found some
interesting things. I wanted to see if others have similar results:

• First I measured the round trip time of my IEEE 802.15.4
implementation. I have to computers, one runs a ping application, the
other one a pong. The average delay in this case is 26.8ms.

• Next, I measured the round trip time of my FSK implementation (also
based on packets). Here, the mean RTT is 14.9ms.

• Next, I tried to find out the one way delay, i.e., how much time
does it take between generating one sample at the computer, until that
sample gets sent out at the USRP output. For this, I changed the code
of the sig_gen class such that it generates a square wave instead of a
cosine and I added code that toggles the parallel port pins according
to the state of the square wave. Then I used an oscilloscope to
measure the delay between the two signals, i.e., the one generated at
the parallel port and the one from the USRP (LFTX). The signal is at 2
Hz.
The delays in this case depend a lot on what I choose for the
decimation (while keeping a constant square wave frequency). If I use
decimation=500, then I find an average delay of 65.8ms. When I change
the decimation to something else, I get values ranging from 30ms up
to 150/200ms (I didn’t do exact measurements for these yet).

What I am wondering is, why is there such a huge difference? Is it
because of the GNU Radio scheduler? Or is there some other problems?

Thomas

On Wed, Nov 08, 2006 at 02:06:11PM -0800, Thomas S. wrote:

based on packets). Here, the mean RTT is 14.9ms.
The delays in this case depend a lot on what I choose for the
decimation (while keeping a constant square wave frequency). If I use
decimation=500, then I find an average delay of 65.8ms. When I change
the decimation to something else, I get values ranging from 30ms up
to 150/200ms (I didn’t do exact measurements for these yet).

What I am wondering is, why is there such a huge difference? Is it
because of the GNU Radio scheduler? Or is there some other problems?

Thomas

Have you tried explicity setting the fast usb buffering?
Take a look at the examples in gnuradio-examples/python/digital/*.py

You want to pass the fusb_* args to the usrp constructor.

Eric

On 11/8/06, Eric B. [email protected] wrote:

• Next, I measured the round trip time of my FSK implementation (also
Hz.

Have you tried explicity setting the fast usb buffering?
Take a look at the examples in gnuradio-examples/python/digital/*.py

I set the fusb buffering for the square wave test, additional to
setting real time scheduling. The result is similar. For different
decimation factors I get delays. What I didn’t notice before is the
following: Now, the delay increases with higher decimation factor and
it looks almost linear, i.e., if I double the decimation factor, the
delay increases by a factor of two. Does that make sense? I don’t
really understand that behavior, since I assumed that the delay should
be constant.

Thomas

On 11/8/06, Eric B. [email protected] wrote:

In your test case the ultimate flow control is the speed of transfer
8 bytes

Also, the usrp.sink_c sets it’s output_multiple to 128 samples, so it’s
always going to wait until there’s at least 128 complex samples
available before it runs (that’s 1 full USB packet).

I am actually using the short sink. But I assume this doesn’t change
the concept and just changes the amount of samples in the buffers.

I don’t really understand that behavior, since I assumed that the
delay should be constant.

It is, in samples. Hope this helps.

Note that if you’re not flow controlled (the typical
case with discontinuous transmission), you’ll get a better measure.
You could test this with a source that returned 0 for noutput items,
except that every N seconds, it toggled the parallel port lines and
then produced its burst of output.

Ah, this explains the differences in the delays from the packet source
tests I did earlier. It does make sense now.

You’ll want to be sure to calibrate out the time to wiggle to parallel
port lines. I’m assuming you’re doing it with a direct iob to the
parallel port control or data registers.

Yes, I am using iob. The parallel port delay is supposedly 1us, i.e.,
I can toggle the parallel ports at the speed of the bus it is
connected to, which is apparently around 1MHz. This delay is so small,
compared to the USRP delays, that I didn’t factor it in (yet).

So, just to make sure that I understood your explanation. The USB bus
speed is set by the interpolation factor (sorry, I used decimation
before, though this is the case when we receive, not send. My
fault…). The buffer before the USB driver is filled all the times by
our signal source. And this buffer is emptied at the set USB speed.

Thomas

On Wed, Nov 08, 2006 at 03:14:01PM -0800, Thomas S. wrote:

delay increases by a factor of two. Does that make sense?
Absolutely

In your test case the ultimate flow control is the speed of transfer
across the USB. This is determined by the decimation factor. The
siggen block is going to run as fast as it can and WILL fill up all
available buffering downstream from it. If your test case is a
gr.sig_source followed by the USRP, I would expect that the runtime
system would have allocated 32kB of buffering between the sig_source
and the usrp sink.

``````      gr_complex
``````

32kB * ---------- = 4096 complex samples
8 bytes

That’s the buffering between the sig_source output and the usrp input.

That amount is constant (and known). Note that if the sig_source
wasn’t flow controlled by the usrp consumption rate, the buffer would
not be full.

So, you’ve got a 4096 sample fixed delay (probably actually 4095)
between the output of the sig_source. You know your data rate across
the USB, and thus can subtract off the constant delay.

Also, the usrp.sink_c sets it’s output_multiple to 128 samples, so it’s
always going to wait until there’s at least 128 complex samples
available before it runs (that’s 1 full USB packet).

I don’t really understand that behavior, since I assumed that the
delay should be constant.

It is, in samples. Hope this helps.

Note that if you’re not flow controlled (the typical
case with discontinuous transmission), you’ll get a better measure.
You could test this with a source that returned 0 for noutput items,
except that every N seconds, it toggled the parallel port lines and
then produced its burst of output.

You’ll want to be sure to calibrate out the time to wiggle to parallel
port lines. I’m assuming you’re doing it with a direct iob to the
parallel port control or data registers.

Eric

Hi Eric,

I did some calculations and something doesn’t add up. As I mentioned
in my last email, I am using usrp.sink_s. Thus, my samples are real
shorts (16 bit). I verified the buffer size by printing out the buffer