Usrp_siggen.py underruns

Dominik_A · February 11, 2009, 7:36pm

Hi!

I am currently observing an odd behavior of usrp_siggen.py.

When I start the program as follows

./usrp_siggen.py -f 2.40G -i 16 --gaussian

there are a lot of underruns (uU). However, for all other signal
generation options except gaussian, it works fine (i.e. const, sine,
uniform). Just to see, I have modified usrp_siggen to enable realtime
scheduling. It didn’t make the underruns go away.
My /etc/security/limits.conf contains the line
@usrp - rtprio 90
as suggested by a recent post to mailing list (though I increased the
maximum priority). libgruel realtime functions sets the priority -30
(checked with top). CPU usage is ~ 103%.

I have observed a similar behavior with my transmitter system, if I set
the bandwidth to 8 MHz, which should be the maximum. To me, it seems
like the GnuRadio USRP driver can hardly keep up with this rate (it
should be the maximum supported). Measurements without the USRP sink
revealed that my transmitter actually sustains rates over 30 MS/s.
Though I didn’t test what rate the gaussian noise source in usrp_siggen
achieves if connected to a nullsink.
Further, with the USRP2, my transmitter sends continuously at a
bandwidth of 12.5 MHz, no problem. However I need the USRP1 too.

My gnuradio version is from the svn trunk, but it’s not the latest one.
Some revision above 10000. If necessary, I could do a test with the
latest revision.

The program test_usrp_standard_tx reports
xfered 1.34e+08 bytes in 4.19 seconds. 3.2e+07 bytes/sec. cpu time = 0
0 underruns

My machine is a Core i7 940, 3 Gb RAM. I am using a fresh install of
Ubuntu 8.10. The USRP owns his proper USB root hub, i.e. no other
devices connected. Hence I think it is unlikely to be caused by the
machine.

Best regards
Dominik

Dominik_A · February 11, 2009, 7:56pm

On Wed, Feb 11, 2009 at 07:35:10PM +0100, Dominik A. wrote:

uniform). Just to see, I have modified usrp_siggen to enable realtime
scheduling. It didn’t make the underruns go away.
My /etc/security/limits.conf contains the line

@usrp - rtprio 90

as suggested by a recent post to mailing list (though I increased the
maximum priority). libgruel realtime functions sets the priority -30
(checked with top). CPU usage is ~ 103%.

That won’t help. The problem is that the gaussian RNG is really slow.
You’ll need to figure out how to make it faster.

Eric

Dominik_A · February 11, 2009, 8:42pm

Hi!

That won’t help. The problem is that the gaussian RNG is really slow.
You’ll need to figure out how to make it faster.
I am sorry. This was an example and I hoped that the RNG is fast enough.
Actually, I have observed this behavior with my transmitter. As I
described, it doesn’t send with 8 MHz bandwidth on USRP1. Before we
received the new USRP2s, I have thought that this is a problem of my
application. Even though with a nullsink, I have measured a throughput
in front of the nullsink of more than 30 MSamples per second.

Now with the USRP2, my transmitter is streaming continuously, sending at
12.5 Mhz bandwidth. It keeps up with this rate. There was no change in
the transmitter, except for using the USRP2.

Conclusion: my code can send with at least 12.5 complex MSamples per
second (equal to 12.5 MHz bandwidth), but using USRP1, I can’t send with
8 Mhz?

Best regards
Dominik

Dominik_A · February 11, 2009, 8:57pm

On Wed, Feb 11, 2009 at 08:35:56PM +0100, Dominik A. wrote:

in front of the nullsink of more than 30 MSamples per second.

Now with the USRP2, my transmitter is streaming continuously, sending at
12.5 Mhz bandwidth. It keeps up with this rate. There was no change in
the transmitter, except for using the USRP2.

Conclusion: my code can send with at least 12.5 complex MSamples per
second (equal to 12.5 MHz bandwidth), but using USRP1, I can’t send with
8 Mhz?

What kind of an EHCI controller do you have?
We’ve come across some that won’t support 32MB/s on transmit.

Eric

Dominik_A · February 11, 2009, 9:20pm

Hi!

What kind of an EHCI controller do you have?
We’ve come across some that won’t support 32MB/s on transmit.
http://www.asus.com/products.aspx?modelmenu=1&model=2593&l1=3&l2=179&l3=815&l4=0

Intel X58 chipset on an Asus P6 Deluxe. We are using the onboard
controller.

test_usrp_standard_tx reports
xfered 1.34e+08 bytes in 4.19 seconds. 3.2e+07 bytes/sec. cpu time = 0
0 underruns

Identical behavior on another machine, which has Athlon 64 X2 and hence
different mainboard and chipset, definitely no Intel chipset. But I am
not at the institute and can’t tell you the controller name at the
moment (but tomorrow, if you need it).

Dominik

Dominik_A · February 11, 2009, 10:27pm

Hi!

An additional note: using usrp_siggen.py with sine, const and uniform at
8 MHz bandwidth actually works. It is unlikely that my EHCI controller
does not support 32 MB/s on transmit.

Could this be a timing problem? I mean, that the data is generated very
fast, but then the generator waits, e.g. because the buffer is full.
Does the double buffering of the TPB scheduler work as supposed? Using
STS scheduler with usrp_siggen didn’t change anything.

Summary:
The application supports 12.5 complex MS/s (100 MB/s) if using USRP2,
but can’t sustain 8 complex MS/s with USRP1, even though usrp_siggen.py
does support 8 MS/s with the generators sine,const and uniform on the
USRP1 (and test_usrp_standard_tx estimates an achievable rate of 32
MB/s). Furthermore, this behavior shows up on 2 different machines.

Do you have an idea how I could benchmark the application, e.g. to
characterize the stream timing in front of the USRP?

Best regards
Dominik

Dominik_A · February 12, 2009, 12:29am

On Wed, Feb 11, 2009 at 6:03 PM, Eric B. [email protected] wrote:

Are you really trying to use the Gaussian PRNG? If so you’ll have to
fix it. If you look at the code for it, you’ll see that it samples a
distribution until it gets something it likes.

A classical reference for fast generation of random numbers under
various
distributions is

Lorrain, D. 1980. “A Panoply of Stochastic ‘Cannons’.” Computer Music
Journal 4(1)

The common shortcut to Gaussian random sequences is to sum some number
of
uniform variates, usually 12, for each Gaussian output.

Frank

Dominik_A · February 12, 2009, 1:54am

On Wed, Feb 11, 2009 at 06:28:16PM -0500, Frank B. wrote:

Frank
Thanks, Frank!

Eric

Dominik_A · February 12, 2009, 12:04am

On Wed, Feb 11, 2009 at 10:26:25PM +0100, Dominik A. wrote:

Hi!

An additional note: using usrp_siggen.py with sine, const and uniform at
8 MHz bandwidth actually works. It is unlikely that my EHCI controller
does not support 32 MB/s on transmit.

OK.

Could this be a timing problem? I mean, that the data is generated very
fast, but then the generator waits, e.g. because the buffer is full.
Does the double buffering of the TPB scheduler work as supposed? Using
STS scheduler with usrp_siggen didn’t change anything.

Yes. Double buffering works.

Summary:
The application supports 12.5 complex MS/s (100 MB/s) if using USRP2,

Uhh, 12.5 MS/s is 50MB/s (16-bit I&Q across the wire).

but can’t sustain 8 complex MS/s with USRP1, even though usrp_siggen.py
does support 8 MS/s with the generators sine,const and uniform on the
USRP1 (and test_usrp_standard_tx estimates an achievable rate of 32
MB/s). Furthermore, this behavior shows up on 2 different machines.

Do you have an idea how I could benchmark the application, e.g. to
characterize the stream timing in front of the USRP?

Yes, there are lots of ways to do this. In this particular case,
you’re going to want to keep track of the worst case and average run
times.

Are you really trying to use the Gaussian PRNG? If so you’ll have to
fix it. If you look at the code for it, you’ll see that it samples a
distribution until it gets something it likes. The worst case time
can be huge. The difference between the USRP1 and the USRP2 is the
amount of buffering available in the Tx path in the kernel and on the
board.

If you’re not using the Gaussian PRNG, I suggest that you stop worrying
about it. Virtually all of the signal processing algs we run are
designed so that there’s not much difference between the average and
worst cases.

Eric

Dominik_A · February 12, 2009, 11:09am

Hi!

Thanks for your answer.
And thanks Frank B., too!

Uhh, 12.5 MS/s is 50MB/s (16-bit I&Q across the wire).
Sorry, my fault.

Yes, there are lots of ways to do this. In this particular case,
you’re going to want to keep track of the worst case and average run
times.
Hm run times may not be the appropriate performance measure in my case.
The transmitter is of course designed to run continuously (until I
interrupt him). What about interarrival times? I once had the idea to
record every buffer update with a timestamp, the difference in the
number of samples and the current processor the task is running on. Do
you think that these samples may help to reveal the reason for the
underruns in my transmitter code?

Are you really trying to use the Gaussian PRNG? If so you’ll have to
fix it. If you look at the code for it, you’ll see that it samples a
distribution until it gets something it likes. The worst case time
can be huge. The difference between the USRP1 and the USRP2 is the
amount of buffering available in the Tx path in the kernel and on the
board.
No. Just for the case someone tries to reproduce the odd behavior,
usrp_siggen is widely available. You may understand that I can’t hand
out my transmitter code. I could strip it down to the relevant parts,
that reproduce the behavior, but since I don’t know what could be the
reason, this becomes an undirected search. Simply I hoped that the
underruns of the Gaussian PRNG are caused by the same problem.

Will a big buffer in the USRP1 probably change the behavior? Am I right
that with setting fusb_nblocks etc., the buffer size changes?

I have just confirmed that the Gaussian PRNG can’t send at a bandwidth
of near 8 MHz with the USRP2. That was definitely a bad example.

I will try to perform some measurements in the next week. Are there any
gnuradio blocks, gnuradio utils available to find the average and worst
cases? Oprofile will sample the whole application, not only the link
between my last block and the USRP1 sink.
For your interest, I was measuring the throughput with a modified
gr.throttle block. Instead of delaying the stream, I compute the
momentaneous rate/throughput and average with a simple IIR (the rate
estimate).

Thank you for your help.

Best regards
Dominik

Dominik_A · February 12, 2009, 5:34pm

On Thu, Feb 12, 2009 at 11:07:43AM +0100, Dominik A. wrote:

number of samples and the current processor the task is running on. Do
you think that these samples may help to reveal the reason for the
underruns in my transmitter code?

Will a big buffer in the USRP1 probably change the behavior? Am I right
that with setting fusb_nblocks etc., the buffer size changes?

You can try that, though on Linux the defaults are pretty big already.

I have just confirmed that the Gaussian PRNG can’t send at a bandwidth
of near 8 MHz with the USRP2. That was definitely a bad example.

…

I will try to perform some measurements in the next week. Are there any
gnuradio blocks, gnuradio utils available to find the average and worst
cases? Oprofile will sample the whole application, not only the link
between my last block and the USRP1 sink.
For your interest, I was measuring the throughput with a modified
gr.throttle block. Instead of delaying the stream, I compute the
momentaneous rate/throughput and average with a simple IIR (the rate
estimate).

I wouldn’t try to use gr.throttle for this. I suggest running your
flow graph with a known amount of known input and throw the output
into a null sink. Then time the wall and cpu times.

$ time <my_application>

or you could insert a gr.head(…) immediately before the null sink
which will stop the graph after it’s copied N samples into the null
sink. In either case, you’re got a graph that will process a known
amount of input and then exit.

You can get time measurements that avoid most of the setup overhead by
just measuring fg.run(). Check the python docs for functions that
measure wall and cpu time.

If your code can’t on the average generate the required amount of
output in the required time, then you’ve got some work to do. If you
think you’ve got a case where the average and the worst cases vary
widely, I suspect that the easiest way to go after that is to think
about it! You wrote the code, right? You know the expected and worst
case complexity for each block, right? If not, spend some time with
Knuth, then think about it some more…

Eric