UHD Performance: Reaching 8Msps TX with USB 2.0 (?)

HI fellows,

I was wondering if anybody has been trying to reach 8 Complex Msps over
the
USB 2.0 on the Tx path.
While this has always been OK with old libusrp (and USRP 1) it appears
to
be no longer possible by means of UHD
neither when trying to do that on USRP1 (a few underruns) nor on B100
(lots
of overruns).

Everything appears instead fine on the Rx path

Is there any workaround to this?

…or did I miss something?

thanks everybody

PS
USB 3.0 seems to be capable enough for the 8 Msps.
Is USB3.0 a requirement for 8 Msps on the B100?

____________________________________________________________B100

./benchmark_rate --tx_rate 8e6
linux; GNU C++ version 4.6.1 20110908 (Red Hat 4.6.1-9); Boost_104600;
UHD_003.004.000-325-g7e296167

Creating the usrp device with: …
– USRP-B100 clock control: 10
– r_counter: 2
– a_counter: 0
– b_counter: 20
– prescaler: 8
– vco_divider: 5
– chan_divider: 5
– vco_rate: 1600.000000MHz
– chan_rate: 320.000000MHz
– out_rate: 64.000000MHz

Using Device: Single USRP:
Device: B-Series Device
Mboard 0: B100 (B-Hundo)
RX Channel: 0
RX DSP: 0
RX Dboard: A
RX Subdev: WBX RX v3 + Simple GDB
TX Channel: 0
TX DSP: 0
TX Dboard: A
TX Subdev: WBX TX v3 + Simple GDB

Testing transmit rate 8.000000 Msps
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
Benchmark rate summary:
Num received samples: 0
Num dropped samples: 0
Num overflows detected: 0
Num transmitted samples: 79931260
Num sequence errors: 0
Num underflows detected: 406

Done!

./benchmark_rate --tx_rate 8e6 --tx_otw sc16
linux; GNU C++ version 4.6.1 20110908 (Red Hat 4.6.1-9); Boost_104600;
UHD_003.004.000-325-g7e296167

Creating the usrp device with: …
– USRP-B100 clock control: 10
– r_counter: 2
– a_counter: 0
– b_counter: 20
– prescaler: 8
– vco_divider: 5
– chan_divider: 5
– vco_rate: 1600.000000MHz
– chan_rate: 320.000000MHz
– out_rate: 64.000000MHz

Using Device: Single USRP:
Device: B-Series Device
Mboard 0: B100 (B-Hundo)
RX Channel: 0
RX DSP: 0
RX Dboard: A
RX Subdev: WBX RX v3 + Simple GDB
TX Channel: 0
TX DSP: 0
TX Dboard: A
TX Subdev: WBX TX v3 + Simple GDB

Testing transmit rate 8.000000 Msps
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
Benchmark rate summary:
Num received samples: 0
Num dropped samples: 0
Num overflows detected: 0
Num transmitted samples: 79890620
Num sequence errors: 0
Num underflows detected: 696

Done!

____________________________________________________________USRP 1

./benchmark_rate --tx_rate 8e6 --tx_otw sc16
linux; GNU C++ version 4.6.1 20110908 (Red Hat 4.6.1-9); Boost_104600;
UHD_003.004.000-325-g7e296167

*** Warning! ***
Benchmark results will be inaccurate on USRP1 due to insufficient
features.

Creating the usrp device with: …
– Opening a USRP1 device…
– Using FPGA clock rate of 64.000000MHz…
Using Device: Single USRP:
Device: USRP1 Device
Mboard 0: USRP1 (Classic)
RX Channel: 0
RX DSP: 0
RX Dboard: B
RX Subdev: WBX RX v2 + Simple GDB
TX Channel: 0
TX DSP: 0
TX Dboard: B
TX Subdev: WBX TX v2 + Simple GDB

Testing transmit rate 8.000000 Msps
UUUUU
Benchmark rate summary:
Num received samples: 0
Num dropped samples: 0
Num overflows detected: 0
Num transmitted samples: 80022656
Num sequence errors: 0
Num underflows detected: 5

Done!

____________________________________________________________everything
fine
with 8bit samples

./benchmark_rate --tx_rate 8e6 --tx_otw sc8
linux; GNU C++ version 4.6.1 20110908 (Red Hat 4.6.1-9); Boost_104600;
UHD_003.004.000-325-g7e296167

– Loading firmware image: /usr/share/uhd/images/usrp_b100_fw.ihx…
done
Creating the usrp device with: …
– USRP-B100 clock control: 10
– r_counter: 2
– a_counter: 0
– b_counter: 20
– prescaler: 8
– vco_divider: 5
– chan_divider: 5
– vco_rate: 1600.000000MHz
– chan_rate: 320.000000MHz
– out_rate: 64.000000MHz

– Loading FPGA image: /usr/share/uhd/images/usrp_b100_fpga.bin… done
Using Device: Single USRP:
Device: B-Series Device
Mboard 0: B100 (B-Hundo)
RX Channel: 0
RX DSP: 0
RX Dboard: A
RX Subdev: WBX RX v3 + Simple GDB
TX Channel: 0
TX DSP: 0
TX Dboard: A
TX Subdev: WBX TX v3 + Simple GDB

Testing transmit rate 8.000000 Msps

Benchmark rate summary:
Num received samples: 0
Num dropped samples: 0
Num overflows detected: 0
Num transmitted samples: 80053688
Num sequence errors: 0
Num underflows detected: 0

Done!

On Wed, Mar 21, 2012 at 11:42 AM, Vincenzo P.
[email protected]wrote:

Everything appears instead fine on the Rx path
Is USB3.0 a requirement for 8 Msps on the B100?

Look for other devices on that USB bus using lsusb. Avoid sharing the
bus
with other peripherals (bluetooth, wlan, etc). You can also modify the
transport parameters using
–args=recv_frame_size=xxxxx,send_frame_size=xxxxx. This will give you
the
same control over receive & send frame size that the old USRP1 drivers
had.
The default receive/send frame sizes are 16K, which seems to work OK on
most machines.

For comparison, the USB host controller I’m using is the Intel 6
Series/C200, and on B100 I can use a TX send rate of 10.67Msps without
underflow, although occasionally underflows occur at the very beginning
of
the transmission (likely due to interrupt coalescing on the USB
controller).

I also have a USB 3.0 controller (an NEC Corporation uPD720200) on this
laptop, which fares more poorly, but still easily achieves 8Msps. I
don’t
have a good explanation as to why some USB controllers do better than
others. USB 3.0 is certainly not required on B100/USRP1, as neither
device
uses USB 3.0.

–n

Hi Nick, thanks for the suggestions.
I will test the args. What is the best (maximum?) possible value for
the send_frame_size in order to minimize the overhead yielded by UHD?

Would it be correct to assume that the over-the-wire overhead yielded by
UHD is larger than what the classic libusrp used to impose? If yes, by
what
scale?

The USB peripherals configuration does not differ when I use the classic
libusrp version and the UHD. Also, the difference in terms of underflow
amount when using the tx_samples_from_file (UHD) and an equivalent
classic,
libusrp-based utility is huge using the same USB controller, same hard
drive, same OS (fedora 16) in both cases. Actually I’m using the very
same
machine to do the tests.

A friend of mine here in Pisa (Mario di Dio, he’s also on the list) has
obtained the same results on both Ubuntu 11.10 and Fedora 14. He had
almost
no underruns apart from some at the very beginning when he used his USB
3.0
port and lots of underruns when using the 2.0 USBs of the same laptop.

I think I’m seeing something macroscopic, maybe a macroscopic mistake of
mine. May I know what version of UHD you are using and your OS?

sorry for the many questions, I’m just trying to figure out what I might
be
missing in order to properly use UHD for my purposes.

thanks

Il giorno 21 marzo 2012 19:59, Nick F. [email protected] ha scritto:

My sense is that a couple of things are “in play” in these
scenarios:

o UHD seems a little better at reporting under/overflow
than “classic”

o UHD consumes a slightly-larger amount of CPU in some
critical parts of the USB processing than in “classic”. Which means that
situations that may have been marginal before are now over the edge.

I’m also not sure what makes a “good” USB controller and a “not so
good” USB controller.

-Marcus

On Wed, 21 Mar 2012 11:59:14 -0700,
Nick F. wrote:

On Wed, Mar 21, 2012 at 11:42 AM, Vincenzo
Pellegrini wrote:

HI fellows,

I was wondering if anybody
has been trying to reach 8 Complex Msps over the USB 2.0 on the Tx path.

While this has always been OK with old libusrp (and USRP 1) it
appears to be no longer possible by means of UHD
neither when trying
to do that on USRP1 (a few underruns) nor on B100 (lots of overruns).

Everything appears instead fine on the Rx path

Is there
any workaround to this?

…or did I miss something?

thanks everybody

PS
USB 3.0 seems to be capable enough for
the 8 Msps.
Is USB3.0 a requirement for 8 Msps on the B100?

Look for other devices on that USB bus using lsusb. Avoid sharing the
bus with other peripherals (bluetooth, wlan, etc). You can also modify
the transport parameters using
–args=recv_frame_size=xxxxx,send_frame_size=xxxxx. This will give you
the same control over receive & send frame size that the old USRP1
drivers had. The default receive/send frame sizes are 16K, which seems
to work OK on most machines.

For comparison, the USB host
controller I’m using is the Intel 6 Series/C200, and on B100 I can use a
TX send rate of 10.67Msps without underflow, although occasionally
underflows occur at the very beginning of the transmission (likely due
to interrupt coalescing on the USB controller).

I also have a USB
3.0 controller (an NEC Corporation uPD720200) on this laptop, which
fares more poorly, but still easily achieves 8Msps. I don’t have a good
explanation as to why some USB controllers do better than others. USB
3.0 is certainly not required on B100/USRP1, as neither device uses USB
3.0.

–n

____________________________________________________________B100

./benchmark_rate --tx_rate 8e6

linux; GNU C++ version 4.6.1 20110908
(Red Hat 4.6.1-9); Boost_104600; UHD_003.004.000-325-g7e296167

Creating the usrp device with: …

– USRP-B100 clock control: 10

– r_counter: 2
– a_counter: 0
– b_counter: 20

prescaler: 8

– vco_divider: 5
– chan_divider: 5

vco_rate: 1600.000000MHz

– chan_rate: 320.000000MHz

out_rate: 64.000000MHz


Using Device: Single USRP:
Device:
B-Series Device
Mboard 0: B100 (B-Hundo)
RX Channel: 0
RX
DSP: 0
RX Dboard: A
RX Subdev: WBX RX v3 + Simple GDB
TX
Channel: 0
TX DSP: 0
TX Dboard: A
TX Subdev: WBX TX v3 +
Simple GDB

Testing transmit rate 8.000000 Msps

UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU

Benchmark rate summary:
Num received samples: 0
Num dropped
samples: 0
Num overflows detected: 0
Num transmitted samples:
79931260
Num sequence errors: 0
Num underflows detected: 406

Done!

./benchmark_rate --tx_rate 8e6 --tx_otw sc16

linux; GNU C++ version 4.6.1 20110908 (Red Hat 4.6.1-9); Boost_104600;
UHD_003.004.000-325-g7e296167

Creating the usrp device with: …

– USRP-B100 clock control: 10
– r_counter: 2
– a_counter:
0
– b_counter: 20
– prescaler: 8
– vco_divider: 5

chan_divider: 5

– vco_rate: 1600.000000MHz
– chan_rate:
320.000000MHz
– out_rate: 64.000000MHz

Using Device:
Single USRP:
Device: B-Series Device
Mboard 0: B100 (B-Hundo)

RX Channel: 0
RX DSP: 0
RX Dboard: A
RX Subdev: WBX RX
v3 + Simple GDB
TX Channel: 0
TX DSP: 0
TX Dboard: A
TX
Subdev: WBX TX v3 + Simple GDB

Testing transmit rate 8.000000
Msps

UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU

Benchmark rate summary:
Num received samples: 0
Num dropped
samples: 0
Num overflows detected: 0
Num transmitted samples:
79890620
Num sequence errors: 0
Num underflows detected: 696

Done!

____________________________________________________________USRP 1

./benchmark_rate --tx_rate 8e6 --tx_otw sc16
linux; GNU C++
version 4.6.1 20110908 (Red Hat 4.6.1-9); Boost_104600;
UHD_003.004.000-325-g7e296167

*** Warning! ***
Benchmark
results will be inaccurate on USRP1 due to insufficient features.

Creating the usrp device with: …
– Opening a USRP1 device…

– Using FPGA clock rate of 64.000000MHz…
Using Device: Single
USRP:
Device: USRP1 Device
Mboard 0: USRP1 (Classic)
RX
Channel: 0
RX DSP: 0
RX Dboard: B
RX Subdev: WBX RX v2 +
Simple GDB
TX Channel: 0
TX DSP: 0
TX Dboard: B
TX
Subdev: WBX TX v2 + Simple GDB

Testing transmit rate 8.000000
Msps
UUUUU
Benchmark rate summary:
Num received samples: 0

Num dropped samples: 0
Num overflows detected: 0
Num
transmitted samples: 80022656
Num sequence errors: 0
Num
underflows detected: 5

Done!

____________________________________________________________everything
fine with 8bit samples

./benchmark_rate --tx_rate 8e6 --tx_otw
sc8
linux; GNU C++ version 4.6.1 20110908 (Red Hat 4.6.1-9);
Boost_104600; UHD_003.004.000-325-g7e296167

– Loading firmware
image: /usr/share/uhd/images/usrp_b100_fw.ihx… done
Creating the
usrp device with: …
– USRP-B100 clock control: 10

r_counter: 2

– a_counter: 0
– b_counter: 20
– prescaler:
8
– vco_divider: 5
– chan_divider: 5
– vco_rate:
1600.000000MHz
– chan_rate: 320.000000MHz
– out_rate:
64.000000MHz

– Loading FPGA image:
/usr/share/uhd/images/usrp_b100_fpga.bin… done
Using Device:
Single USRP:
Device: B-Series Device
Mboard 0: B100 (B-Hundo)

RX Channel: 0
RX DSP: 0
RX Dboard: A
RX Subdev: WBX RX
v3 + Simple GDB
TX Channel: 0
TX DSP: 0
TX Dboard: A
TX
Subdev: WBX TX v3 + Simple GDB

Testing transmit rate 8.000000
Msps

Benchmark rate summary:
Num received samples: 0
Num
dropped samples: 0
Num overflows detected: 0
Num transmitted
samples: 80053688
Num sequence errors: 0
Num underflows
detected: 0

Done!


Vincenzo P.

Vincenzo Pellegrini - YouTube [1]


Discuss-gnuradio
mailing list
[email protected] [2]

Discuss-gnuradio Info Page [3]

Links:

[1] Vincenzo Pellegrini - YouTube
[2]
mailto:[email protected]
[3]
Discuss-gnuradio Info Page
[4]
mailto:[email protected]

On 03/21/2012 05:45 PM, Vincenzo P. wrote:

Hi Nick, thanks for the suggestions.
I will test the args. What is the best (maximum?) possible value for
the send_frame_size in order to minimize the overhead yielded by UHD?

Would it be correct to assume that the over-the-wire overhead yielded
by UHD is larger than what the classic libusrp used to impose? If yes,
by what scale?
No, the over-the-wire format for USRP1 hasn’t changed in years and
years. UHD simply makes what’s always been there look more
“UHD like”. Neither the USRP1 firmware nor FPGA images have changed
in a long time.


Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium