High packet loss problem (samples dropped?) and fusb paramet

Hi,

I’m running some packet transmission experiments using GNU Radio and
USRP. However, the packet loss ratio is very high (20%) even when I
directly connect two USRP boards, which are the transmitter and the
receiver, with a direct antenna cable (I’ve also tried using regular
antennas, but the packet loss ratio is about the same). I suspect that
the samples are dropped somewhere from usrp_sink -> usrp board ->
physical channel -> usrp board -> usrp_source, but I couldn’t pinpoint
the source of the problem. I was hoping that someone on the mail list
might have an idea about what’s causing the problem.

I’m positive that the signal processing blocks are working because if
I replace usrp_sink with file_sink on the transmitter and replace
usrp_source with file_source on the receiver, the packet loss ratio
immediately drops to zero.

I also tried using usrp_rx_cfile.py in gnuradio-examples to log the
signal in a file, then feed the file into the receiver side. It still
shows about 20% of packet losses. If I lower the data rate (hence the
sampling rate), the packet loss ratio gradually drops. But even when
I’m using the smallest data rate setting, it still shows some packet
losses.

One interesting thing I noticed is that when I increased fusb_nblock
and fusb_block_size, it shows a lot of buffer overrun (‘uO’). When I’m
using the default setting, these uO disappear. Why is this happening?
However, the packet loss ratio are high in both cases. And I also
tried enabling real time scheduling, but it doesn’t have much effects.

I’m using GNU Radio 3.0.3 release with Fedora Core 6. I’ve also tried
Debian 4.0r0, but the results are similar. I’ve already tried running
the same code on 3 different machines, all with similar results.

I felt that I have exhausted everything that can go wrong (which I can
think of). I would greatly appreciate any suggestions/comments from
the mail list. Many thanks.

-Michael


Hsin-Mu (Michael) Tsai
Ph.D. Student
Electrical and Computer Engineering Department
Carnegie Mellon University
E-Mail: [email protected]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

One interesting thing I noticed is that when I increased fusb_nblock
and fusb_block_size, it shows a lot of buffer overrun (‘uO’). When I’m
using the default setting, these uO disappear. Why is this happening?
However, the packet loss ratio are high in both cases. And I also
tried enabling real time scheduling, but it doesn’t have much
effects.

uO means “USRP Overrun”. That is, the USRP is sending data to the
computer faster than the computer can handle it. This probably means
that you are sending at too high a data rate – the online decoder is
taking too much CPU. When you start logging with the file_source, then
this CPU (and memory, etc) overhead disappears and all samples are
logged. Then when the decoded samples are replayed, your receiver has
ample CPU time to work.

Fixes to decrease CPU load are:

  1. Decrease the data rate (you tried this, and it worked!)
  2. Insert a pwr_squelch_cc block with gating on to decode only packets
    and not all random noise. Note this requires some parameter tuning to
    get it to work properly.
  3. Optimize the receiver.

Once you’ve eliminated the uO’s, if there are still packet losses, note
that the default GNU radio receivers are not perfect or even necessarily
that great… there might just be some problems recovering from the
default channel distortions. Finally, you can also play with the gain on
each end to attempt to improve the SNR. Which modulation scheme are you
using with which boards at which frequency?

Also, just to make sure, when you connect two USRPs directly make sure
to use sufficient attenuation (~40dB I think, check the archives for
emails from Matt E. for the actual number) with any of the RFX boards
to avoid damage to the receiver!

  • -Dan
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGp55My9GYuuMoUJ4RAthfAJ9/euUrNbUVop+8y0z7ExWmP+uz4QCggR+i
EFcrFGwCGcdrIQKNCt6aGuU=
=O/Dh
-----END PGP SIGNATURE-----

Thanks for the quick response. I really appreciate it.

uO means “USRP Overrun”. That is, the USRP is sending data to the
computer faster than the computer can handle it. This probably means
that you are sending at too high a data rate – the online decoder is
taking too much CPU. When you start logging with the file_source, then
this CPU (and memory, etc) overhead disappears and all samples are
logged. Then when the decoded samples are replayed, your receiver has
ample CPU time to work.

I don’t think my problem is related to the CPU. Here are my reasons:

  1. I tried overclock my CPU from 2.6 GHz to 3.2 GHz in order to see if
    an increase of CPU performance can decrease the packet loss ratio.
    However, no significant change was observed. And I’m using Intel Core
    2 Extreme QX6700 at 2.6 GHz, which is one of the most powerful CPU we
    can get currently.

  2. I guess I wasn’t very clear in my first e-mail. I tried two
    different kinds of setting with file_sink and file_source:

a) TX_blocks -> usrp_source -> usrp board 1 —physical_channel—>
usrp board 2 -> usrp_sink -> file_sink
then decode the signal with
file_source -> RX_blocks

b) TX_blocks -> file_sink
then decode the signal with
file_source -> RX blocks

a) still has similar results while b) has 0 packet loss. a) doesn’t
involve real-time decoding (the CPU should have ample time to work
with the data), but it still has a lot of packet losses.

  1. I’ve tried power squelch filter too. The results for both using it
    and not using it are similar.

Once you’ve eliminated the uO’s, if there are still packet losses, note
that the default GNU radio receivers are not perfect or even necessarily
that great… there might just be some problems recovering from the
default channel distortions. Finally, you can also play with the gain on
each end to attempt to improve the SNR. Which modulation scheme are you
using with which boards at which frequency?

One of the weirdest thing is that even when I’m using only usrp_source
-> file_sink, I can still have a lot of uO when using certain
fusb_nblock and fusb_block_size setting (and I believe my machine is
fast enough. I even tried saving the data to a file on the ramdisk).
How does these settings affect the probability of USRP overrun? If I
want to avoid samples being dropped, what setting should I use?

I’m using Thomas S.'s 802.15.4 blocks (OQPSK) (was posted on this
mail list a while ago). I’m using RFX2400 operating at 2.4 GHz.

Also, just to make sure, when you connect two USRPs directly make sure
to use sufficient attenuation (~40dB I think, check the archives for
emails from Matt E. for the actual number) with any of the RFX boards
to avoid damage to the receiver!

That’s good to know! (I didn’t know this.) Is there anyway to make
sure that my board is not damaged?

Thanks again for your time.

-Michael


Hsin-Mu (Michael) Tsai
Ph.D. Student
Electrical and Computer Engineering Department
Carnegie Mellon University

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hsin-mu Tsai wrote:

file_source → RX blocks

a) still has similar results while b) has 0 packet loss. a) doesn’t
involve real-time decoding (the CPU should have ample time to work
with the data), but it still has a lot of packet losses.

One of the weirdest thing is that even when I’m using only usrp_source
→ file_sink, I can still have a lot of uO when using certain
fusb_nblock and fusb_block_size setting (and I believe my machine is
fast enough. I even tried saving the data to a file on the ramdisk).
How does these settings affect the probability of USRP overrun? If I
want to avoid samples being dropped, what setting should I use?

Yeah, it sounds like it’s not CPU or disk overhead then – how about
USB? I don’t know much about the fast usb settings – search the archive
for comments from wiser minds than I – but you should definitely pick
settings that are free of 'uO’s.

IIRC 802.15.4 is a 2 MHz (if you think of it as QPSK at twice the symbol
rate, that is) modulation scheme so this is going to be testing the
limits of USB if you sample at 2 4, or 8MHz complex (8, 16, 32 MBps
respectively). Have you benchmarked your USB speed (I think
gnuradio-examples/src/python/usrp/benchmark_usb is the accepted way to
do so these days)? Are there other USB devices attached to the computer?

Also, just to make sure, when you connect two USRPs directly make sure
to use sufficient attenuation (~40dB I think, check the archives for
emails from Matt E. for the actual number) with any of the RFX boards
to avoid damage to the receiver!

That’s good to know! (I didn’t know this.) Is there anyway to make
sure that my board is not damaged?

Not my area.

  • -Dan
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGp6pSy9GYuuMoUJ4RAtiMAJ4gjAyXOa/Bsb1+/EHp1T6ncV4NYACfcqCn
6JADdFcKteA56Fdp+VDAib8=
=3AXS
-----END PGP SIGNATURE-----

On 7/25/07, Dan H. [email protected] wrote:

IIRC 802.15.4 is a 2 MHz (if you think of it as QPSK at twice the symbol
rate, that is) modulation scheme so this is going to be testing the
limits of USB if you sample at 2 4, or 8MHz complex (8, 16, 32 MBps
respectively). Have you benchmarked your USB speed (I think
gnuradio-examples/src/python/usrp/benchmark_usb is the accepted way to
do so these days)? Are there other USB devices attached to the computer?

I suspect the problem is somehow related to USB too. But I couldn’t
figure out a way to isolate the problem. Could the interrupts
generated by the USB controller be lost in the OS? (I thought it’s
reliable, but I’m not sure)

I removed all the USB devices from the computer when I was doing the
experiment. The benchmark script said 32 MB/s throughput can be
achieved.

Thanks,
-Michael


Hsin-Mu (Michael) Tsai
Ph.D. Student
Electrical and Computer Engineering Department
Carnegie Mellon University
E-Mail: [email protected]
Web: http://www.ece.cmu.edu/~hsinmut/
Office: +1-412-268-4639

Hsin-mu Tsai wrote:

I don’t think my problem is related to the CPU. Here are my reasons:

  1. I tried overclock my CPU from 2.6 GHz to 3.2 GHz in order to see if
    an increase of CPU performance can decrease the packet loss ratio.
    However, no significant change was observed. And I’m using Intel Core
    2 Extreme QX6700 at 2.6 GHz, which is one of the most powerful CPU we
    can get currently.

Have you tried running the latest low-latency kernel, with
modified settings in /etc/security/limits.conf?

Frank


Managing developers is like herding cats.
Managing volunteer developers is like herding bats, in the dark.

Have you tried running the latest low-latency kernel, with
modified settings in /etc/security/limits.conf?

Frank

This is my limits.conf

@usrp - rtprio 90
@usrp - memlock 2048000
@usrp - nice -19

The user running the code is in usrp group.

I’m running kernel 2.6.20-0119.rt8 from
http://people.redhat.com/mingo/realtime-preempt/.
(Ingo Molnar’s realtime preempt kernel)

The scheduling policy and priority should be in place when I was
running the experiments.

Many thanks,
-Michael


Hsin-Mu (Michael) Tsai
Ph.D. Student
Electrical and Computer Engineering Department
Carnegie Mellon University
E-Mail: [email protected]
Web: http://www.ece.cmu.edu/~hsinmut/
Office: +1-412-268-4639

Hsin-mu Tsai wrote:

(Ingo Molnar’s realtime preempt kernel)
That all looks well-formed. Are you then running the gnuradio
process with an explicit nice value?

Frank


Managing developers is like herding cats.
Managing volunteer developers is like herding bats, in the dark.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hsin-mu Tsai wrote:

On 7/25/07, Dan H. [email protected] wrote:
I removed all the USB devices from the computer when I was doing the
experiment. The benchmark script said 32 MB/s throughput can be
achieved.

Sorry I reread your original email, … you’re now using settings where
the there are no more (or very few) USRP overruns? Is that correct?

In that case it’s not the overruns, it’s the detector. I suspect there
is too much noise or the gain is set wrong. What reason do you have to
believe that 20% packet loss is not expected? What’s the SNR?

Especially at large symbol rates (like 2MHz), there is no reason to
expect that the error performance would be 0% packet loss without
actually knowing the SNR.

  • -Dan
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGp7F8y9GYuuMoUJ4RAs0NAJ9aJsmUF3rgDxThFx1zoP69YDtdBgCfZC5f
QgZU75H7Hs2QGZkYMmkDZ9E=
=XqAu
-----END PGP SIGNATURE-----

I didn’t change the nice value (with top or nice, etc.). I thought
enabling the real time setting (gr.enable_realtime_scheduling) in the
script will automatically change priority to (max + min) /2.

are ‘priority’ and ‘nice’ the same thing?

Thanks,
-Michael

That all looks well-formed. Are you then running the gnuradio
process with an explicit nice value?

Frank


Managing developers is like herding cats.
Managing volunteer developers is like herding bats, in the dark.


Hsin-Mu (Michael) Tsai
Ph.D. Student
Electrical and Computer Engineering Department
Carnegie Mellon University
E-Mail: [email protected]
Web: http://www.ece.cmu.edu/~hsinmut/
Office: +1-412-268-4639

Sorry I reread your original email, … you’re now using settings where
the there are no more (or very few) USRP overruns? Is that correct?

Yes. When using smaller fusb_nblock and fusb_block_size parameters, I
don’t see any ‘uO’ when running the script.

In that case it’s not the overruns, it’s the detector. I suspect there
is too much noise or the gain is set wrong. What reason do you have to
believe that 20% packet loss is not expected? What’s the SNR?

I was under the impression that when using a cable to connect TX and
RX boards, the SNR should be very high and the BER or PER should be
close to 0. Could it be that the SNR is too high and the signal is
saturated?

I tried using antennas instead. The packet loss ratio is about the
same as as when I’m using a direct antenna cable. That’s why I
believed the loss is not due to the physical channel. (can’t be 100%
sure with this kind of logic though)

I don’t know how to measure the SNR with existing GNU Radio blocks. Is
there any fast ways to do this?

Especially at large symbol rates (like 2MHz), there is no reason to
expect that the error performance would be 0% packet loss without
actually knowing the SNR.

That’s usually due to insufficient channel coherence bandwidth. That
shouldn’t be the case for the antenna cable.

Many thanks,
-Michael


Hsin-Mu (Michael) Tsai
Ph.D. Student
Electrical and Computer Engineering Department
Carnegie Mellon University
E-Mail: [email protected]
Web: http://www.ece.cmu.edu/~hsinmut/
Office: +1-412-268-4639

Maybe we can bring this thread up one more time. Michael found me on
campus and I’ve been trying to help with this.

We’re attempting to use the 802.15.4 (OQPSK) blocks that a UCLA wireless
group wrote, available here:
http://acert.ir.bbn.com/projects/gr-ucla/

Using file source/sinks, we can successfully encode/decode packets with
absolutely no packet loss. As soon as we connect the blocks with the
USRP, we experience relatively high packet loss (~2%) using coax (very
high SNR, low noise). I would expect 0% packet loss, as I see using
every single other modulation scheme in GNU Radio with benchmark_tx/rx.

We are generating absolutely no overrun/underrun between the transmitter
and receiver, so I’m not exactly sure what could be the problem here.

Any other ideas?

  • George

Hsin-mu Tsai wrote:

I didn’t change the nice value (with top or nice, etc.). I thought
enabling the real time setting (gr.enable_realtime_scheduling) in the
script will automatically change priority to (max + min) /2.

I don’t actually know what the gr.enable_realtime_scheduling
function does; you’re probably right. On the other hand I’m not
sure that effect is sufficient to get what you want.

As I understand it, what the low-latency/realtime scheduling
provide are more frequent opportunities for high-priority
processes to run. It’s sheer speculation but I wonder whether your
gnuradio process isn’t competing with something else at the same
priority, so that giving your process a slight edge in priority
might take care of the problem. I spent a great deal of time a
couple of years ago trying to balance the JACK daemon against the
X server, without much success and a with lot of frozen mice. It’s
only the most recent patches that really give solid low-latency
performance with the current audio subsystems. The performance is
very good now, though.

are ‘priority’ and ‘nice’ the same thing?

Niceness is an increment to the priority value. Nice-ing in the
negative direction improves the priority.


Managing developers is like herding cats.
Managing volunteer developers is like herding bats, in the dark.

George N. wrote:

(very high SNR, low noise). I would expect 0% packet loss, as I see
using every single other modulation scheme in GNU Radio with
benchmark_tx/rx.

We are generating absolutely no overrun/underrun between the
transmitter and receiver, so I’m not exactly sure what could be the
problem here.

Any other ideas?

Does the code do frequency offset correction? What about timing?

Matt