Large number of overflows

dcian · April 12, 2010, 6:31am

Hi All

I am trying a modified example of the digital-bert routines, for
communication between 2 USRP2s, and notice that I am getting a very
large number of overflows (SSSS…) even with decimation rate at the
receiver of 20, and 4 samples per symbol (sometimes even with 20
samples/symbol). If I don’t get overflows (as has occurred when I used
20 for decimation as well as 20 for samples/symbol in one instance), I
am able to capture the demodulated bits as 111111111111111111111111…,
as expected for the example. However, with overruns, which seem to occur
more for lower samples per symbol and/or lower decimation values, I get
a large number of bit errors.

My receiver flowgraph is of the form:

USRP2 Source --> RRC Filter --> Costas Loop --> Mueller and Muller Synch
–> Complex to Real --> Binary Slicer --> Descrambler --> File Sink.

The transmitter flowgraph uses the same blocks as per
digital-bert/transmit_path.py, but with a USRP2 sink.

I am transmitting over-the-air, and clocks are not synchronised between
Tx and Rx.

I have a gigabit Ethernet link, and 2 x 2 GHz CPUs in my PC, which is
running Ubuntu 9.10.

Can anyone suggest why I am getting so many over-runs, and how I could
get around this problem?

Thanks

Ian.

dcian · April 22, 2010, 7:23pm

On 04/11/2010 09:22 PM, Ian H. wrote:

more for lower samples per symbol and/or lower decimation values, I get
I am transmitting over-the-air, and clocks are not synchronised between
Tx and Rx.

I have a gigabit Ethernet link, and 2 x 2 GHz CPUs in my PC, which is
running Ubuntu 9.10.

Can anyone suggest why I am getting so many over-runs, and how I could
get around this problem?

These overflows indicate one of two things:

that your flowgraph is too slow to execute in real time on your
computer
You haven’t enabled realtime scheduling.

Matt

dcian · April 23, 2010, 1:42am

Hi Matt

Myself and a colleague have created a C++ equivalent for the same
flowgraph, with realtime scheduling enabled. We still have overruns for
data rates above 2 Mbps, even on a Core i7 machine. We will try and make
a multi-threaded version to hopefully resolve this, since our version is
only single-threaded at this stage.

In regards to using GRC to create the flowgraph, how can I check if
realtime scheduling is enabled, and/or enable realtime scheduling?

Thanks

Ian.

dcian · April 23, 2010, 1:48am

In regards to using GRC to create the flowgraph, how can I check if
realtime scheduling is enabled, and/or enable realtime scheduling?

Select realtime scheduling in the options block. If your flow graph
fails to enable it at runtime, an error message is printed. -Josh

dcian · April 23, 2010, 6:26am

On 04/22/2010 07:56 PM, Matt E. wrote:

are using at least 2 samples per bit, you only have 800 cycles per
sample to process them. This is certainly possible, but you will need
to optimize your code.

How long are your filters? Are you using FFT-based filters instead of
convolution based? Is too much memory getting copied around?

For some perspective, based on USRP1 data.

My radio astronomy application runs fairly well at 10.6Msps, on a Core 2
Quad 9XXX (9770?) machine,
with 8G of memory, and clocked at about 3.2GHz.

My application does a 1Hz-resolution FFT over the data (that’s a 10.6M
point FFT!), computes the total power,
and also does interference notch filtering, using a FFT filter, plus
SETI analysis, pulsar folding, and
transient detection. It can keep up, but all 4 cores are pretty busy!

I think Matt’s analysis is pretty close to the mark. One of the
mistakes people make (that I’ve also made)
is to specify FIR filters with very-narrow transition widths–that
will cause a very long filter to be created.
Relaxing the “skirts” on the filter can dramatically reduce CPU
consumption. I typically use filter “skirts”
that are roughly 20-25% of the total bandwidth of the filter. In many
applications, very tight filtering isn’t
a requirement for decent performance of the downstream demodulation,
particularly when link margins
are reasonably good anyway.

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

dcian · April 23, 2010, 2:13am

On 04/22/2010 04:38 PM, Ian H. wrote:

Hi Matt

Myself and a colleague have created a C++ equivalent for the same
flowgraph, with realtime scheduling enabled. We still have overruns for
data rates above 2 Mbps, even on a Core i7 machine. We will try and make
a multi-threaded version to hopefully resolve this, since our version is
only single-threaded at this stage.

I am pretty sure that what you are seeing is that your application is
not keeping up. The USRP2 keeps sending data to the computer as fast as
it generates it. The ethernet card DMAs it into some buffer in memory.
Your app uses it and the driver then frees the buffer. If at some
point the driver receives a frame and there is no buffer free for it
then the packet will be dropped, and you’ll see an “S”. S stands for
sequence number error, which is how the system can tell that there is a
dropped packet. It is an overrun occurring in the computer, not in the
hardware. The hardware will not overrun.

The best way to test what is happening is to run usrp2_fft.py. If you
can run that at the same or higher sample rates than you are using in
your application, then the driver is not the issue. My guess is that
your computer will run without problem at decimation of 6 at worst, and
more likely all the way down to 4. Your app is running at a decimation
of around 12 or 16, so it is your app that can’t keep up.

Think of it this way – the fastest Core i7 machines are 3.2 GHz. For a
2 Mbps signal, you only have 1600 cycles per data bit. Assuming you are
using at least 2 samples per bit, you only have 800 cycles per sample to
process them. This is certainly possible, but you will need to optimize
your code.

How long are your filters? Are you using FFT-based filters instead of
convolution based? Is too much memory getting copied around?

In regards to using GRC to create the flowgraph, how can I check if
realtime scheduling is enabled, and/or enable realtime scheduling?

It is the last option in the “Options” top block.

Matt

dcian · April 23, 2010, 6:28am

On Thu, Apr 22, 2010 at 21:18, Marcus D. Leech [email protected]
wrote:

My application does a 1Hz-resolution FFT over the data (that’s a 10.6M
point FFT!)

Who would have thought ten years ago we’d be doing 10 million point
FFTs in real-time on computers you can buy at the local store

Johnathan

dcian · April 23, 2010, 8:09am

Thanks Marcus

Actually, the only filtering I did in the C++ version is for the M&M
clock recovery, i.e. in interpolating to get the symbols based on
imperfectly timed samples. In the GRC example, I also had an RRC filter,
with 11*samples_per_symbol taps, but this didn’t appear to be the
bottleneck. In both applications, the Costas loop and the M&M timing
recovery tend to be the problem. I think multithreading the C++
application will benefit, but I am not sure it is splittable into
multiple threads other than possibly 3, since the Costas loop and also
the M&M loop are recursive in nature.

By the way, FFTs don’t seem to be such a problem, I can even get lower
decimation rates for that, but to do the Costas/M&M seems to be the big
killer.

Cheers

Ian.