Diagnosing the cause of D's

Hi all,

I need some help determining the cause of D’s being displayed from my
N210
device. I know it means GNU Radio is not consuming the samples from
my laptops Ethernet socket buffer fast enough, causing the USRP to
overflow
it.

What I would like to do now, is learn how to monitor performance such
that
I can figure out which part of my receiver is the bottleneck, so I can
focus on optimizations there. I can’t lower the sample rate anymore,
because I’m already at the minimum rate the USRP requires (320k).

Would someone recommend a next course of action and tools I should
download
to proceed?

Appreciated,
Rich

On Fri, May 29, 2015 at 9:07 AM, Richard B. [email protected]
wrote:

What I would like to do now, is learn how to monitor performance such that
I can figure out which part of my receiver is the bottleneck, so I can
focus on optimizations there. I can’t lower the sample rate anymore,
because I’m already at the minimum rate the USRP requires (320k).

Would someone recommend a next course of action and tools I should
download to proceed?

There are many potential sources of problems that can result in overflow
from the USRP. Some common ones:

  • Insufficient network stack buffering in the OS. Increasing this,
    however,
    may just be masking the real problem, and at the data rate you are
    describing, is unlikely the issue.

  • One of the blocks in the flowgraph is exceeding the CPU resources
    available in a single core. While all GNU Radio blocks run in their own
    threads and area scheduled by the OS onto as many cores as are
    available,
    due to their sequential data dependencies, a single “slow” block can
    become
    the rate-setting portion of the flowgraph. Normally, you always want
    the
    hardware to have this role.

  • OS operations like periodic file system flushing can often consume CPU
    and I/O bandwidth that competes with the flowgraph. This is made much
    worse if you are writing to disk as a result of the flowgraph
    processing.

A coarse view of the flowgraph resource usage (CPU, etc.) can be had
with a
thread and core aware tool like “htop”. This will quickly show if a
single
block/thread has become a CPU bottleneck.

A more granular view can be seen with the “perf” tool available in the
linux-tools package. This allows profiling CPU usage by function and can
identify hot spots for further optimization.

Finally, it might help to understand if the overflow events are “a few
every so often”, or continuous.

Hi Richard,

320kS is not the minimal rate; if I’m not mistaken it’s 100MHz/512.

Ds are relatively serious, and I’ve rarely seen them: The typical “your
system is too slow” results in "O"verflows; typically, you see “D” if
UHD starts wondering where the sample packet n disappeared to, after
receiving n-1 followed by n+1 (or so).

What’s your ethernet hardware? (If on linux, “lspci | grep -i ether”)
We’ve had some grief caused by USB3-to-ethernet-adapters which seemed to
take delight in confusing at least UHD, its users and a significant part
of its support team by randomly reordering packets on a direct link.
Also, there’s a single Intel Gigabit Ethernet controller that comes
directly from hell, but it’s becoming rarer in the wild every day.

Best regards,
Marcus

On 05/29/2015 02:13 PM, Marcus M. wrote:

We’ve had some grief caused by USB3-to-ethernet-adapters which seemed
to take delight in confusing at least UHD, its users and a significant
part of its support team by randomly reordering packets on a direct
link. Also, there’s a single Intel Gigabit Ethernet controller that
comes directly from hell, but it’s becoming rarer in the wild every day.

Best regards,
Marcus

The notorious Intel NIC is the 82579LM. It drops packets, even at low
load. It’s a FIFO control bug that they couldn’t ever fix…

I’ve been trying out the (bleeding edge) corr_est() code and the
test_corr_est.grc sometimes segvs. Not repeatable, won’t crash under gdb
:slight_smile: From a core file, it’s crashing loading the first element of aVector
in volk_32fc_x2_multiply_32fc_a_avx
instruction is vmovaps (%eax),%ymm1 where eax is 0x8b8b590.

My CPU supports the 256bit AVX instructions, and I believe such data
needs to be 256bit=32 byte aligned. EAX here isn’t. Looking at the
backtrace, it’s crashing in fft_filter_ccc::filter, where the d_fwdfft
is generated from fft::fft_complex

In turn this creates d_inbuf by calling fftwf_malloc. I believe this
only guarantees 16 byte alignment. (I’m on fftw3.3.4, on a 32 bit Linux
3.18.1, CPU Intel i5-3470)

As we move to use volk and the fast SIMD, should the code in fft.cc (and
perhaps other places) move to using a volk memory allocator to get the
right alignment, rather than fftwf_malloc?

I’m on gentoo, so I could fix the source to fftwf_malloc to return 32
byte aligned, but that’s not a general solution.

I’m not sure how I should report this issue, please pass on my report if
there’s a better place to discuss such things

G8SQH
[email protected]

On Fri, May 29, 2015 at 4:33 PM, [email protected] wrote:

I’m not sure how I should report this issue, please pass on my report if
there’s a better place to discuss such things

G8SQH
[email protected]

Yes, this is absolutely a problem with using the fftw_malloc functions.
Unless told, FFTW doesn’t build AVX support, which means that it can
return
vectors that are not AVX-aligned, and the code is calling the _a kernel
and
so requires it to be appropriately aligned.

Can you change and fftw_malloc’d arrays used in volk calls to
volk_malloc
and a) make sure that works and b) submit a patch with where you’re
running
into this? I thought we had fixed these already (also, what version are
you
running?).

Tom