Cannot Achieve 50MSPS Sampling Rate

luislavena · March 5, 2012, 5:11pm

I’m using an Ettus N210. I have a simple block diagram in GRC with a
cosine
signal source feeding a throttle feeding the usrp sink. All share the
samp_rate variable as their sampling rate, which I am trying to get to
50M.
So far, I’ve been able to achieve up to 25MSPS using Complex int16 as
the
wire format with minimal under sampling (A few 'U’s are printed to the
terminal at first, but it eventually stabilizes and stops printing Us).
I’ve
been able to get 33.333333MSPS with minimal undersampling using complex
int8. I’m yet to achieve 50MSPS. I have an older computer that was
unable to
achieve these sample rates, so I thought it might be my pc. However,
this
laptop has a Core i5 M450 @2.4GHz, 4Gb DDR3 and a gigabit Ethernet card
(integrated though). I’m not sure how much power gnuradio requires, but
I
feel like this is a fair amount for most applications. I’ve set sudo
sysctl
-w net.core.rmem_max=50000000 and sudo sysctl -w
net.core.rmem_max=50000000.
Not sure what else I can do. I thought maybe I could change the
decimation
rate, but that’s not going to help if I can’t change the format over the
wire. Any help is appreciated!

View this message in context:
http://old.nabble.com/Cannot-Achieve-50MSPS-Sampling-Rate-tp33444517p33444517.html
Sent from the GnuRadio mailing list archive at Nabble.com.

labarowski · March 5, 2012, 5:47pm

The single-core performance of an i5 at 2.4GHz is roughly 8GFlops.

That sounds like a lot, but keep in mind that every sample is
touched a lot in the average case. Every sample has to percolate
through layers of kernel, network/USB-stack, and C library, and Gnu
Radio. So at 50Msps, it doesn’t take long until you’ve burned up your
CPUs capability. And 2.4GHz isn’t a particularly fast CPU these days,
and there are asymmetries in the integer vs float-point performance on
CPUs. Your samples are being handled by a mixture of floating-point and
integer pipelines, so even though you may have enough floating-point
“headroom”, you might be running out of steam on the integer side. It’s
simply not practically possible to squeeze every last possible milligram
of performance out of a modern CPU, using a general-purpose operating
system, C libraries, etc, etc. They each optimize for different things,
and the composite result isn’t necessarily ideal for real-time signal
processing.

Gnu Radio has made progress lately on improving
floating-point performance by using vectorized processing “kernels” for
a number of key core functions within Gnu Radio. That helps. But since
those blocks (and Gnu Radio) are only part of the whole “story” when
it comes to processing your samples, the net effect isn’t going to be
spectacular.

On Mon, 05 Mar 2012 08:28:17 -0800, Josh B. wrote:

50 Msps is serious amount of data to push through your computer. More

than a feeling, even the most serious computers can easily have a

bottleneck. I suspect that the gnuradio core signal source cannot

sustain 50Msps in 1 thread/1 work function.

labarowski · March 5, 2012, 5:29pm

On 03/05/2012 08:10 AM, labarowski wrote:

I’m using an Ettus N210. I have a simple block diagram in GRC with a cosine
signal source feeding a throttle feeding the usrp sink. All share the

dont use a throttle, the usrp sink is back-pressuring the stream
use this new signal source on my new_blocks or next branch, its much
much faster:

git://gnuradio.org/jblum.git

http://gnuradio.org/cgit/jblum.git/tree/gr-blocks/lib/signal_source.cc?h=new_blocks

samp_rate variable as their sampling rate, which I am trying to get to 50M.
So far, I’ve been able to achieve up to 25MSPS using Complex int16 as the
wire format with minimal under sampling (A few 'U’s are printed to the
terminal at first, but it eventually stabilizes and stops printing Us). I’ve
been able to get 33.333333MSPS with minimal undersampling using complex
int8. I’m yet to achieve 50MSPS. I have an older computer that was unable to
achieve these sample rates, so I thought it might be my pc. However, this
laptop has a Core i5 M450 @2.4GHz, 4Gb DDR3 and a gigabit Ethernet card
(integrated though). I’m not sure how much power gnuradio requires, but I
feel like this is a fair amount for most applications.

50 Msps is serious amount of data to push through your computer. More
than a feeling, even the most serious computers can easily have a
bottleneck. I suspect that the gnuradio core signal source cannot
sustain 50Msps in 1 thread/1 work function.

I’ve set sudo sysctl

-w net.core.rmem_max=50000000 and sudo sysctl -w net.core.rmem_max=50000000.

rmem max cannot help, this on receive

wmem_max cant make a difference past 1 megabyte. All the buffering is on
the usrp in the transmit direction. wmem_max is simply just large enough
so that any packets going out will have the same amount of space on the
host to guarantee that send() wont block.

Not sure what else I can do. I thought maybe I could change the decimation
rate, but that’s not going to help if I can’t change the format over the
wire. Any help is appreciated!

Why do you think you cant change the wire format? It sounds like you
changed it to sc8 format (above).

I prototyping whatever it is you are trying to do at a lower rate, then
optimizing it in SIMD routines to get up to 50 Msps on the host.

Also, try examples/tx_waveforms with --otw=sc8 --rate=50e6 and
experiment with that so you can see if your PC can sustain the rate with
minimal load.

-Josh

labarowski · March 5, 2012, 11:47pm

On 03/05/2012 01:58 PM, labarowski wrote:

git://gnuradio.org/jblum.git

http://gnuradio.org/cgit/jblum.git/tree/gr-blocks/lib/signal_source.cc?h=new_blocks

Didn’t realize that a throttle was unnecessary with the usrp. Thanks for
pointing that out. I see that you have a lot of other blocks in your
repository as well. Do these have any advantage to the ones included with
gnuradio?

They all add performance, utility, also support for fixed point IO.

50 Msps is serious amount of data to push through your computer. More
than a feeling, even the most serious computers can easily have a
bottleneck. I suspect that the gnuradio core signal source cannot
sustain 50Msps in 1 thread/1 work function.

Makes sense. I would assume that, as long as I’m using grc at least, I’m
limited to one thread?

well, its 1 thread per block, but grc just generates the flow graph
code, its technically out of the loop.

Why do you think you cant change the wire format? It sounds like you
changed it to sc8 format (above).
Should have worded that more carefully. I can choose 8 or 16 bits over the
wire. My thinking was that I might be able to increase the rate of
decimation (on second thought, decimation has more to do with sampling
frequency than a data bottleneck) and also decrease the precision and use
an integer less than 8bits. Basically, I thought ethernet may have been
the bottleneck. Decreasing precision shouldn’t be necessary, I’m just
trying to build an understanding here.

ok cool

minimal load.

I actually couldn’t find that in my /usr/local/share/gnuradio/examples
folder. Do I need to download it elsewhere?

/share/uhd/examples/tx_waveforms

or in your build directory of uhd

/examples/tx_waveforms

Thanks for the informative post Josh! 50MSPS will probably prove unnecessary
for the project that I’m working on. I was worried that perhaps I was doing
something fundamentally wrong and that’s why I wasn’t getting the full
sampling rate out of this device.

Thanks!
-Josh

labarowski · March 5, 2012, 11:00pm

Josh B.-3 wrote:

sustain 50Msps in 1 thread/1 work function.
an integer less than 8bits. Basically, I thought ethernet may have been

Also, try examples/tx_waveforms with --otw=sc8 --rate=50e6 and
experiment with that so you can see if your PC can sustain the rate with
minimal load.

I actually couldn’t find that in my /usr/local/share/gnuradio/examples
folder. Do I need to download it elsewhere?

Thanks for the informative post Josh! 50MSPS will probably prove
unnecessary
for the project that I’m working on. I was worried that perhaps I was
doing
something fundamentally wrong and that’s why I wasn’t getting the full
sampling rate out of this device.

View this message in context:
http://old.nabble.com/Cannot-Achieve-50MSPS-Sampling-Rate-tp33444517p33447202.html
Sent from the GnuRadio mailing list archive at Nabble.com.

labarowski · March 6, 2012, 12:15am

I was totally unaware of SIMD before you mentioned it. That’s an
interesting subject. It is my understand that SIMD is integrated into
gnuradio through VOLK as of Dec. 2010. Doesn’t look like I can use VOLK
directely from grc, or even from Python. Looks like it needs to be
implemented at the block level in C++. Is this the case?

The latest master GIT repo for Gnu Radio has integrated work to make
many of the core
functions SIMD-aware using Volk-based processing “kernels”. If you
run the latest, you
should already be benefitting from SIMD implementations of many core
functions, including
filtering, and basic math like complex add, multiply, etc.

But I wonder, given the apparent confusion about decimation whether you
actually need very-high sample rates, or
perhaps you have a misconception about what you really need out of a
flow-graph. For example, the sample-rate has
very little to do with the final RF frequency you’re “looking at” or
generating. Unless you’re doing very-wideband
spectral analysis, or have very-wideband signals to deal with (or any
of a number of other interesting wideband applications
like Radio Astronomy), you don’t need any more bandwidth that will
comfortably “fit” your signal(s) of interest.

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

labarowski · March 6, 2012, 12:44am

well, its 1 thread per block, but grc just generates the flow graph
code, its technically out of the loop.

The unspoken text here is that Gnu Radio has two different scheduling
“policies” for a flow-graph–the default
for several years had been TPB (Thread-Per-Block). Which improves
your odds of “keeping up” with complex
flow-graphs across many blocks.

But individual blocks don’t, as a rule, perform operations across
multiple threads. There are exceptions–the FFT blocks
now provide the option of doing their work across multiple threads.
But that’s only worthwhile for very large
FFT operations–20K bins or so or larger. Computer science hasn’t
yet come up with any general-purpose way of
converting serial algorithms into parallel ones that are
unconditionally better that their serial counterparts. So most of
the blocks in Gnu Radio are essentially serial in nature, with
occasional optimizations like the SIMD optimizations described
earlier.

Should have worded that more carefully. I can choose 8 or 16 bits over the
wire. My thinking was that I might be able to increase the rate of
decimation (on second thought, decimation has more to do with sampling
frequency than a data bottleneck) and also decrease the precision and use
an integer less than 8bits. Basically, I thought ethernet may have been
the bottleneck. Decreasing precision shouldn’t be necessary, I’m just
trying to build an understanding here.

There’s no support for anything other than 16-bit or 8-bit (I and Q)
samples on the wire. In some wide-band applications, like
radio astronomy, using fewer bits to gain bandwidth is pretty
commonplace. But in general-purpose SDR platforms, not so
much. But more importantly, handling 50Msps and doing anything
useful present significant challenges if you want to “do those things”
on a general-purpose compute platform.

–
Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

Cannot Achieve 50MSPS Sampling Rate

Thanks for the informative post Josh! 50MSPS will probably prove unnecessary for the project that I’m working on. I was worried that perhaps I was doing something fundamentally wrong and that’s why I wasn’t getting the full sampling rate out of this device.

Thanks for the informative post Josh! 50MSPS will probably prove
unnecessary
for the project that I’m working on. I was worried that perhaps I was
doing
something fundamentally wrong and that’s why I wasn’t getting the full
sampling rate out of this device.