High CPU usage

Hello,

I am using a C++ interface with the USRP2 board. I found the CPU usage
is about 30% for dual core 3.2G CPU with 5M sample frequency while about
60% usage for 20M sample frequency. I am just keep running the
rx_sample() function without any other operation. Is this CPU usage
normal? I think it is too high. Is there any method to optimize it?

On Thu, Feb 24, 2011 at 12:20 AM, peng senl [email protected]
wrote:

Hello,

I am using a C++ interface with the USRP2 board. I found the CPU usage is about
30% for dual core 3.2G CPU with 5M sample frequency while about 60% usage for 20M
sample frequency. I am just keep running the rx_sample() function without any
other operation. Is this CPU usage normal? I think it is too high. Is there any
method to optimize it?

The legacy driver or UHD? Are you using 32-bit complex floats or
16-bit complex shorts for you data?

I’d be very interested to hear your benchmarking of the different
types here. That is UHD/32fc vs. USRP2/32fc and UHD/16sc vs.
USRP2/16sc. Also, UHD/32fc vs. UHD/16sc.

One of the issues is that the samples are coming over the wire in
16-bit shorts and must be converted from big to little endian and then
from short to float. There is some vectorization we can do for this
(SIMD stuff) to speed up both parts of this conversion. Having played
with it in the USRP2 driver (pre-UHD), I remember seeing a 20%
improvement by vectorizing the endian and short to float conversions.
You could potential squeak even more out of it.

I’m not sure, but Josh might have already vectorized this process in
UHD. If not, I’m sure he will soon.

Tom

The legacy driver or UHD? Are you using 32-bit complex
floats or
16-bit complex shorts for you data?

In my case, I am using GNU Radio with USRP2 in C++.
The CPU usage for 5MHz is 30% with 3.2 G duo core CPU and around 70% for
20MHz sample frequency.

I’d be very interested to hear your benchmarking of the
different
types here. That is UHD/32fc vs. USRP2/32fc and UHD/16sc
vs.
USRP2/16sc. Also, UHD/32fc vs. UHD/16sc.

I just read data coming over the Ethernet. I did not even convert from
big to little endian or convert data to other format. So I try to
minimize the operations. But I still get such a high CPU usage. I
wondering is it possible to simplify the data receive operations.

On Fri, Feb 25, 2011 at 3:47 PM, peng senl [email protected]
wrote:

I’d be very interested to hear your benchmarking of the
different
types here. That is UHD/32fc vs. USRP2/32fc and UHD/16sc
vs.
USRP2/16sc. Also, UHD/32fc vs. UHD/16sc.

I just read data coming over the Ethernet. I did not even convert from big
to little endian or convert data to other format. So I try to minimize the
operations. But I still get such a high CPU usage. I wondering is it
possible to simplify the data receive operations.

Ok, that didn’t answer my question at all. HOW are you reading them from
the
Ethernet port? Which function are you calling in the USRP2 library to do
this? Or which GNU Radio usrp2_source_XXX are you using
(usrp2_source_32fc
or usrp2_soruce_16sc)?

Tom

p { margin-bottom: 0.08in; }

Hello Tom,

Here is how I collect the data:

I am using the example
rx_streaming_samples.cc to collect data. I disabled the function
copy_u2_16sc_to_host_16sc() in the example.

I think the program keeps calling bool
ok = rx_nop_handler::operator()(items, nitems, metadata) to return
the pointer of the received meta data array.

There is a background thread
usrp2::impl::bg_loop() running in real time. This function calls
“d_eth_buf->rx_frames(this, 100); ” to get the data out from
the ethernet buffer. That is basically what this program does.

Do you think the CPU usage is normal?
I also notice that the block timeout is 100ms. Is there a reason for
doing this?

— On Sun, 2/27/11, Tom R. [email protected] wrote:

From: Tom R. [email protected]
Subject: Re: High CPU usage
To: “peng senl” [email protected]
Cc: [email protected]
Date: Sunday, February 27, 2011, 4:37 PM

On Fri, Feb 25, 2011 at 3:47 PM, peng senl [email protected]
wrote:

The legacy driver or UHD? Are you using 32-bit complex

floats or

16-bit complex shorts for you data?

In my case, I am using GNU Radio with USRP2 in C++.

The CPU usage for 5MHz is 30% with 3.2 G duo core CPU and around 70% for
20MHz sample frequency.

I’d be very interested to hear your benchmarking of the

different

types here. That is UHD/32fc vs. USRP2/32fc and UHD/16sc

vs.

USRP2/16sc. Also, UHD/32fc vs. UHD/16sc.

I just read data coming over the Ethernet. I did not even convert from
big to little endian or convert data to other format. So I try to
minimize the operations. But I still get such a high CPU usage. I
wondering is it possible to simplify the data receive operations.

Ok, that didn’t answer my question at all. HOW are you reading them from
the Ethernet port? Which function are you calling in the USRP2 library
to do this? Or which GNU Radio usrp2_source_XXX are you using
(usrp2_source_32fc or usrp2_soruce_16sc)?

Tom