MAC layer development and USRP2

Charles_I · March 14, 2010, 10:16pm

I’ve been reading some papers related to MAC layer development on the
USRP, but they seem to have tapered off with the USRP2. Does anyone
have any information about MAC layer and protocol development for the
USRP2. Has this been satisfied with things like timestamps and gigE?
Any current papers or web links related to USRP2 protocol level
development? Thanks.

Charles

Charles_I · March 14, 2010, 10:58pm

Think of it this way…

MAC development is severely limited by GNU Radio… it lacks the
much-needed functionality to make information passing between the
blocks rich, simple, and bi-directional. Some of the building blocks
are in place (e.g., PMT), and the m-block was implemented to solve the
rest of the problems, but was deprecated (maybe removed as of now?).

MAC performance is limited by several things: 1) delay between GNU
Radio and the USRP/USRP2, 2) signal processing delay (GNU Radio), and
(3) jitter (e.g., unpredictable delay) … (1) is reduced a little by
USRP2 using GigE, but it’s still not down to traditional MAC
turnaround times (10s of us). (2) benefits from Moore’s law. (3)
kind of depends on whether you use realtime scheduling and how your
data hits delays in queues mainly.

All in all… still an open problem IMO.

George

Charles_I · April 6, 2010, 4:13pm

Thanks for the reply George. I’m still looking for a little more
information on this topic.

What is PMT
Why was m-block removed
Has anyone measured latency with the USRP2 and GigE
Is GigE alone not capable of handling MAC turnaround times or is
software to blame for this
Is the scheduler the main issue in the way it handles i/o between
blocks

Charles

Charles_I · April 6, 2010, 4:22pm

On Tue, Apr 6, 2010 at 10:07 AM, Charles I. [email protected] wrote:

Thanks for the reply George. I’m still looking for a little more
information on this topic.

What is PMT

http://gnuradio.org/redmine/wiki/1/TypePMT

Why was m-block removed

http://osdir.com/ml/discuss-gnuradio-gnu/2010-01/msg00066.html

Has anyone measured latency with the USRP2 and GigE

I’m not sure.

Is GigE alone not capable of handling MAC turnaround times or is
software to blame for this

I think the latency is on hundreds of microseconds, which is greater
than,
say, an 802.11 ACK turnaround time (24us).

Is the scheduler the main issue in the way it handles i/o between blocks

There are some details of this in the second link I gave.

George

Charles_I · April 6, 2010, 5:41pm

George-

Has anyone measured latency with the USRP2 and GigE

I’m not sure.

Is GigE alone not capable of handling MAC turnaround times or is
software to blame for this

I think the latency is on hundreds of microseconds, which is greater than,
say, an 802.11 ACK turnaround time (24us).

I would tend to blame Linux and buffering more than GbE itself (MAC +
PHY). Here is an interesting doc where the
researchers were asking similar questions:

http://www.hep.man.ac.uk/u/rich/atlas/docs/atlas_net_note_draft5.pdf

I’m not sure yet how much buffering is done in the USRP2 firmware but we
hope to know shortly as a couple of our guys
are in the process of taking apart the logic, pulling out non-GbE
related sections, and rebuilding.

-Jeff

Charles_I · April 6, 2010, 6:24pm

I tried with a stop-and-wait ARQ and two USRP2s with XCVR2450s, but
the delay was too long and inconsistent. I can’t remember the exact
figures, but definitely up to milliseconds.

Veljko

2010/4/6 George N. [email protected]:

Charles_I · April 6, 2010, 6:50pm

Veljko-

I tried with a stop-and-wait ARQ and two USRP2s with XCVR2450s, but
the delay was too long and inconsistent. I can’t remember the exact
figures, but definitely up to milliseconds.

Do you mean two USRP2s back-to-back? Or both connected to motherboard
ports?

-Jeff

Charles_I · April 6, 2010, 10:28pm

I would tend to blame Linux and buffering more than GbE itself (MAC + PHY). Here is an interesting doc where the
researchers were asking similar questions:

http://www.hep.man.ac.uk/u/rich/atlas/docs/atlas_net_note_draft5.pdf

I’m not sure yet how much buffering is done in the USRP2 firmware but we hope to know shortly as a couple of our guys
are in the process of taking apart the logic, pulling out non-GbE related sections, and rebuilding.

-Jeff

I glanced over the document briefly and was wondering if your analysis
of the linux issue was because of this document, or a separate source.
I’m only asking because the document is 10 years old and is using
RedHat 5 and Pentium 2s. I would assume the linux kernel support for
GigE has improved since then.

Charles

Charles_I · April 6, 2010, 9:28pm

Two independent PC+USRP nodes. All the ACK related logic was
implemented at the Python layer.

Another thing that I tried was to connect the two nodes via Ethernet
(I have two Ethernet NICs in each of the PCs) and then use USRPs for
data and Ethernet for ACKs. I still couldn’t get good results,
although I had some issues with the OFDM decoding latency, so I can’t
really say where the bottleneck was.

Veljko

2010/4/6 Jeff B. [email protected]:

Charles_I · April 6, 2010, 10:45pm

On 04/06/2010 04:19 PM, George N. wrote:

Jeff, I definitely agree that buffering also adds significant latency. How
much of the MAC can you get around? I just think that, there are a number
of people who want the flexibility of the SDR, but want to do MAC research,
and current common SDR architecture is just not good enough. We need lower
latency between the hardware and the host.

Microsoft Research recently built up a new SDR which uses PCI-E to address
the latency issue:
http://research.microsoft.com/en-us/projects/sora/

Is Sora active? The forum seems really quiet. Also they say there is a
strict non-commercial use use license. Also, it seems like they are
using the RF front ends from WARP, a look at the Warp site suggests the
radio board is 2K. Also, they estimate the board price at “several K$”,
so it is not quite WARP prices, but looks to be closing in on it
rapidly. [1]

Philip

[1]
http://social.microsoft.com/Forums/en-US/sora/thread/2701a49b-2ea1-4df6-a85c-d5d01b4ea77c

Charles_I · April 6, 2010, 10:29pm

Jeff, I definitely agree that buffering also adds significant latency.
How
much of the MAC can you get around? I just think that, there are a
number
of people who want the flexibility of the SDR, but want to do MAC
research,
and current common SDR architecture is just not good enough. We need
lower
latency between the hardware and the host.

Microsoft Research recently built up a new SDR which uses PCI-E to
address
the latency issue:
http://research.microsoft.com/en-us/projects/sora/

Their whitepaper is here:
http://research.microsoft.com/apps/pubs/default.aspx?id=79927

I had a paper in the same conference which used several techniques to
split
common MAC functionality between the USRP and the host to reduce the
latency
of time-critical functions (e.g., carrier sense):
http://www.andrew.cmu.edu/user/gnychis/nychis_nsdi09.pdf

I of course believe in my own work, but I also believe that it is not
sufficient to cover all MAC implementations and future novel MACs
PS. it
also has architectural latency measurements (e.g., host → kernel,
kernel →
USRP, USRP → kernel, etc.)… and I posted the code for these
measurements
on CGRAN, for those interested. This is why you have the problems you
have
Veljko, the turnaround time is extremely high. We came up with the
approach
of “fast-ACKs” which are generated from the USRP itself.

This all said… I really think we need a better interface to reduce
latency. Some platforms take the: run on the board approach, such as
WARP
which puts the MAC on a core on the board. Good luck conjuring up
$10-15k
just for a single WARP board plus frontends though

George

Charles_I · April 6, 2010, 10:51pm

Hi George,

2010/4/6 George N. [email protected]:

Their whitepaper is here:
USRP, USRP → kernel, etc.)… and I posted the code for these measurements
on CGRAN, for those interested. This is why you have the problems you have
Veljko, the turnaround time is extremely high. We came up with the approach
of “fast-ACKs” which are generated from the USRP itself.

What I got from your paper is that the matched filter approach for
fast packet detection would not work in an OFDM setting. What about
fast ACK generation? Would it require an IFFT implementation on the
USRP? Would it help much?

This all said… I really think we need a better interface to reduce
latency. Some platforms take the: run on the board approach, such as WARP
which puts the MAC on a core on the board. Good luck conjuring up $10-15k
just for a single WARP board plus frontends though

Is there anything that would prevent GNUradio developers to push the
MAC layer functionality on the USRP?

George

cheers,

Veljko

Charles_I · April 6, 2010, 11:38pm

Philip-

Is Sora active? The forum seems really quiet. Also they say there is a
strict non-commercial use use license. Also, it seems like they are
using the RF front ends from WARP, a look at the Warp site suggests the
radio board is 2K. Also, they estimate the board price at “several K$”,
so it is not quite WARP prices, but looks to be closing in on it
rapidly. [1]

I think you’re touching on an underlying, basic point: Matt et. al.
have spent years developing an RF + server
architecture that both works and is inexpensive. Those two things are
very difficult to integrate. Many tradeoffs
and compromises must be made carefully, with a lot of painstaking trial
and error. Matt’s followers recognized this
some time ago, more recently NI has recognized this. The Sora team may
find it difficult – and likely expensive –
to reliably move very high rate ADC data over some distance, external to
the PC. PCIe-over-cable is one way, but
again, not cheap.

-Jeff

Charles_I · April 6, 2010, 11:58pm

Charles-

I glanced over the document briefly and was wondering if your analysis
of the linux issue was because of this document, or a separate source.
I’m only asking because the document is 10 years old and is using
RedHat 5 and Pentium 2s. I would assume the linux kernel support for
GigE has improved since then.

Which part of the Linux issue… sustained throughput or latency? I
wouldn’t be surprised to find that latency hasn’t
improved substantially because it’s not a priority for server software.
Even VoIP applications are not concerned
about a 1 msec improvement… whereas that makes or breaks a wireless
MAC.

What I found interesting in that particular document is the authors were
careful not to speculate and to use a logic
analyzer to make exact measurements. For me the key figures are GbE
(MAC + PHY) and PCI latencies, which are likely
not too reducible.

-Jeff

Charles_I · April 6, 2010, 11:30pm

George-

Jeff, I definitely agree that buffering also adds significant latency. How
much of the MAC can you get around? I just think that, there are a number
of people who want the flexibility of the SDR, but want to do MAC research,
and current common SDR architecture is just not good enough. We need lower
latency between the hardware and the host.

Microsoft Research recently built up a new SDR which uses PCI-E to address
the latency issue:
http://research.microsoft.com/en-us/projects/sora/

Did you see my previous post about the accelerator PCIe card? To some
extent the Microsoft approach is what we’re
doing. But we want to stay compatible with USRP2 hardware so we connect
GbE to the accelerator card; non MAC-related
dataflow is PCIe from there. Buffering required to stay compatible with
USRP2 software and high, sustained transfer
rates moves “right”, to the accelerator card (which has a lot of
memory).

The real trick is software. Our approach is that MAC-related code still
appears in GNU radio source, but is marked
with pragmas (first something specific to our project, then OpenCL, then
OpenMP) so that code actually runs on the
accelerator card (the TI multicore CPUs on the acclerator card run
arbitrary C/C++ code so they’re not limited like an
Nvidia or other GPU). We plan to use our GNU radio interface to test
results of a server acceleration project we’re
doing for DoE.

That’s the long story… right now our short-term objective is the
GbE-to-GbE USRP2 connection.

BTW, that’s a Virtex 5 on the Sora board, that’s not going to be cheap.

-Jeff

Charles_I · April 7, 2010, 1:43am

Did you see my previous post about the accelerator PCIe card? To some
extent the Microsoft approach is what we’re
doing. But we want to stay compatible with USRP2 hardware so we connect
GbE to the accelerator card; non MAC-related
dataflow is PCIe from there. Buffering required to stay compatible with
USRP2 software and high, sustained transfer
rates moves “right”, to the accelerator card (which has a lot of memory).

Interesting, I didn’t see this post. I tried doing some googling for it
but
I couldn’t turn up with it. What was the subject of the post?

The real trick is software. Our approach is that MAC-related code still
appears in GNU radio source, but is marked
with pragmas (first something specific to our project, then OpenCL, then
OpenMP) so that code actually runs on the
accelerator card (the TI multicore CPUs on the acclerator card run
arbitrary C/C++ code so they’re not limited like an
Nvidia or other GPU). We plan to use our GNU radio interface to test
results of a server acceleration project we’re
doing for DoE.

That’s the long story… right now our short-term objective is the
GbE-to-GbE USRP2 connection.

So right now you’re trying to get low latency, but high throughput,
between
two USRP2’s connected directly via GbE? So you’re not using the
frontend?

Maybe this is explained in your previous post, if so just point me to it

Charles_I · April 7, 2010, 1:46am

PS. if you haven’t seen, SORA is able to interoperate with 802.11g,
which
is impressive. It meets all of the timing requirements. However, it
does
not come with the exact ease of programming that we’re familiar with.
They
do have to push the use of SSE and tradeoff a lot of computation for
memory
to do lookups. This isn’t a major drawback, but it is different. For
those
not necessarily concerned so much with the PHY, but are looking for MAC
development, it would come with a “black box PHY” for the standard
802.11
waveforms that can pull the processing off in time for MAC turnaround.

Charles_I · April 7, 2010, 1:39am

On Tue, Apr 6, 2010 at 5:35 PM, Jeff B. [email protected]
wrote:

lower

radio board is 2K. Also, they estimate the board price at “several K$”,
find it difficult – and likely expensive –
to reliably move very high rate ADC data over some distance, external to
the PC. PCIe-over-cable is one way, but
again, not cheap.

SORA is quiet right now because the boards are not public. To my
understanding, they are providing dozens to research institutions for
research purposes, and then after this phase pushing them public. But,
I’m
not sure. That’s just my impression.

Their original proposed price range of the SORA board was $2k. I’m not
sure
it will hit that price, and you’re right, they’re using a WARP
daughterboard
which is pricey. Luckily, in the academic world we can get our hands on
some of these. CMU was awarded 6 of the SORA boards (which I’m assuming
will come with daughterboards?) for research. Our plan is to connect
them
to our wireless emulator (Wireless Emulator) which is
accessible to anyone. That would allow both us, and anyone who wanted
to,
to use the SORA boards. But, we need to change some of the
infrastructure
to support the PCI-E boards.

I definitely agree with you on the tradeoffs there. There is a pure
tradeoff between cost and performance, and Matt and the USRP hit a great
point for flexibility at the PHY and low cost radios. This to me, is
sufficient for a lot of PHY-level research. As we go up the protocol
stack,
it’s just not sufficient enough. I’m not saying it’s a bad SDR
solution,
it’s just insufficient to work our way up the protocol stack and have an
effective, high throughput, radio. I’m not sure what the answer is to
this… but I’m hoping there is one in the future that facilitates MAC
development at a low cost

Charles_I · April 7, 2010, 1:58am

Hi Veljko,

What I got from your paper is that the matched filter approach for
fast packet detection would not work in an OFDM setting. What about
fast ACK generation? Would it require an IFFT implementation on the
USRP? Would it help much?

It’s a good question, and something I haven’t explored, and I’m not much
of
a DSP guy. So, I’ll punt the question to everyone else who has more DSP
experience than me. Both are all about signal detection, both the
fast-packet detection and fast ACK generation. So what you want to do
is
first detect the preamble in the USRP without decoding (because it’s
complex
and takes long). So, we propose using a matched filter on the USRP to
detect the packet preamble. In 802.11ab, the preamble was done with
BPSK
(even if the data is sent using OFDM in 802.11a). With 802.11g, it can
be a
full OFDM preamble. So, with a full OFDM preamble, you can still detect
it
with a matched filter, but I’m a little unclear about how to generate
the
coefficients. You essentially are detecting in the time domain with the
matched filter. It might require an IFFT on the USRP… anyone? Dan?

Is there anything that would prevent GNUradio developers to push the
MAC layer functionality on the USRP?

The USRP and USRP2, from what I understand, are both tight for space in
the
FPGA. I’m pretty confident you can’t fit an OFDM implementation on the
USRP. There are free multipliers and space on the USRP2, but I think it
would also be tight, leaving you with not much room for the MAC. Then,
you’d be building the MAC in verilog which sucks. Most people who want
to
do MAC development have CS backgrounds, not EE backgrounds, form which
Verilog is black magic

George

Charles_I · April 7, 2010, 4:49am

On 04/06/2010 09:44 PM, John G. wrote:

Which part of the Linux issue… sustained throughput or latency? I wouldn’t be surprised to find that latency hasn’t
improved substantially because it’s not a priority for server software. Even VoIP applications are not concerned
about a 1 msec improvement… whereas that makes or breaks a wireless MAC.

Simple test. Core 2 Duo system, 2.33GHz, Fedora 11.

A 1500 byte ping test to localhost yields an average RTT of about
33usecs. That tests most of
the network stack except for hardware interfaces, and gives you some
notion of “best case”
for latency/turn-around time.

If MACs have requirements that are more aggressive than 20-50usec
turnaround time, then relying
purely on software in a running general-purpose operating system, even
on relatively-good hardware
may be optimistic.

–
Principal Investigator
Shirleys Bay Radio Astronomy Consortium