MAC layer development and USRP2

Charles_I · April 7, 2010, 3:50am

Which part of the Linux issue… sustained throughput or latency? I wouldn’t be surprised to find that latency hasn’t
improved substantially because it’s not a priority for server software. Even VoIP applications are not concerned
about a 1 msec improvement… whereas that makes or breaks a wireless MAC.

I know that in the early days of Linux development, David M. spent
a lot of time making sure that the Ethernet layer could reliably send
and receive more than 1 MByte/sec via TCP over 10 megabit Ethernet,
and more than 10 MBytes/sec over TCP on 100 megabit Ethernet. I watched
his measurements and his kernel evolve to make it happen (learning from
and
improving on Van Jacobson’s early work making 68000-based Sun-2’s move

1MByte/sec over TCP on original Ethernet).

You might say, “That’s only 90%, surely he can do better,” but
that’s 90% of the raw bit rate, delivered flow controlled and error-free
at the TCP socket layer (all the overhead, from interframe spacing to
preambles to CRCs to packet headers, goes in the 10%).

As you might expect, pumping the data through required keeping all
parts of both systems working in overlap. “One packet being assembled
to transmit, one received packet being picked apart, and one packet
flying on the medium”, at all times. If these two software jobs can
both run in one packet time, you win (and don’t need much if any
buffering, keeping latency very low). These code paths were heavily
scrutinized and optimized for the common cases.

I haven’t kept track of who’s measuring Linux kernel GigE thruput
recently. Here’s a pointer to a 2001 study:

http://www.csm.ornl.gov/~dunigan/netperf/bulk.html

Most people care about TCP speed, but making fast paths for TCP
usually makes even faster paths for the UDP packets that USRP2 will be
using soon.

John

Charles_I · April 7, 2010, 5:31pm

George-

and takes long). So, we propose using a matched filter on the USRP to

WARP
would also be tight, leaving you with not much room for the MAC. Then,
you’d be building the MAC in verilog which sucks. Most people who want to
do MAC development have CS backgrounds, not EE backgrounds, form which
Verilog is black magic

To cover a wide range of MAC layer standards at fast RF data rates, some
type of processing element is required in the
front-end data flow; i.e. before the x86 server. One way is an embedded
processor core in the FPGA that runs C code
and has a library of useful stuff (matched filtering, iFFT, etc) that
look like basic function calls, but are
implemented as parallel structures in FPGA logic, outside the processing
core. C/C++ code calls the function, waits
some number of clock ticks or gets a callback, and it’s done (well, more
or less). This approach both abstracts the
FPGA logic to the C/C++ programmer and gives the FPGA more flexibility
(i.e. reduces the number of applications where
people need to reprogram the FPGA).

I would guess that between Matt and NI guys they’re planning (if not
already started) on developing a more powerful
version of the USRP2, with larger FPGA. My understanding is that Matt
originally chose Spartan-3 because it was
Xilinx’s highest performance FPGA (with reasonable chip cost) that would
still allow developers to use WebPACK.
Evidently he had to move to S-3 2000 for more capacity, although WebPACK
only supports up to S-3 1500. That means
that GNU radio users who want to modify the FPGA already need the “paid
for” Xilinx ISE tools… and I can tell you
from experience that Xilinx holds its tools in high regard and charges
accordingly.

For these reasons – not to mention competition from people like Lyrtech
and Sora, maybe something NI guys pay more
attention to than Matt – a USRP2 with Virtex 5 or 6 starts to make
sense.

-Jeff

Charles_I · April 7, 2010, 6:39pm

Marcus-

33usecs. That tests most of
the network stack except for hardware interfaces, and gives you some
notion of “best case”
for latency/turn-around time.

If MACs have requirements that are more aggressive than 20-50usec
turnaround time, then relying
purely on software in a running general-purpose operating system, even
on relatively-good hardware
may be optimistic.

I think there is no way to avoid that MAC-related processing has to be
done prior to the server motherboard.

-Jeff

Charles_I · April 7, 2010, 4:45pm

George-

I couldn’t turn up with it. What was the subject of the post?
Here is the archive copy:

[Discuss-gnuradio] interfacing a DSP array card to USRP2

That’s the long story… right now our short-term objective is the
GbE-to-GbE USRP2 connection.

So right now you’re trying to get low latency, but high throughput, between
two USRP2’s connected directly via GbE? So you’re not using the frontend?

No, one USRP2 connected to the accelerator card (which is PCI or PCIe).
We want to stay as compatible as possible
with all USRP2 hardware.

-Jeff

Charles_I · April 7, 2010, 6:58pm

John-

improving on Van Jacobson’s early work making 68000-based Sun-2’s move
flying on the medium", at all times. If these two software jobs can
usually makes even faster paths for the UDP packets that USRP2 will be
using soon.

I can believe Linux Ethernet handling is fast and gets faster all the
time… but with most of the emphasis on
throughput. I’m still looking for studies that focus on latency; i.e.
turn-around time (or RTT), and use recent Linux
and motherboards. Probably such data is harder to find becuase in most
applications (over long routes), improving RTT
at the motherboard + kernel level won’t be worth the effort.

-Jeff

Charles_I · April 7, 2010, 10:58pm

Dear All,

Regarding MAC layer development I would like to empasize on the
importance of time-stamps. With time-stamps we can at least do slotted
schemes. Maybe non-slotted schemes can be approximated by slotted ones ?

BR/
Per

Charles_I · April 7, 2010, 11:26pm

On Wed, Apr 7, 2010 at 4:54 PM, Per Z.
[email protected]wrote:

Dear All,

Regarding MAC layer development I would like to empasize on the importance
of time-stamps. With time-stamps we can at least do slotted schemes. Maybe
non-slotted schemes can be approximated by slotted ones ?

Hi Per,

I’m not sure what you’re attempting to do, and if you’re tried the USRP1
inband timestamps, but I do have a slotted scheme I am looking for
someone
to test:

http://www.mail-archive.com/[email protected]/msg23432.html

While I say “anyone interested in build?” I have one ready for someone
to
help test:
https://www.cgran.org/browser/projects/cmu_macs/trunk/src/lib/tmac.cc#L109

Imagine a basestation and a client in the network. Therefore you have 2
slots per round. With the MAC, you specify the slot time and the guard
time, and then when you run it, the client uses the timestamp when the
basestation’s beacon was received to determine how to align its own
slots.
It uses a TX timestamp to align its transmissions very tightly. I was
able
to achieve microsecond level guard bands.

George