In-band signaling & dependent packets (i.e., ACK generation)

george_n · November 27, 2007, 11:17pm

Hi all,

I was looking to kick some discussion to the board to get some ideas
here. The in-band signaling code is likely to be used by many to
research MAC protocols, and one thing crucial to CSMA style protocols
are dependent packets which have very strict timing requirements. For
example, an ACK is a dependent packet, it relies on a DATA packet to be
generated.

Generating these packets using the in-band code and the framing of some
physical layer at the host using GNU Radio will be no problem. However,
the latency across the bus will almost guarantee that these packets will
never be transmitted and received in time to meet the strict timing
requirements of most protocols such as 802.11. I’m not saying we’re
trying to meet 802.11 specs, just to giving an example.

So what I would like to discuss are techniques to generating dependent
packets in the USRP without placing the full physical layer in the FPGA.
This would inhibit flexibility, which is a major goal of a development
platform such as GNU Radio. If we wanted to meet 802.11 specs, sure…
we could try this. But it’s not our goal. We want the to keep the
flexibility of using any physical layer.

What I had discussed with Jeff (CC’ed) previously when asked how we
would handle these situations, was the possibility of a technique using
some sort of sample pattern matching. I don’t know how possible this is
in reality, this is at a level I’m still not very familiar with… but
trying to learn There could be space in the FPGA where patterns
could be stored that could be used to look-up incoming streams of
samples.

For example, after decoding one frame successfully in GNU Radio, is it
possible that the host has some general idea of what the start of frame
bits and its address look like (given its framing format) in sample
space? If so, it could pass some amount of samples back to the FPGA to
use in pattern matching.

Of course, it is difficult to tell for sure if the packet can be
successfully decoded without going to the host and using the physical
layer. It may turn out that some of the bits are incorrect using a
checksum, for instance, for which an ACK would not be generated.
Generating a false-ACK could have negative impacts, but it is not so bad
as generating an ACK for data not truly destined to you The higher
layers such as TCP could be used to recover from this error. These are
all trade-offs that are great research material for SDRs and packet
radio which can’t meet timing specs, in my opinion… which I’d love to
explore.

While we cannot detect for sure whether the full packet can be decoded
properly, SNR and other values could be used to make a ballpark guess.
And again, if we guess wrong OK, it’s a performance trade-off that can
have negative impacts if wrong.

Penny for your thoughts. I don’t even know how realistic sample pattern
matching is in the FPGA… I’m open to “you’re crazy” responses, I don’t
typically work at this level. But maybe there’s some other technique
that could be used to generate these packets without incurring the bus
latency.

George

george_n · November 28, 2007, 12:40am

A few questions:

What is the current round trip latency for the in-band code?
Have you tried synchronizing two different USRPs to each other over
the air?
What is the minimum amount of turnaround time you’re looking to
achieve?

Brian

george_n · November 28, 2007, 4:42am

Brian P. wrote:

A few questions:

What is the current round trip latency for the in-band code?

To measure the round trip latency we used three USRPs… two in
contention and a third monitoring. The two in contention would exchange
the channel back and forth by reading the RSSI value from the incoming
packets. To spare the details and cut to the chase, we measured the gap
between the time the channel went idle, the contending node detected it
was idle, and began transmitting. This is essential measuring dependent
packets and host-level carrier sense performance.

The average was 1.96ms and sdev 0.62ms.

Have you tried synchronizing two different USRPs to each other over the air?

What do you mean by synchronizing?

What is the minimum amount of turnaround time you’re looking to achieve?

Tens of microseconds would be great… but I’m not sure if this is
achievable? Hundreds of microseconds would be decent

George

george_n · November 28, 2007, 4:44am

The average was 1.96ms and sdev 0.62ms.

Sorry, I omitted the fact that this was calculated from 200 values.

George

george_n · November 28, 2007, 5:49am

On Nov 27, 2007 10:43 PM, George N. [email protected] wrote:

To measure the round trip latency we used three USRPs… two in
contention and a third monitoring. The two in contention would exchange
the channel back and forth by reading the RSSI value from the incoming
packets. To spare the details and cut to the chase, we measured the gap
between the time the channel went idle, the contending node detected it
was idle, and began transmitting. This is essential measuring dependent
packets and host-level carrier sense performance.

The average was 1.96ms and sdev 0.62ms.

That doesn’t seem too bad to me.

What do you mean by synchronizing?

Lets say for a TDMA MAC, there is a beaconing time that happens every
50ms for 1 ms and a 200us guard time between beacons for a specified
number of radios.

Can you setup a USRP to transmit some data every 50ms, and have a
second USRP lock on to that periodic 50ms transmission and be sure to
be on the air at 51.2ms +/- 10us? Is this a reasonable expectation?

Tens of microseconds would be great… but I’m not sure if this is
achievable? Hundreds of microseconds would be decent

Is there any driving reason for a requirement of tens of microseconds?
Is it mainly to be compatible with 802.11 style systems?

To be honest, I feel that trying to achieve the turnaround of tens of
microseconds is too lofty a goal without creating a special FPGA load
for that specific waveform. I am not saying a custom FPGA load is a
good or bad thing - I just think you can’t go over USB and have the
host do processing to then go back over USB for a response. There’s
just too much to do for it to basically be a real-time system.

I feel that if you give a minimum latency of 2ms that there won’t be
issues creating latency tolerable MAC layers.

On the other hand, being compatible with current waveforms that may be
completely implemented in custom ASICs might be a bit of a problem.

Brian

george_n · November 28, 2007, 5:59am

Lets say for a TDMA MAC, there is a beaconing time that happens every
50ms for 1 ms and a 200us guard time between beacons for a specified
number of radios.

Can you setup a USRP to transmit some data every 50ms, and have a
second USRP lock on to that periodic 50ms transmission and be sure to
be on the air at 51.2ms +/- 10us? Is this a reasonable expectation?

Let me find out

Is there any driving reason for a requirement of tens of microseconds?
Is it mainly to be compatible with 802.11 style systems?

Mainly, yes.

On the other hand, being compatible with current waveforms that may be
completely implemented in custom ASICs might be a bit of a problem.

I totally agree that 2ms is not that bad and is something that MAC
layers can certainly deal with. The goal is an attempt to be compatible
with current waveforms using software radios. Are there techniques that
we can use to try and achieve these low latencies without implementing
the PHY fully in the hardware? This is what I’m interested in. Is any
sort of sample pattern matching possible in the slightest bit?

George

george_n · December 4, 2007, 5:12pm

Is any sort of sample pattern matching possible in the slightest bit?

As a first test, I used one USRP to continuously transmit frames, and
another to dump the raw samples used to decode the frame sync bits of
each frame. Based on 100 different sample dumps, I am finding
absolutely no correlation in the raw samples by calculating the
correlation coefficient. So, raw sample pattern matching does not seem
quite possible.

George

george_n · December 4, 2007, 6:50pm

I don’t think I understand what you’re trying to do here. What frames
were you transmitting? What pattern are you looking for? Do you hope
on performing this operation in the FPGA or on the host?

Sorry for my confusion.

No problem, I don’t mind trying to be clear

Here’s an example of what I want to overcome: latency between the USRP
and host when ACK’ing a DATA frame in any CSMA type protocol.

The frames I was transmitting are home-made frames that just so happen
to use the same sync bits as the GNU Radio GMSK frames. It’s not really
important, but the pattern I am looking for is the start of frame in
sample space.

The “in sample space” is what’s important, because I can obviously find
the framing bit sequence by decoding the samples using the PHY layer…
but this incurs the latency over the bus. What I’m trying to do is
detect the start of frame sequence without using the PHY layer to avoid
this latency.

So as a first experiment I would mark what samples were needed to
actually decode the frame bits, and was trying to see if I could pattern
match these raw samples. If there was a high correlation between the
samples, I could simply implement some functionality in the FPGA to look
for this pattern. This would hopefully be independent of the PHY layer
used.

George

george_n · December 4, 2007, 7:08pm

From some discussion on comp.dsp, it seems as though I’m looking for a
matched filter:
http://groups.google.com/group/comp.dsp/browse_thread/thread/f93d7867f74dbe95#0dc48f2a8ed09e07

If you see what I’m getting at, if I implement a matched filter in the
FPGA (given that it does what I think it does ;)), I can detect incoming
frames without using the PHY layer.

Let’s say that a simple requirement is this: the frame format must have
the destination address directly after the framing sequence. Therefore
I could use the matched filter to detect incoming frames to my address
in the FPGA using a single sequence, without the turnaround time to the
host.

By doing this, I could generate ACKs much faster by storing
pre-modulated data in the FPGA which is triggered.

George

george_n · December 4, 2007, 7:30pm

On Dec 4, 2007 1:07 PM, George N. [email protected] wrote:

From some discussion on comp.dsp, it seems as though I’m looking for a
matched filter:
http://groups.google.com/group/comp.dsp/browse_thread/thread/f93d7867f74dbe95#0dc48f2a8ed09e07

Yes, you are describing a matched filter.

If you see what I’m getting at, if I implement a matched filter in the
FPGA (given that it does what I think it does ;)), I can detect incoming
frames without using the PHY layer.

You’re not getting rid of the PHY layer. You’re incorporating this
mechanism into the PHY layer as opposed to having it within the MAC
layer.

Let’s say that a simple requirement is this: the frame format must have
the destination address directly after the framing sequence. Therefore
I could use the matched filter to detect incoming frames to my address
in the FPGA using a single sequence, without the turnaround time to the
host.

By doing this, I could generate ACKs much faster by storing
pre-modulated data in the FPGA which is triggered.

You can accomplish this by using some simple sign manipulation/zero
insertion and treating the GMSK as a simple PSK that only shifts 90
degrees for each transition. No new multipliers should be required,
but memory will be needed for coefficients and samples.

How many bits is your address and how many bits is your frame sync?

Brian

george_n · December 4, 2007, 5:50pm

On Dec 4, 2007 11:11 AM, George N. [email protected] wrote:

As a first test, I used one USRP to continuously transmit frames, and
another to dump the raw samples used to decode the frame sync bits of
each frame. Based on 100 different sample dumps, I am finding
absolutely no correlation in the raw samples by calculating the
correlation coefficient. So, raw sample pattern matching does not seem
quite possible.

I don’t think I understand what you’re trying to do here. What frames
were you transmitting? What pattern are you looking for? Do you hope
on performing this operation in the FPGA or on the host?

Sorry for my confusion.

Brian

george_n · December 4, 2007, 8:01pm

You’re not getting rid of the PHY layer. You’re incorporating this
mechanism into the PHY layer as opposed to having it within the MAC
layer.

I see, I want to go lower than the PHY layer really…

You can accomplish this by using some simple sign manipulation/zero
insertion and treating the GMSK as a simple PSK that only shifts 90
degrees for each transition. No new multipliers should be required,
but memory will be needed for coefficients and samples.

How many bits is your address and how many bits is your frame sync?

Here’s the thing, I don’t want the solution to be dependent on the
physical layer. The goal is a development platform, and by choosing a
solution dependent on GMSK, I essentially lock all MAC development in to
using GMSK. Is there a solution independent of the PHY layer? I
suppose I’m still not sure

If some mechanism could be built in the FPGA that was highly
reconfigurable, that would be fine… but it seems to be boiling down to
different techniques per-modulation.

George

george_n · December 4, 2007, 8:37pm

On Dec 4, 2007 2:00 PM, George N. [email protected] wrote:

I see, I want to go lower than the PHY layer really…

You can’t go lower than the PHY layer. There’s a reason it’s the
lowest on the stack.

Here’s the thing, I don’t want the solution to be dependent on the
physical layer. The goal is a development platform, and by choosing a
solution dependent on GMSK, I essentially lock all MAC development in to
using GMSK. Is there a solution independent of the PHY layer? I
suppose I’m still not sure

You can use a matched filter for this, but a generalized matched
filter uses multipliers. If you limited yourself to GMSK, PSK, or QAM
you can get away with sign manipulations. OFDM would require
different processing in general.

If some mechanism could be built in the FPGA that was highly
reconfigurable, that would be fine… but it seems to be boiling down to
different techniques per-modulation.

Not really an option for the USRP as it currently is due to the lack
of hardware multipliers. Maybe more of an option for the USRP2.

On a side note, you should not feel bad about making something that
considers a trade-off in the realm of software defined radio. You are
giving up (slight) PHY layer flexibility for much improved latency.
We’re still using an antenna. We’re still using a super heterodyne
receiver. Some things just can’t be software and are limitations -
working around them is fine.

Brian

george_n · December 5, 2007, 2:44am

You can’t go lower than the PHY layer. There’s a reason it’s the
lowest on the stack.

Yeah, stupid comment by me

You can use a matched filter for this, but a generalized matched
filter uses multipliers. If you limited yourself to GMSK, PSK, or QAM
you can get away with sign manipulations. OFDM would require
different processing in general.

If some mechanism could be built in the FPGA that was highly
reconfigurable, that would be fine… but it seems to be boiling down to
different techniques per-modulation.

Not really an option for the USRP as it currently is due to the lack
of hardware multipliers. Maybe more of an option for the USRP2.

So there is no way of getting a generalized matched filter on the USRP?
Is there anything that can be done to get around the hardware
multipliers? If there is absolutely no way, limiting to GMSK, PSK, and
QAM is not that bad. What makes OFDM need different processing? I’m
trying to read up on matched filters now.

On a side note, you should not feel bad about making something that
considers a trade-off in the realm of software defined radio. You are
giving up (slight) PHY layer flexibility for much improved latency.
We’re still using an antenna. We’re still using a super heterodyne
receiver. Some things just can’t be software and are limitations -
working around them is fine.

I agree, it’s not even that this functionality is truly needed. The
host can still be used to decode and generate ACKs. It’s merely an
optional mechanism and certainly better than nothing.

George

george_n · December 5, 2007, 3:10am

On Dec 4, 2007 8:43 PM, George N. [email protected] wrote:

So there is no way of getting a generalized matched filter on the USRP?
Is there anything that can be done to get around the hardware
multipliers? If there is absolutely no way, limiting to GMSK, PSK, and
QAM is not that bad. What makes OFDM need different processing? I’m
trying to read up on matched filters now.

There’s always a way to do something, but you are resource limited.
You can write something that will use 1 complex multiplier and can do
a 64 tap matched filter at a symbol rate of 1Msps. Or you can use 8
complex multipliers to do the same thing since the minimum decimation
of the USRP is 8. You can slice it any way you want and do TDM on the
multipliers you infer or directly instantiate. You can possibly even
generalize the RTL to be parameterized so anyone can re-build it and
re-program the FPGA with ease using their own matched filter.

A matched filter is just a FIR filter that does correlation because
the coefficients are a specific sequence instead of a frequency
response. This is how CDMA works. You “spread” your one bit out over
this PN sequence and then use a matched filter to correlate against
the expected PN sequence. You can either get a 1 or a -1 depending if
you sent a 0 or a 1.

Wikipedia has good entries for both a matched filter and cross
correlation.

http://en.wikipedia.org/wiki/Cross-correlation
http://en.wikipedia.org/wiki/Matched_filter

The reason OFDM is different is because the symbols are actually
presented in the frequency domain, so you have to perform an FFT on a
window of samples and read which tones are present. Because there is
that extra step of taking the FFT first, the time domain samples are
pretty much useless by themselves.

Brian

george_n · January 13, 2008, 6:11pm

On Jan 12, 2008 6:01 PM, George N. [email protected] wrote:

Would any of the FIR filters that already exist in GR be appropriate? I
could then perform the cross-correlation on the output of the block.

If you just give your known coefficients to the filter, then it really
just performs the cross-correlation for you. The magnitude of the
complex output would be the correlation, and the phase would be a good
indicator of how off your symbol timing is from optimal sampling.

Does that answer your question?

Brian

george_n · January 12, 2008, 7:03pm

Thanks Brian,

A matched filter is just a FIR filter that does correlation because
the coefficients are a specific sequence instead of a frequency
response. This is how CDMA works. You “spread” your one bit out over
this PN sequence and then use a matched filter to correlate against
the expected PN sequence. You can either get a 1 or a -1 depending if
you sent a 0 or a 1.

Would any of the FIR filters that already exist in GR be appropriate? I
could then perform the cross-correlation on the output of the block.

George