USRP packet parsing

Hi,

I have created a wiki page describing what USB packet parsing could look
like for the transmit chain:

http://gnuradio.org/trac/wiki/UsrpTxModifications

I am wondering what you think of it; if it is completely wrong or if it
could work with some modifications. I made a wiki page so you can modify
it, add comments, point what’s not right.

Thibaud

P.S. I have got and filled in the FSF letter, it is on its way back to
the FSF.

A comment on your description:

It’s easier to push the separated data into your FIFOs rather than pull
them in.

Brian

Ok, I will modify it for pushing instead of pulling packets.

Can I assume that the data will arrive ordered by timestamps ? If not
then I have to use one data_queue and one samples_fifo per channel,
isn’t that too much?

Thibaud

About the pulling/pushing thing: I though that I would have one process
in the usb block writing the usb_fifo data on the bus and one other
process in the channel block that would read them from the bus and store
them. This is neither pulling nor pushing right ? Can I do this with one
single process ?

Thibaud

On 3/19/07, Thibaud H. [email protected] wrote:

Ok, I will modify it for pushing instead of pulling packets.

Can I assume that the data will arrive ordered by timestamps ? If not
then I have to use one data_queue and one samples_fifo per channel,
isn’t that too much?

I am pretty positive you can assume that the packets will arrive
ordered by timestamp. The host can easily pre-order the packets and
send them sequentially. If any packet is out of order, I would assume
it should be evicted and dropped immediately.

Now, since the 32-bit number can only have a finite amount of time
before it rolls over, there should be a well defined way of defining
the beginning of an epoch or invalidating the “previous timestamp”
register that may be used to ensure time continuity.

Thibaud

Brian

On 3/20/07, Thibaud H. [email protected] wrote:

About the pulling/pushing thing: I though that I would have one process
in the usb block writing the usb_fifo data on the bus and one other
process in the channel block that would read them from the bus and store
them. This is neither pulling nor pushing right ? Can I do this with one
single process ?

I am not really sure I understand what your process does, but the way
I had thought about it was like this.

process( Read and Distribute Packets ) {
if the incoming FIFO has data in it loop
if channel == command then
write command information to command sequencer
else /* must be data */
write data information to channel sequencer
end if
end loop
}

In this process, anything coming into the USB FIFO that has either the
channel or the data endpoint as a target will then “push” that data
into those sequences.

Does that make sense?

Thibaud

Brian

On Mon, Mar 19, 2007 at 06:24:16PM -0400, Thibaud H. wrote:

Ok, I will modify it for pushing instead of pulling packets.

Can I assume that the data will arrive ordered by timestamps?

Yes, I think that’s a reasonable assumption.

If not then I have to use one data_queue and one samples_fifo per
channel, isn’t that too much?

Eric

Brian P. wrote:

process( Read and Distribute Packets ) {
channel or the data endpoint as a target will then “push” that data
into those sequences.

Does that make sense?

Yes! Thanks

Thibaud

The length of the packet is built into the packet, isn’t it? Do we
need to keep a count of the packet length, or just take it from the
packet itself?

Brian

I have updated the wiki page
(http://gnuradio.org/trac/wiki/UsrpTxModifications) to add the processes
description in pseudo-code. If it still looks good to you I will do the
same for the receiving side so we can I have a global view of what’s
going on. It will be done tomorrow.

Thibaud

Something else I noticed was with the channel definition stating that
the IQ data is to be interleaved. This shouldn’t necessarily happen
and there shouldn’t be a problem with having the block rams be in a
x32 configuration and each location has an IQ pair associated with it.
This would reduce down any complexity when dealing with
deinterleaving the data coming out, and add to the readability of the
code.

Comments?

Brian

On 3/21/07, Thibaud H. [email protected] wrote:

How do the fpga know is data is interleaved or not ?

I believe all samples sent down is interleaved over USB to 16-bit I
followed by 16-bit Q samples. These can be concatenated (since they
are of the same sample time) to 1 32-bit number to store within a
block ram within the FPGA.

Doing real-only transmissions could possibly be a status bit to say
what the data format is?

I am still worried about the number of fifo that will be used and their
size. The FPGA looks pretty full. Is there a way to have a memory
separated from the FPGA that I could access through a bus?

We’ll be removing one of the RX channels I believe, which frees up a
multiplier and a whole boatload of memory. Moreover, the Cyclone has
special hard memory blocks that are 4096 bits of dual-port memory.
That gives us 128 locations for complex sample storage in a single
block. If more blocks are used (which are available), you just double
the number every time.

It’s a lot of FIFOs, but if they are necessary then you have to use
them. It’s a trade off that you should be aware of, but is also
easily checked. Download Quartus II from altera.com and compile the
design for the target FPGA. Disable the other RX channel and
re-compile to see the change in resources.

What do you think?

Thibaud

Brian

How do the fpga know is data is interleaved or not ?

I am still worried about the number of fifo that will be used and their
size. The FPGA looks pretty full. Is there a way to have a memory
separated from the FPGA that I could access through a bus?

Thibaud

On Wed, Mar 21, 2007 at 12:45:00PM -0400, Thibaud H. wrote:

How do the fpga know is data is interleaved or not ?

I am still worried about the number of fifo that will be used and their
size. The FPGA looks pretty full. Is there a way to have a memory
separated from the FPGA that I could access through a bus?

Thibaud

I’m not sure you need all those fifos. It seems that after the first
fifo (which gathers the packet and spans the two clock domains), that
rest of the movements are effectively “change of ownership”, with no
need to recopy all the data, perhaps only the identifier of the
particular packet and it’s length.

That is, perhaps think about splitting the RAM into max packet sized
blocks (512 bytes), then pass the ownership of the packets around as
needed. Another way to think about this is to assume that you’ve got
say 4 packet buffers, numbered 0 through 3. Then you could have a
very small fifo between blocks that contained lines that contained
only the buffer ID [0,3] and the active length of the buffer.

Brian

I think we have to assume that people are going to be dealing with all
kinds of data formats, probably mostly I & Q, but definitely with
different number of bits assigned to the I,Q pairs. We already handle
8 and 16, and there are efforts underway to handle 4, 2 and 1 bit
samples.

It’s possible to have dual ported ram with the two ports different
widths. The interface to the FX2 is over the 16-bit GPIF, so that’s
the natural size for one port. 32-bits may make good sense for the
other end.

Eric

On Tue, Mar 20, 2007 at 08:07:04PM -0400, Brian P. wrote:

The length of the packet is built into the packet, isn’t it? Do we
need to keep a count of the packet length, or just take it from the
packet itself?

Brian

The payload length is in the packet. The total length including the
fixed length header is Payload Length + 8 (per
usrp/doc/inband-signaling-usb)

Eric

On Wed, Mar 21, 2007 at 12:52:30PM -0400, Brian P. wrote:

On 3/21/07, Thibaud H. [email protected] wrote:

How do the fpga know is data is interleaved or not ?

I believe all samples sent down is interleaved over USB to 16-bit I
followed by 16-bit Q samples.

See below for more about 1, 2, 4, 8 and 16-bit I/Q.

These can be concatenated (since they are of the same sample time)
to 1 32-bit number to store within a block ram within the FPGA.

Yes.

Doing real-only transmissions could possibly be a status bit to say
what the data format is?

I think that the format should be an attribute of the data channel.
Below is the current format register def. Assume we (eventually)
support 1, 2, 4, 8 and 16 bit components. The half-band bit (B)
probably ought to get moved to a different register.

/*!

  • \brief Specify Rx data format.
  • \param format format specifier
  • Rx data format control register
  • 3                   2                   1
    
  • 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
  • ±----------------------------------------±±±--------±------+
  • | Reserved (Must be zero) |B|Q| WIDTH | SHIFT |
  • ±----------------------------------------±±±--------±------+
  • SHIFT specifies arithmetic right shift [0, 15]
  • WIDTH specifies bit-width of I & Q samples across the USB [1, 16]
    (not all valid)
  • Q if set deliver both I & Q, else just I
  • B if set bypass half-band filter.
  • Right now the acceptable values are:
  • B Q WIDTH SHIFT
  • 0 1 16 0
  • 0 1 8 8
  • More valid combos to come.
  • Default value is 0x00000300 16-bits, 0 shift, deliver both I & Q.
    */

It’s a lot of FIFOs, but if they are necessary then you have to use
them. It’s a trade off that you should be aware of, but is also
easily checked. Download Quartus II from altera.com and compile the
design for the target FPGA. Disable the other RX channel and
re-compile to see the change in resources.

What do you think?

Sounds good to me.

Thibaud

Brian

Eric

On 3/21/07, Thibaud H. [email protected] wrote:

So, if I have correctly understood, I would use dual_clock ram component
(altsyncram for instance) and only pass the packet address (and maybe
its length) to the next block. If the whole packet (including padding)
is stored in the RAM then it’s easy because all my memory block are the
same size, but if I have different packet length then it becomes hairy
to deal with because there are RAM fragmentation issues. Is it ok to
store the padding to simplify the processing ?

You really shouldn’t need the padding in there since the length of the
packet is within the packet itself. The FPGA can use that information
to process the packet.

I suppose we really need to figure out how the timer is going to work
with the scheduling of the entire system. Do we really need a 1/64e6
resolution of commands? How are we going to control the epoch for
TDMA systems? I would think those sort of implementations would
really drive how this connects up to that system.

Brian

So, if I have correctly understood, I would use dual_clock ram component
(altsyncram for instance) and only pass the packet address (and maybe
its length) to the next block. If the whole packet (including padding)
is stored in the RAM then it’s easy because all my memory block are the
same size, but if I have different packet length then it becomes hairy
to deal with because there are RAM fragmentation issues. Is it ok to
store the padding to simplify the processing ?

Brian P. wrote:

packet is within the packet itself. The FPGA can use that information
to process the packet.

Yes, I forgot that the packet are ordered by timestamps, which solved
the fragmentation issues. However I cannot find an Altera RAM
megafunction that provides more that two independent ports. This is not
enough and will prevent the FPGA from processing packets (that have the
same timestamps but use different channels) concurrently.

I suppose we really need to figure out how the timer is going to work
with the scheduling of the entire system. Do we really need a 1/64e6
resolution of commands? How are we going to control the epoch for
TDMA systems? I would think those sort of implementations would
really drive how this connects up to that system.

I think sending one packet per transmit window in the most common.
However I remember having read somewhere on the wiki what the maximum
transmit is but I cannot find it again.

Brian P. wrote:

channels is as easy as a for loop.
I can copy the sample to a fifo, but I still have 3 processes that want
to use the RAM a the same time: One to progressively store the packets
coming from the usb bus, one to copy the samples into the corresponding
channel fifo and one to copy the subcommands to be executed now. So, if
I am not mistaken I will have to find a way to synchronize the two last
processes, right?

I think sending one packet per transmit window in the most common.
However I remember having read somewhere on the wiki what the maximum
transmit is but I cannot find it again.

I am not really thinking about the rates, but more the mechanism as to
how it will be compared. What will the state machine look like? What
is the length of 1 tick? How will it operate to make sure it can send
everything properly? If a FIFO gets half empty, how quickly can we
get more samples to send out of modulated data?

I can write and read from the fifo at the same time, so one process
would be in charge of filling the channel fifo. Two states: either wait
for the timestamps to match the time, or proceed a copy of the samples
from the ram to the fifo. The problem is if there are more than one
channels, then this process can be busy filling in channel 1 fifo while
channel 2 fifo is empty. I don’t know how to solve that.

For instance the packets for one channel can stack up in ram until it’s
full, preventing any other channel to receive data.