FPGA "headroom" in USRP2

Marcus_DSLeech · February 10, 2009, 3:26am

Is there room in the USRP2 for building a 2048 lag autocorrelator with 1
or 2-bit sampling?

–
Marcus L.
Principal Investigator, Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · February 10, 2009, 4:36am

On Mon, Feb 9, 2009 at 6:25 PM, Marcus D. Leech [email protected]
wrote:

Is there room in the USRP2 for building a 2048 lag autocorrelator with 1
or 2-bit sampling?

Most likely. I did the investigation to implement a fully pipelined
2048 point FFT with 16-bit I/Q samples using the Xilinx core generator
software. It looks like it would take up just under half the area in
terms of slices. We’re using about half the slices already, so this
would fill the rest of the chip. I don’t recall what the block ram
usage was, but we’re already using most of those for the CPU/firmware.
I don’t know the effect of having 1- or 2-bit sampling instead of
16. The Xilinx coregen docs are pretty good for this though.

You’d do an FFT on the input array, calculate bin power, then an IFFT
for the ACF output. The FFT core would need to be shared between the
two operations, so that would cut your data rate in at least half, and
lose the ability to pipeline.

There could be an efficient time domain way of doing what you want,
depending on how often you wanted the results.

Johnathan

Marcus_DSLeech · February 10, 2009, 5:23am

On Mon, Feb 9, 2009 at 10:36 PM, Johnathan C.
[email protected] wrote:

usage was, but we’re already using most of those for the CPU/firmware.
I don’t know the effect of having 1- or 2-bit sampling instead of
16. The Xilinx coregen docs are pretty good for this though.

I think it might be better if you actually tried to implement the FFT
based on systolic arrays. They’re very efficient for the streaming
data coming in to the FPGA.

I don’t think the Xilinx core has an option for systolic or streaming
FFT data.

You’d do an FFT on the input array, calculate bin power, then an IFFT
for the ACF output. The FFT core would need to be shared between the
two operations, so that would cut your data rate in at least half, and
lose the ability to pipeline.

There could be an efficient time domain way of doing what you want,
depending on how often you wanted the results.

Speaking of streaming in FPGA’s - I find it helpful to think of what
operations have to be done from sample n to n+1 and what are the
difference in calculations for each function. Sometimes there are
nice streaming shortcuts you can take.

Lastly, since the USRP2 uses a Spartan-3 FPGA, you should definitely
look a using the SRL16 data-storage unit within the FPGA for sample
storage when you need access to many parts of your sample stream, or
the BRAM for lower sample rates or when you only need access to 1 or 2
samples at a time from a large data set.

Using SRL16’s, you could store 2048 2-bit samples in, I believe, 128
slices. Each slice has 2 SRL16’s which can hold 32 1-bit samples and
address two samples at a time (one from each SRL16). 2048 samples at
2-bits/sample gives 4096 storage bits required. Since each slice can
hold 32, 4096 / 32 = 128.

I think I did that math right.

Good luck!

Brian