Interfacing a DSP array card to USRP2

Matt-

We’re working on a project at Signalogic to interface one of our DSP
array PCIe cards to the USRP2. This would
provide a way for one or more TI DSPs to “insert” into the data flow and
run C/C++ code for low-latency and/or other
high performance applications. The idea is that we would modify the
current USRP2 driver (or create an alternative)
so it would read/write to/from the PCIe card instead of the Linux
(motherboard) GbE.

A few general questions at this point:

  1. We would connect the USRP2 to the GbE on our DSP array card. We
    would want to shift latency/delay “downstream” to
    the PCIe card Linux driver interface, and make the GbE-to-GbE interface
    as low latency as possible. Could you give us
    some guidance on which FPGA modules handle buffering for host
    transmit/receive? Is it reasonable we can reduce buffer
    sizes if the array card GbE has a fast response time?

  2. We want to use the GNU radio GMAC as opposed to Xilinx or other
    off-the-shelf core, our thinking being that we can
    make contributions to data rate and latency-reduction discussions, as
    well as tech USRP2 tech support, if we become
    familiar with your core. Can you give us some guidance on a process to
    remove non-GMAC related modules from the
    firmware? Go to the top level and start pulling? Obviously SRAM
    related, CORDIC, and ADC/DAC interfaces, are not
    needed.

  3. Do you have an FPGA internal achitecture block diagram of any type?
    Is there another group you’re aware of doing
    such “major modification” FPGA work that we might talk to?

Thanks.

-Jeff

On 03/30/2010 11:48 AM, Jeff B. wrote:

A few general questions at this point:

  1. We would connect the USRP2 to the GbE on our DSP array card. We
    would want to shift latency/delay “downstream” to the PCIe card Linux
    driver interface, and make the GbE-to-GbE interface as low latency as
    possible. Could you give us some guidance on which FPGA modules
    handle buffering for host transmit/receive?

The mac is all contained in simple_gemac, and above that in
simple_gemac_wrapper:

http://code.ettus.com/redmine/ettus/projects/fpga/repository/revisions/master/show/usrp2/simple_gemac

which is instantiated in u2_core. Most of the buffering happens in
simple_gemac_wrapper in the fifo_2clock_cascade files.

Is it reasonable we can
reduce buffer sizes if the array card GbE has a fast response time?

You could drastically reduce this buffering if you could guarantee fast
response.

  1. We want to use the GNU radio GMAC as opposed to Xilinx or other
    off-the-shelf core, our thinking being that we can make contributions
    to data rate and latency-reduction discussions, as well as tech USRP2
    tech support, if we become familiar with your core. Can you give us
    some guidance on a process to remove non-GMAC related modules from
    the firmware? Go to the top level and start pulling? Obviously SRAM
    related, CORDIC, and ADC/DAC interfaces, are not needed.

I would just start with the u2_core and simple_gemac_wrapper. If you’re
not using the SERDES, that is a good place to start ripping out.

  1. Do you have an FPGA internal achitecture block diagram of any
    type? Is there another group you’re aware of doing such “major
    modification” FPGA work that we might talk to?

There were some on the wiki at one time. If they’re not still there
I’ll post a talk I did which covers the architecture.

If you really just want high speed super low latency connections to
another board, I would suggest using the SERDES (MIMO) interface
instead. It will have less latency than GbE. You just need an FPGA
with gigabit transceivers, or a TI TLK2701 chip to talk to.

Matt

Matt,

Thank you for your email.

The mac is all contained in simple_gemac, and above that in
simple_gemac_wrapper:
http://code.ettus.com/redmine/ettus/projects/fpga/repository/revisions/master/show/usrp2/simple_gemac
simple_gemac_wrapper in the fifo_2clock_cascade files.
which is instantiated in u2_core. Most of the buffering happens in
I would just start with the u2_core and simple_gemac_wrapper. If you’re
not using the SERDES, that is a good place to start ripping out.

Does this imply that we can pull out the aeMB core, the 32K RAM and the
buffer pool under module u2_core ?

To carry out preliminary testing we need to be able to pass data to the
gemac and configure appropriate control registers. Could you please
suggest what existing modules we could reuse to send data to the gemac ?

  1. Do you have an FPGA internal achitecture block diagram of any
    type? Is there another group you’re aware of doing such “major
    modification” FPGA work that we might talk to?

There were some on the wiki at one time. If they’re not still there
I’ll post a talk I did which covers the architecture.

I have looked at the wiki (http://gnuradio.org/redmine/wiki/gnuradio),
however i was not able to find any block diagrams for the internal
architecture of the FPGA for USRP2. I still might not be look at the
right place. Could you please point me in the right direction ?

From forum discussions over the past couple of months it appears that
USRP2 does not support the 10/100 mode. Could you please help us
understand the work effort involved in getting the 10/100 mode working ?

Thanks and Regards,
Vikram.

Matt-

About Vikram’s 10/100 mode question, we were wondering if it’s a design
flaw; i.e. something wrong from the start in
the original opencores.org source, or if it’s fixable but hasn’t been a
high priority item given USRP2’s high data
rate requirements. But then I found this post:

USRP2 gigabit ethernet code - GNU Radio - Ruby-Forum

So I guess the former. If there are any hints you can give on what’s
wrong, we can take a look. Maybe our guys can
get it working.

-Jeff

On 04/07/2010 05:58 PM, Vikram Ragukumar wrote:

not using the SERDES, that is a good place to start ripping out.

Does this imply that we can pull out the aeMB core, the 32K RAM and the
buffer pool under module u2_core ?

You can pull out whatever you want. Start from scratch if you like.
But if you take out the processor, you’ll need to find some other way to
get the peripherals (DAC, clock gen, lsdac, etc.) programmed.

To carry out preliminary testing we need to be able to pass data to the
gemac and configure appropriate control registers. Could you please
suggest what existing modules we could reuse to send data to the gemac ?

You’re going to need to look at the code. The processor does all that
now.

Matt

On 04/07/2010 09:10 PM, Jeff B. wrote:

Matt-

About Vikram’s 10/100 mode question, we were wondering if it’s a design flaw; i.e. something wrong from the start in
the original opencores.org source, or if it’s fixable but hasn’t been a high priority item given USRP2’s high data
rate requirements. But then I found this post:

USRP2 gigabit ethernet code - GNU Radio - Ruby-Forum

So I guess the former. If there are any hints you can give on what’s wrong, we can take a look. Maybe our guys can
get it working.

Eventually I got completely fed up with the opencores gige core and
wrote a completely new one from scratch (the simple_gemac). It only
does gigabit, though.

One of the main problems with 10/100/1000 in a spartan is that you need
a large number of clocks and the S3 is constrained. If you just want
10/100 and can live without gigabit, I would suggest doing that, as it
will save 1 or 2 clocks.

Matt

My understanding is that it takes 3 BUFGs and one DCM for tri-mode (maybe one more of each for RGMII support but I
don’t see that) and, between this and other USRP2 needs, you ran into the limit of 8. Is that accurate? Or would
10/100/1000 support would take more than 3…

I can’t say how many clocks a good 10/100/1G system would need, but
the Opencore required 4. One thing to keep in mind is that while there
are theoretically 8 global clocks in the S3, other limitations mean that
it can be difficult to use all 8.

Matt

Matt-

Eventually I got completely fed up with the opencores gige core and
wrote a completely new one from scratch (the simple_gemac). It only
does gigabit, though.

Ok.

One of the main problems with 10/100/1000 in a spartan is that you need
a large number of clocks and the S3 is constrained.

My understanding is that it takes 3 BUFGs and one DCM for tri-mode
(maybe one more of each for RGMII support but I
don’t see that) and, between this and other USRP2 needs, you ran into
the limit of 8. Is that accurate? Or would
10/100/1000 support would take more than 3…

If you just want
10/100 and can live without gigabit, I would suggest doing that, as it
will save 1 or 2 clocks.

Ok… maybe we can implement both modes if we’re careful.

-Jeff

Matt,

  1. Do you have an FPGA internal achitecture block diagram of any
    type? Is there another group you’re aware of doing such “major
    modification” FPGA work that we might talk to?

There were some on the wiki at one time. If they’re not still there
I’ll post a talk I did which covers the architecture.

I have looked at the wiki (http://gnuradio.org/redmine/wiki/gnuradio),
however i was not able to find any block diagrams for the internal
architecture of the FPGA for USRP2. I still might not be look in the
correct place. Could you please point me in the right direction ?

It would be nice if you could post the presentation that you made, that
covered the architecture.

Thanks and Regards,
Vikram.

Matt,

In our effort to distill the gemac core and related logic, we have
pulled out the following module under u2_core
SERDES, Dsp core, UART, external RAM interface and the buffer pool

The mac is all contained in simple_gemac, and above that in
simple_gemac_wrapper:
which is instantiated in u2_core. Most of the buffering happens in
simple_gemac_wrapper in the fifo_2clock_cascade files.

(a) Is any buffering for the gemac done using buffers in the buffer pool
or is it ok to eliminate that module all together ?

(b) The synthesis report currently shows that 24 BRAM’s are being used
by the design. Does this sound about right ? Are there modules unrelated
to gemac or aeMB that we can pull out, to reduce BRAM usage ?

Thanks and Regards,
Vikram.

Matt-

My understanding is that it takes 3 BUFGs and one DCM for tri-mode (maybe one more of each for RGMII support but I
don’t see that) and, between this and other USRP2 needs, you ran into the limit of 8. Is that accurate? Or would
10/100/1000 support would take more than 3…

I can’t say how many clocks a good 10/100/1G system would need, but
the Opencore required 4. One thing to keep in mind is that while there
are theoretically 8 global clocks in the S3, other limitations mean that
it can be difficult to use all 8.

Ok thanks Matt.

-Jeff

On 04/12/2010 05:22 PM, Vikram Ragukumar wrote:

(a) Is any buffering for the gemac done using buffers in the buffer pool
or is it ok to eliminate that module all together ?

Yes, it does buffering. You can get rid of it, but then you’ll need to
create a module which holds the FIFO contents until there is a complete
packet. Otherwise, the ethernet will start sending the packet before
you have a complete packet there.

(b) The synthesis report currently shows that 24 BRAM’s are being used
by the design. Does this sound about right ? Are there modules unrelated
to gemac or aeMB that we can pull out, to reduce BRAM usage ?

You’re going to need to do your own exploration. ISE has a feature to
tell you which modules are using block rams.

Matt