Forum: GNU Radio FX2 firmware

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Dominik A. (Guest)
on 2009-04-28 23:29
(Received via mailing list)
Hi all!

I am studying the FX2 firmware provided by the USRP package, just to
"get a feeling" for this.

There are a few very old mails on the mail archive stating that an
improvement of the USB bandwidth could be possible if the FX2 timing is
tuned. Does anyone know where the current bottleneck is? Is it the main
loop, or the GPIF state machine?

There was an idea about moving the loop invariant, e.g. one loop only
for tx if only tx chain is enabled. However, my first quick'n dirty
trial didn't change anything (test_usrp_standard_tx to test). At least
it still works :)

It will do to point me in a good direction, I'll find out the rest.
However, it would be faster if someone can direct me.

Best regards
Dominik
Dominik A. (Guest)
on 2009-04-30 00:17
(Received via mailing list)
Hi!

A more specific question on the FX2:

do {          \
      FLOWSTATE = 0x81;      \
      FLOWLOGIC = 0x2d;      \
     FLOWEQ0CTL = 0x26;      \
     FLOWEQ1CTL = 0x00;      \
    FLOWHOLDOFF = 0x04;      \
        FLOWSTB = 0x04;      \
    FLOWSTBEDGE = 0x03;      \
FLOWSTBHPERIOD = 0x02;      \
GPIFHOLDAMOUNT = 0x00;      \
} while (0)

If I have reengineered this correctly (gpif.gpf crashes the current GPIF
Designer, importing gpif.c skips the flow states), you set to transfer
data at rising AND falling edge while in flow state. Is this correct?

What I have found is:
state 1 is flow state (for both waveforms)
for flowstates:
for fifowr:
if TCXpire and TCXpire then - else WEN, BOGUS
Master Strobe Pin "unused", Half Period 2 (=1 clock)
Holdoff pin ="unused",but holdoff not asserted
for fiford:
if TCXpire and TCXpire then - else REN, OE, BOGUS
everything else not changed from fifowr

DP:
fiford:
if TCXpire and TCXpire then S2 else S1
etc....


Btw. there is an application note on flow states (I saw that someone
stated that these are barely documented):
http://www.cypress.com/?rID=12951


Best regards
Dominik
Dominik A. (Guest)
on 2009-04-30 17:53
(Received via mailing list)
Hello,

I was able to increase the USB bandwidth of the rx chain to 40Mb/s if tx
is completely turned off (test_usrp_standard_rx -D 4). However, with
test_usrp_standard_tx -i 8, it won't get beyond 32.7 Mb/s. I am ignoring
under/overruns for now.

Is there a way test wether this is a limitation of my mainboard, the
program or the USRP?

Best regards
Dominik
Dominik A. (Guest)
on 2009-04-30 20:29
(Received via mailing list)
> If I have reengineered this correctly (gpif.gpf crashes the current GPIF
> Designer, importing gpif.c skips the flow states), you set to transfer
> data at rising AND falling edge while in flow state. Is this correct?

I can give the answer to myself ;-) Took a while ...

So, data is transferred on the falling and rising edge of master strobe
(which is not connected to the FPGA). The half period of MSTB is 1 IFCLK
cycle (which is the minimum). Hence, data is actually transferred once
per IFCLK cycle (twice per MSTB cycle). MSTB is toggling at 24 Mhz. This
gives a data rate of 96 Mb/s (16 bit per IFCLK cycle, which runs at 48
Mhz).

IFCLK is generated internally, and output inverted to the FPGA.

Dominik
Dominik A. (Guest)
on 2009-05-01 11:41
(Received via mailing list)
Hi Philip,

 > http://gnuradio.org/trac/wiki/UsrpFAQ/Gen#USB:480M...
http://gnuradio.org/trac/wiki/UsrpFAQ/FX2


We can get beyond. See
http://lists.gnu.org/archive/html/discuss-gnuradio...
Larry achieved 35Mb/s. I got 40Mb/s when receiving. The SSRP sustains
more than 40Mb/s on receiver side
http://oscar.dcarr.org/ssrp/software/firmware/firmware.php .

Also:
http://lists.gnu.org/archive/html/discuss-gnuradio...

So, there are demo firmwares for the FX2 sustaining 50Mb/s (though, I
didn't find them, yet).

Best regards
Dominik
Eric B. (Guest)
on 2009-05-05 22:19
(Received via mailing list)
On Thu, Apr 30, 2009 at 03:51:46PM +0200, Dominik A. wrote:
> Hello,
>
> I was able to increase the USB bandwidth of the rx chain to 40Mb/s if tx
> is completely turned off (test_usrp_standard_rx -D 4). However, with
> test_usrp_standard_tx -i 8, it won't get beyond 32.7 Mb/s. I am ignoring
> under/overruns for now.
>
> Is there a way test wether this is a limitation of my mainboard, the
> program or the USRP?

It's hard to say.  If you've got a logic analyzer you can instrument
the inner loop of the firmware and see if that's the bottleneck or not.

Eric
Dominik A. (Guest)
on 2009-05-06 20:36
(Received via mailing list)
Hi Eric,

Thanks for the answer.

> It's hard to say.  If you've got a logic analyzer you can instrument
> the inner loop of the firmware and see if that's the bottleneck or not.
Unfortunately, I don't have a access to a logic analyer :(

However, I made progress that I am going to share once it is tested and
cleaned up.

Short summary:
When doing RX only, I am at 45 Mb/s (yes! decim=6 works without
underruns). On the TX side, I can't get above 32.7 Mb/s. Now I suspect
that this is a host side bottleneck. On the FX2, if using only one
direction, I am setting the GPIF to loop infinitely. With GPIFABORT=0xFF
to switch if the state changes. Hence there is no main loop left that
could be a bottleneck. The TX state machine now consists of 2 states,
where state one is the idle state, and state 2 transferring data (one
word per clock, as before). The 8051 core is completely out of the data
path. (Auto commit etc.)
Same for RX, except that a few more states were needed.

When RX and TX are needed, the firmware is still faster, though the same
TX bottleneck appears (which is, of course, no big problem because we
already share USB bandwidth).

Do you have, maybe, an idea why TX bandwidth is limited? Interestingly
enough, 32.7 Mb/s is the limit on my computer and my notebook. Of
course, I made the tx loop on the host as short as possible, set
SCHED_FIFO and rtprio to 49, and played with fusb_nblock/size etc.

Dominik
Eric B. (Guest)
on 2009-05-06 21:14
(Received via mailing list)
On Wed, May 06, 2009 at 06:35:36PM +0200, Dominik A. wrote:
>
> Short summary:
> When doing RX only, I am at 45 Mb/s (yes! decim=6 works without
> underruns).

That's great!

> When RX and TX are needed, the firmware is still faster, though the same
> TX bottleneck appears (which is, of course, no big problem because we
> already share USB bandwidth).
>
> Do you have, maybe, an idea why TX bandwidth is limited? Interestingly
> enough, 32.7 Mb/s is the limit on my computer and my notebook.

Not sure.  Could be the EHCI controller, or the host driver, etc.

> Of  course, I made the tx loop on the host as short as possible, set
> SCHED_FIFO and rtprio to 49, and played with fusb_nblock/size etc.

Let us know what else you figure out!

Eric
Stefan Bruens (Guest)
on 2009-05-06 21:42
(Received via mailing list)
On Wednesday 06 May 2009 18:35:36 Dominik A. wrote:
> Do you have, maybe, an idea why TX bandwidth is limited? Interestingly
> enough, 32.7 Mb/s is the limit on my computer and my notebook. Of
> course, I made the tx loop on the host as short as possible, set
> SCHED_FIFO and rtprio to 49, and played with fusb_nblock/size etc.

Are you transmitting random data, or a stream of zeros? In the latter
case
(IIRC), every 6 zeros will have a single 1 added to aid clock recovery,
limiting net bandwidth to 6/7 (which is about 42MByte). Try transmitting
random data, a stream of ones should be fine to.

For detail have a look at the USB spec.

Stefan

--
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
phone: +49 241 53809034     mobile: +49 151 50412019
Marcus D. Leech (Guest)
on 2009-05-06 23:47
(Received via mailing list)
Dominik A. wrote:
> Same for RX, except that a few more states were needed.
Hmmm.  My application is RX-only.   Using 8-bit samples, that 45Mb/s
gives about 20Msps.    I have a QX9770 system running
  at 3.7GHz, but *still* get overruns at two channels, 8Msps per
(complex) channel.  I also get overruns at 16Msps, single-channel.

At 8Msps dual-channel, my application (an all-mode radio astronomy
receiver system) burns up about 2.75CPU on the above-mentioned
  QX9770@3.7GHz (with slower memory that will get upgraded soon!).   I
get overruns a couple of times per minute with this
  setup.

What type of system are you getting reliable 45Mb/s receive throughput
on, and how complicated is your signal processing
  flowgraph?

--

Marcus L.
Principal Investigator, Shirleys Bay Radio Astronomy Consortium
http://www.sbrac.org
Dominik A. (Guest)
on 2009-05-07 10:35
(Received via mailing list)
Hi!

> Hmmm.  My application is RX-only.   Using 8-bit samples, that 45Mb/s
> gives about 20Msps.    I have a QX9770 system running
>   at 3.7GHz, but *still* get overruns at two channels, 8Msps per
> (complex) channel.  I also get overruns at 16Msps, single-channel.
You mean, your system doesn't even sustain 32 Mb/s?

> At 8Msps dual-channel, my application (an all-mode radio astronomy
> receiver system) burns up about 2.75CPU on the above-mentioned
>   QX9770@3.7GHz (with slower memory that will get upgraded soon!).   I
> get overruns a couple of times per minute with this
>   setup.
Could be a problem of your CPU, too. In our lab, our eightcore machine
has overruns, while my notebook with a core 2 duo does not. I have
figured out that this is because of multiprocessor communication, the
eight cores are composed of two quadcore processors, which are themself
two dualcores on one die. Restricting the scheduler (taskset 0x11 app)
to two cores which reside in the same dual core, it was fine, no
overrun. Adding one core, whatever location, and there were overruns.
However, before noticing this fact, I had already turned down cpu usage
of that specific app (the transmitter) down to two cores by aggressive
optimization.

> What type of system are you getting reliable 45Mb/s receive throughput
> on, and how complicated is your signal processing
>   flowgraph?
C2D E6750, 4 Gb RAM, ICH9 USB Controller
I am using test_usrp_standard_rx, no signal processing.

Dominik
Dominik A. (Guest)
on 2009-05-07 10:38
(Received via mailing list)
It is a saw wave (0-255 per packet, upper 8 bits of each short are
zero). Thanks for the info! I will try sending different data this
evening.

Dominik
This topic is locked and can not be replied to.