FPGA / new rx_buffer_inband

soderic · September 7, 2008, 10:30am

For those interested,

I’ve committed my changes to the inband RX buffering subsystem. It is
available at:

http://gnuradio.org/svn/gnuradio/branches/developers/ets/inband/usrp/fpga

Source only at the moment, I’m not sure of the policy for RBFs in dev
branches.

Caveat emptor. The module has been completely rewritten and has not
been
extensively tested.

I do get occasional overruns at decim 64. I’m pretty sure they are due
to
my host.

The preliminary results look promising however:

918303 delta: 8064

926367 delta: 8064

934431 delta: 8064

942495 delta: 8064

950559 delta: 8064

958623 delta: 8064

966687 delta: 8064

.

If anyone tests it, please let me know your results.

–ets

soderic · September 7, 2008, 3:21pm

On Sun, Sep 7, 2008 at 4:28 AM, Eric S.
[email protected] wrote:

918303 delta: 8064

966687 delta: 8064

Were these results for the 2 channel mode?

What is the size difference within the USRP when using 2 channels? Is
it significant or pretty insignificant?

Brian

soderic · September 7, 2008, 10:48pm

-----Original Message-----
From: Brian P. [mailto:[email protected]]
Sent: Sunday, September 07, 2008 7:21 AM

Were these results for the 2 channel mode?

What is the size difference within the USRP when using 2 channels? Is
it significant or pretty insignificant?

They were from the 1 channel build.

Here is a snippet from inband_2rxhb_2tx.rbf:

ch: 0 s: 965745 delta: 8064
ch: 1 s: 965745 delta: 8064
ch: 0 s: 973809 delta: 8064
ch: 1 s: 973809 delta: 8064
ch: 0 s: 981873 delta: 8064
ch: 1 s: 981873 delta: 8064
ch: 0 s: 989937 delta: 8064
ch: 1 s: 989937 delta: 8064
ch: 0 s: 998001 delta: 8064
ch: 1 s: 998001 delta: 8064

The channels seem well aligned at least.

Unfortunately, I am getting regular segfaults on 2rx… :-/

Resource Comparison:

New 2 chan:
; Compilation Hierarchy Node ; Logic
Cells ;
LC Registers ; Memory Bits ; M4Ks
; |rx_buffer_inband:rx_buffer| ; 983 (59)
;
697 ; 72576 ; 18
; |packet_builder:pb| ; 162 (162)
;
19 ; 0 ; 0
; |rx_channel_buffer:cb[0].chan_buf[0]| ; 249 (68)
;
209 ; 24192 ; 6

Orig 2 chan:
; Compilation Hierarchy Node ; Logic
Cells ;
LC Registers ; Memory Bits ; M4Ks
; |rx_buffer_inband:rx_buffer| ; 546 (119)
;
318 ; 114688 ; 28
; |packet_builder:rx_pkt_builer| ; 137 (137)
;
41 ; 0 ; 0
; |fifo_1kx16:generate_channel_fifos[0].rx_chan_fifo| ; 43 (0)
;
33 ; 16384 ; 4
; |fifo_4kx16_dc:rx_usb_fifo| ; 158 (0)
;
134 ; 65536 ; 16

I haven’t done any optimization yet, so I’m not sure how much the logic
cell
utilization could be reduced. I imagine a fair amount of it comes from
my
use of 64 bit wide data paths for header data. I may try a hybrid
version
where the header data is muxed into a 16b header fifo. There shouldn’t
be
any performance penalty except for very short packets (< 64 bytes, e.g.
command channel). The header fifo capacity would be better utilized
then as
well.

The new code has 4+2 M4Ks (vs just 4) per channel, but with no usb_fifo
(+16
M4K), so memory comparison is not really apples to apples. We could
split
the usb_fifo memory between the channels and give each 2-4k per channel.

–ets

soderic · September 8, 2008, 12:44am

-----Original Message-----
From: Brian P. [mailto:[email protected]]
Sent: Sunday, September 07, 2008 3:03 PM

I am disappointed by the prioritization scheme for the channels and
would have rather seen a more fair round-robin approach instead of a
priority encoder.

Man, tough crowd!

But you are correct. RR should be a pretty easy modification.

I hope you still plan on changing the interfaces of the modules to be
more generic.

Yes, but I wanted to alter as little as possible on my first cut.
rx_buffer_inband seemed like a good boundary. Once I gain some more
confidence with the system I’ll be more brave. Internally there is
definitely some room for improvement too…

On that note, I’d like to get refactoring suggestions.

–ets

soderic · September 7, 2008, 11:23pm

On Sun, Sep 7, 2008 at 4:47 PM, Eric S.
[email protected] wrote:

ch: 0 s: 989937 delta: 8064
ch: 1 s: 989937 delta: 8064
ch: 0 s: 998001 delta: 8064
ch: 1 s: 998001 delta: 8064

The channels seem well aligned at least.

That looks good.

Unfortunately, I am getting regular segfaults on 2rx… :-/

That sounds bad.

209 ; 24192 ; 6
33 ; 16384 ; 4
well.
There are a lot of different optimization tools built into Quartus
that will let you know where the largest parts of the design are so
you can focus on the points that really matter.

The new code has 4+2 M4Ks (vs just 4) per channel, but with no usb_fifo (+16
M4K), so memory comparison is not really apples to apples. We could split
the usb_fifo memory between the channels and give each 2-4k per channel.

I am disappointed by the prioritization scheme for the channels and
would have rather seen a more fair round-robin approach instead of a
priority encoder.

I hope you still plan on changing the interfaces of the modules to be
more generic.

Brian

soderic · September 8, 2008, 3:59pm

On Sun, Sep 7, 2008 at 6:43 PM, Eric S.
[email protected] wrote:

On that note, I’d like to get refactoring suggestions.

I don’t like all the different memory megacells. Wrap the
altsyncram/fifo with the generics you require and pass those in using
the same module underneath. It reduces clutter for having more of the
same thing.

You shouldn’t negate the clock going into the asynchronous FIFOs at
the rx_channel_buffer level - do it at the input of the module as it
is more clear what you intend.

I’d separate out the FIFO muxing to its own module. I’d separate out
the banking of the FIFOs (command and data) to be their own individual
modules. I’d try to accommodate {1,2,4,8,16} bits per sample. I’d
try to accommodate both complex and real sampling for each of the
different bit widths used. I’d make buffer lengths generic at the top
level. I’d accommodate for channel loop-back, bypassing the HB/CIC
filters and simply strobing the samples back to the RX side.

Those are just some ideas. Make the interfaces as simple as possible
and most everything should fall right into place.

Brian

soderic · September 8, 2008, 8:27pm

Thanks for the input Brian. A few comments inline.

I don’t like all the different memory megacells. Wrap the
altsyncram/fifo with the generics you require and pass those in using
the same module underneath. It reduces clutter for having more of the
same thing.

Are you talking about not using the megacells at all? I see your point,
but
I don’t see it as a high priority.

However, there are areas that I think fifos are restrictive/inefficient
A
lot of the logic ultimately boils down to maintaining counters to gate
the
fifo counters; it seems redundant. Why not just use the dp-ram
directly?
Not to mention that non-sequential access could be handy in some places.

You shouldn’t negate the clock going into the asynchronous FIFOs at
the rx_channel_buffer level - do it at the input of the module as it
is more clear what you intend.

I had thought about just inverting usbclock so all the modules would be
posedge. The negedge logic was a last minute change, I had overlooked
that
aspect of the FX2 interface initially.

I’d separate out the FIFO muxing to its own module. I’d separate out

Yes.

the banking of the FIFOs (command and data) to be their own individual

I’d like to see the command channel I/O more integrated and away from
the
other channels.

modules. I’d try to accommodate {1,2,4,8,16} bits per sample. I’d
try to accommodate both complex and real sampling for each of the

The issue of I/Q or not is kind of tricky, since it means the difference
between 1 and 2 channels, and the logic would span multiple channel
buffers.
I think a new module in front of the buffers is in order. A sample
switcher
of sorts.

different bit widths used. I’d make buffer lengths generic at the top
level. I’d accommodate for channel loop-back, bypassing the HB/CIC
filters and simply strobing the samples back to the RX side.

I’m pretty sure that logic would be outside of the channel buffer.
Maybe a
feature of the “sample switcher”.

Those are just some ideas. Make the interfaces as simple as possible
and most everything should fall right into place.

All good ideas. FYI, I am focused on getting functionality in the near
term. I certainly want the architecture to be as well designed as
possible,
but getting the functionality with the least changes/time is my first
priority.

–ets

soderic · September 8, 2008, 10:27pm

-----Original Message-----
From: Brian P. [mailto:[email protected]]

No. It just seems silly to have a hard wrapper (fifo_1kx16_dc_la) for
a generic wrapper (dcfifo). It would be nice if there was an
identification of the different FIFOs and made those aspects generics
to a top level which would then just instantiate the generic wrapper
with most of the same generics already instantiated.

OK, I get you now.

How many of the models in the megacells directory are we really using?
Maybe it’s time it gets cleaned up.

I haven’t inventoried them, but I think a lot of them are used,
depending on
the particular configuration being built.

Working with fully synchronous FIFOs, I’d say go and roll your own as
a dpram from the template. If you’re working with dual clocks, I’d
suggest you stick with their implementations unless you want to worry
about the clock boundary. It’s generally not fun and pretty
cumbersome.

I have no direct knowledge regarding this, but from reading the Cyclone
datasheets it seems like the MK4 blocks natively support two fully
independent asynchronous r/w ports. Where would the problems arise? A
clock boundary buffer would be needed to pass size info but that sounds
like
a good generic construct to have handy anyway. Maybe I’ll play around
with
it when I don’t have anything better to do. E.g not anytime soon.

The only thing missing from the current setup is getting the RX and TX
timestamp clocks from the same source. Change that and I think the
level of functionality promised by the previous inband code is pretty
much completed. Don’t you agree?

I’ll agree when rx and tx timestamps are accurate and the tx scheduling
works well (there is at least one quick fix needed there). I think the
new
rx code still has some issues to be worked out too. As soon as it
“works”,
I’m good.

I’m not even too concerned about the dual timestamp counters at the
moment.
Is there any suspicion that they are ever unaligned?

There does seem to be a real concern for not meeting timing requirements
too. I haven’t done any in-depth testing but I get worrisome timing
warnings during synthesis. It seems that the master_clock rates are
pushing
the boundaries of the current design.

There are also some inband protocol ambiguities that I think should be
addressed before endeavoring forward on major refactoring. I’ll post
more
about that some other time.

–ets

soderic · September 8, 2008, 10:50pm

On Mon, Sep 8, 2008 at 4:26 PM, Eric S.
[email protected] wrote:

I have no direct knowledge regarding this, but from reading the Cyclone
datasheets it seems like the MK4 blocks natively support two fully
independent asynchronous r/w ports. Where would the problems arise? A
clock boundary buffer would be needed to pass size info but that sounds like
a good generic construct to have handy anyway. Maybe I’ll play around with
it when I don’t have anything better to do. E.g not anytime soon.

Problems arise when you want to ensure your FIFO is not full and not
empty. As a message passing mechanism, it works pretty well as just a
DPRAM. As the FIFO, there are counters involved which can cause
timing issues when dealing with asynchronous clocks.

I’m not even too concerned about the dual timestamp counters at the moment.
Is there any suspicion that they are ever unaligned?

If it’s coming from two different parts of the chip with two
independent reset signals, then yes there is always suspicion that
they are unaligned.

There does seem to be a real concern for not meeting timing requirements
too. I haven’t done any in-depth testing but I get worrisome timing
warnings during synthesis. It seems that the master_clock rates are pushing
the boundaries of the current design.

Such as? Is the design fully constrained? Yes, 64MHz can be pushing
the limits for that part as it gets full - especially with deep logic.

Are you sure you’re currently meeting timing?

There are also some inband protocol ambiguities that I think should be
addressed before endeavoring forward on major refactoring. I’ll post more
about that some other time.

OK.

Brian

soderic · September 8, 2008, 8:59pm

On Mon, Sep 8, 2008 at 2:27 PM, Eric S.
[email protected] wrote:

Thanks for the input Brian. A few comments inline.

Are you talking about not using the megacells at all? I see your point, but
I don’t see it as a high priority.

No. It just seems silly to have a hard wrapper (fifo_1kx16_dc_la) for
a generic wrapper (dcfifo). It would be nice if there was an
identification of the different FIFOs and made those aspects generics
to a top level which would then just instantiate the generic wrapper
with most of the same generics already instantiated.

How many of the models in the megacells directory are we really using?
Maybe it’s time it gets cleaned up.

However, there are areas that I think fifos are restrictive/inefficient A
lot of the logic ultimately boils down to maintaining counters to gate the
fifo counters; it seems redundant. Why not just use the dp-ram directly?
Not to mention that non-sequential access could be handy in some places.

Working with fully synchronous FIFOs, I’d say go and roll your own as
a dpram from the template. If you’re working with dual clocks, I’d
suggest you stick with their implementations unless you want to worry
about the clock boundary. It’s generally not fun and pretty
cumbersome.

I’d like to see the command channel I/O more integrated and away from the

different bit widths used. I’d make buffer lengths generic at the top
term. I certainly want the architecture to be as well designed as possible,
but getting the functionality with the least changes/time is my first
priority.

OK. I think the interfaces are all broken since they are all
hardcoded and can’t be extensible easily at the top level. Maybe you
could replace those first, and then move on more functionality?

The only thing missing from the current setup is getting the RX and TX
timestamp clocks from the same source. Change that and I think the
level of functionality promised by the previous inband code is pretty
much completed. Don’t you agree?

Brian

soderic · September 8, 2008, 10:55pm

Just going to chime in here with nothing really too important, because
this isn’t really my area. But, I am trying to follow discussion here,
and I appreciate you working on this Eric.

As for your segmentation faults, if you e-mail me your RBF where you’re
getting these, I can poke at why you’re getting a fault. It’s a bit
odd, as I do many sanity checks on the incoming packets from the USRP…
but you may have introduced something I am not checking for.

Thanks!
George

soderic · September 8, 2008, 11:16pm

On Mon, Sep 08, 2008 at 04:49:23PM -0400, Brian P. wrote:

independent reset signals, then yes there is always suspicion that
they are unaligned.

That there ever were two counters was a bug.

Eric

soderic · September 8, 2008, 11:20pm

Eric B. wrote:

they are unaligned.

That there ever were two counters was a bug.

I’m not sure what code you started from, Eric S. … I had mentioned I
had some fixes in my own developer branch, one of which was making sure
there was only 1 counter:
http://gnuradio.org/trac/changeset/8987
http://gnuradio.org/trac/changeset/8988
http://gnuradio.org/trac/changeset/8990

I think that we’re talking about the same thing here.

George

soderic · September 8, 2008, 11:21pm

USRP…
but you may have introduced something I am not checking for.

I’m starting my journey through the inband/m-block host code, but I’m
hoping
someone might be able to give me a head start.

I have enabled packet logging in usrp_rx.cc, but I think the crash
happens
prior to logging the guilty packet; at least it doesn’t look like
anything
is out of place in the logged packets.

Setting verbose in usrp_server doesn’t seem to provide any useful data
either.

I can only assume that I am sending something bad; but crashing the host
side code isn’t good… Can you think of any field that I could screw
up
that might cause this? Payload length seems the obvious suspect. What
about channel number? What if I sent a packet to a non-existent channel
for
instance?

From the core:
(gdb) backtrace
#0 usrp_server::handle_response_usrp_read (this=0x805f450,
data=@0xb7362d94)
at
/opt/boost_1_36/include/boost-1_36/boost/detail/sp_counted_base_gcc_x86.hpp:
66
#1 0xb7ed8e97 in usrp_server::handle_message (this=0x805f450,
msg=@0xb73632dc) at usrp_server.cc:274
#2 0xb7bdd8ba in mb_mblock::main_loop (this=0x805f450) at
mb_mblock.cc:82
#3 0xb7bf69fe in mb_worker::worker_thread_top_level (this=0x805f2f8) at
mb_worker.cc:168
#4 0xb7bf6d0f in mb_worker::run_undetached (this=0x805f2f8,
ignored=0x0) at
mb_worker.cc:109
#5 0xb7bc3607 in omni_thread_wrapper (ptr=0x805f2f8) at posix.cc:450
#6 0xb7d5246b in start_thread () from
/lib/tls/i686/cmov/libpthread.so.0
#7 0xb7cd66de in clone () from /lib/tls/i686/cmov/libc.so.6

RBFs here:
http://www.schneider-group.com/gnuradio/

–ets

soderic · September 9, 2008, 12:07am

-----Original Message-----
From: Brian P. [mailto:[email protected]]
If it’s coming from two different parts of the chip with two
independent reset signals, then yes there is always suspicion that
they are unaligned.

I didn’t realize there were different resets. It should be changed
anyway.

Are you sure you’re currently meeting timing?

No

But the previous inband build reported the same master_clock warnings
during
synthesis.

Most of my logic is on usbclock, which reported no timing problems.

–ets

soderic · September 8, 2008, 11:42pm

-----Original Message-----
From: George N. [mailto:[email protected]]
I’m not sure what code you started from, Eric S. … I had mentioned I
had some fixes in my own developer branch, one of which was making sure
there was only 1 counter:
http://gnuradio.org/trac/changeset/8987
http://gnuradio.org/trac/changeset/8988
http://gnuradio.org/trac/changeset/8990

I started from the trunk. I didn’t want to touch anything outside of
rx_buffer_inband. Now that I will be making some minor changes to the
tx_buffer anyway, I will merge the appropriate changes from your branch.

–ets

soderic · September 8, 2008, 11:46pm

Eric S. wrote:

I started from the trunk. I didn’t want to touch anything outside of
rx_buffer_inband. Now that I will be making some minor changes to the
tx_buffer anyway, I will merge the appropriate changes from your branch.

ehhhhh, just be careful with merging anything. Things are probably
going to be extremely different. This is pretty much the only major fix
in my branch that I did not yet merge to trunk. Not only that, it’s
extremely simple… so it’s probably best to just implement it and
forget about merging.

George