[GSoC] Co-Processors Update #10

addis_a · August 18, 2014, 10:44pm

Hello all,

Logistical:

Pencils down is upon us so the rest of the week will be spent on
getting
all my documentation in order. This includes cleaning the code on
github,
keystone2 wiki page
http://gnuradio.org/redmine/projects/gnuradio/wiki/Keystone2
runtime wiki page
http://gnuradio.org/redmine/projects/gnuradio/wiki/runtime
Final Evaluations due this Friday

Progress:

I have pushed the tcp3d test that passes the llrs from the arm and the
dsp processes them on the tcp3d then the arm checks whether they match
the
expected result. GitHub - muniza/tcp3d_dsp_test: Testing the Keystone2 Turbo Decoder
I have some code that modifies GNU Radio runtime to isolate blocks
based
on flags we set in the blocks constructor. This will enable us to treat
certain blocks differently such as putting the buffer in a different
location in memory. Also is code that exposes the buffer object to the
work
function so that we can get the start address and size of the buffer in
bytes. GitHub - muniza/gnuradio: GSoC Buffer Management
I have an OOT module that passes a struct with the gnuradio buffer
start
address and size to a kernel module using ioctl.
GitHub - muniza/gr-buffertest: GNU Radio Test for Deep Buffer Manipulation
Lastly the kernel module that runs get_user_pages on the gnuradio
buffer
struct. GitHub - muniza/gsoc_2014

Plan:
I did more research on the contigous memory allocation method and I now
see
that it is not a good zero-copy solution for ALL the devices we want to
support. A good discussion is available on the linux kernel news site
that
discusses the reasons for NOT integrating ION, another contigous memory
allocator, into the linux kernel: Integrating the ION memory allocator [LWN.net]. I’m
still going to get a minimal CMEM GPLv3 integrated into GNU Radio as a
stepping stone for modifying runtime and using zero copy with the
keystone.
This shouldn’t take long at all since I just need ioctl. For part of my
talk at the conference, I am going to discuss this method of contigous
memory along with positives and negatives as it relates to GNU Radio and
a
couple of devices. I’m also hoping to make more progress so I can show
the
integration of the get_user_pages dma method but we’ll have to see what
happens when I’m back at Penn. I think a discussion of various methods
this
will bring up good conversation for the coproc working group.

Expect one more update to mark the finishing of documentation on Friday.
I’ll also cry a little and give thanks to those that helped but thats in
a
couple of days.

alfredmart · August 20, 2014, 9:46am

On Mon, Aug 18, 2014 at 1:42 PM, Alfredo M. [email protected]
wrote:

Expect one more update to mark the finishing of documentation on Friday.

Philip requested that I do a summary of my findings before then so here
goes.

If we want GNU Radio to support a multitude of coprocessors we need to
be
able to offer both contiguous memory allocation support and scatter
gather
list support (get_user_pages method). I think we all agree that whatever
goes into runtime must be general and not device specific since we don’t
want to keep changing things when a new device comes out.

Contiguous memory allocation support is needed for devices such as the
ARM<==>DSP in the keystone2 because the DSP doesn’t have an MMU or
IOMMU.
There is support for scatter-gather lists but is highly abstracted by
the
multicore navigator that its seems like a different thing all together.
TI
uses CMEM, Nvidia uses NVMAP, Qualcomm uses PMEM, Android uses ION,
which
essentially do the same thing but for different devices. I went down the
CMEM road because thats what people were recommending on the e2e forums
for
zero copy work on the keystone2. I think we should be able to support
things like this and I think we can with minor additions to runtime.
Integrating a memory allocator into GNU Radio will require that most of
device specific things/memory allocator specific things be done in the
OOT
which I think is doable. All we need to do to GNU Radio is change the
buffer location in memory.

Scatter gather list support is needed for devices such as the Zynq where
the AXI DMA supports scatter gather lists. First we pass our buffer from
userspace to the kernel module using ioctl. The module runs
get_user_pages
on it which essentially gives us the translation from userspace virtual
to
kernelspace physical without a need to copy to kernel (kernelspace
virtual). We can then use the scatter-gather api in the kernel to send
to
the bus address. The reason for scatter-gather is that the buffer is
large
so spans many pages that are not contiguous in memory so devices that
support scatter-gather make it appear contiguous in memory. This is an
overall better solution as its pretty much the standard to include
scatter-gather dma support (its just very abstracted in the keystone2).
Integrating this into GNU Radio is possible with the changes I have
already
made to runtime on my GNU Radio branch. I can probably write a module
and
test this myself before the conference since I have a zedboard and
experience with developing on the zynq and friends with experience
developing on the zynq aka you guys plus I’ve probably read LDD3 too
many
times.

So thats my recommendation for GNU Radio after three full months of
working
on this. I guess after my last documentation update on Friday and of
course
the conference, I’ll be updating during the coproc calls.