[GSoC] Co-Processors Update #9

addis_a · August 9, 2014, 5:56am

Hello all,

Logistical:

coproc dev call is scheduled for August 12th at 2pm US central time
Next week is the last week of the program. I’ll treat it like a usual
week. After pencils down there is the last last week to clean things up
and
work on documentation so that’s what I’ll spend the last last week on.
After that school will start and I’ll get situated to continue working
till
the real deadline: the GNU Radio Conference.

Progress:

I have finally pushed the code for the turbo decoder dsp only test. It
is
in TI’s code composer project format so I need to release instructions
on
how to compile without code composer:
GitHub - muniza/tcp3d_dsp_test: Testing the Keystone2 Turbo Decoder
I successfully ran a turbo decoder test with moving data from the ARM
to
the DSP and vice versa using the physical pointer contiguous memory
method
I described on the last email. I’ll clean this up and push this code
sometime this weekend hopefully.

Plan for GNU Radio:

From my talks with Pendlum, I think this approach will work for both
Zynq
and Keystone and any device that has shared memory with the
coprocessors.
I’m going to create a kernel module that is able to set aside contiguous
memory. I will write some functions in GNU Radio runtime to communicate
with the module through ioctl. We set a flag in the GNU Radio block to
tell
runtime that we should create the buffer differently using the memory
functions. Basically it will still go through the same process as any
other
buffer the only change is that we put the buffer somewhere in the
physical
memory space that we allocated. We are be able to get the virtual
pointer
to the input and output buffers through the work function so I can have
a
function for translating virtual to physical which can be done through
get_user_pages in the kernel module. This way we don’t need to change
much
in GNU Radio and the ARM and any coprocessor with shared memory has
direct
access to the data so no need for memcpy and userspace - kernelspace
transfers. I’ll bring this up at the coproc dev call.

Thanks everyone for the support!

alfredmart · August 11, 2014, 7:08pm

On 08/08/2014 11:54 PM, Alfredo M. wrote:

Progress:

From my talks with Pendlum, I think this approach will work for both Zynq
and Keystone and any device that has shared memory with the coprocessors.

I doubt depending on contiguous memory will ever work for GNU Radio.
I’ve heard a lot of talk about changing the guts of GNU Radio, but no
real action. Especially given GNU Radios dependence on double mapped
buffering to handle wrap around. For things with hard IP blocks like
Keystone, this may be a difficult problem. Unless the IP blocks can be
configured to operate on non-contiguous blocks. FPGA code should be
written to avoid dependencies on specific buffer layouts. (Yes, I know I
have made this mistake, but I ahve seen the error of my ways)

I understand why TI drives you towards the CMEM driver, but that is a
lousy long term plan. They are just reusing code from prior generations
of drivers. And I do want to see something work so we can evaluate the
hard IP based GNU Radio block. My concern with your wording is that
people might think depending contiguous memory buffers is a good idea.

Philip

alfredmart · August 11, 2014, 7:43pm

On Mon, Aug 11, 2014 at 10:06 AM, Philip B. [email protected]
wrote:

configured to operate on non-contiguous blocks. FPGA code should be
written to avoid dependencies on specific buffer layouts. (Yes, I know I
have made this mistake, but I ahve seen the error of my ways)

The typical use case for the TCP is variable length packets up to a
fixed maximum (6144 bits for LTE). Message passing is inherently a
better fit and the double-mapped buffer probably shouldn’t apply. Each
block of (soft) bits going in/out of the TCP would be contiguous, but
subsequent chunks of memory carrying different block segments need not
be.

I understand why TI drives you towards the CMEM driver, but that is a
lousy long term plan. They are just reusing code from prior generations
of drivers. And I do want to see something work so we can evaluate the
hard IP based GNU Radio block. My concern with your wording is that
people might think depending contiguous memory buffers is a good idea.

At least from a high level, a message queue with a rotating set of
buffer pointers seems OK to me. Though, not being familiar with the
current Keystone transport options, what are the other preferred
approaches?

-TT

alfredmart · August 11, 2014, 8:25pm

Hey Philip, nice to have you back.

On Mon, Aug 11, 2014 at 10:06 AM, Philip B. [email protected]
wrote:

I doubt depending on contiguous memory will ever work for GNU Radio.
I’ve heard a lot of talk about changing the guts of GNU Radio, but no
real action. Especially given GNU Radios dependence on double mapped
buffering to handle wrap around.

What lead me to believe this would work was a little test that I ran. I
basically had two blocks print their virtual address, the block on the
left
printed the write_pointer, block on right printed the read_pointer. From
looking at the buffer QA tests, the write_pointer and read_pointer point
to
the buffers. What I observed was a giant buffer that contained three
buffers inside it. So GNU Radio would write to one of the buffers during
one cycle then move on to the next the next cycle until it wraps around
after the third cycle. When it wraps around it starts at a different
starting address for whatever reason but it is predictable. I suppose
changing this behavior to actually use the same start addresses for each
buffer would help.

On Mon, Aug 11, 2014 at 10:06 AM, Philip B. [email protected]
wrote:

I understand why TI drives you towards the CMEM driver, but that is a
lousy long term plan. They are just reusing code from prior generations
of drivers. And I do want to see something work so we can evaluate the
hard IP based GNU Radio block. My concern with your wording is that
people might think depending contiguous memory buffers is a good idea.

I can’t use the CMEM driver since its GPLv2. I planned to write a much
simpler driver that just allocates memory in the desired physical
address
which would be in some sort of shared memory. I think having the GNU
Radio
buffer described above in a known physical shared memory location can
make
the coprocessor interaction much easier since the coprocessor not having
an
MMU can interact with the buffer directly.

alfredmart · August 11, 2014, 8:30pm

On Mon, Aug 11, 2014 at 11:06 AM, Philip B. [email protected]
wrote:

On 08/11/2014 01:42 PM, Tom T. wrote:

At least from a high level, a message queue with a rotating set of
buffer pointers seems OK to me. Though, not being familiar with the
current Keystone transport options, what are the other preferred
approaches?

These are good points. To date, we think of co processors accelerating
traditional signal processing operations, not operations on functions
that are better treated as message queues.

On a more general note, I know the objective for this project wasn’t
specifically cellular, but most of the other Keystone accelerators
will have similar use patterns. That’s because the cellular framing
and time multiplexing forces PDU construction (and possibly sample
discarding) early in the signal processing chain.

-TT

alfredmart · August 11, 2014, 8:08pm

On 08/11/2014 01:42 PM, Tom T. wrote:

Keystone, this may be a difficult problem. Unless the IP blocks can be

I understand why TI drives you towards the CMEM driver, but that is a
lousy long term plan. They are just reusing code from prior generations
of drivers. And I do want to see something work so we can evaluate the
hard IP based GNU Radio block. My concern with your wording is that
people might think depending contiguous memory buffers is a good idea.

At least from a high level, a message queue with a rotating set of
buffer pointers seems OK to me. Though, not being familiar with the
current Keystone transport options, what are the other preferred
approaches?

These are good points. To date, we think of co processors accelerating
traditional signal processing operations, not operations on functions
that are better treated as message queues.

If there was more time left in GSoC, I’d ask Alfredo to look into this
line of thought

Philip

alfredmart · August 11, 2014, 9:03pm

On Mon, Aug 11, 2014 at 10:42 AM, Tom T. [email protected] wrote:

The typical use case for the TCP is variable length packets up to a
fixed maximum (6144 bits for LTE). Message passing is inherently a
better fit and the double-mapped buffer probably shouldn’t apply.

This is a good point. How would this system deal with variable length
packets? The problem I have with message passing is that it becomes too
device specific and has additional latencies and limits on how much data
can be passed through the message queue.

On Mon, Aug 11, 2014 at 10:42 AM, Tom T. [email protected] wrote:

Though, not being familiar with the
current Keystone transport options, what are the other preferred
approaches?

There are a lot but they all boil down to different forms of message
passing. MessageQ is the quickest and fairly easy to use but has limits
on
the amount of data one can pass.

I think it will be good at the meeting tomorrow to talk about a couple
of
different approaches. We can lay out a couple of options and poke holes
in
them to figure out which direction we want to take this.