This is an evolution of a concept that was mentioned at GRcon which I
really liked and I thought a bit more about it.
The general idea is that each block and each port within blocks would
have a “domain” associated to it.
For the blocks this would essentially represent where that block is
running, like CPU, DSP, FPGA, GPU
For the ports this would represent where the data are stored Main
memory, GPU memory, page-locked memory for a coproc, shared zone with
a DSP, …
Once you have those, you’d need ingress/egress blocks to cross over
data domains, they could be either a memcpy on the host, or a
read/write buffer queued in a CL command queue, or whatever is
required. Those wouldn’t even need to be really exposed to the user.
GRC/GR-core could be smart enough to find an appropriate path to move
the data from one data domain to the other, it just needs a list of
such available block.
The advantage of introducting this concept to blocks is that for all
the various types of coproc you can think of, the GR core doesn’t have
to know anything special and can delegate to appropriate domain
handlers/plugins. Even the CPU domain and Main memory domain would
just be plugins, no special case or anything, they would be treated
just like any other. So coproc aren’t “second class citizens”, they’re
treated just like the main CPU is.
I’ve been mulling over this, and I like this design a lot. I think it
provides a lot of flexibility while also preventing any particular
scenario
from becoming a “corner case”. I’m still thinking about it and trying to
find somewhere to poke a hole, but at a high-level I think it is really
straight-forward.
I suggest that we discuss this further in some future [CoProc] call, and
that you add the idea to the wiki =)
Awesome write-up, Johnathan. I really enjoyed reading it.
A few topological problems arise that aren’t solved yet by this, such as
having adjacent accelerator blocks that both want to own the shared
memory buffer. The suggestion here is to use the above mechanism to
create a domain crossing “sink” block and a domain crossing “source”
block as endpoints in a hierarchical block that also instantiates
whatever logic is needed to do the chained accelerators inside.
This goes back to the “ingress” and “egress” blocks that Justin’s team
used
in their original design.
I think having these transitions represented with such blocks makes
sense,
from a graphical perspective, but under-the-hood I think we should
architect these “gresses” as zero-copies. How that relationship /
responsibility gets controlled & delegated is something we still need to
figure out, I think.
Thus, again with minimally invasive changes to the GNU Radio internals,
this mechanism supports both single accelerator blocks as well as the
domain crossing sources and sinks.
Yeah, this is a big selling point of the design, I think.
Finally, this solution is orthogonal to the desired capability of having
in-place processing blocks. It can be implemented fairly rapidly, even
in a 3.7 API compatible way, and gives the hooks for additional work to
implement block’s requesting in-place semantics vs. the existing
streaming semantics.
Right, and this is the key, I think. This problem has to be solved in a
“coproc / accelerator”-independent way; otherwise the design would be
fundamentally flawed.
In this design, a block indicates to GNU Radio that it needs to allocate
the memory for either its input buffer (actually, the upstream block’s
output buffer) or its own output buffer, by adding a flags field to the
io_signature and having one of the possible flags be WE_OWN.
Well, what I was thinking about is a bit different.
The block didn’t really did that itself. It was delegated to a “domain”
object.
The advantages are :
No special casing or flags. The “normal” host buffers as they are
now can be handled in the same way.
Since the block doesn’t do that itself, that means that several
blocks can use the same “domain” object and then, if two blocks are
back to back, that object would know how to handle that.
If the block does it all itself, two blocks using a GPU for
examples couldn’t really realize that they’re both in the GPU and that
there is no need to copy data back to the host at all between them. By
delegating those to standardized domain plugins, it would allow to
handle those case more gracefully IMHO.
Of course this is a bit further from the current architecture but I
think the external visible API wouldn’t have to change that much when
implementing this scheme either.
Unfortunately, this moves the knowledge of how a domain works into GNU
Radio, and away from the code/coders that know about it. It would mean
that any time a different co-processor or hardware offload design comes
up, GNU Radio itself would have to change, and designers would have to
have knowledge of GNU Radio internals in order to develop their code.
Not necessarily. I would see those “domain objects” be handled like
blocks, pluggable. And GR would have some for very standardized stuff
(typically the current default behavior / host memory) and you could
have some in external tree / projects.
If someone has an interface that’s completly custom, they could ship
their domain plugin right along their custom block that uses it.
I suggest that a way to implement domain-specific knowledge across
multiple blocks, allowing the kind of optimization you describe above,
would be to make a parent class for each domain that the blocks of that
type all derive from.
Well, it’s not that far off from what I was thinking. But doing it via
inheritance on the block level rather than by delegation to a ‘domain
object’ has one downside in my mind : It’s global to the block. While
OTOH delegation could be part of the io signature and per-port. I can
see some blocks having input / output ports in different domains with
different requirements.
Cheers,
Sylvain
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.