Yesterday during the coproc wg, I wrote this small snippets to
illustrate what I meant. It uses C idioms and not C++ ones and I also
didn’t consider multiport but hopefully it gets my points accross.
And when I look at it, it might actually be more straightforward to do
something like that in GR than I originally though. (But then again,
I’m not all familiar with the internals).
Split gr::block into two classes, a ‘base one’ that contains really
just the base stuff and none of the stuff that’s needed for host
execution (so … almost nothing). Then a subblass of it that adds
everything that’s needed to run all the current host blocks.
Replace the current ‘connect’ method with the
Add a few book keeping method to the flow graph to register domains
This also looks pretty orthogonal to the buffer management
improvements that have been raised during the sessions since in the
first incarnation of this concept, buffer management would purely be a
‘host’ domain thing.
Anyway, just my 2ct
Thanks - I hadn’t had a chance to read through this until today.
Just to make sure I understand the intention here: this is aimed at
the scheduler avoid moving memory back and forth between host <->
coprocessor correct, and doing so without adding a ‘new’ mechanism to
connect sequential blocks that exist on the co-processor side, yes?
now the solutions available (I’m thinking primarily of gr-gpu, and the
RFNoC demo) have ports similar to regular blocks, but they are presented
differently somehow (e.g. in the case of RFNoC, they have a ‘host-side’
port, and an ‘FPGA-side’ port essentially, or in the case of gr-gpu,
are explicit ingress/egress blocks, and all GPU blocks only communicate
within the GPU space). Your objection here is that they require the
(where user here is the flowgraph developer) to explicitly call-out when
this move back and forth.
There is a certain elegance to this, in terms of maintaining the
consistency how of blocks are connected from the flowgraph developers’
perspective, i.e. where the runtime manages the need for ingress/egress,
however the complexity requirement on the runtime is obviously higher
(e.g., when there are multiple downstream blocks, and they exist in
different domains, etc.). Certainly not insurmountable, but it requires
some development effort to create, and more importantly debug - mainly
make sure all corner cases are handled correctly/well.
I do agree that this is (relatively) orthogonal to the buffer
business that occupied most of the WG discussion (I think I would say
more complementary), and could be developed somewhat separately. I do
think it would be easier to develop with the support for custom
complete (since you could then easily test with a ‘dummy’ co-processor
is simply a custom allocator), but I imagine it could be done without
If you are offering to help develop this idea I would certainly like to
continue this conversation: I think I have some ideas on what ‘things’
to be handled, but without delving into all the guts of gnuradio-runtime
I’m not confident I know which dragons might be awoken with this