R: Re: THESIS with CUDA

Thanks for your response.

FYI, there have been quite a few projects using GPPs in the past. I
recommend going through the slides from all of the GRCons, you’ll find
something there (at the very least from the first GRCon).
Also, there’s gr-theano, which also uses CUDA, and for inspiration,
having a look at gr-fosphor might be a good idea.

For high-performance GR flow graphs, the overhead for ingress/egress has
always been the problem. If you have ideas for modifying GR itself, it
might be worth contacting the “Accelerators” working group, which
discusses these matters.
Ye,I can imagine. Also because there are noticeable difference between
different CUDA devices. And with OpenCL it can go worse, because there
are
devices of different categories too.

One of my possible idea in order to reduce it, is to merge different
blocks on
the same kernel:serial,“concatenating code”,or in “parallel”,where each
warp
elaborate different block; In this way is possible to use shared
memory(which
is very small).
But it is complicated(PTX code,while OpenCL can compile more easily;
different
schedule policy) and I will try it at the end,after time spent over more
normal
version,like multiple kernel communicating via global memory on-card.
Latency
will be my hell.