Il giorno mer 27 mag 2015 alle ore 16:29 Marcus Müller <
[email protected]> ha scritto:
Yes. The Thread-per-Block-scheduler gets its name from the fact that
every block gets its own thread.
You should really use it – using STS will probably kill the performance
you can gain by accelerating stuff on a GPU, because nearly no one uses
single-core CPUs any more, and to my knowledge, only the FFT blocks are
I hope to bring on CUDA the more time-consuming blocks…for the moment
and IIR filter, and FFT…
It’s my fault: I’ve reason, from CUDA 4.0 the same GPU can be shared
between different threads/processes…I got formation from older books
Now I need to change a litte my code…I was not so concerned about this
capability because my blocks just launch asynchronously kernels/memcpy
It seems that with multithread I’m able to reduce “pause time” between
execution of kernels of differents blocks from 1-3microseconds to
0.3microseconds…not bad!!(each block waits the previous with streams
attached to events).
The drawback is that each kernel execution must be bigger,because
its execution wouldn’t hide the bigger overhead due to “reasignment of
device” to a different thread.
Thanks for your replies,