Choose thread

musicdenotation · May 22, 2015, 7:23pm

Hi,
I’ve a question about thread management…I like the fact that scheduler
can
launch blocks in different threads, but I’d like to execute some blocks
inside the same thread(CUDA require to perform all operations from a
single
thread)…for the moment I’m using GR_SCHEDULED=STS,but it blocks at all
multithreading…Is it possible to set a thread affinity?(I’ve looked
https://gnuradio.org/doc/doxygen/page_affinity.html, but it doesn’t
help)

Thanks,
marco

marco_Ribero · May 24, 2015, 7:53pm

On Fri, May 22, 2015 at 1:22 PM, marco Ribero
[email protected]
wrote:

Don’t use the STS scheduler. It is, after all, the Single-Threaded
Scheduler. And setting the thread affinity under that condition is a
nop.
That will only work with the default TPB scheduler.

Tom

marco_Ribero · May 27, 2015, 3:31pm

Il giorno dom 24 mag 2015 alle ore 19:52 Tom R. [email protected]
ha
scritto:

Don’t use the STS scheduler. It is, after all, the Single-Threaded
Scheduler. And setting the thread affinity under that condition is a nop.
That will only work with the default TPB scheduler.

Tom

I need that different blocks run under the same thread(because CUDA
require to make everything inside a single thread…each thread is
associated to a different GPGPU)…so,without the usage of STS
scheduler,is
not possible to run different blocks with same thread?

I’d like that other blocks(not related with CUDA) can run in parallel
threads…

Thanks,
marco

marco_Ribero · May 27, 2015, 4:30pm

Dear Marco,

I need that different blocks run under the same thread(because CUDA
require to make everything inside a single thread…each thread is
associated to a different GPGPU)…so,without the usage of STS
scheduler,is not possible to run different blocks with same thread?
Yes. The Thread-per-Block-scheduler gets its name from the fact that
every block gets its own thread.
You should really use it – using STS will probably kill the performance
you can gain by accelerating stuff on a GPU, because nearly no one uses
single-core CPUs any more, and to my knowledge, only the FFT blocks are
internally multithreaded.
I’m a bit surprised that CUDA requires you to run everything in one
thread – doesn’t using cudaSetDevice in every thread (==in every
block’s work() method on the first call) suffice?
NVidia claims CUDA is thread safe, i.e. worst case your multi-threading
performance is as bad as doing everything in a single thread.

I’d like that other blocks(not related with CUDA) can run in parallel
threads…
That’s really awesome because it scales so well

Best regards,
Marcus

marco_Ribero · May 27, 2015, 7:27pm

Il giorno mer 27 mag 2015 alle ore 16:29 Marcus Müller <
[email protected]> ha scritto:

Yes. The Thread-per-Block-scheduler gets its name from the fact that
every block gets its own thread.
You should really use it – using STS will probably kill the performance
you can gain by accelerating stuff on a GPU, because nearly no one uses
single-core CPUs any more, and to my knowledge, only the FFT blocks are
internally multithreaded.

I hope to bring on CUDA the more time-consuming blocks…for the moment
FIR
and IIR filter, and FFT…

Best regards,
Marcus

It’s my fault: I’ve reason, from CUDA 4.0 the same GPU can be shared
between different threads/processes…I got formation from older books
Thank you!!

Now I need to change a litte my code…I was not so concerned about this
capability because my blocks just launch asynchronously kernels/memcpy
and
exit…

It seems that with multithread I’m able to reduce “pause time” between
execution of kernels of differents blocks from 1-3microseconds to
0.3microseconds…not bad!!(each block waits the previous with streams
attached to events).
The drawback is that each kernel execution must be bigger,because
otherwise
its execution wouldn’t hide the bigger overhead due to “reasignment of
device” to a different thread.

Thanks for your replies,
marco