Re: Which blocks do you like?

musicdenotation · May 17, 2015, 10:59pm

@Ron E.: Thanks for your reply.I’ve looked about the code.I think
that maybe I’ll implement it in the future,but not now: is too specific
to
a single application

@Marcus Müller:I’ve looked about correlate_and_sync method(are you
speaking
about it,isn’t it?)Ye it can take good advantage from CUDA…
Ye,I’ve a FIR implementatio and I’m looking also for an FFT(with device
code,so unlucky without cuFFT library).
here the problem is not only data outgoing from the block,but relative
tags.also because for the moment I’m not supporting tags on the flow
above
CUDA.If u have good idea about how manage them,feel free to speak.

While I was reading the code,a question arise about this code( method
work in
cc implementation):

while(i < noutput_items) {
if((corr_mag[i] - corr_mag[i-d_sps]) > d_thresh) {
while(corr_mag[i] < corr_mag[i+1])
i++;

is not possible that the variable i will go outside the array corr_mag?I
know that is necessary a strange pattern in order to have the last item
greater than the previous…

Greetings,
marco ribero

marco_Ribero · May 18, 2015, 9:53am

Hi Marco,

I’ve looked about correlate_and_sync method(are you speaking about
it,isn’t it?)
Pretty much, yeah

here the problem is not only data outgoing from the block,but relative
tags.also because for the moment I’m not supporting tags on the flow
above CUDA.If u have good idea about how manage them,feel free to speak.

I don’t know exactly what your blocks look like, but assuming you take a
gr::sync_block and let it handle the whole CUDA thing, GNU Radio would
automatically make this work.
If you instead have a block that “sends” data to the CUDA GPU (a sink
from GNU Radio’s perspective), and another one that “receives” data from
CUDA (a source from the perspective of GNU Radio), you could simply
get_tags_in_range on the sink block, take these tags, and send them as
message to your source block (before you start the CUDA processing). You
would then write a message handler in your source block which just takes
these tags and does add_item_tag to each of them.
If you want to add tags by detecting things in CUDA, that’ll be a bit
more complicated, and sounds like your CUDA threads would need to
coordinate with a lot of barriers to write the index of a tag you want
to add into a specific memory position… I don’t know how effective
that would be compared to doing the same on a GHz CPU.

have the last item greater than the previous…

I agree, relying on the signal to limit execution of this loop feels a
bit “dangerous”, but I also agree that as long as your “known symbol”
has a sane shape, this won’t be a problem. The problem is that this is
indeed a bit of a CPU-hungry bit of code, so every additional condition
might slow things down; I think it should be possible to come up with a
nearly-as-effective implementation that is more secure, but I can’t
think of one right now. Any hints? [1]
At any rate, the outer while shouldn’t run till noutput_items, but
till noutput_items - 1, because of the inner while(corr_mag[i] < corr_mag[i+1]). Thanks for spotting that!If you don’t mind, I’d like to
ask you to fix that line
(while(i<noutput_items) → while(i<noutput_items-1))and submit a pull
request on github [2] (if possible, base it off the “maint” branch).

Best regards,
Marcus

[1] Hard to properly optimize; intuitively, it’s not clear whether a
highly optimized “precompute the complete
corr_mag[0:end-d_sps]-corr_mag[d_sps:end] and
sign(corr_mag[:end-1]-corr_mag[1:end])” would be faster than selectively
only computing the relevant parts (like this branching-intense algorithm
does) on a CPU. Pretty sure that on CUDA, you’d just precompute the
whole arrays.
[2]
https://gnuradio.org/redmine/projects/gnuradio/wiki/Development#How-can-I-use-github-to-submit-patchesfeatureschanges