On Fri, Nov 1, 2013 at 9:44 PM, Marcus D. Leech [email protected]
For the “generic” LP/BP filter blocks, would it make sense for them to
automatically select either a conventional FIR or FFT-fast-convolution
depending on the number of taps and other parameters?
Sorry, which classes are you talking about? Are you referring to what
we’ve referred to internally as the FIR/FFT filter “kernels”? These
live in gr-filter/lib/fir_filter.cc and gr-filter/lib/fft_filter.cc.
Or are you talking about the blocks themselves? Like
Also, I’ll just refer to our fast-convolution filters as the FFT
filters (since the block name is filter.fft_filter_XXX).
The user still has access to the low-level implementations, but the choice
between conventional FIR and fast convolution based on FFT is something that
could be done whenever the filter parameters change, yes?
Possibly… One of the problems is that the crossover between which is
the more efficient can change based on the processor. On Intel/AMD,
it’s probably roughly the same. I’ve done a number of tests of this on
my own machines. The crossover seems to be between 10 and 30 items. So
on my AVX-enabled processor, the crossover is around 10 taps, but the
difference is so tiny between the FFT and FIR filters that whenever I
can, I use an FFT filter.
One big issue that I haven’t properly tested well is the difference
between the fft_filter_ccc and fir_filter_ccf. If you have real taps
(like an LPF), we use fir_filter_ccf because we can get away with the
float-complex multiple. We don’t want to have to promote the taps to
complex to run the computations. But we only have the fft_filter_ccc
(since we take the FFT of the filter taps, we make them complex,
anyways). I’m not sure how those two compare against each other.
… few seconds later.
I just compared them on an Intel i7 870 at 2.93 GHz.
fir_filter_ccc vs fft_filter_ccc: ~10 taps
fir_filter_ccf vs fft_filter_ccc: ~24 taps.
Bottom line, though. It’s still probably possible and could make
sense. We could make generalized rules that is more or less right for
any platform, unless it’s one where the FFT was implemented by a
monkey. Even if we say 50 taps, the amount of difference won’t be that
extreme. And if it’s still too much, you should always have the option
to select which one you want.
But I warn you; just because I agree that this could be and maybe
should be done, I’m probably not going to do it for lack of motivation
on my part. But if it gets done and submitted, well, that’s another
Furthermore when running a standard dot-product FIR filter, do we take
advantage of the identities shown in this paper:
Which allows you to avoid computations in a FIR that will never be used in
the output? Looking at the code, it looks like we do, but I’m
Yes, we definitely do. We have the pfb_decimator block, which is
basically the same thing as the fir_filter as long as the channel you
select is 0. The pfb_decimator filter allows us to select which
channel we actually want to extract. If the channel is 0, probably
better just to use the fir_filter since there’s a bit less logic here
(though my test show they are /roughly/ equivalent in speed).