DPKS Demodulator implementation (FIR)

I am interested in the implementation of the DPSK demodulator block
in GNU radio. I have been profiling and debugging code to better
understand
the implementation of this block. One thing I noticed is that all of the
baseband filtering happens in the time domain. I find this peculiar
because the convolution for the FIR filter is consuming ~30% of the call
time (inner product).

Specifically the raised cosine filter (in fll_band_edge) appears to
be consuming much of the CPU time during demodulation. Curious, I
plotted
the time required to execute time domain and frequency domain FIR
filters and found that even for a low number of taps (55 in the DPSK
demod example), the FFT version preforms much better than the
convolution approach.

My question is, what am I missing about the data path flow that makes
time domain filtering more attractive than frequency domain? It seems
clear that an FFT approach to filtering would preform better but the
entire demodulator block keeps the signal in the time domain. While I
relize that there are benefits from preforming the matched filter at the
same time as timing correction–as far as I can tell–these filters
steps exist at a different level than the baseband filtering. Thanks!