Basic theory of IIR filters

I’ve switch back with filters, and I’d like to implement a IIR
because I’ve seen that there are different papers about IIR filters in
CUDA,but I didn’t see any implementation.

I’ve see that actually gnuradio is using a direct form I
implementation,which cannot be easily ported.

So I’ve looked about cascade and parallel FIR filter. My question is:
is best in terms of stability?Is acceptable to decompose the transfer
function in sub-function with only one pole?or could it cause
instability?(e.g. conjugate poles).
Otherwise,due to the fact that both approaches(parallel and cascade)
require only the previous output,I think what the implementaion would be
not difficult: thanks to the expansion of recurrence expression, is
“sufficient” something like a parallel sum prefix.

Marco Ribero