Mike,
there are roughly 3 ways you can parallelize these algorithms:
- packet-level: run a lot of codewords at the same time
- subblock level: divide each codeword into pieces (overlapping) and
run
SISOs on each one of them in parallel - trellis level: do ACS operations in parallel
take a look at the following link and references to get some ideas (not
claiming it is a seminal paper though :-))
http://web.eecs.umich.edu/~anastas/docs/turbogpu.pdf
best,
Achilleas