The primary issue with the Lazy Viterbi Decoder

I’m working on programming the “Lazy” Viterbi (convolutional, maximum
likelihood) decoder right now. See
< http://citeseer.ist.psu.edu/599573.html >
< http://vanu.com/resources/publications/
2002fast_maximum_likelihood_decoder.pdf >

I have a few issues which I’m trying to resolve, and maybe someone in
GR-land knows the answer(s) or knows someone …

Their basic idea is to maintain 2 structures: a “trellis” for state-
transition decisions which have already been made, and a “priority
queue” (PQ) for decisions which are pending. The “Lazy” comes from
the fact that PQ entries might already be rendered moot by trellis
decisions, and thus the constraint is on trellis membership while
allowing (almost) arbitrary (“Lazy”) PQ membership. The general loop
is to “pop” the minimum PQ entry and, if it’s not in the trellis then
put it there and add all of it’s next-time transitions to the PQ; if
it is in the trellis, then some other time transition metric was at
least as good as it and thus it is dropped. This is repeated until a
transition reaches the end of the trellis (for block decoding) or a
given number of bits (for stream decoding). The math and logic here
are sound.

The trellis can be done in many ways (e.g. full or sparse matrix,
hash table), but the simplest (which I will use up front) is just a
full “matrix” via std::vector<std::vector<void*> > pointers to the
previous state, in order to provide the traceback. This structure is
simple to work with and understand; there are no issues here. All of
the issues are with the PQ structure.

The PQ holds the possible transitions, ordered by minimum metric -
where the absolute minimum (combined) metric for each set of time
transitions is always 0. For precision metrics, the PQ (I
think) has to be a (linked) list with new possible transitions
inserted via a search for the correct insertion point … not
constant time. In order to maintain (expected) constant time in the
PQ, they use integer (M-ary precision) metrics - and limit the number
of needed slots available for PQ storage (to [2^M]+1), then store the
possible transition in the slot which matches it’s accumulated
(possibly relative) metric.

The primary issue which they don’t address is: what happens when a PQ
storage slot is already occupied but another metric is computed which
would need that slot (overwrite? but which one)? There is no
mechanism described to handle this conflict, and neither of the
metrics can necessarily be dropped. (I think) It’s trivial to come
up with an example where this might happen and neither metric could
yet be dropped, so clearly they had to resolve this issue. But it’s
not in the text of the report, so I’m a-priori trying to think of a
(reasonably simple) way around it.

The first way I’ve though of is to make each PQ slot the head of a
linked list. But this requires a possibly large number of linked
nodes to be available a-priori, or calling ‘new’ a lot. An upper
bound on “large” is n*(2^m)*(2^#o), where ‘n’ is the block length or
stream decode length (in bits), ‘m’ is the total number of delays (==
constraint length - 1), and ‘#o’ is the number of output bit-streams
from the encoder; for (n,m,#o)=(100,8,2) this is 100k nodes (@ likely
8 Bytes / node == 800 kB), which is significantly more memory than
required by the rest of the decoder (~ 100 kB) - so this doesn’t seem
like the “best” way to go.

Another issue which they do not address is how the minimum metric PQ
slot is determined. The obvious way is to start at the “head” of the
PQ and work through entries until the first “real” one is found …
this is less “expensive” than adding new metrics in sorted order, but
how much so (maybe a factor of sqrt(M) on average?). And this search
has to be done on each new minimum metric to the extracted … not a
large “cost” under high SNR, but it would likely be a significant
“cost” under low SNR.

Thanks in advance for your thoughts. - MLD

On Tue, Aug 01, 2006 at 11:06:11AM -0400, Michael D. wrote:

would need that slot (overwrite? but which one)? There is no
mechanism described to handle this conflict, and neither of the
metrics can necessarily be dropped. (I think) It’s trivial to come
up with an example where this might happen and neither metric could
yet be dropped, so clearly they had to resolve this issue. But it’s
not in the text of the report, so I’m a-priori trying to think of a
(reasonably simple) way around it.

How about just dropping a note to the authors?
Their email addresses are at the bottom of column one of the report.

Eric

How about just dropping a note to the authors?
Their email addresses are at the bottom of column one of the report.

Done that. Waiting. Thought I’d ping the list too. I certainly am
surprised by the breadth / width / depth of experience available on
the list. :wink: MLD

Michael D. wrote:

I’m working on programming the “Lazy” Viterbi (convolutional, maximum
likelihood) decoder right now. See
< http://citeseer.ist.psu.edu/599573.html >
< http://vanu.com/resources/publications/
2002fast_maximum_likelihood_decoder.pdf >

Mike,

Can’t help you with the algorithm, but could you say a few words on the
advantages of Lazy Viterbi over normal Viterbi decoding?

Matt

That is straightforward.

In High SNR, the lazy algorithm is O(1)!!!

In low SNR, both the lazy algorithm and the traditional algorithm are
both O(2^k) where k+1 is the constraint length.

It is not a simple relationship as the SNR falls off but in many of the
cases we would consider, the lazy algorithm should be a win.

Bob

Matt E. wrote:

Can’t help you with the algorithm, but could you say a few words on the
advantages of Lazy Viterbi over normal Viterbi decoding?
Matt


Discuss-gnuradio mailing list
[email protected]
Discuss-gnuradio Info Page


Robert W. McGwier, Ph.D.
Center for Communications Research
805 Bunn Drive
Princeton, NJ 08540
(609)-924-4600
(This sig required by my employer)

That’s the “quickie” overview. ‘k-1’, BTW, is the memory order
required to realize the actual encoder; the “constraint length” is
effectively how long a single (“impulse”) input bit can possibly
effect output bits. Codes are (sometimes) referred to as (n,m,k),
where ‘n’ is the number of inputs, ‘m’ is the number of outputs, and
‘k’ is the total memory order. Some places use (n,m,k+1), for the
constraint length.

For “high” SNR (e.g. Es/N0 ~ 6 dB for a (2,1,8) CDMA code), the Lazy
method basically traverses the convolutional code’s trellis directly;
it is necessary to just find the best path metric and follow that set
of nodes. The Lazy method does have some bookkeeping, but the
reduction by a factor of approximately 2^k is very large when it
comes to standard NASA codes (k ~ 7-9).

For “low” SNR (e.g. Es/N0 ~ 0 dB for a (2,1,8) CDMA code), the Lazy
method spends a lot of time doing bookkeeping, and thus is probably a
little slower than the standard Viterbi decoder which systematically
goes through all possible nodes and node outputs to determine the
best path.

At “middle” SNR’s, the Lazy is a generally faster than other
algorithms, but the benefits really show at higher SNR’s.

While I don’t know much about the “A*” algorithm (which traverses the
trellis lengthwise [in time] instead of breadth-wise as Viterbi
does), it seems that at high SNR it would be just as effective as the
“Lazy” method, since there are very few low-metric paths through the
trellis.