[VOLK] 16i_max_star_horizontal_16i non-saturating subtraction

Detlef_R · February 5, 2014, 5:05pm

I was doing some work with this kernel and came across an odd result
that I think is caused by a non-saturating add in the generic
proto-kernel, that should also be relevant to the 16i_max_star_16i.

I haven’t looked too much in to the SSE versions, but the generic
versions are doing a comparison by subtracting two values and
comparing the result to 0. At least in the QA on armhf this is causing
wrap-around so that the smaller of the two numbers is returned as the
max.

Since I’m pretty sure this is part of the max* operator defined by
Viterbi in his “justification and implementation of a MAP decoder” I
think this result is incorrect unless I’m misunderstanding part of the
operator.

Can somebody with a little more insight in to this kernel ping back if
this is intended behavior? If not, is there some benefit to doing
(x-y)>0 vs x>y as the comparison?

Thanks,
Nathan

WestS_Nathan · February 6, 2014, 12:20pm

On Wed, Feb 5, 2014 at 4:04 PM, West, Nathan
[email protected] wrote:

Since I’m pretty sure this is part of the max* operator defined by
Viterbi in his “justification and implementation of a MAP decoder” I
think this result is incorrect unless I’m misunderstanding part of the
operator.

Can somebody with a little more insight in to this kernel ping back if
this is intended behavior? If not, is there some benefit to doing
(x-y)>0 vs x>y as the comparison?

Thanks,
Nathan

Just from what you’ve said here, I would agree that (x-y)>0 is a bit
dangerous, and I don’t see why we can’t use x>y. Does it work for you
if you make that change?

Tom

WestS_Nathan · February 6, 2014, 5:59pm

On Thu, Feb 6, 2014 at 6:19 AM, Tom R. [email protected] wrote:

max.
Thanks,
Nathan

Just from what you’ve said here, I would agree that (x-y)>0 is a bit
dangerous, and I don’t see why we can’t use x>y. Does it work for you
if you make that change?

Tom

I’ll suggest that the generic kernel should be trusted to do the correct
thing, or at least have the correct set of intentions, and that if you
want
to change the behavior of the generic kernel, that you are introducing a
change that will break applications depending on this behavior.

I believe the use of casting the subtraction before the comparison
(where:
((int16_t) (src0[i] - src0[i + 1]) > 0) ? src0[i] : src0[i+1]; is the
full
comparison) was intentional to ensure overflow subtraction vs.
saturation
subtraction.

WestS_Nathan · February 6, 2014, 6:33pm

On Thu, Feb 6, 2014 at 11:58 AM, Douglas G.
[email protected] wrote:

I’ll suggest that the generic kernel should be trusted to do the correct
thing, or at least have the correct set of intentions, and that if you want
to change the behavior of the generic kernel, that you are introducing a
change that will break applications depending on this behavior.

I believe the use of casting the subtraction before the comparison (where:
((int16_t) (src0[i] - src0[i + 1]) > 0) ? src0[i] : src0[i+1]; is the full
comparison) was intentional to ensure overflow subtraction vs. saturation
subtraction.

Assuming this kernel is used for some type of trellis decoding, the
overflow makes sense as it avoids the need for normalization of
accumulated metrics. It does, however, require that the spread of
input values be bounded for it to work. Otherwise, you have
comparisons between overflow values and non-overflow values, which
makes the comparison ambiguous and the output not-sane.

-TT

WestS_Nathan · February 7, 2014, 11:02am

On Thu, Feb 6, 2014 at 5:31 PM, Tom T. [email protected] wrote:

subtraction.

Assuming this kernel is used for some type of trellis decoding, the
overflow makes sense as it avoids the need for normalization of
accumulated metrics. It does, however, require that the spread of
input values be bounded for it to work. Otherwise, you have
comparisons between overflow values and non-overflow values, which
makes the comparison ambiguous and the output not-sane.

-TT

Yeah, Nathan emailed me back off-list and explained that to me. Makes
sense now.

Thanks,

Tom