GSoC14 gr-trellis update

jason_sam · June 25, 2014, 9:41am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey everyone,

right in time for Midtermevalutaions I have a bigger update for you.
Viterbi Decoder is now fully Volkified, but results are a bit meh

You can read more on my blog

-Jan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTqnzQAAoJEJmPbVa7qwgjiawIAK+DYl+x5E3Benqu1kQUYBry
gXqbylVpwLTgUaFDTcYP9yVL/IE4qJj4yZ/QptUxH8Zb/Q5BVs/9j9gJcLvj+EHT
epouim9SpkNzfI6P/yWcCKkMPpkctQoji63IevTq/kio3XXDXZlvOHGuAWJ5JB0e
p7M3Ias7AL+m20kaLxkUNc3oTfyeBnEcpP0f4ZtyOZPFdQcNbGuSR+4juvmX5rnn
Y7CrNhjOTPZAPoSjHqJhRfxEM3BiRlwqi88Wc96vSezWSKNe/vGN38Y9B74uRYMu
WdPR5G2xisk0KSqb0/IqgQPB7YoS/5ePChEqwMqh3RkRXreovPOl5I4xjnvWU3U=
=nbFB
-----END PGP SIGNATURE-----

Jan_KrS_mer · June 25, 2014, 4:03pm

On Wed, Jun 25, 2014 at 3:40 AM, Jan Krämer [email protected]
wrote:

-Jan

Great stuff, Jan.

For the Viterbi, take a look at the volk_8u_x4_conv_k7_r2_8u.h kernel
that
was recently added to volk/gnuradio with my merger of our FEC API work.
It
implements the K=7, Rate 1/2 Voyager code based on the polynomials you
give
it. I’ve seen good speedup using this versus the generic kernel on my
systems. The VOLK kernel was mainly derived from SPIRAL and manipulated
to
fit inside VOLK.

Tom

Jan_KrS_mer · June 25, 2014, 10:42pm

On Wed, Jun 25, 2014 at 3:40 AM, Jan Krämer [email protected]
wrote:

right in time for Midtermevalutaions I have a bigger update for you.
Viterbi Decoder is now fully Volkified, but results are a bit meh

You can read more on my blog
http://spectrejan.blogspot.de/

Hi Jan,

I only recently read your blog, so I may not be familiar with the
overall objectives. But I have some questions and comments.

How essential is it for you to compute full Euclidean metrics?

The general case for convolutional decoding does call for Euclidean
metrics, but most practical FEC use cases will use binary codes and
equal power symbols to which the distance calculation reduces to a
basic dot product. Furthermore, using sign changes, the branch metric
unit simplifies to a max-sum operation and the metrics can be computed
without the use of multipliers. If you look at the SPIRAL code that
Tom mentioned, you’ll see that that multiplication is not used. But,
those are specific cases, and if the objective necessitates full
Euclidean metrics, than one needs to compute Euclidean metrics.

for (int k = 0; k < _PACKETLENGTH; ++k)
{
obsValLSB = _mm_load1_ps(obsPtr++);
distLB = _mm_sub_ps(obsValLSB,tValLSB);
distLB = _mm_mul_ps(distLB, distLB);

    obsValMSB = _mm_load1_ps(obsPtr++);
    distHB = _mm_sub_ps(obsValMSB,tValMSB);
    distHB = _mm_mul_ps(distHB, distHB);

    metricVal = _mm_add_ps(distHB, distLB);
    _mm_store_ps(metricPtr, metricVal);
    metricPtr += 4;

}

Regarding the code, the main performance limitation you’ll run into
with this segment is pipeline utilization. If you look at the argument
dependencies, and therefore register dependencies at the instruction
level, each math intrinsic is dependent on the results of the prior
instruction. This creates read-after-write (RAW) hazards, and will
inevitably lead to a pipeline stall at some point. Both the compiler
and the CPU will aggressively attempt to reorder instructions to avoid
this scenario, but there’s very little to reorder in this case.

What may help, then, is to give the compiler and cpu some help by
unrolling the loop such that there is more code between branch points.
That will allow instructions to be rearranged to alleviate the
argument dependencies. Looking again at the SPIRAL example, the entire
vertical pass through the 64-state trellis is fully unrolled leaving
only a single conditional for normalization purposes. SPIRAL is an
extreme example, but the above code will likely benefit from a longer
loop section.

Hope that helps.

-TT

Jan_KrS_mer · June 26, 2014, 9:51am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey Tom,

thanks again for all the input.
The general goal is to see how to speed up the whole gr-trellis block,
which is pretty generic and not solely used for FEC. Think of SISO-MAP
Equalization, turbo-eq etc. But you are right, I could add a new
Branch Metrics Function that uses simpler metrics just for the viterbi
decoder.
As for Loop unrolling, yeah I saw the “over the top”-Loop unrolling in
SPIRAL
Unfortunately I don’t think i can do this (at least for viterbi state
metrics), due to the generic nature of gr-trellis without sacrificing
some of its flexibility.

I will review my code today with the help of my mentor, keeping all
your tipps and suggestions in mind.

-Jan

On 25.06.2014 21:41, Tom T. wrote:

I only recently read your blog, so I may not be familiar with the
at the SPIRAL code that Tom mentioned, you’ll see that that
distHB);
point. Both the compiler and the CPU will aggressively attempt to
benefit from a longer loop section.

Hope that helps.

-TT

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTq9CQAAoJEJmPbVa7qwgjQp0IAI5uDqgZRdA0ffCltWUSQe7w
B6piRAYEDv/D/aaHooOlLozYzjtfksTZlnUJZ7pOsz7RvMvvyxTUlWSRSHm0Tp8D
ZSdOosBFUXSQb0/7xI0zo3mrOqdZgJWnFyUWostQLW1s9QgVmRkoo4fhrz/R7uZV
+fdJlJN3ugOEDXaZrqro4jCIfIAF12IioFVq4Vfinw3S+u2tz4IWdFFeV1zr1PuT
IQQ63vC2RrvPaBYxhnfz5mKIAqZ3/hCtoDBgUi+jktUzHRTaOn+yvM6MwpDLjmr1
uW4HGEgRg3V6DojKlt9qk4rUv1iE2B0hGgHTPcBQyPGN8cbg7rMove7AhvXqRjg=
=SMup
-----END PGP SIGNATURE-----