Here we go again. Another project update.
I’m working with VOLK and SIMD for two weeks now. I could fix some
hiccups with last weeks pack and unpack kernels. They run just fine
during test now.
Also, I added a ‘volk_8u_x3_encodepolar_8u_x2’ kernel. It operates on
the the assumption that there is one active bit in a byte and it is
located in the LSB. A quick performance test with a 2^32 samples head
block after the encoder shows that generic crunches ~160MSps. So far I
had an encoder which operated on packed bytes and did ~300MSps. An
unpack block was added to the flowgraph with the ‘extended_encoder’ in
use. The vector optimized version does ~570MSps. So it is ~3.5x as fast
as the generic version. Some more optimization might yield even better
At first glance it is weird that the output signature of the encoder is
‘8u_x2’. The kernel internally needs a temporary buffer which has the
same size as the output buffer. Instead of malloc’ing and free’ing it on
every call, it can be created once and be used all the time.
During the week I was struggling with VOLK tests. Finally I solved those
issues. But I’d like to refer to the mail I sent out the other day.
SIMD code tends to have quite a few lines of code. In order to make it
easier to read and understand, it would be great if it was possible to
implement multiple functions within one ‘#ifdef LV_HAVE_ARCH … #endif’
paragraph. But so far the compiler refuses to compile if I did this. It
is possible to add functions in the general section but that’s only
appropriate for a generic kernel or common functions.
All the intrinsics I used so far are available on SSSE3. Although, I
created aligned and unaligned versions of those kernels only store[u]
and load[u] might make a difference here. My benchmarks don’t show any
significant difference. All benchmarks are done on a Sandy Bridge i7.
I suspect the encoder was easier to optimize than the decoder will be.
So for the upcoming week and beyond I will focus on creating kernels for
More info and current project progress can be found in ,  and .