Hi Tom,
thanks for your answer. The point I was making was that at the moment of
me writing the Viterbi code, I tried to use the available VOLK functions
(multiplications, subtractions, etc) and the code was slower than using
directly intrinsics. Implementing a new kernel for Viterbi decoder (with
intrinsics of course like the others) was just the next step in the
process.
So, I totally agree that it worth creating a kernel to completely solve
a problem like a convolutional decoder as it will make it faster. The
downside would be, though, that the next time you want to do something
slightly different you’ll need to create another kernel. But that is the
tradeoff between the flexibility and speed.
I see the your code using Spiral implementation, I will look to see what
speed it gives as for me this is one of the biggest challenges. I still
believe there will be someone who will create a convolutional decoder
implementation that is both readable and fast :). I know, I am speaking
from a open source sw. guys perspective
who inherently has the need
to understand all the code.
Bogdan
BTW, from my experience, to speed-up in the case of depuncturing it
worth making depuncturer part of the decoder or at least aware of that.
On Tue, 2/25/14, Tom R. [email protected] wrote:
Subject: Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant :
Optimization with VOLK
To: “Bogdan D.” [email protected]
Cc: “GNURadio D.ion List” [email protected], “Abhishek
Bhowmick” [email protected]
Date: Tuesday, February 25, 2014, 4:09 PM
On Tue, Feb 25, 2014 at 8:21 AM,
Bogdan D.
[email protected]
wrote:
Hi Abhishek,
When implemented gr-dvbt (GitHub - BogdanDIA/gr-dvbt: DVB-T implementation in gnuradio) I
used VOLK in
many places to speed-up the processing. However, there is a
great deal of speed-up that still need to be achieved on
both Tx/Rx in order to lower cpu cycles consumption so there
are a lot of challenges in the project from this point of
view.
For example the Viterbi implementation is done using
intrinsics instead of using VOLK just because when I used
VOLK it was quite slow, achieving only 16mbps of processing
per single thread (7-8mbps on just C implementation).
Using intrinsics it raised the spead to 32-37mbps per
thread which is quite good but the code is not directly
portable. So, a good Viterbi decoder that achieves easily
over 60mbps speed at input is still necessary probably not
only in dvb-t implementation but perhaps in other
applications. Just to add more to the challenge one may want
to have a readable code beside the necessary speed (Spiral
viterbi implementation is on the opposite side).
Bogdan,
Good advice, generally. Just a few issues to point out.
First, I think
there’s a misconception between “VOLK” and “using
intrinsics.” VOLK
uses intrinsics and so whatever code you wrote with the
intrinsics
could be done in VOLK. For instance, the fecapi that we are
working to
bring into GNU Radio has a constitutional decoder defined as
a single
VOLK kernel:
fecapi/volk_fecapi/kernels/volk_fecapi/volk_fecapi_8u_x4_conv_k7_r2_f2048_8u.h at master · namccart/fecapi · GitHub
This is actually Spiral code that was wrapped up into a
kernel to make
it portable and usable.
Basically, I’m trying to convey that there is not limit to
what we can
define as a kernel in VOLK. In fact, the more complex the
kernel, the
better the speedup because you can keep the data inside the
registers
and more tightly control the algorithm. We just want a
kernel to
represent some operations that would be usable in other
situations,
like a convolutional decoder.
The OFDM synchronization code is also very time
consuming and although uses VOLK already it can be using
with great benefit new AVX2 instructions. Actually many of
the blocks can use new instructions to speed-up the data
processing.
Yes, certainly. The synchronization part is a good place for
optimization.
Tom
Basically, for dvb-t on it’s maximum speed with OFDM
FFT 8k, QAM-64 and puncturing rate 7/8 the output of video
is of 32mbps which means more than 60mbps of processing
speed after de-puncturing. A bigger challenge would be
implementing real life DVB-S receiver where the data rate is
over 50mbps at video output
).
This is just my short insight of challenges one may
face when dealing with speed optimizations in a modern
communication project.
Bogdan
On Sun, 2/23/14, Abhishek B. [email protected]
wrote:
Subject: [Discuss-gnuradio] Google Summer of Code
2014 applicant : Optimization with VOLK
To: [email protected]
Date: Sunday, February 23, 2014, 8:52 AM
Hello,
I have completed a Bachelor’s degree in
Electrical Engineering from IIT Bombay, India and
will be
joining a masters program in Computer Science in
August. For
the summer, I am interested in participating GSoC
2014 and
GNU Radio is an organization where my background
fits
nicely.
I went through the ideas page and was
particularly interested in doing performance
optimization
with VOLK. After going through some online
documentation
about the library and the SDR’12 paper, I
realised that
following areas need work :
-
Profiling GNU radio code to identify new
kernels and implement them for existing Intel
SIMD
extensions, also porting kernels to other ISA
extensions.
-
Better testing of the effects of more complex
scheduler logic on larger environments (beyond
simple
kernels)
-
Exploring extension of Volk to GPU ISAs, to
leverage chips such as AMD Fusion (However, this
seems to
more research than software development)
According to the GSoC proposal, point (1) seems
to be the expectation. Given this, I would like
some advice
on how to go ahead looking for potential ideas
(and some
feedback on feasibility of the other ideas as
well)
My background : C++, Python, Signal Processing,
Computer Architecture
Thanks,
Abhishek B.
-----Inline Attachment Follows-----
Discuss-gnuradio mailing list
[email protected]
Discuss-gnuradio Info Page
Discuss-gnuradio mailing list
[email protected]
Discuss-gnuradio Info Page