Forum: GNU Radio Google Summer of Code 2014 applicant : Optimization with VOLK

22a2c3bd265d940ceafff9b8dd007255?d=identicon&s=25 Abhishek Bhowmick (Guest)
on 2014-02-23 07:53
(Received via mailing list)
Hello,
I have completed a Bachelor's degree in Electrical Engineering from IIT
Bombay, India and will be joining a masters program in Computer Science
in
August. For the summer, I am interested in participating GSoC 2014 and
GNU
Radio is an organization where my background fits nicely.

I went through the ideas page and was particularly interested in doing
performance optimization with VOLK. After going through some online
documentation about the library and the SDR'12 paper, I realised that
following areas need work :
1. Profiling GNU radio code to identify new kernels and implement them
for
existing Intel SIMD extensions, also porting kernels to other ISA
extensions.
2. Better testing of the effects of more complex scheduler logic on
larger
environments (beyond simple kernels)
3. Exploring extension of Volk to GPU ISAs, to leverage chips such as
AMD
Fusion (However, this seems to more research than software development)

According to the GSoC proposal, point (1) seems to be the expectation.
Given this, I would like some advice on how to go ahead looking for
potential ideas (and some feedback on feasibility of the other ideas as
well)

My background : C++, Python, Signal Processing, Computer Architecture

Thanks,
Abhishek Bhowmick
3c2c8407d19f3488e9cd2a28a9732bde?d=identicon&s=25 West, Nathan (Guest)
on 2014-02-24 06:17
(Received via mailing list)
On Sun, Feb 23, 2014 at 12:52 AM, Abhishek Bhowmick
<abhowmick22@gmail.com> wrote:
> Hello,
> I have completed a Bachelor's degree in Electrical Engineering from IIT
> Bombay, India and will be joining a masters program in Computer Science in
> August. For the summer, I am interested in participating GSoC 2014 and GNU
> Radio is an organization where my background fits nicely.
>
> I went through the ideas page and was particularly interested in doing
> performance optimization with VOLK.

Great to hear. Just keep in mind we have another ~13 hours before we
as an organization
know whether we were accepted or not.

>
> According to the GSoC proposal, point (1) seems to be the expectation. Given
> this, I would like some advice on how to go ahead looking for potential
> ideas (and some feedback on feasibility of the other ideas as well)
>
> My background : C++, Python, Signal Processing, Computer Architecture
>
> Thanks,
> Abhishek Bhowmick
>

Abhishek,

Right, so points 1 and 2 are what I had in mind when I wrote the idea
on our list. Point 3
is technically possible to do in VOLK, but probably not really worth
using GPUs in this way
since the transport costs would dwarf any acceleration from the
current VOLK kernel. That said,
there's nothing really wrong with a proposal that is on a research
area, but I do think we would want
code and something contributed back to the community at the end of the
project. Also, don't let that
prevent you from submitting a proposal about GPU programming if that's
what you're interested in, it's
probably just not best targeted for VOLK. My understanding of GSoC
proposals is that you can submit
any number,so you can submit one for doing some GPU acceleration and
another for something more
related to VOLK.

So for points 1 and 2 it would be good to see a specific algorithm or
module that you think would
benefit from moving to VOLK, which would take some research on your
part. I think gr-atsc is a
good place to look for some acceleration gains, and it would be good
to see that application run
real time. One of the things I had in mind is accelerating OFDM frame
sync. Martin's ofdm_{rx,tx}
and gr-80211 are good examples where they use blocks that have VOLK
kernels in them to do the
sync, but that's somewhat inefficient because we move data in and out
of SIMD registers. Of course
there's the old trade-off of modularization and code re-use vs. speed.
I'd be glad to discuss this and
similar ideas more once we know we are accepted as an org. There's one
more possibility that recently
came up, but I'd like to wait until things are official before
recommending it (and I'll need to talk with
other interested parties).

At the moment the only mainstream ISA not being targeted is probably
AVX2, which has
some nice features for the type of kernels we're doing.  If you went
that route it would likely need add
protokernels to a pretty large number of kernels.

Nathan
C539637020fd56193dd6daec746c4a84?d=identicon&s=25 Tom Rondeau (Guest)
on 2014-02-24 16:31
(Received via mailing list)
On Mon, Feb 24, 2014 at 12:15 AM, West, Nathan
<natw@ostatemail.okstate.edu> wrote:
>
>> 2. Better testing of the effects of more complex scheduler logic on larger
>> Thanks,
> current VOLK kernel. That said,
> related to VOLK.
I agree with Nathan that VOLK is probably not the right abstraction
for GPUs. The Fusion concept with the GPU and GPP on the same die is
compelling, but maybe too specific. There is another project called
gr-gpu that's focusing on the GPU problem more generally.

> sync, but that's somewhat inefficient because we move data in and out
> of SIMD registers. Of course
> there's the old trade-off of modularization and code re-use vs. speed.
> I'd be glad to discuss this and
> similar ideas more once we know we are accepted as an org. There's one
> more possibility that recently
> came up, but I'd like to wait until things are official before
> recommending it (and I'll need to talk with
> other interested parties).

We're working with Andrew Davis on updating gr-atsc
(https://github.com/glneo/gnuradio/tree/atscfixup). If you decide to
focus on ATSC speedups with VOLK, look into that project instead of
the one inside gnuradio (which will be deprecated).

Tom
22a2c3bd265d940ceafff9b8dd007255?d=identicon&s=25 Abhishek Bhowmick (Guest)
on 2014-02-24 21:34
(Received via mailing list)
On Mon, Feb 24, 2014 at 9:00 PM, Tom Rondeau <tom@trondeau.com> wrote:
>>> I went through the ideas page and was particularly interested in doing
>>> existing Intel SIMD extensions, also porting kernels to other ISA
>>> My background : C++, Python, Signal Processing, Computer Architecture
>> using GPUs in this way
>> any number,so you can submit one for doing some GPU acceleration and
>> module that you think would
>> there's the old trade-off of modularization and code re-use vs. speed.
> the one inside gnuradio (which will be deprecated).
>
> Tom
>

Firstly, congratulations on being accepted as a mentor organization.

Thanks for the pointers to gr-atsc and gr-80211. I have started
looking there as a
starting point. Are there similar modules which are undergoing volk
speedup fixes?
I am also trying to meet up with other people who have been using GNU
radio
to identify potential modules for acceleration. As you are now a
mentor organization, I feel it's a good time for us to get into
detailed discussions.

>
>> At the moment the only mainstream ISA not being targeted is probably
>> AVX2, which has
>> some nice features for the type of kernels we're doing.  If you went
>> that route it would likely need add
>> protokernels to a pretty large number of kernels.
>>
>> Nathan

This also seems to be promising, though I guess it would require me to
come up to speed with AVX2 (which I would love to do). Could you
please elaborate
a little on the kind of beneficial features you have in mind ? I am
concerned that the
job of adding proto-kernels might turn out to be mundane/tedious ? Is
that a valid
concern ?

Abhishek
B4ffbc711addde4c649b1ed526df6157?d=identicon&s=25 Martin Braun (Guest)
on 2014-02-25 09:48
(Received via mailing list)
On 02/24/2014 09:33 PM, Abhishek Bhowmick wrote:
> Firstly, congratulations on being accepted as a mentor organization.
>
> Thanks for the pointers to gr-atsc and gr-80211. I have started
> looking there as a
> starting point. Are there similar modules which are undergoing volk
> speedup fixes?

I had started working on accelerating the OFDM sync blocks in-tree...
I'd need to dig around a bit to get that up to date. Contact me off-list
if you want to see about this.

> I am also trying to meet up with other people who have been using GNU radio
> to identify potential modules for acceleration. As you are now a
> mentor organization, I feel it's a good time for us to get into
> detailed discussions.

Absolutely, that goes for you and all other applicants. The ideas list
is just to get you started, for successful participation, you will need
to write a proposal that details what you plan to do for 3 months of
full-time coding.

> please elaborate
> a little on the kind of beneficial features you have in mind ? I am
> concerned that the
> job of adding proto-kernels might turn out to be mundane/tedious ? Is
> that a valid
> concern ?

That's highly subjective :) If you don't know much about those specific
SIMD instruction sets, it's probably neither. On the other hand, if
you're already an expert, it's not that much work, and you can quickly
benefit the project. I don't think anyone expects you to do 3 months of
proto-kernel development, so you can balance it in your project
proposal.

M
616d6d8c8b18b9bbd5a998ff7ae69066?d=identicon&s=25 Bogdan Diaconescu (Guest)
on 2014-02-25 14:23
(Received via mailing list)
Hi  Abhishek,

When implemented gr-dvbt (https://github.com/BogdanDIA/gr-dvbt) I used
VOLK in many places to speed-up the processing. However, there is a
great deal of speed-up that still need to be achieved on both Tx/Rx in
order to lower cpu cycles consumption so there are a lot of challenges
in the project from this point of view.

For example the Viterbi implementation is done using intrinsics instead
of using VOLK just because when I used VOLK it was quite slow, achieving
only 16mbps of processing per single thread (7-8mbps on just C
implementation).
Using intrinsics it raised the spead to 32-37mbps per thread which is
quite good but the code is not directly portable. So, a good Viterbi
decoder that achieves easily over 60mbps speed at input is still
necessary probably not only in dvb-t implementation but perhaps in other
applications. Just to add more to the challenge one may want to have a
readable code beside the necessary speed (Spiral viterbi implementation
is on the opposite side).

The OFDM synchronization code is also very time consuming and although
uses VOLK already it can be using with great benefit new AVX2
instructions. Actually many of the blocks can use new instructions to
speed-up the data processing.

Basically, for dvb-t on it's maximum speed with OFDM FFT 8k, QAM-64 and
puncturing rate 7/8 the output of video is of 32mbps which means more
than 60mbps of processing speed after de-puncturing. A bigger challenge
would be implementing real life DVB-S receiver where the data rate is
over 50mbps at video output :) ).

This is just my short insight of challenges one may face when dealing
with speed optimizations in a modern communication project.

Bogdan


--------------------------------------------
On Sun, 2/23/14, Abhishek Bhowmick <abhowmick22@gmail.com> wrote:

 Subject: [Discuss-gnuradio] Google Summer of Code 2014 applicant :
Optimization with VOLK
 To: discuss-gnuradio@gnu.org
 Date: Sunday, February 23, 2014, 8:52 AM

 Hello,
 I have completed a Bachelor's degree in
 Electrical Engineering from IIT Bombay, India and will be
 joining a masters program in Computer Science in August. For
 the summer, I am interested in participating GSoC 2014 and
 GNU Radio is an organization where my background fits
 nicely.


 I went through the ideas page and was
 particularly interested in doing performance optimization
 with VOLK. After going through some online documentation
 about the library and the SDR'12 paper, I realised that
 following areas need work :

 1. Profiling GNU radio code to identify new
 kernels and implement them for existing Intel SIMD
 extensions, also porting kernels to other ISA extensions.
 2. Better testing of the effects of more complex
 scheduler logic on larger environments (beyond simple
 kernels)

 3. Exploring extension of Volk to GPU ISAs, to
 leverage chips such as AMD Fusion (However, this seems to
 more research than software development)

 According to the GSoC proposal, point (1) seems
 to be the expectation. Given this, I would like some advice
 on how to go ahead looking for potential ideas (and some
 feedback on feasibility of the other ideas as well)


 My background : C++, Python, Signal Processing,
 Computer Architecture

 Thanks,
 Abhishek Bhowmick


 -----Inline Attachment Follows-----

 _______________________________________________
 Discuss-gnuradio mailing list
 Discuss-gnuradio@gnu.org
 https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
C539637020fd56193dd6daec746c4a84?d=identicon&s=25 Tom Rondeau (Guest)
on 2014-02-25 15:11
(Received via mailing list)
On Tue, Feb 25, 2014 at 8:21 AM, Bogdan Diaconescu
<b_diaconescu@yahoo.com> wrote:
> Hi  Abhishek,
>
> When implemented gr-dvbt (https://github.com/BogdanDIA/gr-dvbt) I used VOLK in
many places to speed-up the processing. However, there is a great deal of 
speed-up
that still need to be achieved on both Tx/Rx in order to lower cpu cycles
consumption so there are a lot of challenges in the project from this point of
view.
>
> For example the Viterbi implementation is done using intrinsics instead of using
VOLK just because when I used VOLK it was quite slow, achieving only 16mbps of
processing per single thread (7-8mbps on just C implementation).
> Using intrinsics it raised the spead to 32-37mbps per thread which is quite good
but the code is not directly portable. So, a good Viterbi decoder that achieves
easily over 60mbps speed at input is still necessary probably not only in dvb-t
implementation but perhaps in other applications. Just to add more to the
challenge one may want to have a readable code beside the necessary speed 
(Spiral
viterbi implementation is on the opposite side).


Bogdan,

Good advice, generally. Just a few issues to point out. First, I think
there's a misconception between "VOLK" and "using intrinsics." VOLK
uses intrinsics and so whatever code you wrote with the intrinsics
could be done in VOLK. For instance, the fecapi that we are working to
bring into GNU Radio has a constitutional decoder defined as a single
VOLK kernel:

https://github.com/namccart/fecapi/blob/master/vol...

This is actually Spiral code that was wrapped up into a kernel to make
it portable and usable.

Basically, I'm trying to convey that there is not limit to what we can
define as a kernel in VOLK. In fact, the more complex the kernel, the
better the speedup because you can keep the data inside the registers
and more tightly control the algorithm. We just want a kernel to
represent some operations that would be usable in other situations,
like a convolutional decoder.


> The OFDM synchronization code is also very time consuming and although uses VOLK
already it can be using with great benefit new AVX2 instructions. Actually many 
of
the blocks can use new instructions to speed-up the data processing.

Yes, certainly. The synchronization part is a good place for
optimization.

Tom
616d6d8c8b18b9bbd5a998ff7ae69066?d=identicon&s=25 Bogdan Diaconescu (Guest)
on 2014-02-25 16:31
(Received via mailing list)
Hi Tom,

thanks for your answer. The point I was making was that at the moment of
me writing the Viterbi code, I tried to use the available VOLK functions
(multiplications, subtractions, etc) and the code was slower than using
directly intrinsics. Implementing a new kernel for Viterbi decoder (with
intrinsics of course like the others) was just the next step in the
process.

So, I totally agree that it worth creating a kernel to completely solve
a problem like a convolutional decoder as it will make it faster. The
downside would be, though, that the next time you want to do something
slightly different you'll need to create another kernel. But that is the
tradeoff between the flexibility and speed.

I see the your code using Spiral implementation, I will look to see what
speed it gives as for me this is one of the biggest challenges. I still
believe there will be someone who will create a convolutional decoder
implementation that is both readable and fast :). I know, I am speaking
from a open source sw. guys perspective :) who inherently has the need
to understand all the code.

Bogdan

BTW, from my experience, to speed-up in the case of depuncturing it
worth making depuncturer part of the decoder or at least aware of that.


--------------------------------------------
On Tue, 2/25/14, Tom Rondeau <tom@trondeau.com> wrote:

 Subject: Re: [Discuss-gnuradio] Google Summer of Code 2014 applicant :
Optimization with VOLK
 To: "Bogdan Diaconescu" <b_diaconescu@yahoo.com>
 Cc: "GNURadio Discussion List" <discuss-gnuradio@gnu.org>, "Abhishek
Bhowmick" <abhowmick22@gmail.com>
 Date: Tuesday, February 25, 2014, 4:09 PM

 On Tue, Feb 25, 2014 at 8:21 AM,
 Bogdan Diaconescu
 <b_diaconescu@yahoo.com>
 wrote:
 > Hi Abhishek,
 >
 > When implemented gr-dvbt (https://github.com/BogdanDIA/gr-dvbt) I
used VOLK in
 many places to speed-up the processing. However, there is a
 great deal of speed-up that still need to be achieved on
 both Tx/Rx in order to lower cpu cycles consumption so there
 are a lot of challenges in the project from this point of
 view.
 >
 > For example the Viterbi implementation is done using
 intrinsics instead of using VOLK just because when I used
 VOLK it was quite slow, achieving only 16mbps of processing
 per single thread (7-8mbps on just C implementation).
 > Using intrinsics it raised the spead to 32-37mbps per
 thread which is quite good but the code is not directly
 portable. So, a good Viterbi decoder that achieves easily
 over 60mbps speed at input is still necessary probably not
 only in dvb-t implementation but perhaps in other
 applications. Just to add more to the challenge one may want
 to have a readable code beside the necessary speed (Spiral
 viterbi implementation is on the opposite side).


 Bogdan,

 Good advice, generally. Just a few issues to point out.
 First, I think
 there's a misconception between "VOLK" and "using
 intrinsics." VOLK
 uses intrinsics and so whatever code you wrote with the
 intrinsics
 could be done in VOLK. For instance, the fecapi that we are
 working to
 bring into GNU Radio has a constitutional decoder defined as
 a single
 VOLK kernel:

 https://github.com/namccart/fecapi/blob/master/vol...

 This is actually Spiral code that was wrapped up into a
 kernel to make
 it portable and usable.

 Basically, I'm trying to convey that there is not limit to
 what we can
 define as a kernel in VOLK. In fact, the more complex the
 kernel, the
 better the speedup because you can keep the data inside the
 registers
 and more tightly control the algorithm. We just want a
 kernel to
 represent some operations that would be usable in other
 situations,
 like a convolutional decoder.


 > The OFDM synchronization code is also very time
 consuming and although uses VOLK already it can be using
 with great benefit new AVX2 instructions. Actually many of
 the blocks can use new instructions to speed-up the data
 processing.

 Yes, certainly. The synchronization part is a good place for
 optimization.

 Tom



 > Basically, for dvb-t on it's maximum speed with OFDM
 FFT 8k, QAM-64 and puncturing rate 7/8 the output of video
 is of 32mbps which means more than 60mbps of processing
 speed after de-puncturing. A bigger challenge would be
 implementing real life DVB-S receiver where the data rate is
 over 50mbps at video output :) ).
 >
 > This is just my short insight of challenges one may
 face when dealing with speed optimizations in a modern
 communication project.
 >
 > Bogdan
 >
 >
 > --------------------------------------------
 > On Sun, 2/23/14, Abhishek Bhowmick <abhowmick22@gmail.com>
 wrote:
 >
 > Subject: [Discuss-gnuradio] Google Summer of Code
 2014 applicant : Optimization with VOLK
 > To: discuss-gnuradio@gnu.org
 > Date: Sunday, February 23, 2014, 8:52 AM
 >
 > Hello,
 > I have completed a Bachelor's degree in
 > Electrical Engineering from IIT Bombay, India and
 will be
 > joining a masters program in Computer Science in
 August. For
 > the summer, I am interested in participating GSoC
 2014 and
 > GNU Radio is an organization where my background
 fits
 > nicely.
 >
 >
 > I went through the ideas page and was
 > particularly interested in doing performance
 optimization
 > with VOLK. After going through some online
 documentation
 > about the library and the SDR'12 paper, I
 realised that
 > following areas need work :
 >
 > 1. Profiling GNU radio code to identify new
 > kernels and implement them for existing Intel
 SIMD
 > extensions, also porting kernels to other ISA
 extensions.
 > 2. Better testing of the effects of more complex
 > scheduler logic on larger environments (beyond
 simple
 > kernels)
 >
 > 3. Exploring extension of Volk to GPU ISAs, to
 > leverage chips such as AMD Fusion (However, this
 seems to
 > more research than software development)
 >
 > According to the GSoC proposal, point (1) seems
 > to be the expectation. Given this, I would like
 some advice
 > on how to go ahead looking for potential ideas
 (and some
 > feedback on feasibility of the other ideas as
 well)
 >
 >
 > My background : C++, Python, Signal Processing,
 > Computer Architecture
 >
 > Thanks,
 > Abhishek Bhowmick
 >
 >
 > -----Inline Attachment Follows-----
 >
 > _______________________________________________
 > Discuss-gnuradio mailing list
 > Discuss-gnuradio@gnu.org
 > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
 >
 >
 > _______________________________________________
 > Discuss-gnuradio mailing list
 > Discuss-gnuradio@gnu.org
 > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
3c2c8407d19f3488e9cd2a28a9732bde?d=identicon&s=25 West, Nathan (Guest)
on 2014-02-25 23:38
(Received via mailing list)
>  >  Electrical Engineering from IIT Bombay, India and
>  will be
>  >  joining a masters program in Computer Science in
>  August. For
>  >  the summer, I am interested in participating GSoC
>  2014 and
>  >  GNU Radio is an organization wheAbhishekre my background
>  fits
>  >  nicely.
>  >
>  >>  > --------------------------------------------

>  >  kernels and implement them for existing Intel
>  seems to
>  >
>  >  My background : C++, Python, Signal Processing,
>  >  Computer Architecture
>  >
>  >  Thanks,
>  >  Abhishek Bhowmick
>  >


This is a great conversation, and I'll take the opportunity to plug
the up coming VOLK working group call
(https://plus.google.com/u/1/events/ch3jrjcvp7mdiqe...).
Bogdan, your results aren't particula>  >
--------------------------------------------
rly surprising, but the feedback is really good to hear.

Back to GSoC:

Abhishek,

>Thanks for the pointers to gr-atsc and gr-80211. I have started
>looking there as a
>starting point. Are there similar modules which are undergoing volk
>speedup fixes?
>I am also trying to meet up with other people who have been using GNU radio
>to identify potential modules for acceleration. As you are now a
>mentor organization, I feel it's a good time for us to get into
>detailed discussions.

From the previous discussion it should be apparent that how algorithms
are implemented will make the biggest difference, and that the new
acceleration is primarily going to come from larger more complex
kernels. At the end of the day it's going to be your proposal. So far
on the list of places to look we have

* in-tree OFDM (contact Martin)
* gr-atsc (use Andrew Davis' fork)
* gr-dvbt
* gr-fecapi

For your proposal I would recommend looking at their code, then
getting in contact with the author(s) of those modules to ask about
their thoughts on accelerating blocks they have written. The reality
of this project is that we are accelerating some signal processing
algorithm and knowledge of that algorithm is useful for acceleration.
Whatever application you have interested and/or knowledge in (fresh
out of a BS it's more likely to be interest) should guide your
proposal. If you know anything about error correcting codes then the
latter 2 would be good fits. OFDM frame detection probably has a
gentler learning curve since at the basic level you're looking at
convolution, and there's papers you can look for on more involved
algorithms. Other algorithms to look at might include agc or
equalizers.

If you're interested in GPU programming don't forget to checkout gr-gpu.

>This also seems to be promising, though I guess it would require me to
>come up to speed with AVX2 (which I would love to do). Could you
>please elaborate
>a little on the kind of beneficial features you have in mind ? I am
>concerned that the
>job of adding proto-kernels might turn out to be mundane/tedious ? Is
>that a valid concern ?

Right, so as Martin mentioned the answer is sort of relative. I
wouldn't go so far as to say it's mundane, especially if you have
little experience with using intrinsics and SIMD instructions. One
reason AVX isn't so prominently featured (I suspect) is that the
instructions are almost the same as SSE instructions, but the vectors
are twice as long so that is actually mundane. AVX2/FMA extensions
introduce some new features to the amd64 instruction set. The most
obvious being that it looks like Intel and AMD finally settled in on
the same fused multiply-add (there's also a multiply-subtract that's
good for complex numbers) implementation. That will likely be able to
speed things up a bit, but I'm also looking forward to seeing gains
from the various load_gathers that have been introduced. They allow
you to do a single load operation that gathers vector elements that
span pretty large ranges. VOLK won't be so interested in the large
ranges (except maybe decimators), but it could be useful for loading
complex vectors. There's some other math functions we may be able to
leverage, but those are two features that I think would be widely
applicable.

In your proposal you should definitely include what ISAs you intend to
use, and if there are features specific to that instruction set then
point out why it's a good choice. This is mostly important for
choosing between SSE and friends, AVX, AVX2/FMA. It would be good to
see plans that include NEON support for anything you'd add to amd64
platforms, but that's not a requirement.


Nathan
3c2c8407d19f3488e9cd2a28a9732bde?d=identicon&s=25 West, Nathan (Guest)
on 2014-02-26 00:34
(Received via mailing list)
On Tue, Feb 25, 2014 at 4:37 PM, West, Nathan
<natw@ostatemail.okstate.edu> wrote:
>>  >  Electrical Engineering from IIT Bombay, India and
>
>>  >  kernels and implement them for existing Intel
>>  seems to
>>  >
> (https://plus.google.com/u/1/events/ch3jrjcvp7mdiqe...).
>>starting point. Are there similar modules which are undergoing volk
> on the list of places to look we have
> algorithm and knowledge of that algorithm is useful for acceleration.
>
>>This also seems to be promising, though I guess it would require me to
> reason AVX isn't so prominently featured (I suspect) is that the
> ranges (except maybe decimators), but it could be useful for loading
>
>
> Nathan

I also see that GNSS-SDR made it to GSoC and they have a VOLK related
project.
http://gnss-sdr.org/documentation/google-summer-co...
22a2c3bd265d940ceafff9b8dd007255?d=identicon&s=25 Abhishek Bhowmick (Guest)
on 2014-02-26 07:50
(Received via mailing list)
Thanks everyone. These are quite a few pointers, I will spend some time
digesting it all.

So there are really two approaches, large complex kernels on
one hand and AVX2/AVX/FMA on the other, or a combination of the two.

I guess I should propose identifying and implementing larger complex
kernels
and then further accelerating using AVX2/FMA etc. Doing both will of
course limit the
number of  applications/algorithms I can feasibly target. What's your
take on
this ?

Abhishek

On Wed, Feb 26, 2014 at 5:03 AM, West, Nathan
<natw@ostatemail.okstate.edu> wrote:
>>>  >  Hello,
>>>  >
>>>  >
>>>  >  3. Exploring extension of Volk to GPU ISAs, to
>>>  well)
>> This is a great conversation, and I'll take the opportunity to plug
>>>Thanks for the pointers to gr-atsc and gr-80211. I have started
>> acceleration is primarily going to come from larger more complex
>> their thoughts on accelerating blocks they have written. The reality
>>
>>>> Nathan
>> wouldn't go so far as to say it's mundane, especially if you have
>> you to do a single load operation that gathers vector elements that
>> see plans that include NEON support for anything you'd add to amd64
>> platforms, but that's not a requirement.
>>
>>
>> Nathan
>
> I also see that GNSS-SDR made it to GSoC and they have a VOLK related project.
> http://gnss-sdr.org/documentation/google-summer-co...

Yeah, I also noticed that. I might submit a proposal to them also.

Abhishek
22a2c3bd265d940ceafff9b8dd007255?d=identicon&s=25 Abhishek Bhowmick (Guest)
on 2014-03-10 15:34
(Received via mailing list)
Hello,
I would like to clarify some things :

1. I feel it is tough to beat spiral implementations through manual
vectorization, performance wise. If so, is readability the prime and
only reason for using intrinsics manually, and hence of value to the
community ?

2. What is currently the state of adding support for sse4, neon in
stock volk kernels (project ideas page mentions some work is under
way) ? Would be great if someone who is working on this already shares
his branch, so that I may know how much/if any work is needed in this
before moving on to avx. Of course, new kernels will need support for
all.

3. How feasible/useful does it sound to incorporate the newly added
idea of 'turbo equalizer' within the ofdm system ? Are the
requirements of the proposed equalizer overkill for the ofdm blocks?

Abhishek

On Wed, Feb 26, 2014 at 1:49 AM, Abhishek Bhowmick
<abhowmick22@gmail.com> wrote:
> this ?
>>>>  >  Subject: [Discuss-gnuradio] Google Summer of Code
>>>>  >  the summer, I am interested in participating GSoC
>>>>  >  with VOLK. After going through some online
>>>>  >  2. Better testing of the effects of more complex
>>>>  >  to be the expectation. Given this, I would like
>>>>  >  Thanks,
>>>
>>>>mentor organization, I feel it's a good time for us to get into
>>> * gr-dvbt
>>> latter 2 would be good fits. OFDM frame detection probably has a
>>>>> AVX2, which has
>>>>concerned that the
>>> obvious being that it looks like Intel and AMD finally settled in on
>>>
>> I also see that GNSS-SDR made it to GSoC and they have a VOLK related project.
>> http://gnss-sdr.org/documentation/google-summer-co...
>
> Yeah, I also noticed that. I might submit a proposal to them also.
>
> Abhishek



--
Regards;
Abhishek Bhowmick,
Senior Undergraduate,
Department of Electrical Engineering,
IIT Bombay.

On Wed, Feb 26, 2014 at 12:19 PM, Abhishek Bhowmick
<abhowmick22@gmail.com> wrote:
> this ?
>>>>  >  Subject: [Discuss-gnuradio] Google Summer of Code
>>>>  >  the summer, I am interested in participating GSoC
>>>>  >  with VOLK. After going through some online
>>>>  >  2. Better testing of the effects of more complex
>>>>  >  to be the expectation. Given this, I would like
>>>>  >  Thanks,
>>>
>>>>mentor organization, I feel it's a good time for us to get into
>>> * gr-dvbt
>>> latter 2 would be good fits. OFDM frame detection probably has a
>>>>> AVX2, which has
>>>>concerned that the
>>> obvious being that it looks like Intel and AMD finally settled in on
>>>
>> I also see that GNSS-SDR made it to GSoC and they have a VOLK related project.
>> http://gnss-sdr.org/documentation/google-summer-co...
>
> Yeah, I also noticed that. I might submit a proposal to them also.
>
> Abhishek



--
Regards;
Abhishek Bhowmick,
Senior Undergraduate,
Department of Electrical Engineering,
IIT Bombay.
B4ffbc711addde4c649b1ed526df6157?d=identicon&s=25 Martin Braun (Guest)
on 2014-03-10 16:46
(Received via mailing list)
On 03/10/2014 03:33 PM, Abhishek Bhowmick wrote:
> way) ? Would be great if someone who is working on this already shares
> his branch, so that I may know how much/if any work is needed in this
> before moving on to avx. Of course, new kernels will need support for
> all.
>
> 3. How feasible/useful does it sound to incorporate the newly added
> idea of 'turbo equalizer' within the ofdm system ? Are the
> requirements of the proposed equalizer overkill for the ofdm blocks?

Turbo equalizers are generally not a good choice for OFDM, because of
the way the OFDM parameters are chosen w.r.t. the channel properties. In
OFDM, you usually have 1-tap equalizers, the only difficulty is the
channel estimation.

MB
3c2c8407d19f3488e9cd2a28a9732bde?d=identicon&s=25 West, Nathan (Guest)
on 2014-03-11 16:26
(Received via mailing list)
On Mon, Mar 10, 2014 at 2:32 PM, Abhishek Bhowmick
<abhowmick22@gmail.com> wrote:
>> >
>> > stock volk kernels (project ideas page mentions some work is under
>> some existing kernels thrown in if time permits.
> I am afraid my question didn't come out correctly. I was referring to the
>>
> Senior Undergraduate,
> Department of Electrical Engineering,
> IIT Bombay.

Ah! So there was a slight miscommunication. Yes, porting the
OpenAirInterfaces
SIMD code to VOLK is a good option as well. The turbo channel
coder/decoder
is part of that. I've *briefly* looked at the code to see what is
currently there, and
it's my understanding that the work involved will be to write generic
C implementations
of vectorized code where the generic version does not exist. Beyond
that porting to
newer/different ISAs (AVX or NEON depending on your preference and
hardware
availability). I think Florian is on the gr-discuss mailing list, but
I've CCed him to
hopefully provide more details as he's more familiar with the original
code base.
267cd73bdb1100865db59d1cacb6354f?d=identicon&s=25 Florian Kaltenberger (Guest)
on 2014-03-11 23:09
(Received via mailing list)
Attachment: Florian_Kaltenberger.vcf (365 Bytes)
Hi Nathan and Abhishek,

On 10/03/2014 23:22, West, Nathan wrote:
> I've CCed him to
> hopefully provide more details as he's more familiar with the original
> code base.
I only joined this mailing list recently, so I probably missed a part of
the discussion. Let me summarize briefly what OpenAirInterface can
provide. We have optimized SIMD (SSE4) implementations of the LTE turbo
encoder and decoder as well as the LTE tail-biting Viterbi encoder and
decoder. We also have the 802.11 Viterbi encoder and decoder. The only
functions for which we have generic non-vectorized functional
equivalents is the LTE turbo decoder.
I am not sure I understand why it is necessary to write generic versions
for the already optimized SIMD code. My idea was to port the optimized
SIMD code from OpenAirInterface to VOLK, such that is can be used by GR
applications. I am not familiar with VOLK (yet) but this might just be
as easy as writing a wrapper function.
As Nathan suggested, the more interesting part is probably to upgrade
the code to AVX2 or similar.

Cheers,
Florian.
5e623cc1b53ddefb15c9bad4245986a1?d=identicon&s=25 Johannes Demel (Guest)
on 2014-03-12 00:40
(Received via mailing list)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Florian,

the generic implementation serves 2 purposes, at least in my opinion.
Firstly, in case the necessary hardware extension is not available on
the target hardware, there is a backup default.
Secondly, the generic implementation is usually easier to read and
thus preferable as a reference. I know that this is a 'the code is the
documentation' argument. But even though documentation might be really
good, sometimes looking at the code just makes things clearer.

happy hacking
Johannes

On 11.03.2014 23:08, Florian Kaltenberger wrote:
>> NEON depending on your preference and hardware availability). I
> decoder. I am not sure I understand why it is necessary to write
> generic versions for the already optimized SIMD code. My idea was
> to port the optimized SIMD code from OpenAirInterface to VOLK, such
> that is can be used by GR applications. I am not familiar with VOLK
> (yet) but this might just be as easy as writing a wrapper
> function. As Nathan suggested, the more interesting part is
> probably to upgrade the code to AVX2 or similar.
>
> Cheers, Florian.
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTH56MAAoJEO7fmkDsqywMb+AP/jNXrJoV7Cs6wY7Cx9AHkllM
NEo1mxxBhaALsxWv9xwTImaGpA83guiBZ8o0CufYj65oN/i1mN8dUHgK9D/SlLSn
GhWTZSBlBiVvIUtxFskDaAA0sqg/2Ae+iYoDKm0yxJerU49K5YGrTBFhzgl7i/r5
fZz+BIGPm29rP1kHyRfw/ROmonXOlz1z+jIR7PGK7DEQbw/Uy9eITchVVKMNsdjm
X73+vJHF9UXftzbpwEF/CsgwWVnTvWEVy3YjvxaKRMET/2zQWtEJst51l+aVWFLp
M4ejRtf4zmuSBx5JUMf0/eY1lnNWUkqdlcEaLkddalDwl5chkkfxtS+Gwd6YEqJH
pdqIa7BfHMaPwrKJ/bX3Wp9u9czWcwI9c8A9GjxnFrASIy2g+QLzU21XDdmImFWm
iqaOB0p+/y6bK/V91a4ZjL9gtTBRahlmlmB2EIcPsxlnW+PjJZKNPA833BkuqEE8
gU7w9diq5nbEQYhsvxeqz0WX16yZNwJlz98ane8+oZaVNt9JRI0cjuj0JX24EEPT
9wUfPkmnr1325NJISFJx8X7w2mAQV3zrf5Md1wOfI6Ls9Byfp8+WjeRYrTfzC95b
kXvcTVID0XZdoTItSjQUEbbJAbl8IkfwWaQCNgHCcJzfZLdJ2hHy7RqpiBAkMmAv
QzLU1GZSSHXzEPKUI1fM
=INm4
-----END PGP SIGNATURE-----
22a2c3bd265d940ceafff9b8dd007255?d=identicon&s=25 Abhishek Bhowmick (Guest)
on 2014-03-14 19:28
(Received via mailing list)
Hi,
So, according to some suggestions,  I looked into how I can potentially
use
better signal processing for the OFDM receiver. I was thinking of a LS
estimator with higher order interpolation or an MMSE estimator for the
channel estimator part. Also, a MMSE-DFE or Viterbi equalizer. These
will
need matrix operations and other computations, which can potentially be
developed into new volk kernels.
1. Are the computational complexities involved feasible in the current
framework ?
2. Though they can give better BER in adverse channel conditions, can
they
do deliver more in terms of throughput/performance?
3. Is it a good idea to include such implementations alongside doing new
volk kernels in the same proposal ?

Abhishek


On Wed, Mar 12, 2014 at 3:38 AM, Florian Kaltenberger <
B4ffbc711addde4c649b1ed526df6157?d=identicon&s=25 Martin Braun (Guest)
on 2014-03-14 23:09
(Received via mailing list)
On 14.03.2014 19:27, Abhishek Bhowmick wrote:
> they do deliver more in terms of throughput/performance?
> 3. Is it a good idea to include such implementations alongside doing new
> volk kernels in the same proposal ?

Abishek,

at this point, please just put together a proposal and upload it so we
can make sure it gets into Melange in time.

M
22a2c3bd265d940ceafff9b8dd007255?d=identicon&s=25 Abhishek Bhowmick (Guest)
on 2014-03-15 10:38
(Received via mailing list)
Here is the link for my first proposal draft :
https://github.com/abhowmick22/GSoc14-Proposal

I will keep revising it. Seeking feedback in meantime. Thanks all.

Abhishek
3c2c8407d19f3488e9cd2a28a9732bde?d=identicon&s=25 West, Nathan (Guest)
on 2014-03-19 01:10
(Received via mailing list)
Can you enter this through Melange? It should be sufficient to link to
your PDF/repo on Melange.

It's good to see you were able to get control port and oprofile results.

On Sat, Mar 15, 2014 at 4:37 AM, Abhishek Bhowmick
22a2c3bd265d940ceafff9b8dd007255?d=identicon&s=25 Abhishek Bhowmick (Guest)
on 2014-03-19 16:56
(Received via mailing list)
My current hardware doesn't support AVX2. How practical is it to develop
software for AVX2 intrinsics using Intel's SW Development Emulator (and
possible performance testing on a remote machine) ?

Abhishek


On Wed, Mar 19, 2014 at 5:39 AM, West, Nathan
16004c87ebfc271b7165e66d5bc5eb1d?d=identicon&s=25 Moritz Fischer (Guest)
on 2014-03-19 17:06
(Received via mailing list)
Abhishek,

On Wed, Mar 19, 2014 at 4:53 PM, Abhishek Bhowmick
<abhowmick22@gmail.com> wrote:
> My current hardware doesn't support AVX2. How practical is it to develop
> software for AVX2 intrinsics using Intel's SW Development Emulator (and
> possible performance testing on a remote machine) ?

I'm fairly confident we can get you a login to a AVX2 machine if
there's no other option.

Cheers,

Moritz
49c4e5d024c18583bd6ca53aba800d29?d=identicon&s=25 Michael Dickens (Guest)
on 2014-03-19 19:20
(Received via mailing list)
Hi Abhishek - Your proposal is coming along nicely!  I'll 2nd (or, maybe
3rd by now) encouraging you to get your proposal into Melange so that we
can comment on it there more.  My up-front comment is that you probably
want to state that OpenAirInterface is wholly licensed under the GPLv2 (
and hence the relevant portions can be ported to GNU Radio (assuming
this statement is true; licensing is important with discussion porting
programming) ... I think it would be useful for you to include a link to
the specific files in their repo:
https://svn.eurecom.fr/openair4G/trunk/   ... I can't find those files
just by perusing.  Looking forward to more discussion on Melange. - MLD
22a2c3bd265d940ceafff9b8dd007255?d=identicon&s=25 Abhishek Bhowmick (Guest)
on 2014-03-19 19:33
(Received via mailing list)
Thanks. Will address these points. Project proposal is already on
Melange,
actual proposal is uploaded at :
https://github.com/abhowmick22/GSoc14-Proposal

Abhishek


On Wed, Mar 19, 2014 at 11:49 PM, Michael Dickens
<michael.dickens@ettus.com
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.