Thread scheduling for high-sensitivity, high-resolution FFTs

Marcus_DSLeech · January 7, 2009, 12:59am

Let’s say, for the sake of interesting discussion, that I have access to
a beastly multi-CPU 'puter, perhaps based on the
latest Xeon 7460 series CPUs, and a quad-socket server, giving me 24
CPUs, and 48 threads.

I’m interested in wideband, high-sensitivity, high-resolution
spectroscopy.

I envisage a scheme where one thread gathers an FFT-frame worth of data
(let’s say from an USRP2), and doles it out to one
of a number of other threads that are doing FFTs on these frames of
data. All the outputs of the FFT threads are averaged
together to produce a result that is then time averaged.

The existing FFT display schemes in Gnu Radio use a “keep-one-in-n”
scheme for the FFTs, which means that sensitivity
is necessarily lost, due to the spectrum being under-sampled in the
time domain.

Is there existing plumbing to be able to try this sort of thing? [On a
smaller scale–I don’t actually have such a monster server
at my disposal at the moment!!].

–
Marcus L.
Principal Investigator, Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · January 7, 2009, 3:47am

On Tue, Jan 06, 2009 at 06:58:47PM -0500, Marcus D. Leech wrote:

data. All the outputs of the FFT threads are averaged

Is this an SMP machine, or a cluster?

If SMP, it’ll “just work”. If it’s a cluster, you’ll need to figure
out how to partition separate instantiations of GR on each node.

Eric

Marcus_DSLeech · January 7, 2009, 5:21am

Eric B. wrote:

Is this an SMP machine, or a cluster?

If SMP, it’ll “just work”. If it’s a cluster, you’ll need to figure
out how to partition separate instantiations of GR on each node.

Eric

SMP.

Describe how it’ll “just work”??

Maybe I haven’t looked at all the blocks carefully enough, but I can’t
immediately see how to do this.

I basically want a “distributor” block that distributes FFT input frames
to multiple instances of an FFT computation,
across multiple threads, and have the results of the separate FFTs
synchronized.

–
Marcus L.
Principal Investigator, Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · January 7, 2009, 7:05am

Eric B. wrote:

synchronized.

fft1 = …
connect(fft0, (fanin, 0))

Eric

Yes, I came to the same conclusion about gr.interleave trickery myself,
and will have to noodle over it when I’m awake.

–
Marcus L.
Principal Investigator, Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · January 7, 2009, 4:23pm

From what Eric has described (which should work), you have choices. The
choices would be to increase sensitivity by adding the results of the
fft’s
which is the same as producing spectra from long term autocorrelation
which
will increase sensitivity O(n) because it is coherent averaging or if
you
sist on noncoherent averaging sensitivity will be increase O(sqrt(n))
but
only by decreasing the variance of the noise. And/or you can build
larger
fft’s from the intermediate ones via several techniques to increase
frequency resolution. The latter requires more careful organization
because
you do not wish to window the smaller ones if you wish to combine them
later. This would require windowing in the frequency domain and
necessarily
the expense of convolution rather than multiplication.

This is all a fruitful area for experimentation now that we have the new
TPB
scheduler.

Bob

ARRL SDR Working Group Chair
Member: ARRL, AMSAT, AMSAT-DL, TAPR, Packrats,
NJQRP, QRP ARCI, QCWA, FRC.
“And yes I said, yes I will Yes”, Molly Bloom

Marcus_DSLeech · January 7, 2009, 6:05pm

Bob McGwier wrote:

the expense of convolution rather than multiplication.

In the existing FFT sink, many frames are dropped, and the FFT output
bins are averaged using a single-pole
IIR. It takes long integration times for this to be as sensitive as
an FFT in which no input frames are ever dropped.

In my application (SETI), having long integration times hurts, because
of chirp (although one could also de-chirp
the baseband prior to computing spectra).

I’ll conduct some experiments as Eric has suggested, once I get my
quad-core system back in operation.
[Don’t muck with hardware when you’re too sick to cope. That’s how
LGA775 sockets get bent
pins ].

This is all a fruitful area for experimentation now that we have the new TPB
scheduler.

Bob

Indeed

–
Marcus L.
Principal Investigator, Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · January 7, 2009, 8:20pm

Marcus:

What is the time constant on your IIR? If the time constant is longer
than
two FFT’s you are probably losing sensitivity doing noncoherent
averaging.
All I am suggesting is that a computation is in order to determine the
“noise figure” of the two approaches.

Bob

ARRL SDR Working Group Chair
Member: ARRL, AMSAT, AMSAT-DL, TAPR, Packrats,
NJQRP, QRP ARCI, QCWA, FRC.
“And yes I said, yes I will Yes”, Molly Bloom

Marcus_DSLeech · January 7, 2009, 6:39am

On Tue, Jan 06, 2009 at 11:20:48PM -0500, Marcus D. Leech wrote:

SMP.

Describe how it’ll “just work”??

Maybe I haven’t looked at all the blocks carefully enough, but I can’t
immediately see how to do this.

I basically want a “distributor” block that distributes FFT input frames
to multiple instances of an FFT computation,
across multiple threads, and have the results of the separate FFTs
synchronized.

Sounds good. Sorry, I misunderstood your goal.
(I’m working on the guts of FFTW now and have FFT on the brain…)

I think we have the blocks you need to do the fanout and fanin
already. They’re disguised as

gr.deinterleave(itemsize)
gr.interleave(itemsize)

I think you could use them like this:

fft_size = 512

usrp = …

fanout = gr.deinterleave(fft_size * gr.sizeof_complex)
fanin = gr.interleave(fft_size * gr.sizeof_complex)

fft0 = …
fft1 = …
fft2 = …
fft3 = …

connect(usrp, fanout)

connect((fanout, 0), fft0)
connect((fanout, 1), fft1)
connect((fanout, 2), fft2)
connect((fanout, 3), fft3)

connect(fft0, (fanin, 0))
connect(fft1, (fanin, 1))
connect(fft2, (fanin, 2))
connect(fft3, (fanin, 3))

Each block runs in its own thread, so this should run faster on an SMP
machine. Please let us know if this works for you!

[FYI, there are also two extra copies happening in the gr.fft_* block
that I
now know how to remove…]

Eric

Marcus_DSLeech · February 11, 2009, 8:53pm

Here’s a screen shot of FFT output:

The image is typical, in that there are multiple narrow local maxima,
but
the amplitude of these particular peaks is quite a bit higher than I
expect
‘in the field’. I’m trying to work out an efficient way to compute the
locations and relative amplitudes of these maxima. Can someone please
suggest either an algorithm and/or the most similar gnuradio application
that I can examine as candidates for this analyis? Thanks.
Paul M.

Marcus_DSLeech · January 8, 2009, 10:56am

Eric B. wrote:

In your ideal world, how many bins would you like in your FFT?

Eric

The application is SETI, so 10Hz or better resolution over several Mhz
of bandwidth would be ideal.

–
Marcus L.
Principal Investigator, Shirleys Bay Radio Astronomy Consortium

Marcus_DSLeech · January 14, 2009, 8:13pm

I have my ‘900 MHz AM Receiver with AFC’ running satisfactorily, but I’m
not
happy with some of the approaches I ended up using to accomplish the
AFC. I
spliced together most of fftsink2.py and usrp_am_rcvr.py and modified
both
as follows:

A. fftsink2.py
Added code to input_watcher class run method (thread):
1)find highest peak in FFT data arriving via msgq
2)calculate the frequency difference between nominal tuning freq and the
peak, i.e., tuning error or drift.
3)calculate new tune freq and retune the usrp

B. usrp_am_rcvr.py:
Adjusted filter parameters to my requirements.
Added staticmethod access to its topblock class ‘set_freq’ method.
Added global variables to contain information about the single instance
of
this class that is invoked by main.

What I don’t like about the above is using staticmethod to get access to
the
set_freq method of the top block. However, I have not figured out how to
address the single instance of the topblock that exists. Considering the
code snippets below, how can I address the instance of wm_rx_block
(which I
referred to as ‘top block’ earlier) that gets passed to stdgui2?

A little more description of the app:
3 stdgui2 FFTs are in view: FFT of USRP data, FFT of channel filter
output,
FFT of audio
Channel filter output is supposed to have a single peak centered on 0
frequency. Any difference can be used to adjust usrp tuning freq. In
practice Thread-2 handles channel filter data and knows the freq offset.
As
usual, the usrp object is bound to a topblock, an instance of class
wam_rx_block. What I don’t understand is how to address this instance.
You
might be able to tell me by simply looking at the main code below.

I’m sure that you longtime Python and gnuradio folks will consider this
trivial, but I have a new bald spot. Any hints will be appreciated.
Paul M.

Here’s main

if name == ‘main’:
app = stdgui2.stdapp (wam_rx_block, “USRP 915MHz AM ISM RX”)
app.MainLoop ()

And here are the most important bits of wam_rx_block:

class wam_rx_block (stdgui2.std_top_block):

my_usrp = None
my_usrp_subdev = None

def __init__(self,frame,panel,vbox,argv):
    global target_freq, tune_freq, my_usrp, my_usrp_subdev

< option parsing and other standard setup stuff snipped >

    tune_freq = self.freq
    target_freq = tune_freq
    my_IF_freq = self.IF_freq    # allow class method access

    ################# Build main flowgraph ##################

    #TODO: add an AGC after the channel filter and before the

AM_demod

    self.u = usrp.source_c()                    # usrp is data

source
my_usrp = self.u # allow global
access

< more standard setup stuff snipped >

    self.u.set_mux(usrp.determine_rx_mux_value(self.u,
                                               options.rx_subdev_spec))
    self.subdev = usrp.selected_subdev(self.u,

options.rx_subdev_spec)
my_usrp_subdev = self.subdev

< wx GUI setup stuff snipped >

def set_freq(self, usrp_freq):
    """
    Set the center frequency we're interested in.

    @param target_freq: frequency in Hz
    @rypte: bool

    Tuning is a two step process.  First we ask the front-end to
    tune as close to the desired frequency as it can.  Then we use
    the result of that operation and our target_frequency to
    determine the value for the digital down converter.
    """
    r = usrp.tune(self.u, 0, self.subdev, usrp_freq + self.IF_freq)
    #TODO: check if db is inverting the spectrum or not to decide
    # if we should do + self.IF_freq  or - self.IF_freq

    return r

def set_freq_static(usrp_freq):
    global target_freq, tune_freq, my_usrp, my_IF_freq,

my_usrp_subdev

    r = usrp.tune(my_usrp, 0, my_usrp_subdev, usrp_freq +

my_IF_freq)

set_freq_static = staticmethod(set_freq_static)

And, here’s the modified run method for input_watcher in fftsink2, with
the
call to set_freq_static near the end:

def run (self):
    global target_freq, tune_freq

    while (self.keep_running):
        msg = self.msgq.delete_head()  # blocking read of message

queue
# FFT results are prefixed by vector size and number of
frames.
itemsize = int(msg.arg1())
nitems = int(msg.arg2())

        s = msg.to_string()            # get the body of the msg as

a
string

        # There may be more than one FFT frame in the message.
        # If so, we take only the last one
        if nitems > 1:
            start = itemsize * (nitems - 1)
            s = s[start:start + itemsize]

        # String actually contains f.p. numbers, so convert.
        complex_data = fromstring (s, float32)

        # find location and amplitude of highest peak
        self.maxval = max(complex_data)
        self.pk_bin = argmax(complex_data)
        self.avgval = mean(complex_data)

        self.cycles_per_bin = self.sample_rate / self.fft_size

        # Show size of packet, max value, index of max, and mean

value.
# decide if peak is significant
# FIXME: make threshold a parameter and/or dynamic
if ((self.maxval - self.avgval) >= 8):
# 1st vlen/2 bins get peak when center_freq is low,
# with offset maximum = 1/2 of BW at bin vlen/2
# 2nd vlen/2 bins get peak when center_freq is high,
# with offset maximum = 1/2 of BW at bin vlen/2
# Both halves repeat for freq diffs > FFT width / 2.
# For usrp_rate = 64 MHz / 256 = 250 kHz, do bin
(vlen/2)
# represents f = -125 kHz, +125 kHz,…
# If we restrict the range of center_freq to within BW/2
# of carrier, there can be no ambiguity.
# Given a peak located at bin N, solve for frequency:
# cycles_per_bin = usrp_rate/vlen,
# This can be refined by calculating a mean pk_freq
value
# based on several bins,
# starting with the values on either side of the peak:
if self.pk_bin > 0:
left_bin = self.pk_bin - 1
else:
left_bin = (self.fft_size - 1)
if self.pk_bin < (self.fft_size - 2):
right_bin = self.pk_bin + 1
else:
right_bin = 0
left_val = complex_data[left_bin]
right_val = complex_data[right_bin]
# Calculate a weighted average bin location starting
with
# the fraction of a bin to offset the center.
wtd_sum = (right_val - left_val) / (left_val +
right_val)
self.pk_bin += wtd_sum
# Adjust for bottom wraparound
if (self.pk_bin < 0): self.pk_bin += (nitems - 1)
# Adjust for top wraparound
if (self.pk_bin > (self.fft_size - 1)):
self.pk_bin += (1 - self.fft_size)
# Calculate equivalent freq of bin, accounting for
spectral
# folding around center.
if self.pk_bin < self.fft_size / 2:
self.pk_freq = int(self.center_freq +
(self.pk_bin *
self.cycles_per_bin))
else:
self.pk_freq = int(self.center_freq +
((self.pk_bin - (self.fft_size -
1))
*
self.cycles_per_bin))

            ################### RETUNING OCCURS HERE:

####################
# NOTE: For the passband data, pk_freq represents freq
drift.
if self.afc:
if abs(self.pk_freq) > 200:
target_freq = tune_freq + self.pk_freq/3 #
correct for drift
print “retune:”, self.pk_freq, “to”, target_freq
wam_rx_block.set_freq_static(target_freq)

################################################################

        # package the data as a wx event object
        de = DataEvent (complex_data)
        # package this packet as a data event within wx and
        # push this single fft plot data packet into fftsink msgq
        wx.PostEvent (self.event_receiver, de)
        del de # no further need for this copy of data

Marcus_DSLeech · February 11, 2009, 11:36pm

Paul M. schrieb:

Paul M.
I thought Analog TV was going of the air in the US… The difficulty is
that when the FFT is taken
from the real world, and the signal tends to the noise level, most
‘peak’ detection algorithms fail,
well, have lots of noise in the ‘accept’ condition…

So, long term averaging, and ‘knowing’ that a peak would appear at
offset x from the right or
left side of the spectrum will be more likely to succeed.

John C…

Marcus_DSLeech · February 12, 2009, 2:40am

Signal has nothing to do with TV.

Paul M. schrieb:

analyis? Thanks. Paul M.
I thought Analog TV was going of the air in the US… The difficulty is
that when the FFT is taken
from the real world, and the signal tends to the noise level, most
‘peak’ detection algorithms fail,
well, have lots of noise in the ‘accept’ condition…

So, long term averaging, and ‘knowing’ that a peak would appear at
offset x from the right or
left side of the spectrum will be more likely to succeed.

John C…

Marcus_DSLeech · May 19, 2009, 12:47am

About a month ago, our USRP1 Rev. 4.5 (Ser. #2728) became erratic in it
USB
connection and communications. A short while after, USB failed
altogether. I
put scope probes on the USB + and - data lines and observed that one of
the
signals had a much different sort of signal than the other. Following
advice
from Matt E. given to others who had experienced USB failure, I
ordered a
replacement USB chip along with a Chipquik SMD removal kit. Replacement
of
the chip had no effect, so I was forced to look more carefully for a
cause.
It turns out that one of the 0603 capacitors installed near the USB
receptacle (C73, C74) was shorted, and removal of these 2 capacitors
restored USB function. I’m posting this description of my experiences
for
the benefit of others on the list.

I suspect that C73 and C74 (which I can’t find in various schematics and
BOMs for USRP1) were added in the hope that they would provide a path
for
electrostatic discharges, which is ironic. In my experience,
low-capacitance
avalanche breakdown devices are more suitable for this purpose.

I can highly recommend ChipQuik.

Paul M.

Marcus_DSLeech · May 19, 2009, 1:01am

Paul M. wrote:

the benefit of others on the list.

I suspect that C73 and C74 (which I can’t find in various schematics and
BOMs for USRP1) were added in the hope that they would provide a path for
electrostatic discharges, which is ironic. In my experience, low-capacitance
avalanche breakdown devices are more suitable for this purpose.

Paul,

Thanks for your analysis of the situation. C73 and C74 are not actually
capacitors, they are, as you say, electrostatic discharge protectors,
part number PESD0402-060. I have not seen any fail in the manor you
mention, but I suppose enough static could cause that.

Has anyone else seen this failure mode?

Matt

Marcus_DSLeech · May 19, 2009, 1:24am

Thanks for the clarification about C73, C74. The Raychem datasheet
doesn’t
say so directly, I’m guessing that these parts are some form of
varistor. In
my experience, varistors are less reliable than silicon clamping
devices,
both in terms of their dynamic clamping characteristics and their
ultimate
failure characteristics.

http://www.conformity.com/artman/publish/printer_211.shtml

http://www.nxp.com/acrobat_download/applicationnotes/AN10753_1.pdf

I work in a high humidity environment, and I’d be surprised if ESD was
the
cause of failure of these parts, anyway. I suspect that flexing of the
circuit board during USB plugging/unplugging caused cracking. The large
solder fillets on the parts may have contributed to the problem.

Paul M.

Marcus_DSLeech · May 19, 2009, 1:28am

Matt E. wrote:

Matt

Aren’t ESD protectors usually MOVs, and don’t MOVs usually fail by
shorting? They can absorb so many joules of ESD, and
after that they short out.

In live electrical power lines, this usually causes the breaker to
trip.

The other type are GDTs, which fail open. But they don’t fail very
often.

–
Marcus L.
Principal Investigator, Shirleys Bay Radio Astronomy Consortium

Thread scheduling for high-sensitivity, high-resolution FFTs

I’m sure that you longtime Python and gnuradio folks will consider this trivial, but I have a new bald spot. Any hints will be appreciated. Paul M.

I’m sure that you longtime Python and gnuradio folks will consider this
trivial, but I have a new bald spot. Any hints will be appreciated.
Paul M.