Losing samples in the flowgraph

Hi all

I think my program loses samples. If I choose for the input of my
flowgraph a file source with a throttle (Rate = 2 MSamples) and for the
output a vector sink, I see in the output vector that samples are loss
(every run differnet datas were loss). The CPU usage is only 40 Percent.

If I reduce the throttle rate to 30 percent. Less or no samples are
loss.

Every test of my blocks were successful and if I start the Flowgraph no
errors occur.

I used this command:
sudo sysctl -w net.core.rmem_max=50000000

I modified the limits.conf file and added this line to the graph:
gr.enable_realtime_scheduling()

I tested the graph with different computers (Ubuntu 10.10) and with the
latest git code. The problem occurs always.

I have no more ideas what the problem could be. Had somebody the same
problem?

Any help will be appreciated.

Thanks
Michael

On 07/24/2011 04:07 AM, Michael H. wrote:

Every test of my blocks were successful and if I start the Flowgraph no
errors occur.

This seems very unexpected.
Can you attach a python script that demonstrates the problem?

I used this command:
sudo sysctl -w net.core.rmem_max=50000000

Are you resizing the udp socket buffer?

-josh

On 07/24/2011 02:03 PM, Josh B. wrote:

Given that the flow-graph in question uses a file source, and a vector
sink, it seems unlikely that net.core.rmem_max has any impact on it.


Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

On 07/25/2011 01:20 AM, Michael Höin wrote:

Thanks for all replies.

This seems very unexpected.
Can you attach a python script that demonstrates the problem?

This is the workflow with the problem:
www.zhaw.ch/~hoim/Flow.py

You realize that you never mentioned in the previous email that you were
using the OFDM blocks. I thought something was fundamentally broken in
gnuradio…

In any case, I wouldnt expect the throttle block and cpu usage to have
any effect on this flow graph. Its just a pure-simulation flow graph.
Perhaps there is a problem in the OFDM blocks. I will have to let the
author comment.

Are you resizing the udp socket buffer?
Is this important? How can I do that?

Are you using udp? Thats only for udp.

-josh

Thanks for all replies.

This seems very unexpected.
Can you attach a python script that demonstrates the problem?

This is the workflow with the problem:
www.zhaw.ch/~hoim/Flow.py

With the command:
diff -u test1.txt test.txt | diffstat
after two runs I searched in the vector-sinks the differences. Typically
blocks of some thousand samples were at the output of the
(self.ofdm_symbole,0) loss.
Completness the sourcecode of my two blocks:
www.zhaw.ch/~hoim/howto_ofdm_symbol_cutter_cc.cc
www.zhaw.ch/~hoim/howto_framestart_detecter_cc.cc
This is the file for the file-source:
http://www.zhaw.ch/~hoim/Record1.dat

Are you resizing the udp socket buffer?
Is this important? How can I do that?

I tried also “sudo nice -n -20 ./Flow.py” but the problem was still the
same.
With the command “top” I see that no memory-use increases or something
like that.

Thanks for your help.
Michael

Hi all

You realize that you never mentioned in the previous email that you were
using the OFDM blocks. I thought something was fundamentally broken in
gnuradio…

I labeled my block “ofdm_symbol_cutter”, but he is not one out of the
standard OFDM blocks. I wrote this block, because I want to isolate the
sample loss problem.
I tested the flow with UHD source and the file source, so it’s not only
a simulation. There was all the times the same flow behaviour. I think
there is a problem with the multi-threading or I understand something
absolutely wrong.

In any case, I wouldnt expect the throttle block and cpu usage to have
any effect on this flow graph. Its just a pure-simulation flow graph.
Perhaps there is a problem in the OFDM blocks. I will have to let the
author comment.
Are you using udp? Thats only for udp.

No, I don’t use udp.

Thanks for your help.
Michael

Hi Marcus

Thanks for reply.

In the example you gave earlier, using a file-source, you run the >
flow-graph for 13 seconds, then call tb.stop(), then harvest the vector
sink. You then make the observation that there are “missing samples”.

Are you actually comparing samples, or simply observing that the >
number of samples harvested from the vector sink is different on a run >
to run basis?

I am absolutely agree with that what you wrote, but I compare samples.
After 13 seconds I wrote the samples (out of the vector-sink) in to a
file (test.txt). Then I restart the flow and save the samples to a
different file (test1.txt).
In the flow-graph this lines do this:
filename = “test.txt”
file = open(filename, ‘w’)
for i in result_data0:
file.write(str(i.imag) + ‘\t’ + str(i.real) + ‘\n’)
file.close()

After this two steps I compare the files with the command:
diff -u test1.txt test.txt | diffstat

This shows me all the differences in this two files. I see that blocks
of some thousand samples are loss from time to time inside the files.

I have no more ideas how I can solve this problem :frowning:

Thanks for your help.
Michael

On Sun, Jul 24, 2011 at 01:07:36PM +0200, Michael Höin wrote:

I tested the graph with different computers (Ubuntu 10.10) and with the
latest git code. The problem occurs always.

I have no more ideas what the problem could be. Had somebody the same
problem?

Hi all,

I think there might be a fundamental problem with GNU Radio. I’ve seen
something like this myself:
When using a file_source w/o throttle as an input, the output can be
inconsistent. I saw this a while ago when demodulating pre-recorded
signals. I knew a file contained N bursts, but the number of bursts
found was random (and they were not always the same!).

My explanation was, simply put, that in the no-throttle mode, when every
block tries to do everything as quickly as possible, there are some
hiccups with the memory allocation, some race condition or something
occurs and samples are lost between blocks. If throttling is active,
the scheduler is forced to distribute processor time more evenly between
blocks, and this problem vanishes.

From one of Michael’s previous posts, I might be able to cook up a flow
graph which demonstrates this error.

MB


Karlsruhe Institute of Technology (KIT)
Communications Engineering Lab (CEL)

Dipl.-Ing. Martin B.
Research Associate

Kaiserstraße 12
Building 05.01
76131 Karlsruhe

Phone: +49 721 608-43790
Fax: +49 721 608-46071
www.cel.kit.edu

KIT – University of the State of Baden-Württemberg and
National Laboratory of the Helmholtz Association

On 07/25/2011 02:30 PM, Hin Michael (hoim) wrote:

I labeled my block “ofdm_symbol_cutter”, but he is not one out of the standard
OFDM blocks. I wrote this block, because I want to isolate the sample loss
problem.
I tested the flow with UHD source and the file source, so it’s not only a
simulation. There was all the times the same flow behaviour. I think there is a
problem with the multi-threading or I understand something absolutely wrong.
In the example you gave earlier, using a file-source, you run the
flow-graph for 13 seconds, then call tb.stop(), then harvest the vector
sink. You then make the observation that there are “missing
samples”.

Are you actually comparing samples, or simply observing that the number
of samples harvested from the vector sink is different on a run
to run basis?

If so, then this is entirely expected behaviour. Linux (and by
implication, Gnu Radio on top) is not, in any way, shape or form, a
“hard real-time” operating system with guaranteed scheduling and
latency. The number of computations completed per unit time
is not entirely deterministic, and furthermore, when you go to sleep
for 13 seconds, you aren’t necessarily guaranteed 13 seconds,
it could be off by a significant factor (at least, significant at the
level of granularity implied by 2Msps). What that means is that the
number of samples delivered to you vector sink, when you call
tb.stop() after 13 seconds of sleeping, is going to be variable.


Marcus L.
Principal Investigator
Shirleys Bay Radio Astronomy Consortium

I solved the problem! (after 10 days :-))

I had not used the “noutput_items” variable in the “general_work”
environment. so my blocks produced uncontrolled outputsamples :frowning:

Sorry about the confusion I produced.
Thanks to all who helped!