Make test failure at qa_constellation_receiver

Gurdipe_D · April 13, 2012, 8:53am

Hi all,

Recently I have upgraded my gnuradio build to v3.5.3 on several
computers, and I find that on two machines with Ubuntu 11.10, make test
will fail the test qa_constellation_receiver while on the other two with
Fedora 16 all tests are passed.

To investigate the problem, I add one line in the file
gr-digital/python/qa_constellation_receiver.py which just print the
value of constellation, differential, and correct before the assert.
Then I run the script
build/gr-digital/python/qa_constellation_receiver_test.sh. The output
is recorded and attached below. Among them:

test.t41.log is from a Thinkpad T41 with Ubuntu 11.10 32bit installed,
test.t60.log is from a Thinkpad T60 with Ubuntu 11.10 32bit installed,
test.f16.log is from a HP 6531s with Fedora 16 x86_64 installed.

Hope these logs are helpful to diagnose the problem.

PS: As a side note, my debugging line’s output appears later than the
result, which is not the case for screen output. I think this is
probably related to stdout buffering.

PS2: I once changed the REQ_CORRECT to 0.7 on one Ubuntu machine,
and then made the test passed. I wonder if it is a valid fix.

alick

Alick_Z · April 13, 2012, 5:43pm

2012/4/12 Alick Z. [email protected]:

Then I run the script
result, which is not the case for screen output. I think this is
Discuss-gnuradio Info Page

Weird. This test is probabilistic (not ideal really) but the chance
of it failing due to chance should be very small. If it’s
consistently failing on one machine but not another then there’s
something fishy going on, and lowering REQ_CORRECT, while making the
test pass, wouldn’t help finding what’s causing the problem.

First thing I’d do would be to repeat the test a bunch of times and
confirm you’re getting a consistently higher BER on one computer than
another. Then I’d do a complete uninstall and reinstall of gnuradio
on a computer having the issue. If the problem is still there then
it’s probably going to be an unpleasant bug to find.

Has anybody else seen this?

Cheers,
Ben

Alick_Z · April 13, 2012, 8:37pm

On Fri, Apr 13, 2012 at 11:41 AM, Ben R. [email protected] wrote:

value of constellation, differential, and correct before the assert.
PS: As a side note, my debugging line’s output appears later than the
[email protected]
confirm you’re getting a consistently higher BER on one computer than
another. Then I’d do a complete uninstall and reinstall of gnuradio
on a computer having the issue. If the problem is still there then
it’s probably going to be an unpleasant bug to find.

Has anybody else seen this?

Cheers,
Ben

Alick,

I had seen that before, but it was a virtual machine installation of
11.10, 32-bit. I pinged Ben about it, but he didn’t see same results
with a native install of 32-bit 11.10. I couldn’t track down where the
problem was occurring or why, so I chalked it up to a problem with the
VM system.

Again, is anyone else having an issue like this? It appears to be
something really sneaky since most of the qa_constellation tests pass
with exactly the same numbers and results as they are expected to but
1 or 2 strangely deviate.

Thanks,
Tom

Alick_Z · April 13, 2012, 8:37pm

On Fri, Apr 13, 2012 at 08:41:41AM -0700, Ben R. wrote:

it’s probably going to be an unpleasant bug to find.

Has anybody else seen this?

Yep, I’ve seen this–works on my Desktop, doesn’t work on my Atom
netbook. Both Ubuntu. However, I thought this bug was squashed a while
ago. I’ll dig out my netbook this weekend and see if it still occurs.

MB

Karlsruhe Institute of Technology (KIT)
Communications Engineering Lab (CEL)

Dipl.-Ing. Martin B.
Research Associate

Kaiserstraße 12
Building 05.01
76131 Karlsruhe

Phone: +49 721 608-43790
Fax: +49 721 608-46071
www.cel.kit.edu

KIT – University of the State of Baden-Württemberg and
National Laboratory of the Helmholtz Association

Alick_Z · April 14, 2012, 11:21am

On Fri, 13 Apr 2012 13:20:01 -0700, Ben R. wrote:

value of constellation, differential, and correct before the assert.
PS: As a side note, my debugging line’s output appears later than the
result, which is not the case for screen output. I think this is
probably related to stdout buffering.

PS2: I once changed the REQ_CORRECT to 0.7 on one Ubuntu machine,
and then made the test passed. I wonder if it is a valid fix.

It sounds like this is definitely a bug, but it’s hard for me to track
it down because I can’t replicate it. Does the bug still occur if you
set FREQUENCY_OFFSET = 0 in the test case?

Yes the bug still occurs.

I just wrote a simple script to ran the test for 50 times on T60 with
Ubuntu, GNU Radio 3.5.3 equipped, before and after FREQUENCY_OFFSET set
to 0. All failed the test. (n_pass is 0/50 in both sets) Every test’s
output is almost identical in each set except particular test time
length. The ones before FREQUENCY_OFFSET change is basically the same as
test.t60.log already attached. One of the ones with FREQUENCY_OFFSET set
to 0 is attached below. I guess the almost same contents of output is
due to fixed random seed.

I also ran the script on a Dell desktop with Fedora, and the result is
50/50 pass the test. Then I notice one line in the qa python file says
that seed 1234 fails. However, 50/50 are OK with seed 1234 on the Dell
desktop.

alick

Alick_Z · April 14, 2012, 1:41pm

On Sat, Apr 14, 2012 at 05:20:13PM +0800, Alick Z. wrote:

I also ran the script on a Dell desktop with Fedora, and the result is
50/50 pass the test. Then I notice one line in the qa python file says
that seed 1234 fails. However, 50/50 are OK with seed 1234 on the Dell
desktop.

So,

I tried this on a native 32-Bit Ubuntu 11.10, and it fails (tells me
it’s using “Volk machine: sse3_32”.

I also noticed the line that says seed=1234 fails. Also, the seed is set
multiple times in the script. If this is the source of the bug or not,
it should be fixed, because the seed is reset somewhere in the guts of
the test. I’ll post a patch on patch-gnuradio–this eliminates the
problem on my machine, for some reason. If it does the same for you,
this might actually be the solution.

MB

–
Karlsruhe Institute of Technology (KIT)
Communications Engineering Lab (CEL)

Dipl.-Ing. Martin B.
Research Associate

Kaiserstraße 12
Building 05.01
76131 Karlsruhe

Phone: +49 721 608-43790
Fax: +49 721 608-46071
www.cel.kit.edu

KIT – University of the State of Baden-Württemberg and
National Laboratory of the Helmholtz Association

Alick_Z · April 14, 2012, 6:25pm

On Sat, Apr 14, 2012 at 04:40, Martin B. [email protected]
wrote:

I’ll post a patch on patch-gnuradio–this eliminates the
problem on my machine, for some reason. If it does the same for you,
this might actually be the solution.

This was applied on 3.5.3 maint and 3.6.0git master.

Johnathan

Alick_Z · April 14, 2012, 6:40pm

On Sat, 14 Apr 2012 13:40:35 +0200, Martin B. wrote:

due to fixed random seed.

I also noticed the line that says seed=1234 fails. Also, the seed is set
multiple times in the script. If this is the source of the bug or not,
it should be fixed, because the seed is reset somewhere in the guts of
the test. I’ll post a patch on patch-gnuradio–this eliminates the
problem on my machine, for some reason. If it does the same for you,
this might actually be the solution.

MB

I applied the patch and tested it. No failure, yeah!

However, with my debugging line, I can see this:

[…]
constellation: <constellation psk (m=32)> differential: True correct:
0.942307692308
constellation: <constellation psk (m=64)> differential: True correct:
0.772435897436
constellation: <constellation psk (m=2)> differential: True correct:
1.0
[…]

So why 0.77XXX on the second line does not cause the assert to fail?

alick

Alick_Z · April 15, 2012, 7:14am

It appears that each test case is being run twice. I think once to
generate the xml output, and once to generate output for stdout and
tell you whether it failed. The random number generator isn’t being
reseeded at the start of each test so they can produce different
results, so you can sometimes see debug statements that indicate it
should fail, but the output to stdout claims that is passes. I’ve
attached a patch that make each test use it’s own random number
generating object which seems a tidier way to do this, and makes the
tests repeatable.

Since there seem to be a fair number of choices of seed that produce
an error fraction of over 0.2, I’ve also increased the acceptable
error rate in the test to 0.3, as Alick initially suggested.

Alick_Z · April 15, 2012, 7:33am

On Sat, Apr 14, 2012 at 22:13, Ben R. [email protected] wrote:

I’ve
attached a patch that make each test use it’s own random number
generating object which seems a tidier way to do this, and makes the
tests repeatable.

This will have to wait until morning, but I’ll get this on maint and
master.

Johnathan

Alick_Z · April 13, 2012, 10:20pm

2012/4/12 Alick Z. [email protected]:

Then I run the script
result, which is not the case for screen output. I think this is
Discuss-gnuradio Info Page

It sounds like this is definitely a bug, but it’s hard for me to track
it down because I can’t replicate it. Does the bug still occur if you
set FREQUENCY_OFFSET = 0 in the test case?

Alick_Z · April 15, 2012, 11:48pm

On Sun, Apr 15, 2012 at 1:13 AM, Ben R. [email protected] wrote:

Since there seem to be a fair number of choices of seed that produce
an error fraction of over 0.2, I’ve also increased the acceptable
error rate in the test to 0.3, as Alick initially suggested.

Hey Ben and Martin,

Both patches have been applied. It fixed the issues on my 32-bit VM,
too, so it looks like we got it.

Thanks for help!

Tom

Alick_Z · April 16, 2012, 5:15am

On Sat, 14 Apr 2012 22:13:50 -0700, Ben R. wrote:

Since there seem to be a fair number of choices of seed that produce
an error fraction of over 0.2, I’ve also increased the acceptable
error rate in the test to 0.3, as Alick initially suggested.

Issue solved too. Thanks all for your help!

alick

Alick_Z · August 22, 2013, 10:32pm

On Thu, Aug 22, 2013 at 7:27 AM, Curt K.
[email protected]wrote:

I’m building gnuradio (version 3.7.0) for the first time, and the
qa_constellation receiver test failed. I am building in a CentOS 6.4
64-bit
virtual machine, running on an Lenovo Y500. What can I do to help debug?

Try:

$ ctest -V -R qa_constellation_receiver

…to output the failure information.

Alick_Z · August 22, 2013, 11:11pm

[curt@localhost build]$ ctest -V -R qa_constellation_receiver
UpdateCTestConfiguration from
:/home/curt/Downloads/gnuradio-3.7.0/build/DartConfiguration.tcl
Start processing tests
UpdateCTestConfiguration from
:/home/curt/Downloads/gnuradio-3.7.0/build/DartConfiguration.tcl
Test project /home/curt/Downloads/gnuradio-3.7.0/build
Constructing a list of tests
Done constructing a list of tests
Changing directory into
/home/curt/Downloads/gnuradio-3.7.0/build/volk/lib
…
Changing directory into
/home/curt/Downloads/gnuradio-3.7.0/build/gr-digital/python/digital
143/172 Testing qa_constellation_receiver
Test command: /bin/sh
/home/curt/Downloads/gnuradio-3.7.0/build/gr-digital/python/digital/qa_constellation_receiver_test.sh
Test timeout computed to be: 9.99988e+06
Traceback (most recent call last):
File
“/home/curt/Downloads/gnuradio-3.7.0/gr-digital/python/digital/qa_constellation_receiver.py”,
line 174, in
gr_unittest.run(test_constellation_receiver,
“test_constellation_receiver.xml”)
File
“/home/curt/Downloads/gnuradio-3.7.0/gnuradio-runtime/python/gnuradio/gr_unittest.py”,
line 135, in run
os.makedirs(path, 0750)
File “/usr/lib64/python2.6/os.py”, line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: ‘./.unittests/python’
– Process completed
***Failed
Changing directory into
/home/curt/Downloads/gnuradio-3.7.0/build/gr-atsc/lib
…
Changing directory into
/home/curt/Downloads/gnuradio-3.7.0/build/gr-wavelet/python/wavelet

0% tests passed, 1 tests failed out of 1

The following tests FAILED:
143 - qa_constellation_receiver (Failed)
Errors while running CTest

Alick_Z · August 22, 2013, 4:29pm

I’m building gnuradio (version 3.7.0) for the first time, and the
qa_constellation receiver test failed. I am building in a CentOS 6.4
64-bit
virtual machine, running on an Lenovo Y500. What can I do to help
debug?

I 171/172 Test #171: qa_fcd … Passed 0.16
sec
Start 172: qa_classify
172/172 Test #172: qa_classify … Passed 0.32
sec

99% tests passed, 1 tests failed out of 172

Total Test time (real) = 82.56 sec

The following tests FAILED:
143 - qa_constellation_receiver (Failed)
Errors while running CTest
make: *** [test] Error 8

–
View this message in context:
http://gnuradio.4.n7.nabble.com/make-test-failure-at-qa-constellation-receiver-tp18777p43280.html
Sent from the GnuRadio mailing list archive at Nabble.com.

Alick_Z · August 23, 2013, 1:10am

Here are the results. I get a segfault, same as the last time after I
fixed the ownership.

[curt@localhost build]$ ctest -V -R qa_constellation_receiver
UpdateCTestConfiguration from
:/home/curt/Downloads/gnuradio-3.7.0/build/DartConfiguration.tcl
Start processing tests
UpdateCTestConfiguration from
:/home/curt/Downloads/gnuradio-3.7.0/build/DartConfiguration.tcl
Test project /home/curt/Downloads/gnuradio-3.7.0/build
Constructing a list of tests
Done constructing a list of tests
Changing directory into
/home/curt/Downloads/gnuradio-3.7.0/build/volk/lib
…
Changing directory into
/home/curt/Downloads/gnuradio-3.7.0/build/gr-digital/python/digital
143/172 Testing qa_constellation_receiver
Test command: /bin/sh
/home/curt/Downloads/gnuradio-3.7.0/build/gr-digital/python/digital/qa_constellation_receiver_test.sh
Test timeout computed to be: 9.99988e+06
/home/curt/Downloads/gnuradio-3.7.0/build/gr-digital/python/digital/qa_constellation_receiver_test.sh:
line 7: 33977 Segmentation fault (core dumped) /usr/bin/python -B
/home/curt/Downloads/gnuradio-3.7.0/gr-digital/python/digital/qa_constellation_receiver.py
– Process completed
***Failed
Changing directory into
/home/curt/Downloads/gnuradio-3.7.0/build/gr-atsc/lib
…
Changing directory into
/home/curt/Downloads/gnuradio-3.7.0/build/gr-wavelet/python/wavelet

0% tests passed, 1 tests failed out of 1

The following tests FAILED:
143 - qa_constellation_receiver (Failed)
Errors while running CTest

Alick_Z · August 22, 2013, 11:36pm

On Thu, Aug 22, 2013 at 04:10:47PM -0500, Curt K. wrote:

gr_unittest.py", line 135, in run
os.makedirs(path, 0750)
File “/usr/lib64/python2.6/os.py”, line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 13] Permission denied: ‘./.unittests/python’
– Process completed
***Failed

I haven’t seen this specific error before, but did you do a ‘sudo make’
by
accident anywhere? Perhaps chown-ing your build dir might help.

MB

–
Karlsruhe Institute of Technology (KIT)
Communications Engineering Lab (CEL)

Dipl.-Ing. Martin B.
Research Associate

Kaiserstraße 12
Building 05.01
76131 Karlsruhe

Phone: +49 721 608-43790
Fax: +49 721 608-46071
www.cel.kit.edu

KIT – University of the State of Baden-Württemberg and
National Laboratory of the Helmholtz Association

Alick_Z · August 23, 2013, 12:23am

Yes, thanks. Somehow some of the files in the source had root
ownership.
I changed ownership, re-built, and then got a different error message.
So
I am starting the build from scratch again. Will let you know.

Alick_Z · August 23, 2013, 4:27pm

On Thu, Aug 22, 2013 at 7:09 PM, Curt K. [email protected]
wrote:

Constructing a list of tests
line 7: 33977 Segmentation fault (core dumped) /usr/bin/python -B
0% tests passed, 1 tests failed out of 1

The following tests FAILED:
143 - qa_constellation_receiver (Failed)
Errors while running CTest

Granted the constellation_receiver QA code is one of the more complex
tests we have, I’ve never heard of it segfaulting before.

Can you just run ‘ctest -V -R constellation_receiver’ again, maybe a
couple of times, to see if it always segfaults and if it provides more
detailed information about what’s going on?

–
Tom
Visit us at GRCon13 Oct. 1 - 4
http://www.trondeau.com/grcon13