Stable on Linux, many crashes on Windows and OS X

Hello,

I want my program to be platform independent (it should work on Linux,
OS X and Windows).
Although it runs stable on Linux (Xubuntu 14.04 amd64), on both Windows
(Win 7 x64) and OS X (10.9) it´s crashing after about a minute.
For OS X I followed the build guide provided (installed using ports,
then uninstalled and compiled from source).

This is the console output:


Mac OS; Clang version 5.1 (clang-503.0.38); Boost_105500;
UHD_003.007.001-0-unknown

gr-osmosdr v0.1.1-10-g9cb023b0 (0.1.2git) gnuradio
3.7.5git-127-g41d08443
built-in source types: file fcd rtl rtl_tcp uhd rfspace
Using device #0 Realtek RTL2838UHIDIR SN: 00000093
Found Elonics E4000 tuner
Using Volk machine: avx_64_mmx_orc
/Users/stefan/.gnuradio/prefs/vmcircbuf_default_factory: No such file or
directory
vmcircbuf_createfilemapping: createfilemapping is not available
gr::vmcircbuf_sysv_shm: shmat(1): Too many open files
gr::vmcircbuf_sysv_shm: shmat(1): Too many open files
gr::vmcircbuf_sysv_shm: shmat(1): Too many open files
gr::vmcircbuf_sysv_shm: shmget(1): Invalid argument
gr::vmcircbuf_sysv_shm: shmget(1): Invalid argument
gr::vmcircbuf_sysv_shm: shmget(1): Invalid argument
gr::vmcircbuf_mmap_shm_open: mmap or shm_open is not available
Segmentation fault: 11

----- Other example -----

Mac OS; Clang version 5.1 (clang-503.0.38); Boost_105500;
UHD_003.007.001-0-unknown

gr-osmosdr v0.1.1-10-g9cb023b0 (0.1.2git) gnuradio
3.7.5git-127-g41d08443
built-in source types: file fcd rtl rtl_tcp uhd rfspace
Using device #0 Realtek RTL2838UHIDIR SN: 00000093
Found Elonics E4000 tuner
Using Volk machine: avx_64_mmx_orc
VOLK: tried to free a non-VOLK pointer
Segmentation fault: 11


Sometimes “VOLK: tried to free a non-VOLK pointer” does not result in a
crash and the program keeps running normally.

For Windows I compiled GNU Radio using Visual Studio 2012 (x64 build).
It´s the same here: It works fine for about a minute, then it crashes.
When I start to debug using Visual Studio, it shows me that the crash
occurred in the free-function (I cannot see the call stack).

Interestingly on Linux it uses the Volk-machine “avx_64_mmx”, on Mac OS
X “avx_64_mmx_orc” (on Windows I don´t know), although it´s the same
computer. Is there a difference between these two and could
“avx_64_mmx_orc” be the reason for the crashes? Can I force GNU Radio to
use another Volk-machine? Any other ideas?

Best regards
Stefan

Hi Stefan - Can you put your program somewhere and post a link to it
(or, if it is an example, tell us which one) – and the command line you
use – so that others can try it out? We might not have your exact
hardware (computer, SDR device), but we can try with what we have and
see if we get similar results. - MLD

ps> When doing the “compile from source” on OSX, did you do “make test”
after doing “make install” to verify that all was well? If not, give
that a try. You still need to install first on OSX because of the way
testing finds libraries – it’s on my queue to fix, but I have other
higher priority items right now.

On Aug 2, 2014, at 8:34 PM, Stefan O. [email protected]
wrote:

I want my program to be platform independent (it should work on Linux,
OS X and Windows).
Although it runs stable on Linux (Xubuntu 14.04 amd64), on both Windows
(Win 7 x64) and OS X (10.9) its crashing after about a minute.
For OS X I followed the build guide provided (installed using ports,
then uninstalled and compiled from source).

Michael D., OSX Programmer
Ettus R. Technical Support
Email: [email protected]
Web: http://www.ettus.com

Hi Michael,
thank you very much for your help. I?ve attached the program in it?s
current state (messy pre-alpha code) to this mail. It?s a GUI
application (wxWidgets). On my system it compiles with build_osx (it?s
just one line and probably won?t work without editing on other systems).
After program is started: File->Open Transmitter Configuration and
select example.csv, GNURadio is started and in a timer loop frequency
will be changed for source and Frequency Xlating filter. On Linux I
tested it for about two hours and no crash.
Of course there could also be something wrong with my program and on
Linux it just doesn?t cause any problem.

It uses OsmoSDR source, so other devices should work as well. Note that
sample rate is currently hardcoded in the source (2,4 MHz).

I did run make test on OS X and it completed with 100% pass.

Best regards
Stefan

Am 03.08.2014 03:37, schrieb Michael D.:

Hi Stefan - OK; well if “make test” worked on OSX for you then the GR
install is probably good to go. No guarantees. If you uninstall your
OO-MacPorts GR install and use gnuradio-devel, then you will should get
the same results (if not better). I’m not sure why you would want an
OO-MacPorts GR install unless you wanted to change GR itself.

After tweaking “build_osx” for my setup, everything builds nicely.
Running “./batmon” I see an empty window and basic menus; nothing else.
I can leave it running (in gdb) for a long time without it crashing. I
don’t know what to do with it otherwise. I have an Ettus B210 hooked
up.

If I select “batmon” -> “About” I see the about box, which when I close
crashes the executable. Looking through the code in main.cpp,
BatMonFrame::OnAbout calls BatMonFrame::StopReceiver, which in turn
calls “gr_tb->stop” even though gr_tb has not been initialized (I set a
breakpoint where it is set, and the code does not get there before this
call). Probably best to check gr_tb before using it, just in case, in
all the methods that call it. Also, I’m not sure why you would stop
processing -after- the about box has been closed; maybe beforehand makes
more sense?

If you let me know what else to test, I’ll try that too. Otherwise,
without actual functionality the basic GUI seems to “work”. - MLD

Michael D., OSX Programmer
Ettus R. Technical Support
Email: [email protected]
Web: http://www.ettus.com

Hi Michael,
thank you very much for testing! The code is in pre-alpha state, and
that the About box stops GnuRadio was just a quick-and-dirty hack for
testing. Normally About should not do anything else than showing the
About Box.

Please try again and open the included example.csv with the program.
That creates and starts the GnuRadio flowgraph and continuously changes
frequency of the hardware and the frequency Xlating filter. You should
see four symbolized batteries, with RF level (and counter since last
update when there is a signal) written above, frequency and
name/description below. In case there is no signal they will be dark
grey, otherwise light grey (and in case the program detects a battery
signal it can decode they will be green/yellow/red depending on the
battery state).

I think I was just able to isolate the problem:

When I only do set_center_freq(frequency_offset) with one Frequency
Xlating filter no crashes. You can test that by removing the first or
second line of the example.csv (so that two frequency never fit within
the 2,4 MHz bandwidth). The other Xlating filters still exists and are
active in the flowgraph, but set_center_freq is never called on them.
Program is stable then.

Just by adding this line:

if((freq_len+1)<bat_decoders.size())
bat_decoders[freq_len+1]->SetFreqOffset((int)(bat_blocks[freq_pos]->GetFreq())-center_freq);

in the for-loop in the very last function (BatMonFrame::OnTimer) will
make the program instable again.
This does nothing else than setting the frequency of the next Frequency
Xlating filter to the same value (if there is one).

I dont see how this can be a bug in my program, I think it has to be a
bug in GNURadio (as it is not predictable and caused by doing the same
thing on two different instances of the same object in very short time
range -> my guess would be a threading problem). Can you confirm that?

I tested this on two different systems (Quad-Core Haswell and Dual-Core
Ivy Bridge mobile), same result for both.

Thank you very much,
Stefan

Am 04.08.2014 15:16, schrieb Michael D.:

Hi Stefan - OK; cool. Loading up that example, it runs for quite a long
time as you describe, before crashing. Here’s the backtrace from gdb
when it crashes. So, yes, it looks like either freq_xlating_fir_filter
has a bug or you’re not using it correctly; I haven’t had time to look
into it further yet. I’d recommend you poke around some more in the
freq_xlating_fir_filter and make sure your usage is correct. If you get
nowhere, ping the list with the specifics of what you’ve tried including
a code-snippet. - MLD
{{{
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000301000009
[Switching to process 75636 thread 0x13603]
0x0000000101dc4c5b in volk_free ()
(gdb) bt
#0 0x0000000101dc4c5b in volk_free ()
#1 0x0000000101208591 in gr::filter::kernel::fir_filter_ccc::set_taps
()
#2 0x00000001012185bf in
gr::filter::freq_xlating_fir_filter_ccf_impl::build_composite_fir ()
#3 0x0000000101218764 in
gr::filter::freq_xlating_fir_filter_ccf_impl::work ()
#4 0x0000000100e5af79 in gr::sync_decimator::general_work ()
#5 0x0000000100e2db00 in gr::block_executor::run_one_iteration ()
#6 0x0000000100e6901d in gr::tpb_thread_body::tpb_thread_body ()
#7 0x0000000100e584be in gr::tpb_container::operator() ()
#8 0x0000000100e582d4 in
gr::thread::thread_body_wrappergr::tpb_container::operator() ()
#9 0x0000000100df9ae6 in boost::function0::operator() ()
#10 0x0000000101f8f625 in boost::(anonymous namespace)::thread_proxy ()
#11 0x00007fff9940c772 in _pthread_start ()
#12 0x00007fff993f91a1 in thread_start ()
}}}

Michael D., OSX Programmer
Ettus R. Technical Support
Email: [email protected]
Web: http://www.ettus.com

Hi Stephan - You’re welcome; glad you found a work-around that fixes the
issue!

I’ll let others speak to whether Volk is thread-safe.

I can say that the scheduler is the same no matter the OS; but, the
actual thread scheduling can be worlds different on different OSs. Thus,
executing on Linux might be stable at the same time as on OSX and Window
would not be; it can happen.

Good luck with your GR work! - MLD

Michael D., OSX Programmer
Ettus R. Technical Support
Email: [email protected]
Web: http://www.ettus.com

On Tue, Aug 5, 2014 at 3:02 PM, Stefan O.
[email protected]
wrote:

Volk, or is Volk not supposed to be thread-safe?
Can you confirm that bug?

Why it worked on Linux before, I´m not sure, either other code in
GnuRadio is used or the scheduler of the OS is somehow different, so
that the problem never occurred.

Thank you very much
Stefan

Stefan,

Great, that explains it. On Linux, you’re using posix_memalign, but on
the
other OS’s, you must be falling through on that into the other
allocation
engine, which does not have any thread safety but really should be.

I’ve created ticket #710 and will get to it soon.

Tom

Hi Michael,

I was able to fix the problem with an extraordinary ugly hack:

I added a global mutex to GnuRadio and protect
gr::filter::freq_xlating_fir_filter_ccf_impl::build_composite_fir with
it, so only one instance can call
gr::filter::kernel::fir_filter_ccc::set_taps at the time.

Stable now on both OS X and Windows. I think its definitely a bug in
Volk, or is Volk not supposed to be thread-safe?
Can you confirm that bug?

Why it worked on Linux before, Im not sure, either other code in
GnuRadio is used or the scheduler of the OS is somehow different, so
that the problem never occurred.

Thank you very much
Stefan

Am 05.08.2014 15:58, schrieb Michael D.:

Yes, its OS X 10.9, but I think I compiled with 10.7 compatibility,
will check what happens when I recompile with the other allocation
engine removed.

Am 05.08.2014 21:27, schrieb Michael D.:

Confirmed its the allocator:
Recompiling with other allocation engine removed did not work.
Removed my hack and compiled with posix_memalign: Stable.
Thank you very much
Stefan

Am 05.08.2014 21:54, schrieb Stefan O.:

On Aug 5, 2014, at 3:17 PM, Tom R. [email protected] wrote:

Great, that explains it. On Linux, you’re using posix_memalign, but on the other
OS’s, you must be falling through on that into the other allocation engine, which
does not have any thread safety but really should be.

I’ve created ticket #710 and will get to it soon.

If Stefan is using OSX 10.8 or 10.9 then he actually is using
posix_memalign – Apple added that function into 10.8+, and I’ve
confirmed that GR uses it, too. - MLD

Michael D., OSX Programmer
Ettus R. Technical Support
Email: [email protected]
Web: http://www.ettus.com

Hi Stefan - I’m guessing that the issue Tom made yesterday <
http://gnuradio.org/redmine/issues/710 > applies, yes? Was the change
to your (local; not GR) code, to use posix_memalign instead of some
other method of memory allocation? I’m guess I’m not all clear on where
the issue actual was … but, I’m glad you got the bug figured out
and/or can work around it without a hack. - MLD

On Aug 5, 2014, at 11:19 PM, Stefan O. [email protected]
wrote:

Confirmed its the allocator:
Recompiling with other allocation engine removed did not work.
Removed my hack and compiled with posix_memalign: Stable.
Thank you very much

Michael D., OSX Programmer
Ettus R. Technical Support
Email: [email protected]
Web: http://www.ettus.com

Hi Michael,
yes, I have not modified my program at all:
I added a #error “Broken allocation engine” to volk_malloc.c at the
beginning of the other allocation engine. I tried to recompile, but
compiler stopped with my error message. As I did not re-run cmake Im
sure this engine was also used before.
Next step I removed my mutex-hack from freq_xlating_fir_filter and
changed the #if for posix_memalign test in volk_malloc.c to #if 1, in
other words posix_memalign is always used. Compiled without an error and
runs stable.
So Im pretty sure bug #710 is the reason.

The only thing Im not sure is: You said you verified that
posix_memalign is used by GnuRadio on OS X. I checked my build process,
I did not add any OS X 10.7 compatibility command, I used exactly the
cmake line from the guide:
http://gnuradio.org/redmine/projects/gnuradio/wiki/MacInstall and
posix_memalign was not used
I think either the #if-clause in volk_malloc.c needs an addition for OS
X >= 10.8. or cmake-file needs to be modified.

Best regards
Stefan

Am 06.08.2014 20:26, schrieb Michael D.:

On Aug 6, 2014, at 7:08 PM, Michael D. [email protected]
wrote:

I will look into the volk_malloc you mention.

I just issued GR pull request 256 “volk: add check for posix_memalign” <
https://github.com/gnuradio/gnuradio/pull/256 > that fixes the use of
“posix_memalign” by checking for it in CMake and then using it if found
in volk_malloc.c – and, thus, is OS agnostic. This will fix things for
OSX 10.8+, maybe 10.7 too (haven’t checked). But, the issue still
remains for OSX 10.6- … which is what GR bug 710 is about <
http://gnuradio.org/redmine/issues/710 >. I’ll leave that to Tom unless
he wants me to fix it. - MLD

Michael D., OSX Programmer
Ettus R. Technical Support
Email: [email protected]
Web: http://www.ettus.com

On Aug 6, 2014, at 6:50 PM, Stefan O. [email protected]
wrote:

yes, I have not modified my program at all:
I added a #error “Broken allocation engine” to volk_malloc.c at the
beginning of the other allocation engine. I tried to recompile, but
compiler stopped with my error message. As I did not re-run cmake Im
sure this engine was also used before.
Next step I removed my mutex-hack from freq_xlating_fir_filter and
changed the #if for posix_memalign test in volk_malloc.c to #if 1, in
other words posix_memalign is always used. Compiled without an error and
runs stable.
So Im pretty sure bug #710 is the reason.

Very good. Thanks for that thorough testing.

The only thing Im not sure is: You said you verified that
posix_memalign is used by GnuRadio on OS X. I checked my build process,
I did not add any OS X 10.7 compatibility command, I used exactly the
cmake line from the guide:
http://gnuradio.org/redmine/projects/gnuradio/wiki/MacInstall and
posix_memalign was not used
I think either the #if-clause in volk_malloc.c needs an addition for OS
X >= 10.8. or cmake-file needs to be modified.

Ah; different beast. I was talking about the “posix_memalign” used by
gnuradio-runtime. GR provides a coarse emulation of this function if
not provided by the OS. You can verify whether the GR install uses this
function or not on OSX via “nm -a
/opt/local/lib/libgnuradio-runtime.dylib | grep memalign” and it should
return nothing for OSs that provided the function; it should return
“posix_memalign” with some other stuff for OSs that do not provide the
function.

I will look into the volk_malloc you mention. Volk is designed to be
separable from GNU Radio, and so it provides it’s own OS-interface
different from GR’s. Thanks for the pointer! - MLD