On Jan 12, 2007, at 3:32 PM, MenTaLguY wrote:
On Fri, 2007-01-12 at 05:37 +0900, Young H. wrote:
On Jan 10, 2007, at 1:39 PM, MenTaLguY wrote:
I’ve given up trying to build Ruby with dmalloc support now that I’ve
learned that MacOS X has built-in support for dmalloc-like memory
debugging.
Have you gotten any useful reports from the memory debugging facility?
Yes I have. It showed that huge amounts of memory (500MB in a matter
of minutes) was being used by the realloc() call in
rb_thread_save_context. The call stack is something like
rb_ary_collect (or rb_ary_each in half the cases)
rb_yield
...
rb_callcc
rb_thread_save_context
realloc
(Incidentally, the call sequence rb_thread_schedule →
rb_thread_save_context wasn’t eating up memory.)
I got the same behavior with and without fastthread.
I finally tracked down the memory leak, and it’s in
SyncEnumerator#each rather than in any thread synchronization class.
If I refrain from using SyncEnumerator, then my program’s memory
usage holds steady at around 33MB. Sorry for the wild goose chase,
but Mutex & company definitely have a bad reputation, and they seemed
the most likely candidates. I still need to investigate why exactly
I’m getting such poor behavior from SyncEnumerator#each. I did
notice that at least one other person has had this problem and
reported it to ruby-talk:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/105936
Assuming it’s a fastthread issue, I’m a little suspicious of Queues.
Where and how does your code use them? If we can narrow it down to a
specific class, then it is easier to derive a simple test case.
Even though my memory leak problem seems to be resolved, I still want
to help you to diagnose the crash with fastthread. My app makes
heavy use of threading, so having faster thread synchronization would
be useful for me.
I can now say that the crash only appears to happen when I use
fastthread. I mentioned that there are two types of crashes, a
segfault at exit and an “rb_gc_mark(): unknown data type” error. I
can give you some more information about the former kind of crash. I
can easily reproduce it (and I’ll try to create a small program to
reproduce it later), and I’ve run my app with memory corruption
detection turned on in MacOS X’s malloc. As far as malloc is
concerned, there are NO heap corruptions, overruns, or underruns. I
even tried with MacOS X’s very aggressive libgmalloc (which puts
unwritable virtual memory pages before or after an allocated block),
and also found no heap overruns or underruns.
The segfault at exit happens when mutex objects are finalized by the
GC. Here’s what I get in GDB when I start up my app, wait for it to
do a small amount of work (just enough to exercise fastthread a bit),
and then halt it with ^C, forcing the finalizers to run (note that
the app crashes on normal exit() as well, not just when forced to
quit with SIGINT):
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000005
0x001bb2f8 in free_entries (first=0x1) at fastthread.c:74
74 next = first->next;
(gdb) bt
#0 0x001bb2f8 in free_entries (first=0x1) at fastthread.c:74
#1 0x001bb368 in finalize_list (list=0x603a74) at fastthread.c:85
#2 0x001bb870 in finalize_mutex (mutex=0x603a70) at fastthread.c:227
#3 0x001bc550 in finalize_queue (queue=0x603a70) at fastthread.c:562
#4 0x001bc5b4 in free_queue (queue=0x603a70) at fastthread.c:572
#5 0x0002c8bc in rb_gc_call_finalizer_at_exit () at gc.c:1884
#6 0x00005e5c in ruby_finalize_1 () at eval.c:1549
#7 0x00006048 in ruby_cleanup (ex=1) at eval.c:1584
#8 0x00006274 in ruby_stop (ex=6) at eval.c:1615
#9 0x00006348 in ruby_run () at eval.c:1636
#10 0x00002bdc in main (argc=2, argv=0xbffff874, envp=0xbffff880) at
main.c:46
(gdb) info locals
next = (Entry *) 0x0
(gdb) up
#1 0x001bb368 in finalize_list (list=0x603a74) at fastthread.c:85
85 free_entries(list->entry_pool);
(gdb) p *list
$1 = {
entries = 0x6040b0,
last_entry = 0x0,
entry_pool = 0x1,
size = 0
}
(gdb) up
#2 0x001bb870 in finalize_mutex (mutex=0x603a70) at fastthread.c:227
227 finalize_list(&mutex->waiting);
(gdb) p *mutex
$2 = {
owner = 6308016,
waiting = {
entries = 0x6040b0,
last_entry = 0x0,
entry_pool = 0x1,
size = 0
}
}
(gdb) p/x mutex->owner
$3 = 0x6040b0
(gdb) up
#3 0x001bc550 in finalize_queue (queue=0x603a70) at fastthread.c:562
562 finalize_mutex(&queue->mutex);
(gdb) p *queue
$4 = {
mutex = {
owner = 6308016,
waiting = {
entries = 0x6040b0,
last_entry = 0x0,
entry_pool = 0x1,
size = 0
}
},
value_available = {
waiting = {
entries = 0x0,
last_entry = 0x0,
entry_pool = 0x0,
size = 0
}
},
space_available = {
waiting = {
entries = 0x0,
last_entry = 0x0,
entry_pool = 0x0,
size = 0
}
},
values = {
entries = 0x0,
last_entry = 0x0,
entry_pool = 0x0,
size = 0
},
capacity = 0
}
(gdb)
The invalid values in fastthread’s mutex object is similar to what
we’ve seen in the 2nd type of crash (“rb_gc_mark(): unknown data
type”). I’ll try to create a test program to reproduce this crash at
exit, and since the corruption appears similar, this test program
should hopefully be useful for diagnosing the 2nd type of crash as well.
–Young