Mem leak without add_heap()?

unknown · August 28, 2006, 12:37am

Okay. I went and did a little simple hacking on a 1.8.4 instance of
Ruby
to simply output to stdout some debugging info for various GC/memory
related actions.

Using the info I gathered from that, I tuned a real application that I
have which consistently leaks a small amount of RAM despite having no
object leaks. I have tested that by setting up a signal handler with
which I can have it dump complete object counts at any time. It grows
slowly, but fairly deterministically. Given a certain number of units
of
work that I ask of the code, RAM usage will increase a predictable
amount,
and never seems to go back down, even across very long runtimes (I
recently killed and restarted some processes that had been running since
sometime in 2005) but the object counts are the same. So, that memory
use
is coming from somewhere else.

Anyway, but manually calling GC.start() at a modest interval within the
application, I can prevent rb_newobj() from ever encountering an empty
freelist and having to call garbage_collect() itself. And by doing so
the
freed count stays very consistent and well above FREE_MIN, so add_heap
is
never invoked there. I also put debugging output on each of the other
two
locations add_heap() can be called. It never is.

However, despite this, RAM usage of the process continues to creep
upward.
So. Why? A leak somewhere else, in some usage of
rc_xmalloc/rb_xcalloc/rb_realloc?

Thanks,

Kirk H.

unknown · August 28, 2006, 2:05am

On Mon, 28 Aug 2006 [email protected] wrote:

Anyway, but manually calling GC.start() at a modest interval within the
application, I can prevent rb_newobj() from ever encountering an empty
freelist and having to call garbage_collect() itself. And by doing so the
freed count stays very consistent and well above FREE_MIN, so add_heap is
never invoked there. I also put debugging output on each of the other two
locations add_heap() can be called. It never is.

However, despite this, RAM usage of the process continues to creep upward.
So. Why? A leak somewhere else, in some usage of
rc_xmalloc/rb_xcalloc/rb_realloc?

I’ve been using valgrind on my leaking code, and so far:

==31968== 21,364 bytes in 871 blocks are possibly lost in loss record 23
of 38
==31968== at 0x401A6C2: malloc (vg_replace_malloc.c:149)
==31968== by 0x806A2B5: ruby_xmalloc (gc.c:122)
==31968== by 0x805F41D: scope_dup (eval.c:7971)
==31968== by 0x805FB10: proc_alloc (eval.c:8254)
==31968== by 0x805FBF4: proc_s_new (eval.c:8289)
==31968== by 0x8065F86: call_cfunc (eval.c:5550)
==31968== by 0x805B65E: rb_call0 (eval.c:5692)
==31968== by 0x805BECC: rb_call (eval.c:5920)
==31968== by 0x80575FD: rb_eval (eval.c:3383)
==31968== by 0x8056B21: rb_eval (eval.c:3109)
==31968== by 0x8056FC9: rb_eval (eval.c:3551)
==31968== by 0x805B987: rb_call0 (eval.c:5826)
==31968== by 0x805BECC: rb_call (eval.c:5920)
==31968== by 0x80575FD: rb_eval (eval.c:3383)
==31968== by 0x805B987: rb_call0 (eval.c:5826)
==31968== by 0x805BECC: rb_call (eval.c:5920)
==31968== by 0x805C068: rb_f_send (ruby.h:638)
==31968== by 0x8065F86: call_cfunc (eval.c:5550)
==31968== by 0x805B65E: rb_call0 (eval.c:5692)
==31968== by 0x805BECC: rb_call (eval.c:5920)
==31968== by 0x80575FD: rb_eval (eval.c:3383)
==31968== by 0x8059EA4: rb_yield_0 (eval.c:4897)
==31968== by 0x805A421: rb_yield (eval.c:4979)
==31968== by 0x80B3686: rb_ary_each (array.c:1128)
==31968==
==31968==
==31968== 53,832 (14,496 direct, 39,336 indirect) bytes in 906 blocks
are
definitely lost in loss record 26 of 38
==31968== at 0x401A6C2: malloc (vg_replace_malloc.c:149)
==31968== by 0x80A4179: st_init_table_with_size (st.c:154)
==31968== by 0x80A41B3: st_init_table (st.c:167)
==31968== by 0x806C96F: hash_alloc (hash.c:235)
==31968== by 0x806CABF: rb_hash_s_create (hash.c:328)
==31968== by 0x8065F86: call_cfunc (eval.c:5550)
==31968== by 0x805B65E: rb_call0 (eval.c:5692)
==31968== by 0x805BECC: rb_call (eval.c:5920)
==31968== by 0x80575FD: rb_eval (eval.c:3383)
==31968== by 0x8056FC9: rb_eval (eval.c:3551)
==31968== by 0x8055FAC: rb_eval (eval.c:2851)
==31968== by 0x805B987: rb_call0 (eval.c:5826)
==31968== by 0x805BECC: rb_call (eval.c:5920)
==31968== by 0x80575FD: rb_eval (eval.c:3383)
==31968== by 0x805769E: rb_eval (ruby.h:643)
==31968== by 0x805B987: rb_call0 (eval.c:5826)
==31968== by 0x805BECC: rb_call (eval.c:5920)
==31968== by 0x805C068: rb_f_send (ruby.h:638)
==31968== by 0x8065F86: call_cfunc (eval.c:5550)
==31968== by 0x805B65E: rb_call0 (eval.c:5692)
==31968== by 0x805BECC: rb_call (eval.c:5920)
==31968== by 0x80575FD: rb_eval (eval.c:3383)
==31968== by 0x8059EA4: rb_yield_0 (eval.c:4897)
==31968== by 0x805A421: rb_yield (eval.c:4979)

There is memory leaking.

So far I haven’t had any success making a small test program that
reproduces these results, but they are reliably reproduced with my
complex
piece of code, and if I extrapolate those numbers of lost bytes (which
came about after about 300 units of work) out to the amount of RAM usage
growth that I see after 100000 units, it fits.

Kirk H.