Mysterious memory corruption, very confused

ruby 1.8.7-p22, OS X 10.4.mumble, PostgreSQL 8.3.1, ruby-pg 2008-03-18.

I get random data corruption when trying to execute queries.

The data corruption comes and goes VERY unpredictably. I’ve narrowed it
down to a small chunk of the pg.c module, except I don’t understand the
ruby interpreter well enough to say much more.

Here’s what happens:

I pass a whole bunch of arguments in to a prepared statement. In pg.c,
that leads to a loop through nParams, reproduced here in case it means
anything to someone.

for(i = 0; i < nParams; i++) {
param = rb_ary_entry(params, i);
if (TYPE(param) == T_HASH) {
param_value_tmp = rb_hash_aref(param,
sym_value);
if(param_value_tmp == Qnil)
param_value = param_value_tmp;
else
param_value =
rb_obj_as_string(param_value_tmp);
param_format = rb_hash_aref(param, sym_format);
}
else {
if(param == Qnil)
param_value = param;
else
param_value = rb_obj_as_string(param);
param_format = INT2NUM(0);
}
if(param_value == Qnil) {
paramValues[i] = NULL;
paramLengths[i] = 0;
}
else {
Check_Type(param_value, T_STRING);
paramValues[i] = StringValuePtr(param_value);
paramLengths[i] = RSTRING_LEN(param_value);
fprintf(stderr, “%d: %p -> %s\n”,
i, paramValues[i], paramValues[i]);
}
if(param_format == Qnil)
paramFormats[i] = 0;
else
paramFormats[i] = NUM2INT(param_format);
}

    for(i = 0; i < nParams; i++) {
            if (paramValues[i] && !strcmp(paramValues[i], "+rG")) {
                    fprintf(stderr, "got a +rG %p in slot %d\n",
    paramValues[i], i);
                    abort();
            }
    }

Obviously, I added the printfs.

Running this, I get an abort:

0: 0x425af0 -> 102
2: 0x4265c0 -> true
3: 0x4265d0 -> 40.48324
4: 0x4265e0 -> -88.09905
5: 0x432820 -> 102
6: 0x422530 -> 65.2579241765071
7: 0x474ab0 -> 2008-06-17
8: 0x4258f0 -> 14:42:36
got a +rG 0x4265c0 in slot 2

So! Somewhere between the 2nd pass (out of 13 or so) through the first
loop, and the next loop, 0x4265c0 has gotten overwritten with garbage.

This is not specific to boolean data; I have also had it happen on
strings,
but the boolean data was a bit easier to track down. This is a pure
heisenbug, which moves to new data depending on things like “the
contents
of ARGV”.

Can anyone give me a hint as to what I should be looking at? I tried
turning
down compiler optimizations, to no noticable effect. (It moved, but it
moves any time anything changes.) The “T_HASH” case is probably
irrelevant,
as all 12 arguments are strings. About all I can think of is that,
perhaps,
rb_obj_as_string is allocating strings which are getting garbage
collected
before the end of the routine?

I’m afraid I can’t make this bug report much more useful, I don’t really
understand the code. I don’t know how the garbage collector works,
either.

… But interestingly, wrapping the call to the API function this wraps
in GC.disable/GC.enable makes the bug go away. I’ll annotate my
rubyforge
bug, but if anyone here can tell me what I should be doing properly to
tag these things not to be collected until this function is done, I’d
love
to know.

On 2008-06-29, Seebs [email protected] wrote:

I’m afraid I can’t make this bug report much more useful, I don’t really
understand the code. I don’t know how the garbage collector works, either.

… But interestingly, wrapping the call to the API function this wraps
in GC.disable/GC.enable makes the bug go away. I’ll annotate my rubyforge
bug, but if anyone here can tell me what I should be doing properly to
tag these things not to be collected until this function is done, I’d love
to know.

Okay, this is almost certainly wrong, but:

I experimentally added an array of N “VALUE” objects. After the first
hunk of code has obtained the “correct” value, I then stash that value
in the Nth item of the array (or store a 0 there), and call
rb_gc_register_address(&param_string_values[i]);

After calling the postgresql function, I loop through calling
rb_gc_unregister_address(&param_string_values[i]);

The program now runs on the whole data set available to me without
errors;
that’s about 15x as long as it usually made it before.

I’m not saying this is the correct fix, but I think it is pretty good
confirmation that the analysis is right and the garbagec collector is
the
culprit.

I get random data corruption when trying to execute queries.
valgrind might tell you if memory is being tramped.

On Mon, 2008-06-30 at 15:47 +0900, Seebs wrote:

foo[i] = GetStringValue(x);

}

The idiom of using rb_obj_as_string, and then using the value, is common in
the Ruby source. It works. … It works as long as you don’t allocate
anything more before you’re done with it
. What ends up happening is that,
if enough of the objects in question need a new string allocated by
rb_obj_as_string, sooner or later you end up invoking the garbage collector.
Now, since there’s only one x, the garbage collector assumes the current
rb_obj_as_string() return is in use, and the others aren’t. So it might,
if it wants the space, free one… And then the memory gets reused.

Thanks again for the detailed analysis.

To be clear, you’re saying that the new object created by
rb_obj_as_string() can be freed as soon as I allocate any new ruby
object?

Is this documented behavior? To be safe, should I always assume any
object that I allocate in C land lives only until the next object is
allocated (unless it’s referenced by some other object Ruby knows about,
of course)?

Regards,
Jeff D.

On 2008-06-30, Roger P. [email protected] wrote:

I get random data corruption when trying to execute queries.
valgrind might tell you if memory is being tramped.

It is. There’s a loop of
VALUE x;
char **foo = malloc(buncha char *);

for (big list of things) {
x = rb_obj_as_string(y);
foo[i] = GetStringValue(x);
}

The idiom of using rb_obj_as_string, and then using the value, is common
in
the Ruby source. It works. … It works as long as you don’t allocate
anything more before you’re done with it
. What ends up happening is
that,
if enough of the objects in question need a new string allocated by
rb_obj_as_string, sooner or later you end up invoking the garbage
collector.
Now, since there’s only one x, the garbage collector assumes the current
rb_obj_as_string() return is in use, and the others aren’t. So it
might,
if it wants the space, free one… And then the memory gets reused.

With “big list” being about 12 items, about half of which needed new
strings
allocated, this ended up blowing up about once every thousand or two
thousand
runs. Unfortunately for me, I had about 80,000 data points. :slight_smile:

I submitted a more detailed bug report to the ruby-pg project, and I’ve
adopted a workaround (possibly very inefficient) involving an array of
VALUE
objects and rb_gc_{un}register_address. It’s ugly but it eliminates the
bug.

Hi,

At Mon, 30 Jun 2008 15:47:19 +0900,
Seebs wrote in [ruby-talk:306636]:

It is. There’s a loop of
VALUE x;
char **foo = malloc(buncha char *);

for (big list of things) {
x = rb_obj_as_string(y);
foo[i] = GetStringValue(x);
}

It’s your bug.

The idiom of using rb_obj_as_string, and then using the value, is common in
the Ruby source. It works. … It works as long as you don’t allocate
anything more before you’re done with it
. What ends up happening is that,
if enough of the objects in question need a new string allocated by
rb_obj_as_string, sooner or later you end up invoking the garbage collector.
Now, since there’s only one x, the garbage collector assumes the current
rb_obj_as_string() return is in use, and the others aren’t. So it might,
if it wants the space, free one… And then the memory gets reused.

Because you drop the references to the created objects. You
have to keep the objects but not only the pointers.

I submitted a more detailed bug report to the ruby-pg project, and I’ve
adopted a workaround (possibly very inefficient) involving an array of VALUE
objects and rb_gc_{un}register_address. It’s ugly but it eliminates the bug.

VALUE x, array;
char **foo = malloc(buncha char *);

for (big list of things) {
x = rb_obj_as_string(y);
rb_ary_push(array, x);
foo[i] = GetStringValue(x);
}

By keeping the values in an automatic variable `array’, they
are marked and won’t be freed.

On Jul 6, 2008, at 7:15 PM, Nobuyoshi N. wrote:

foo[i] = GetStringValue(x);

if enough of the objects in question need a new string allocated by

for (big list of things) {
x = rb_obj_as_string(y);
rb_ary_push(array, x);
foo[i] = GetStringValue(x);
}

By keeping the values in an automatic variable `array’, they
are marked and won’t be freed.

I am wondering why the strings (returned from rb_obj_as_string) will
be garbage collected but the array will not be garbage collected? Both
have the same local scope, and they are not referenced by any other
ruby object.

Please explain when you have time.

Blessings,
TwP

On Jul 7, 2008, at 10:48 AM, Joel VanderWerf wrote:

}
The ‘VALUE x’ local only protects the current string.

Thanks, Joel! That makes sense.

Blessings,
TwP

On Tue, 2008-07-08 at 01:36 +0900, Tim P. wrote:

I am wondering why the strings (returned from rb_obj_as_string) will
be garbage collected but the array will not be garbage collected? Both
have the same local scope, and they are not referenced by any other
ruby object.

Thanks for mentioning that, I had the same question.

Also, what exactly can I do between:
x = rb_obj_as_string(y)
and a statement that makes “x” safe (e.g. stores a reference in some
other value that’s safe from collection)?

In other words, what ruby routines might invoke the garbage collector,
and thus possibly destroy any un-saved objects that I might have?

Regards,
Jeff D.

Hi,

At Tue, 8 Jul 2008 15:02:58 +0900,
Jeff D. wrote in [ruby-talk:307560]:

Also, what exactly can I do between:
x = rb_obj_as_string(y)
and a statement that makes “x” safe (e.g. stores a reference in some
other value that’s safe from collection)?

Basically, “x” is safe as long as it is refered from an
automatic variable. But if you only use the internal pointer
of it, e.g., RSTRING_PTR and RARRAY_PTR, it may be optimized
out by the compiler. You can prevent this with RB_GC_GUARD
macro.

RB_GC_GUARD(x) = rb_obj_as_string(y);

Tim P. wrote:

On Jul 6, 2008, at 7:15 PM, Nobuyoshi N. wrote:

are marked and won’t be freed.

I am wondering why the strings (returned from rb_obj_as_string) will be
garbage collected but the array will not be garbage collected? Both have
the same local scope, and they are not referenced by any other ruby object.

IIUC, ‘VALUE array’ is a local, hence on stack, hence GC marks it. The
‘VALUE x’ local only protects the current string.