Forum: Ruby-core [Bug #744] memory leak in callcc?

Posted by Roger Pack (Guest)
on 2008-11-11 21:34
(Received via mailing list)
Bug #744: memory leak in callcc?
http://redmine.ruby-lang.org/issues/show/744

Author: Roger Pack
Status: Open, Priority: Normal

from 
http://rubyforge.org/tracker/?func=detail&atid=169...
this code
require 'generator'
loop {  g = Generator.new {|x| (1..3).each {|i| x.yield i}} }

seems to leak for me--I'm not sure if this is expected or not.

Thanks.
Posted by Brent Roman (brentr)
on 2008-11-13 02:49
(Received via mailing list)
Roger,

I've run into a number of issues related to Continuations and MRI's 
garbage
collector,
so I thought I'd have a look at this one.  I investigated the equivalent
(non-generator)
example described at:

http://rubyforge.org/tracker/?func=detail&atid=169...

This:

   loop {@x = callcc {|c| c}}

quickly consumes all of memory.

One of my x86 Linux machines crashed after a couple minutes of running 
this
loop with a Segmentation Fault.  My guess here is that the stack, which
during
GC is unchecked,  got too deep.

What I saw is that the stack during garbage collection became 
rediculously
deep.
(>15000 frames deep in the GC)  Here's a bit of the backtrace:

#11768 0x0806401a in mark_locations_array (x=0xa887004, n=1228) at 
gc.c:437
#11769 0x0805dceb in thread_mark (th=0xa8860f8) at eval.c:7403
#11770 0x0806438e in rb_gc_mark (ptr=59) at gc.c:881
#11771 0x0806401a in mark_locations_array (x=0xa88924c, n=1228) at 
gc.c:437
#11772 0x0805dceb in thread_mark (th=0xa888340) at eval.c:7403
#11773 0x0806438e in rb_gc_mark (ptr=59) at gc.c:881
#...

This looks like rb_gc_mark() got passed a bogus VALUE pointer.
I cannot even unwind the stack to the point where this happened without
gdb itself segfaulting.

Interestinly, the very same Ruby interpreter running on an ARM9 under 
Linux
handles
this case without leaking memory or segfaulting.  So, in answer to your
original
question:

I don't think this behavior is intentional.

And, I plan to spend a bit more time looking into it.
Any hints would be appreciated...

- brent
Posted by Brent Roman (brentr)
on 2008-11-13 09:36
(Received via mailing list)
Roger,

This "leak" appears to be an artifact of MRI's conservative garbage
collector.
Depending upon the compiler options and the target CPU, there may be
unused references to these continuation objects left on the thread's 
stack
when
it is copied in whole to by the callcc method.

In your example, these unused references form a linked list of 
continuations
across the loop iterations.  When the GC tries to mark such a recursive
structure, it
consumes a lot of stack space.  With Ruby 1.68, this leads to a 
segmentation
fault when the stack size exceeds the max allowed (see ulimit -s)
With later versions of Ruby, the mark phase of the GC will "give up"
when the stack grows too large.  This avoids the segfault, as GC 
silently
stops working instead.  At least that's what I think I see in v1.8.7 p72

When I built x86 Ruby using gcc without optimization  (CFLAGS=-g),
even this caused a memory leak:

  loop { callcc {|c| c}}

However, when I rebuilt it with CFLAGS=-O2, the memory leak only 
appeared
when the continuations returned by callcc where assigned to a variable.

When compiled for the ARM9 with gcc CFLAGS=-Os or CFLAGS=-O2, everything
works as
it should.  No leaks observed.  However, when I changed to CFLAGS=-g or
CFLAGS=-O3,
the original example leaks badly.  These tests were performed using x86 
gcc
v3.3.5 and
ARM gcc v3.4.5.

Can anyone suggest debugging techniques to help determine what is 
leaving
the dangling
references to these continuations on Ruby's 'C' call stack?

If we knew what wrote these, we might be able to explicitly clear them 
once
they go out of scope.  This should result in better significantly GC
performance all around.

- brent
Posted by Roger Pack (Guest)
on 2008-11-15 19:03
(Received via mailing list)
Issue #744 has been updated by Roger Pack.


> If we knew what wrote these, we might be able to explicitly clear them 
> once
> they go out of scope.  This should result in better significantly GC
> performance all around.

Yeah I've wondered that too.  Maybe we can have a hackfest for it some 
saturday :)
http://redmine.ruby-lang.org/issues/show/649
is related [and somewhat frustrating to be honest].  My thought is that 
maybe there's a way to "clear the stack" of data that isn't currently 
"useful" and thus clear it of old references [I realize this may be 
hard].

Thoughts?
-=R
----------------------------------------
http://redmine.ruby-lang.org/issues/show/744
Posted by Ken Bloom (Guest)
on 2008-11-17 04:09
(Received via mailing list)
On Sun, 16 Nov 2008 02:59:18 +0900, Roger Pack wrote:

> somewhat frustrating to be honest].  My thought is that maybe there's a
> way to "clear the stack" of data that isn't currently "useful" and thus
> clear it of old references [I realize this may be hard].

I can't reproduce http://redmine.ruby-lang.org/issues/show/649
in Debian's Ruby 1.8 or 1.9.

ruby1.8        1.8.7.72-1
ruby1.9        1.9.0.2-8

(The callcc thing, on the other hand, is broken on Debian's Ruby 1.8)
Posted by Brent Roman (brentr)
on 2008-11-17 04:58
(Received via mailing list)
Roger,

Well, I just summarized the result of this Saturday's hackfest at:

http://rubyforge.org/tracker/?func=detail&atid=169...

The main problem seems to be that 'C'/C++ compilers do not
initialize automatic variables, so one is bound to have old, unused, but
valid pointers left on the stack from previous at any point in time.
 By the way, even this will fix the example leak:

  loop {@x = callcc {|c| c}; 2*6+4}

Pretty silly, but it works for me.  And, the fact it works proves that
the issue is unused references left on the stack.

The behavior is a fundamental design weakness of conservative GC.
We notice it most when managing large and/or highly connected objects
like threads, continuations and large arrays.

One could hack the gcc to force it to initialize automatic variables to 
zero
even though this violates the 'C' langauage spec.  But
I can't help feeling that there must be a better way.

In my on again off again quest to put Ruby on a diet, I'll probably
hack at this a bit more over to coming weeks, initially
on my patched v1.68 interpreter.  Thanks for the redmine link.
Found this hanging off it.  Looks better than the "reachability"
patches to GC:

http://softwareverify.com/ruby/customBuild/memtrac...

Here are some interesting posts about this problem from outside
the Ruby world.  The first is especially relevant:

http://gcc.gnu.org/ml/java/2005-05/msg00265.html
http://www.red-bean.com/guile/guile/new/msg01070.html
http://www.digitalmars.com/rtl/gcdescr.html

- brent
Posted by Martin Duerst (Guest)
on 2008-11-17 08:52
(Received via mailing list)
At 12:54 08/11/17, Brent Roman wrote:

>One could hack the gcc to force it to initialize automatic variables to zero 
>even though this violates the 'C' langauage spec.

I haven't read the spec, but my guess (having worked on other specs)
is that the only thing that the 'C' language spec says is that the
value is undefined. A value that happens to be zero would still be
undefined, as far as I understand.

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp
Posted by Brent Roman (brentr)
on 2008-11-17 13:11
(Received via mailing list)
Martin,

Well.  Ummm.  If a compiler writes zeros then is it not setting the
value of variables that the spec says should remain undefined until
explicitly initialized?

Whether or not it violates the 'C' language spec, I don't know any way 
to
make gcc do this with existing compiler options or pragmas.

Does anyone else?

- brent



Martin Duerst wrote:
> 
> At 12:54 08/11/17, Brent Roman wrote:
> 
>>One could hack the gcc to force it to initialize automatic variables to
zero
Posted by Kurt Stephens (Guest)
on 2008-11-17 15:08
(Received via mailing list)
A common technique is to allocate a reasonably sized array (256-bytes)
on the C stack and zero it before and after each allocation.  This
reduces garbage left on the stack before and after allocation and
possible GC:

void *my_alloc(size_t size)
{
   char zeros[256];
   void *ptr;
   memset(zeros, 0, sizeof(zeros));
   ptr = my_alloc_inner(size);
   memset(zeros, 0, sizeof(zeros));
   return ptr;
}

void *my_alloc_inner(size_t size)
{
   /* may call GC */
}

Might need to put my_alloc_inner() in a separate compilation unit to
avoid inlining.

Kurt
Posted by Brent Roman (brentr)
on 2008-11-28 11:00
(Received via mailing list)
After a couple weeks of long nights and false starts, I feel I may have 
come
up with
a fix for a large class of Ruby memory leak.  The basic technique is a
refinement of the
one Kurt Stephens suggested.  It not only eliminates the leaks in this 
one
liner:

  loop {@x=callcc{|c|c}}

but also in our multi-threaded robotics application.  Our Ruby process 
used
to grow
to 20+ MB during a day long run.  The same run now stays smaller than 
10MB.
On an embedded ARM Linux machine with only 32MB of DRAM, this is a great
result!

The central problem is that gcc (and other compilers) tend to create
sparse stack frames such that, when a new frame is pushed onto the 
stack, it
does not
completely overwrite the one that had been previously stored there.  The 
new
frame gets
activated with old VALUE pointers preserved inside its holes.  These 
become
"live" again
as far as any conservative garbage collector is concerned.  And, viola, 
a
leak is born!

I implemented a scheme for recording the maximum depth of the C stack in
xmalloc and during garbage collection itself.  However, I realized that
there was no point in clearing the stack when it is near its maximum 
depth.
Instead, stack clearing is deferred until CHECK_INTS, as this tends to
happen
between evaluation of nodes, when the stack is likely to be shallower.

At this point
a tight loop quickly zeros the region between the current top of stack, 
as
returned by alloca(0), and the maximum recorded stack extent.  It also
updates
the stack extent so no memory is cleared repeatedly if the stack 
contracts
further.

This paper discusses this and similar techniques:
http://www.hpl.hp.com/personal/Hans_Boehm/gc/paper...

Another related issue is that the style of rb_eval() in eval.c in the
1.8 and 1.6 series causes gcc to emit a especially large and sparse 
stack
frames.
Consider that gcc allocates two pair of stack slots for r and l in
constructs like this:

    switch (nd_type(node)) {
   /* nodes for speed-up(literal match) */
      case NODE_MATCH2:
  {
      VALUE l = rb_eval(self,node->nd_recv);
      VALUE r = rb_eval(self,node->nd_value);
      result = rb_reg_match(l, r);
  }
  break;

  /* nodes for speed-up(literal match) */
      case NODE_MATCH3:
  {
      VALUE r = rb_eval(self,node->nd_recv);
      VALUE l = rb_eval(self,node->nd_value);
....

By the time the compiler's optimizer is allocating stack frame slots, 
all
the block structure
of the original code has been lost in various transformations.
As a result, each rb_eval() call ends up pushing about 4k bytes onto the 
C
stack,
of which less than 20% is even initialized.    This means that:

1)  There is a high probability that old VALUEs from previous frames
     will be resurrected as the stack grows,

2)  The GC must scan a sparse, large stack and mark the many dead object
pointers it contains.

3)  callcc and thread context switches must copy needlessly large stacks

4)  recursive Ruby programs run out of stack space much earlier than 
than
they might otherwise.

When I simply re-factored rb_eval() such that it calls a (non-inline)
function
for each node type it encounters, the total observed C stack size for my
application
was reduced by more than two thirds.  Not surprisingly, threading and
continuation
micro benchmarks and run about 3 - 4 times faster.  However, I expect 
that
benchmarks that
operate repeatedly on a few large, long lived objects will run slower.

Keep in mind that these techniques should improve the performance of 
*any*
garbage
collector that scans the unstructured C stack for valid object pointers. 
It
may
even be relevant for the 1.9 series Ruby, but I'll leave that for those 
more
qualified to determine.

Today, this is implemented only in my heavily patched version of Ruby 
1.6.8.
In the short term, if there's interest,
I can quickly post my hacked 1.6.8 Ruby to an FTP site for others to 
test.

Longer term,
The stack clearing could be supplied as a small patch to the 1.8 series,
however the
refactoring of rb_eval() is probably too large to be attached to an 
email
message
on this list.  I will take the time to produce these patches only if at
least a few people
commit to testing them,  reporting detailed results and suggestions for
improvement here.

- brent
Posted by Nobuyoshi Nakada (nobu)
on 2008-11-30 04:42
(Received via mailing list)
Hi,

At Fri, 28 Nov 2008 18:54:45 +0900,
Brent Roman wrote in [ruby-core:20149]:
> Longer term,
> The stack clearing could be supplied as a small patch to the 1.8 series,
> however the
> refactoring of rb_eval() is probably too large to be attached to an email
> message
> on this list.  I will take the time to produce these patches only if at
> least a few people
> commit to testing them,  reporting detailed results and suggestions for
> improvement here.

In shorter, if you use gcc, can't you try -mpreferred-stack-boundary=2 
option?
Posted by Brent Roman (brentr)
on 2008-11-30 06:15
(Received via mailing list)
Before hacking rb_eval(), I first tried finding some compiler
options that would fill the stack holes.

Decreasing the stack slot alignment requirements
does pack stack somewhat, however, the very sparse stack
frame generated by the huge switch statement in rb_eval() remains
largely unaffected by any compiler options I could find.
These holes still caused the GC to preform poorly for my app and
to fail utterly when presented with:  @x=loop {callcc {|c| c}}

Just have a look at the generated assembler code for rb_eval:
from "gcc -S -O2 eval.c".  The function preamble decrements stack 
pointer by
566 bytes.  Which of those bytes is actually written is determined
by the node type processed.  Most of them remain uninitialized in *all*
cases.
With -mpreferred-stack-boundary=2, rb_eval() starts by decrementing
the stack pointer by 548 bytes.  No much difference.

After factoring, rb_eval() decriments the stack pointer by
only about 20 bytes.  I got best results with these options on x86 gcc
4.3.2:

gcc -mpreferred-stack-boundary=2 -fno-stack-protector
-fno-inline-functions-called-once

Nobu, these are not just 2%-5% memory and time reductions.
For multithreaded applications, the both time and space performance
are significantly improved.  I suspect that some large single threaded
apps will also benefit.  (Maybe even rails?! :-)

There's an opportunity here.  I hope that
the core developers will find time to seriously explore it.

- brent
Posted by Roger Pack (rogerdpack)
on 2008-11-30 07:08
(Received via mailing list)
> After a couple weeks of long nights and false starts, I feel I may have come
> up with
> a fix for a large class of Ruby memory leak.  The basic technique is a
> refinement of the
> one Kurt Stephens suggested.  It not only eliminates the leaks in this one
> liner:

Wow thanks for doing that. I'd say please create a redmine bug for it
[or attach it to an existing].  A patch to 1.8.7 would be sweet :)
A patch for 1.9 would be great too :)

I'd imagine that your system is "better" than just blindly doing a
garbage_collect()
{
clear_stack();
....do normal gc
}
void clear_stack()
{
  a = char[10000];
  memclear(a);
}
?

Thanks!
-=R
Note that I use gcc 3.4.5 I assume that won't be a problem though.
Posted by Brian Candler (candlerb)
on 2008-11-30 12:27
Attachment: sf.c (757 Bytes)
Attachment: sf2.c (972 Bytes)
(Received via mailing list)
The problem can be demonstrated with a very simple program (attached), 
and
it looks to me like a bug in gcc - surely it should overlap stack
assignments for automatic variables which aren't in scope 
simultaneously?

One solution to rb_eval() might be an ugly union at the top of the 
function
(second attachment). But it seems wrong to have to do this just to code
around an implementation problem with one particular compiler, albeit a
ubiquitous one.

Regards,

Brian.
Posted by Brent Roman (brentr)
on 2008-11-30 20:12
(Received via mailing list)
Brian,

Thanks for the very clear demo program to illustrate the problem.
Is there anyone who can run look at the assembler code generated
for this demo by a recent Microsoft or Intel 'C' compiler?

In any case,
I doubt that the gcc maintainers would consider this behavior a bug.
It's been with them from before v3.3.5.  They've known about it for many
years.  They view it is an limitation of their register optimization
techniques
and are more concerned about speeding up the code than shrinking
its stack footprint.  However, for us, larger stacks = slower code due 
to
stack copying and the conservative GC.

The "ugly union" solution would not be sufficient because much of the
stack is occupied by compiler generated temporaries that have no
representation in the 'C' input source.  I did consider such wholesale
code changes, but resisted because they would have been, as you say,
quite ugly and difficult to maintain.

What I did come up with was not ugly at all.  Factor the unwieldy switch
statement of rb_eval() into separate functions to handle each node
type and clear the stack at a few opportune times.  rb_eval() becomes
smaller and more likely to be optimized.  I buried the stack clearing
into macros that already exist.

- brent
Posted by Brent Roman (brentr)
on 2008-11-30 20:40
(Received via mailing list)
Roger,

I already responded in detail to this bug:

http://rubyforge.org/tracker/?func=detail&atid=169...

I just bang on Ruby 1.6.8 for our robotics application.

You seem to already be doing a lot of excellent Ruby testing with 
current
versions.
If I spent a couple days developing these two patches for Ruby 1.8.7,
would you be willing to run
regression tests against them and to report the results here?

I think the small stack clearing patch should improve the GC behavior,
but, by itself, it will likely slow down some apps due to its having
to clear large areas of stack.  I'd expect to see that
slow down mitigated by the larger patch that would refactor rb_eval()
and thereby keep the stack smaller.

The combined patches will likely be large, so I'll just post links to 
them
here.

Would anyone else be willing to test them? ...
Particularly those who have large apps, and/or apps that use multiple
threads or
continuations that seem to leak memory?

- brent

P.S.  I use gcc 3.4.5 for generating code for our embedded ARM targets.
The older compiler generates fewer stack temporaries than the newer 
ones.
Don't rush to update :-)

P.P.S.  The way GC is currently invoked causes it to occur when that 
stack
is already near its maximum depth.  This patch tries to make GC normally
occur is part of CHECK_INTS, when the stack tends to be shallower.
At that point, clearing the stack can be much more effective.
Posted by Brian Candler (candlerb)
on 2008-12-01 10:35
(Received via mailing list)
> What I did come up with was not ugly at all.  Factor the unwieldy switch
> statement of rb_eval() into separate functions to handle each node
> type

Did you replace the whole switch statement with a dispatch table? That
sounds like a sensible thing to do anyway.

OTOH, if this is for ruby 1.8.x, I'm afraid you may not find much 
interest
in such changes while the focus is all on 1.9.

Perhaps worth checking how 1.9's bytecode interpreter stacks up under 
the
same conditions?

OTOH, 1.9 doesn't have callcc anyway, so maybe your application code 
would
need a lot of restructuring to use Fiber instead. I don't know if it's
possible to implement callcc in terms of Fiber.

Regards,

Brian.
Posted by Paul Brannan (cout)
on 2008-12-01 13:30
(Received via mailing list)
On Mon, Dec 01, 2008 at 06:29:00PM +0900, Brian Candler wrote:
> OTOH, 1.9 doesn't have callcc anyway, so maybe your application code would
> need a lot of restructuring to use Fiber instead. I don't know if it's
> possible to implement callcc in terms of Fiber.

1.9 does have callcc (require 'continuation').  It's probably not good
to use it, though.

Paul
Posted by Ezra Zygmuntowicz (Guest)
on 2008-12-01 20:16
(Received via mailing list)
Brent-

  I would love to see a version of these patches against 1.8.6 or
1.8.7. I can test them on a few hundred servers to see what kind of
resource consumption these changes have in larger deployments.

  Awesome work on this. I'm very interetsed in testing this for you.
You can contact me off list if you like or if you want servers to use
to test this on.

Thanks

Ezra Zygmuntowicz
ez@engineyard.com
Posted by Ezra Zygmuntowicz (Guest)
on 2008-12-01 20:19
(Received via mailing list)
On Dec 1, 2008, at 1:29 AM, Brian Candler wrote:

> in such changes while the focus is all on 1.9.
Actually I think you will find a *ton* of interest in this for the
1.8.* branch. There are thousands of production apps that are not
going to move to 1.9 anytime soon and any improvements to 1.8.* thread
and callcc handling like this would be very welcome.

Thanks
Ezra Zygmuntowicz
ez@engineyard.com
Posted by Brent Roman (brentr)
on 2008-12-01 20:54
(Received via mailing list)
Brian,

gcc optimizes the big switch in rb_eval() into a dispatch table both
before and after my factoring of it into separate node handling 
functions.

I realize that the Ruby world has moved on, which is why I'm not
going to bother with more work on this until at least a couple folks
commit to testing it.  The 1.8 series is similar enough to 1.6.8 that
I know I could create a patch patch for it in a few days.  If that 
tested
well,
I might consider trying it with 1.9, but I suspect that would be a lot
more effort.  If 1.9 is using the same GC and gcc as 1.8, then I would
expect that it would benefit from this patch.  However, that remains
to be proven.

Also, 1.9 and its "standard libs" have gotten so large
that they simply won't fit on my target (embedded ARM linux) machines.
The 1.8 core is really not that much bigger than 1.9, I'd just have to
strip away most of its new "standard" libs.
Does anyone know the current status of "Atomic Ruby?"

As Paul as already pointed out, Matz and Koichi kept callcc
in v1.9 Ruby via some very amazing code hardwired into the VM.
It is made accessible after require "continuation".

I've traced the reliability issues with continuations to the fact that
the GC object mark function for them is incorrect, and posted
a patch to fix this in v1.8.6 about a year ago.  That fix was never
implemented
so continuations continue to have a bad wrap.  My own experience is with
them
since than is quite good.  However, Paul Brannan told me that he has had
trouble with them due to their incompatibility with some of the 
non-standard
libraries with which his application links.  (Something about call 
backs, if
I recall correctly)

In any case, Continuations are more general than Fibers.
Fibers can be implemented in terms of continuations quite readily, but
Continuations cannot be implemented in terms of Fibers.

- brent
Posted by Stephen Sykes (Guest)
on 2008-12-01 22:00
(Received via mailing list)
>
>        Actually I think you will find a *ton* of interest in this for the
> 1.8.* branch. There are thousands of production apps that are not going to
> move to 1.9 anytime soon and any improvements to 1.8.* thread and callcc
> handling like this would be very welcome.
>
> Thanks
> Ezra Zygmuntowicz

I would like to second that.  1.8.7 patches would be very interesting 
indeed.

-Stephen
Posted by Brian Candler (candlerb)
on 2008-12-01 23:02
(Received via mailing list)
On Tue, Dec 02, 2008 at 04:47:46AM +0900, Brent Roman wrote:
> I've traced the reliability issues with continuations to the fact that
> the GC object mark function for them is incorrect, and posted
> a patch to fix this in v1.8.6 about a year ago.  That fix was never
> implemented

I know what you mean. My own small patches (just to fix compatibility 
for
uClibc(*)) were also ignored.

This is what I meant when I said "not find much interest": of course the
user base is hugely interested in the development of the robust 1.8 
code.
I'm just unconvinced that the ruby core developers are.

Even now that ruby 1.9 is supposedly no longer a moving target, I 
certainly
have no plans to move to it in any production environment. I just don't 
want
the pain of all those broken libraries and frameworks. Maybe in a year 
or
two.

Regards,

Brian.

(*) I'm interested in resource-limited platforms too. ruby 1.8 installs 
fine
on OpenWrt boxes with 4MB of flash, if you trim the standard libraries a
bit.
Posted by Martin Duerst (Guest)
on 2008-12-02 02:27
(Received via mailing list)
At 06:56 08/12/02, Brian Candler wrote:

>I know what you mean. My own small patches (just to fix compatibility for
>uClibc(*)) were also ignored.

Please don't assume that this was on purpose. With that much going
on, things can easily be lost. Please try again resending the patch,
or even better (now that it exists) use redmine.

Regards,    Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp
Posted by Brian Candler (candlerb)
on 2008-12-02 09:54
(Received via mailing list)
On Tue, Dec 02, 2008 at 10:21:05AM +0900, Martin Duerst wrote:
> At 06:56 08/12/02, Brian Candler wrote:
> 
> >I know what you mean. My own small patches (just to fix compatibility for
> >uClibc(*)) were also ignored.
> 
> Please don't assume that this was on purpose. With that much going
> on, things can easily be lost. Please try again resending the patch,
> or even better (now that it exists) use redmine.

I posted it twice to ruby-core, once to rubyforge tracker and then 
migrated
that to redmine a few weeks ago. There was no response in any of those
locations.

Here are the links:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
http://rubyforge.org/tracker/index.php?func=detail...
http://redmine.ruby-lang.org/issues/show/720

I spent time diagnosing, fixing and reporting this particular problem. 
So
even an explicit rejection of this work would have been better than no
response at all.

As far as I can tell, I've followed the processes documented at
http://www.ruby-lang.org/en/community/ruby-core/

Regards,

Brian.
Posted by Yukihiro Matsumoto (Guest)
on 2008-12-02 10:20
(Received via mailing list)
Hi,

In message "Re: [ruby-core:20207] Re: Promising C coding techniques to 
reduceMRI's memory use"
    on Tue, 2 Dec 2008 17:47:25 +0900, Brian Candler 
<B.Candler@pobox.com> writes:

|Here are the links:
|
|http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
|http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
|http://rubyforge.org/tracker/index.php?func=detail...
|http://redmine.ruby-lang.org/issues/show/720
|
|I spent time diagnosing, fixing and reporting this particular problem. So
|even an explicit rejection of this work would have been better than no
|response at all.

My bad, somehow I (we) missed all of your posts.  I am awfully sorry.
Definitely I will check and merge them if I see no problem, after the
deadline I am facing.  Ping me, if you see no further action after a
week or two.

              matz.
Posted by Yukihiro Matsumoto (Guest)
on 2008-12-02 10:32
(Received via mailing list)
Hi,

In message "Re: [ruby-core:20208] Re: Promising C coding techniques to 
reduceMRI's memory use"
    on Tue, 2 Dec 2008 18:14:33 +0900, Yukihiro Matsumoto 
<matz@ruby-lang.org> writes:

|My bad, somehow I (we) missed all of your posts.  I am awfully sorry.
|Definitely I will check and merge them if I see no problem, after the
|deadline I am facing.  Ping me, if you see no further action after a
|week or two.

I briefly checked soon after the post, and found out that:

  * I missed the original report in the rubyforge tracker
  * after reposting to redmime, I checked in the patch into the 1.9
    trunk, so that 1.9 does not have this problem.
  * then I forgot to apply this one to 1.8.
  * I just checked in to 1.8 head.
  * next 1.8.7 maintenance release or 1.8.8 will not have the problem.

I am sorry.

              matz.
Posted by Yukihiro Matsumoto (Guest)
on 2008-12-02 10:38
(Received via mailing list)
Hi,

In message "Re: [ruby-core:20179] Re: Promising C coding techniques to 
reduce MRI's memory use"
    on Mon, 1 Dec 2008 04:34:12 +0900, Brent Roman <brent@mbari.org> 
writes:

|If I spent a couple days developing these two patches for Ruby 1.8.7, 
|would you be willing to run
|regression tests against them and to report the results here?

We are troubled by the "ghost references from the machine stack"
generated by GCC for years.  We are more than happy to see the patch,
and merge it if it's acceptable.

              matz.
Posted by Brian Candler (candlerb)
on 2008-12-02 13:24
Attachment: webrick-patches.rb (2,69 KB)
(Received via mailing list)
On Tue, Dec 02, 2008 at 06:26:06PM +0900, Yukihiro Matsumoto wrote:
>   * I just checked in to 1.8 head.
>   * next 1.8.7 maintenance release or 1.8.8 will not have the problem.

Many thanks - I hadn't noticed that you had applied the patch to 1.9
already.

While we're at it, I also analysed some issues with WEBrick: is there 
any
interest in these?

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
  -- possible patch in [18565]
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...

I included some suggested patches in those posts, but I really wanted 
some
discussion/feedback on what was the best way forward.

For now I am using a local monkey-patch (attached) which addresses these
issues.

This patch also adds the ability to return a proc as the body of a
HTTPResponse; the proc is passed an output object, and everything 
written to
it is turned into a HTTP chunk. This is an expansion of the patch in
[18460]. It also increases block size from 4K to 16K.

I could rewrite these changes as an actual patch to WEBrick if there is
interest in applying them, and agreement on the solutions I've used.

Regards,

Brian.
Posted by Roger Pack (rogerdpack)
on 2008-12-03 18:48
(Received via mailing list)
On Tue, Dec 2, 2008 at 5:18 AM, Brian Candler <B.Candler@pobox.com> 
wrote:
> http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
>  -- possible patch in [18565]
> http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
> http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
>
> I included some suggested patches in those posts, but I really wanted some
> discussion/feedback on what was the best way forward.

Few people use webrick maybe that's why there's no discussion :)
If they're no in redmine I'd add them there so they don't forgotten
[hopefully the new tracker will help].
Cheers!
-=R
Posted by Roger Pack (rogerdpack)
on 2008-12-03 18:58
(Received via mailing list)
> I just bang on Ruby 1.6.8 for our robotics application.

I was wondering why the older version :)

> You seem to already be doing a lot of excellent Ruby testing with current
> versions.
> If I spent a couple days developing these two patches for Ruby 1.8.7,
> would you be willing to run
> regression tests against them and to report the results here?

Absolutely.  I'll test them against some trivial stuff and a small
rails app and see if they help memory wise and check for speed :)


> P.P.S.  The way GC is currently invoked causes it to occur when that stack
> is already near its maximum depth.  This patch tries to make GC normally
> occur is part of CHECK_INTS, when the stack tends to be shallower.
> At that point, clearing the stack can be much more effective.

I wonder if there are less intrusive ways, like changing [from a 
previous post]

          VALUE l = rb_eval(self,node->nd_recv);
           VALUE r = rb_eval(self,node->nd_value);
           result = rb_reg_match(l, r);
       }
       break;

       /* nodes for speed-up(literal match) */
     case NODE_MATCH3:
       {
           VALUE r = rb_eval(self,node->nd_recv);
           VALUE l = rb_eval(self,node->nd_value);
....
to
...

VALUE l = NULL;
VALUE r = NULL;


           l = rb_eval(self,node->nd_recv);
           r = rb_eval(self,node->nd_value);
           result = rb_reg_match(l, r);
       }
       break;

       /* nodes for speed-up(literal match) */
     case NODE_MATCH3:
       {
           r = rb_eval(self,node->nd_recv);
           l = rb_eval(self,node->nd_value);

[reuse same variable].


Also re: size --doesn't 1.9 have rubygems pre-installed so that it
isn't as large of a standard library? [just pointing out that maybe it
could use some minimizing love still?] :)
Thanks!
-=R
Posted by Brent Roman (brentr)
on 2008-12-03 19:53
(Received via mailing list)
Roger,

I'll be posting a set of patches to 1.8.7 on an ftp server
in a week or so, with a URL to it here.  Thanks for
agreeing to test it.

The "ghost VALUE references" would not be affected
by the code changes you propose.  GCC's optimizer
will just remove your attempts to initialize VALUEs to NULL.  Even if 
you
could prevent that (with volatile, perhaps), there would
remain many uninitialized anonymous temporaries that
you could not even access from the 'C' source code.

- brent

P.S.  The core of 1.9 got a good deal larger due to its more 
sophisticated
VM and support for non-latin languages.  But, in all fairness,
I haven't looked at 1.9 seriously for almost a year now.  Maybe
it could benefit from some "minimizing love" now.
Posted by Michal Babej (Guest)
on 2008-12-03 21:48
(Received via mailing list)
Hello,

On Wednesday 03 December 2008 19:47:01 Brent Roman wrote:
> I'll be posting a set of patches to 1.8.7 on an ftp server
> in a week or so, with a URL to it here.  Thanks for
> agreeing to test it.

I'll definitely try it out, too.
>
> The "ghost VALUE references" would not be affected
> by the code changes you propose.  GCC's optimizer
> will just remove your attempts to initialize VALUEs to NULL.  Even if you
Actually that's not exact, according to my experiments - it optimizes 
away
assignments of NULL to a pointer. VALUE is not a pointer, and it doesn't
optimize away neither NULL nor 0 assignments (i tried with gcc 4.3.2)

Coincidently, GCC 4.4 is supposed to have an optimization for variables 
in
switch (see http://gcc.gnu.org/gcc-4.4/changes.html), but unfortunately, 
if i
understand it correctly it's for constants only (i wonder if it's 
impossible
for variables, or just nobody has written it yet :)

Regards,
-- mb
Posted by Michal Babej (Guest)
on 2008-12-04 18:16
(Received via mailing list)
On Wednesday 03 December 2008 21:42:27 Michal Babej wrote:
> Actually that's not exact, according to my experiments - it optimizes away
> assignments of NULL to a pointer. VALUE is not a pointer, and it doesn't
> optimize away neither NULL nor 0 assignments (i tried with gcc 4.3.2)
Sorry, my bad, was jumping to conclusions too fast. Ignore that :)

-- mb
Posted by Kurt Stephens (Guest)
on 2008-12-05 17:55
(Received via mailing list)
The "initialization holes" that leave potential pointers on the stack 
occur
in the interpreter, any system libraries and the GC itself.  Thus 
clearing
some stack words before and *after* allocation/GC helps, but at an 
obvious cost.

  Keeping stack frames small helps, perhaps moving some data structures 
out of
the C stack into explicit stacks would help there?  A call/cc 
implemenation
that copies less C stack might also reduce leaks and overhead:

http://github.com/kstephens/ll/tree/master/src/ccont

  Recompiling Ruby with flags to reduce initialization holes will not 
help
leaks from appearing in initialization holes in system libraries.  We 
have
some Ruby processes (> 375 MB) that we'd like to keep running longer, 
but are
unable to do so because of leaks.

  I'll help test your patches on 1.8.6.

Kurt
Posted by Roger Pack (rogerdpack)
on 2008-12-13 03:20
(Received via mailing list)
> updates
> the stack extent so no memory is cleared repeatedly if the stack contracts
> further.

This is sweet.  I liked the idea so much I coded my own [perhaps much
smaller, definitely less effective] version.  It only includes the
stack clearing you referred to, and doesn't even monitor "exactly" the
stack size, but approximates it by metering it once every CHECK_INTS.
Ruby seems to run "as fast as normal" with it, and collect better.

In principle, you'd only have to clear the stack once "between each
GC" so if you kept track of which portions of it you'd been able to
clear, you could avoid a few stack clearings :)
I'm not sure exactly how much cpu that would save, though.

This patch also doesn't fix the
 loop {@x=callcc{|c|c}}
aspect [presumably because ruby's green threads copy chunks of the
stack to heap, so they aren't cleaned]--so I'd imagine it's less
effective in multi-threaded codes [but hopefully still helpful].

Look forward to the real patch when it comes in :)

Note that as it is currently, if you run GC.start it also calls
clean_stack, so if you run GC.start when your program is at it "inner
depth [most nested call]" it will notice exactly how deep it is, and
hopefully clean up the stack "all the way" when you ascend out of deep
calls.  I suppose creating a new call "GC.clear_stack" would be
useful.

i.e. GC.start -> GC.start + "clean stack/make a note of how deep the
stack is currently"

With [1] it successfully prevents the string 'a' from not being
garbage collected:
With [2] it successfully collects a few more objects than the unpatched 
does.
I'm not positive how well it works but I think it does.

Enjoy.

-=R

[0] patch: http://wilkboardonline.com/roger/clear_stack_only2.diff

[1] file.rb:

def does_nothing
end
def deep(how, gc = false)
 if(how == 175)
    'a'*1000
 end
 if how == 300
        print "222222deepest"
        GC.start
        print "222222deepest"
 return
 end
 deep(how+1)
 20.times {does_nothing}
end
puts
deep(0)
GC.start
deep(0, true)
count = 0
ObjectSpace.each_object(String) do |s| print s, ' '; count = count+1; 
end
print count

[2] file2.rb:

count = 0
ObjectSpace.each_object{|o| count += 1 }
print count
GC.disable
def go depth
 if depth == 50
   GC.enable
   GC.start
   return
  end
  if(rand(10) == 3)
    a = 'abcd'
    go(depth+1)
    go(depth+1)
  end
  if(rand(10) == 3)
    b = 'abcd'
  end
  go(depth+1)
end

go 0
count = 0
ObjectSpace.each_object{|o| count += 1 }
print count
Posted by Brent Roman (brentr)
on 2008-12-13 18:17
(Received via mailing list)
Roger,

Look for the "real patch" next week.
In fact, there will be at least five patches:

#1:  prevents continuations from segfaulting when they refer to dead 
threads
#2:  limit each thread's stack to its own stack frames (none from other
threads)
#3:  My stack clearing patch
#4:  factor rb_eval() to reduce the size of its stack frame
#5:  replace recursive stack_extend() in eval.c, replace GC.stress with
GC.limit=

My stack clearing patch is quite small, however it does tend to clear
the same areas repeatedly.  The difficultly I had avoiding this was that
one could not know exactly when the GC would occur.  If it always
kept occurring when the stack was deep, clearing the stack just
before GC would have no real effect on the "ghost references" still
on it.  I'd be interested if anyone knows a way to cope with
this without repeated zeroing the stack "just in case" whenever it is
shallow.

In any case, like you, I didn't notice any measurable slowing of Ruby
due to clearing the stack this way -- just much reduced memory
usage.  It may well be that the time for stack clearing is more than
offset by the quicker GC passes.

- brent
Posted by Brent Roman (brentr)
on 2008-12-21 08:41
(Received via mailing list)
I've finally put together the promised set of patches against version
1.8.7-p72 and posted them at:

http://sites.google.com/site/brentsrubypatches

From that page:

Aside from bug fixes, the primary goal of these patches is to reduce the
memory consumption of the 1.8 series Ruby interpreters.  Happily, these 
same
techniques tend also to increase the speed of most applications, but 
speed
increase was not my primary concern.

Each of the six patches below (mbari1-6) fixes a specific problem with 
or
optimizes some facet of the Ruby interpreter.  The patches were intended 
to
be applied in order, starting with official interpreter release
1.8.7-patchlevel72 from ruby-lang.org.  However, you may be able to 
apply
only a subset of them if you don't want a particular feature or
optimization.

Until more people test them, this must all be treated as alpha quality
software. ...

My development environment today is 32-bit Intel x86 Linux compiling 
with
gcc version 4.3.2.  I've tried to keep these patches portable to other
platforms, but will make no such claims until others have tested them 
there.
If you test these under MS-Windows, I'll be interested and try to be
helpful, but I won't be able to verify your results.

Please post any bugs, flames, benchmark results, requests for 
improvement,
etc. to the ruby-core mailing list by replying to this message.
Posted by Brent Roman (brentr)
on 2008-12-21 08:51
(Received via mailing list)
I've finally put together the promised set of patches against version
1.8.7-p72 and posted them at:

http://sites.google.com/site/brentsrubypatches

From that page:

Aside from bug fixes, the primary goal of these patches is to reduce the
memory consumption of the 1.8 series Ruby interpreters.  Happily, these 
same
techniques tend also to increase the speed of most applications, but 
speed
increase was not my primary concern.

Each of the six patches below (mbari1-6) fixes a specific problem with 
or
optimizes some facet of the Ruby interpreter.  The patches were intended 
to
be applied in order, starting with official interpreter release
1.8.7-patchlevel72 from ruby-lang.org.  However, you may be able to 
apply
only a subset of them if you don't want a particular feature or
optimization.

Until more people test them, this must all be treated as alpha quality
software. ...

My development environment today is 32-bit Intel x86 Linux compiling 
with
gcc version 4.3.2.  I've tried to keep these patches portable to other
platforms, but will make no such claims until others have tested them 
there.
If you test these under MS-Windows, I'll be interested and try to be
helpful, but I won't be able to verify your results.

Please post any bugs, flames, benchmark results, requests for 
improvement,
etc. to the ruby-core mailing list by replying to this message.
Posted by Ezra Zygmuntowicz (Guest)
on 2008-12-21 09:02
(Received via mailing list)
These look like awesome patches Brent! Thanks for making them
available. I will play with them over the hol;idays and let me know
what I come up with for some larger apps.

Cheers-
-Ezra



On Dec 20, 2008, at 11:42 PM, Brent Roman wrote:

> memory consumption of the 1.8 series Ruby interpreters.  Happily,  
> 1.8.7-patchlevel72 from ruby-lang.org.  However, you may be able to  
> platforms, but will make no such claims until others have tested  
> -- 
> View this message in context: http://www.nabble.com/-ruby-core%3A19846---Bug--74...
> Sent from the ruby-core mailing list archive at Nabble.com.
>
>

Ezra Zygmuntowicz
ez@engineyard.com
Posted by Brent Roman (brentr)
on 2008-12-21 10:15
(Received via mailing list)
Just finished running the standard regression test suite with both
unpatched and patched versions of 1.8.7.

I think the results are encouraging, but there are a couple issues:

                Process Size Inital/Final       User's CPU time (from 
the
time command)
Unpatched 1.8.7-p72:     30MB/97MB      92 seconds
MBARI 6 atop 1.8.7-p2:   30MB/57MB    100 seconds

The patched version reports one additional failure:
  2) Failure:
test_should_propagate_signaled(TestBeginEndBlock)
[./ruby/test_beginendblock.rb:81]:
<""> expected to be =~
</Interrupt$/>.

1878 tests, 1344988 assertions, 2 failures, 0 errors

real  2m35.696s
user  1m39.422s
sys  0m3.284s

And, the drb test segfaults with the patched version.
(so I removed it for both the patched and unpatched for comparason)

Looks like I also will be playing with these patches over the holidays.

Enjoy,

- brent
Posted by Kurt Stephens (Guest)
on 2008-12-21 11:49
(Received via mailing list)
How difficult to apply to 1.8.6?
Posted by Hongli Lai (Guest)
on 2008-12-21 17:20
(Received via mailing list)
Brent Roman wrote:
> I've finally put together the promised set of patches against version
> 1.8.7-p72 and posted them at:
> 
> http://sites.google.com/site/brentsrubypatches

Awesome work! Very good explanations.
Posted by Roger Pack (rogerdpack)
on 2008-12-22 09:39
(Received via mailing list)
First thanks for doing all that hard work.  I'm sure it's not pleasant
to try and figure this all out, and you seem to have done a very
thorough job :)

A few questions.

> Process Size Inital/Final       User's CPU time (from the time command)
> Unpatched 1.8.7-p72:     30MB/97MB      92 seconds
> MBARI 6 atop 1.8.7-p2:   30MB/57MB    100 seconds

Is this the time to complete test-all?

I wonder why it uses more total time... :) [the RAM usage looks nice
though].  Makes me wish we had similar patches for 1.9, too [running
make test-all on 1.9 for me typically uses like 400MB RSS for some
reason...].


> The patched version reports one additional failure:
>  2) Failure:
> test_should_propagate_signaled(TestBeginEndBlock)
> [./ruby/test_beginendblock.rb:81]:
> <""> expected to be =~
> </Interrupt$/>.

Does it report this consistently?

Interestingly, with 1.8.6 HEAD on mingw currently I get this:

 3) Failure:
test_should_propagate_signaled(TestBeginEndBlock)
[../ruby_1_8/test/ruby/test_beginendblock.rb:83]:
<nil> expected but was
<3>.

> And, the drb test segfaults with the patched version.
> (so I removed it for both the patched and unpatched for comparason)

Maybe you could post a gdb backtrace [in case someone can figure out
what's going on...]

Question--The install instructions mention using
-mpreferred-stack-boundary=2, though in the writeup it says it helps
only slightly--but you recommend it because it still helps?

re MBARI2: gc sometimes segfaults: do you have any examples of how it
does this?  So these old frames are collected but not really--is that
what happens?

re: MBARI3 is it possible to use memzero to forcefully overwrite local
variables [though as you pointed out, it would still leave
temporaries].  Are there any other culprits besides rb_eval [and
doesn't eval get called fairly rarely so this isn't a help for most
progs?]

You mention that after this the callcc stuff should work--do you think
that only applying this one patch should be sufficient for that to
happen?

why remove the dynamic malloc_limit?
One thing you might want to try would be the ruby benchmark suite with
and without [1].

MBARI5 : ruby extends the stack when it needs to thread shift from a
"smaller stack" thread to a larger stack thread, is that right?  After
shifting to a smaller stack might be a good time to clean the stack...

re: MBARI6 question: why are these included with 5 other gc patches?
[besides that they're cool and useful]?  Might be convenient to just
include the 1.9 style syntax by default [I might could come up with a
patch for it].:)

re: sourceref--it might be convenient to tie in with SCRIPT_LINES__
stuff, perhaps [thanks to nobu for pointing out its existence recently
to me].

I suppose my only wish list for these would be that it didn't clear
the stack but once per thread per GC.  I might could help out sometime
with it.

Thanks much for your work on these.  I'll give them a shot on windows
mingw/linux by next weekend.
Cheers.
-=r
[1] http://github.com/acangiano/ruby-benchmark-suite/tree/master
Posted by Brent Roman (brentr)
on 2008-12-22 11:08
(Received via mailing list)
Roger,

I just updated the patches at:

http://sites.google.com/site/brentsrubypatches

to fix the bug that was causing the drb test suite to segfault.

All the test suites now run to completion.

Responses to your questions:

R:  Is this the time to complete test-all?,  What patches for about 
1.9?,
Why slower?

B:  This is the time to complete the command:
        ruby runner.rb
     in the test subdir of the 1.8.7p72 directory.

I suspect that the unpatched interpreter is leaking throughout the 
execution
of the tests.
Process size just keep increasing.  With these patches is stabilizes 
about
1/3 the way through.
These techniques may work with v1.9 as my understanding is that the GC 
is
largely unchanged.

Apps that don't swap context much will be a few percent slower.  Those 
that
do should be faster.  There certainly is more that can be down to 
optimize
the stack clearing.  My initial goal was to plug the memory leaks so 
that
Ruby apps could run for long periods without swapping (or worse).  In
practice, once a Ruby process starts swapping to virtual memory, its
performance degrades much more than a few percent.


R:
> The patched version reports one additional failure:
>  2) Failure:
> test_should_propagate_signaled(TestBeginEndBlock)
> [./ruby/test_beginendblock.rb:81]:
> <""> expected to be =~
> </Interrupt$/>.

   Does it report this consistently?

B:  Funny you should ask that...
No, it does not fail consistently.   Any ideas what's happening here?
It does feel like the same problem you see with or mingw port.


R:  The install instructions mention using
-mpreferred-stack-boundary=2, though in the writeup it says it helps
only slightly--but you recommend it because it still helps?

B:  Yes, stack-boundary=2 helps keep the frames a little smaller.
For a multi-threaded app, this is probably worth the little performance 
hit.
For a single threaded app, it may be better to leave out the
-mpreferred-stack-boundary=2
We need more benchmarking to tell.
Ruby should no longer leak memory regardless.


R:  re:  MBARI2: gc sometimes segfaults: do you have any examples of how 
it
does this?  So these old frames are collected but not really--is that
what happens?

B:  Have a look at this post of mine dated 12/03/07
http://markmail.org/message/jjmqzsxenp7oaojm


R:  re:  MBARI3 is it possible to use memzero to forcefully overwrite 
local
variables [though as you pointed out, it would still leave
temporaries].  Are there any other culprits besides rb_eval [and
doesn't eval get called fairly rarely so this isn't a help for most
progs?]

B:
I suspect memzero would be slower than the tight loop I have zeroing the
stack now.
In any case, the temporaries are critically important.  rb_eval is the 
800
pound gorrilla :-)


R:
You mention that after this the callcc stuff should work--do you think
that only applying this one patch should be sufficient for that to
happen?

B:
I think so.  However, I'd recommend installing at least MBARI2 as well 
to
improve performance.


R:
why remove the dynamic malloc_limit?

B:
Because I believe the malloc_limit should be tuned for your target
environment.
In a target with 32MB DRAM, malloc_limit should not be 8MB and I 
certainly
don't want it to increase on its own.  Remember, once Ruby starts 
swapping,
performance goes into the toilet.
I probably won't be motivated enough to benchmark it.  A few percent run
time change does not matter much to me.  I want my app to run for months 
at
a time and to play nice with others.


R:
MBARI5 : ruby extends the stack when it needs to thread shift from a
"smaller stack" thread to a larger stack thread, is that right?  After
shifting to a smaller stack might be a good time to clean the stack...

B:
The MBARI3 patch updates the stack extent at a number of points, 
including
on every context switch, but it defers clearing it until the next
CHECKINTS(), when the stack is likely to be smaller still.  Even so,
optimizing this further is definitely possible.  I've considered only
clearing the stack after GC.increase rises to 75% of GC.limit, for 
instance.


R:
re: MBARI6 question: why are these included with 5 other gc patches?
[besides that they're cool and useful]?  Might be convenient to just
include the 1.9 style syntax by default [I might could come up with a
patch for it].:)

B:
MBARI6 probably should have been packaged separately.
My __line__ and __file__ patches predate the 1.9 stuff by about 5 years.
See:
http://markmail.org/message/ybrbhvvzlhyv552y
I did think of redoing them in the 1.9 style, but I don't particularly 
like
the idea
of returning an array in this context, where numbered indices replace 
named
attributes.
In any case, I can emulate the 1.9 style methods with a tiny bit of Ruby
glue.


R:
I suppose my only wish list for these would be that it didn't clear
the stack but once per thread per GC.  I might could help out sometime
with it.

B:
That's on my wish list too.  I'd be very grateful for any help, even 
just
discussing ideas.

- brent
Posted by Hemant Kumar (gnufied)
on 2008-12-22 12:12
(Received via mailing list)
Hey Brent,

Thanks for patches man. I am yet to dig deeper, but I benchmarked
rails against them:

Here is the Average request/response for patched version:

Requests per second:    234.77 [#/sec] (mean)
Time per request:       42.594 [ms] (mean)
Time per request:       4.259 [ms] (mean, across all concurrent 
requests)
Transfer rate:          108.82 [Kbytes/sec] received

Memory usage stayed around 30MB

For Stock Ruby version:

Requests per second:    138.48 [#/sec] (mean)
Time per request:       72.214 [ms] (mean)
Time per request:       7.221 [ms] (mean, across all concurrent 
requests)
Transfer rate:          64.21 [Kbytes/sec] received


Memory usage stayed around 53 MB

I compiled both ruby versions without "--disable-pthread" and was
wondering if your patches modify anything there.



On Mon, Dec 22, 2008 at 3:29 PM, Brent Roman <brent@mbari.org> wrote:
>
> of the tests.
> performance degrades much more than a few percent.
>   Does it report this consistently?
> B:  Yes, stack-boundary=2 helps keep the frames a little smaller.
>
> B:
>
> environment.
> "smaller stack" thread to a larger stack thread, is that right?  After
> R:
> I did think of redoing them in the 1.9 style, but I don't particularly like
> with it.
> View this message in context: http://www.nabble.com/-ruby-core%3A19846---Bug--74...
> Sent from the ruby-core mailing list archive at Nabble.com.
>
>
>



--
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.

http://gnufied.org
Posted by Michael Selig (Guest)
on 2008-12-23 00:39
(Received via mailing list)
On Mon, 22 Dec 2008 20:59:05 +1100, Brent Roman <brent@mbari.org> wrote:

> I suspect memzero would be slower than the tight loop I have zeroing the
> stack now.

In my experience on x86 architecture using GCC, "memset(p, 0, len)" is
substantially faster than a tight loop (between 2 & 10 times faster
depending whether the loop is byte-by-byte or word-by-word). This is
because GCC knows to optimize "memset" inline to a single instruction 
(or
close to it).

Mike
Posted by Brent Roman (brentr)
on 2008-12-23 05:36
(Received via mailing list)
My patches don't mess with any of the pthread stuff.

I'm a pleasantly surprised by your rails benchmark results.
I would have expected this memory savings, but I can't think of why a 
single
threaded application like Rails (that doesn't use Continuations), would 
see
the sort of speed up you observed.  I'd expect it to be 2 to 10 percent
slower unless it was doing a lot of context switches.

I did get an off list response from a chinese website that confirms the
rails memory savings, but
they said there was no change in speed.

Does your Rails application use threads or continuations?

Are you comparing ruby built from the same source tarball with the same
compiler options before and after patching?

- brent
Posted by robbin (Guest)
on 2008-12-23 07:37
(Received via mailing list)
Hi, Brent:

   I have test MBARI patch on JavaEye.com (http://www.javaeye.com) , 
that is
a chinese software development community website which has 200,000 
members
and 800,00 pageviews per day. JavaEye is written by Ruby on Rails and
running with lighttpd/fastcgi mode. The server environment: AMD64 
machine,
SuSE Linux x86-64, ruby 1.8.7-p72 and Rails 2.1.2.

   I test Rails app performance and memory usage with 4 ruby implements:

   1. ruby MRI  1.8.7-p72

   2. ruby 1.8.7-p72 with Railsbench GC patch and set GC variables 
below:
          RUBY_HEAP_MIN_SLOTS=600000
          RUBY_HEAP_SLOTS_INCREMENT=600000
          RUBY_HEAP_FREE_MIN=100000
          RUBY_GC_MALLOC_LIMIT=60000000

   3. ruby 1.8.7-p72 with MBARI patch.

   4. ruby 1.8.7-p72 with MBARI patch but I modified GC variables in 
gc.c
same as above.


   Test one: Simple Rails app

    I create a simple rails app to test rails routes and template 
rendering:

 ab -c 1 -n 1000 http://localhost:3000/test/index

ruby version                          performance      memory
-----------------------------------------------------
ruby                                    106 request/s     39MB
ruby GC patch                       125 request/s     60MB
ruby MBARI patch                  160 request/s     35MB
ruby MBARI merge GC patch    173 request/s      60MB

 Test One Summary: MBARI patch save a little memory than MRI but improve
rails performance significantly.

   Test two: Real Rails website test

    I select two typical page on JavaEye.com to benchmark:

    Page 1 : http://robbin.javaeye.com/

ab -c 1 -n 100 http://robbin.joinnet.cn/

ruby version                 performance      memory
-----------------------------------------------------
ruby                                  1.69 request/s    136MB
ruby GC patch                     2.81 request/s    179MB
ruby MBARI patch                1.96 request/s    103MB
ruby MBARI merge GC patch   2.90 request/s    158MB

     Page 2: http://robbin.joinnet.cn/blog/283992

ruby version                        performance      memory
-----------------------------------------------------
ruby                                  2.20 request/s    136MB
ruby GC patch                     3.61 request/s    179MB
ruby MBARI patch                2.47 request/s    103MB
ruby MBARI merge GC patch  3.73 request/s    158MB

Test Two Summary:

   1. MBARI patch not only save a lot of memory than MRI but also 
improve
rails performance about 13%
   2. MBARI merge with Railsbench GC patch win others with the highest 
rails
performance and save some memory than Railsbench GC patch.

 My suggest:

    1. MBARI patch has some uncompatible with complicated Regexp. for
example, I met this  error: premature end of regular expression: /0ãk
\000\000\000\000x/  On line #12 of blog/index/_blog.rhtml

    2. I wish MBARI merge Railsbench GC patch, because Railsbench GC 
patch
has a lot of rails performance improvement on JavaEye.com website.

    3. I expect MBARI merge into ruby trunk :)
Posted by Hemant Kumar (gnufied)
on 2008-12-23 08:10
(Received via mailing list)
Hi

On Tue, Dec 23, 2008 at 9:57 AM, Brent Roman <brent@mbari.org> wrote:
> rails memory savings, but
> they said there was no change in speed.

Are you talking about (http://www.javaeye.com)?

>
> Does your Rails application use threads or continuations?

No, I was just benchmarking a hello world rails application.

>
> Are you comparing ruby built from the same source tarball with the same
> compiler options before and after patching?

Yes. Essentially before patching and after patching.
Posted by Brent Roman (brentr)
on 2008-12-23 08:17
(Received via mailing list)
Hi Robbin,

You are the second to observe these patches improving Rails performance.
I really did not expect this.  All I can suppose is that the smaller 
call
stack caused
by the MBARI4 patch is saving more GC time than is spent by the stack
clearing of
the MBARI3 patch.  Someone running rails would have to instrument their 
code
to record the total time spent in GC in order to prove or disprove this.

Regarding your regex failure:
There was a bug in the patches originally posted to the website on the
December 19th.
It was corrected yesterday.  If the output of ruby -v is:

ruby 1.8.7 (2008-12-19 MBARI 6 on patchlevel 72) ...

You have downloaded the original version with the bug.
If so, please download the patches again and retest.
ruby -v should output:

ruby 1.8.7 (2008-12-21 MBARI 6 on patchlevel 72) ...

If you can get the regex problem to occur with the latest patches,
please try to create and post a self contained test program that
demonstrates it.

Thanks for your benchmarks,

- brent
Posted by robbin (Guest)
on 2008-12-23 08:40
(Received via mailing list)
I met Regexp error on MBARI 2008-12-21 version, which occur when we use 
Rails
sanitize helper to format html fragments. But I haven't replay this 
error
yet. If I focus it, I will report to you.
Posted by Brent Roman (brentr)
on 2008-12-24 08:42
(Received via mailing list)
Mike,

Certainly, if one copies byte-at-a-time, performance will be awful.
I'm copying aligned words one ruby VALUE sized word at a time.

As an experiment, I tried substituting memset for my tight stack 
clearing
loop...

and discovered that memset() is actually quite a large function,
and gcc does not inline it.  It is large because,  in this context, the
compiler
cannot tell that the pointers are already long-word aligned and that we
are copying an integer number of long words.  So it emits code to copy
bytes on either end.  And, since we're trying to clear memory from
the current stack pointer down, we must also add a kludgey offset to 
avoid
wiping memset()'s own stack frame.

If anyone else wants to try this on an x86, in rubysig.h, change:

#define __stack_zero_down(end,sp)  while (end <= --sp) *sp=0
to:
#define __stack_zero_down(end,sp) \
  if (sp-6 > end) memset(end, 0, (void *)(sp-6)-(void*)end)

My tiny "bogus1" and "bogus2" show no measurable improvement, but 
perhaps it
might
help for a larger application.

On the other hand...
Very recently, folks who've looked into this far more intensively than
I concluded that an unrolled 'C' loop was better than the venerable

  rep stols

assembly instructions used by x86 gcc's __built_in_memset().  See:

http://sourceware.org/ml/newlib/2008/msg00286.html

They note that microcoded instructions are slower than simple ones for
the modern x86 (RISC-ish) execution cores.  The fastest way to clear
memory these days is supposedly to use MMX instructions.
(I'm not going there, but I welcome others to explore where that might 
lead
:-)

- brent
Posted by Michael Selig (Guest)
on 2008-12-25 05:36
(Received via mailing list)
On Wed, 24 Dec 2008 18:32:57 +1100, Brent Roman <brent@mbari.org> wrote:

> As an experiment, I tried substituting memset for my tight stack clearing
> loop...
>
> and discovered that memset() is actually quite a large function,
> and gcc does not inline it.  It is large because,  in this context, the
> compiler
> cannot tell that the pointers are already long-word aligned and that we
> are copying an integer number of long words.  So it emits code to copy
> bytes on either end.

Try using the gcc option "-minline-all-stringops". I think that should
force memset (and other stuff) to be inlined.


> They note that microcoded instructions are slower than simple ones for
> the modern x86 (RISC-ish) execution cores.  The fastest way to clear
> memory these days is supposedly to use MMX instructions.
> (I'm not going there, but I welcome others to explore where that might  
> lead

Thanks for this reference. I got the impression that he was saying that:
- Memset on GCC 3.4 could be slower than his C tight loop when working 
on
unaligned data. However I thhink that this may be fixed in GCC 4.
- "rep stosl" was fastest when working on 8-byte aligned data on some 
x86
platforms. His assembly patch seems to set the first few bytes until it
gets to an address divisible by 8, then uses "rep stosl" from there. I
think GCC 4.3.2 seems to do 4 byte aligned copies using "rep stosl" when
inlined.
However his code ALWAYS did a function call to memset or a version of 
it,
so it is not clear whether the function call overhead makes much
difference compared to inlining the memset call.

The fact that you didn't notice much difference between the C loop and a
function call to memset() seems to imply that this optimization may not 
be
all that important to ruby stack clearing. It really depends on how 
often
it is called, and how much it is clearing at a time. It is probably 
worth
benchmarking a little more, but I may be barking up the wrong tree here!

Cheers
Mike
Posted by Brent Roman (brentr)
on 2008-12-25 07:18
(Received via mailing list)
I just had a quick play with the gcc option "-minline-all-stringops".
It was definitely a step in the right direction.

Because it in-lined the memset, I could safely remove the offset kludge
(as there was no longer a memset() stack frame to preserve)

But, the compiler still emitted (useless) code to longword align after 
the
main block of the memset operation.  This reformulation of the macro
eliminates that (and removes the offset):

#define __stack_zero_down(end,sp) \
  if (sp > end) memset(end, 0, (sp-end)*sizeof(VALUE))

Now the generated code looks quite clean:

  movl  %edx, %ecx
  subl  %edi, %ecx
  andl  $-4, %ecx
  cmpl  $4, %ecx
  jb  .L1508  ;skip if sp<=end
  shrl  $2, %ecx
  xorl  %eax, %eax
  rep stosl

However, I still don't see any improvement on my little benchmarks.
If someone comes up with an app or test case where these patches appear
to slow things down, then I'll ask them to try this alternative and
perhaps we'll see an improvement.

I'm leery of this technique because, if you omit -minline-all-stringops, 
one
must offset the stack pointer for the size of the memset() frame to 
preserve
it, otherwise the memset causes a segfault.

This optimization is very machine/compiler dependent and the gain is not 
yet
demonstrated.
But, it's reassuring to have worked it out.  Thanks for the tip!

- brent
Posted by Roger Pack (rogerdpack)
on 2008-12-26 23:17
(Received via mailing list)
Seems to overall be a tidge slower for "micro" stuff--5 or 10%.
viz:
lloyd gc bench:
187 unpatched:

arrays_read.rb time 0.072516
arrays_read_yaml.rb time 0.671292
classes_read.rb time 0.040723
classes_read_yaml.rb time 0.736394
create_arrays.rb time 0.165607
create_arrays_yaml.rb time 7.638495
create_hashes.rb time 0.136778
create_hashes_yaml.rb time 20.888187
create_ostructs.rb time 2.028835
create_ostructs_yaml.rb time 10.707594
create_weak_hashes.rb time 0.946386
create_weak_hashes2.rb time 0.389543
growarray.rb time 1.788691
hashes_read.rb time 0.037333
hashes_read_yaml.rb time 1.687161
ostruct_read.rb time 1.691467
ostruct_read_yaml.rb time 1.05634
plist.rb time 4.27333
shrinkarray.rb time 1.751121
weak_hashes_read.rb time 0.293751


187 patched:

arrays_read.rb time 0.060988
arrays_read_yaml.rb time 0.706926
classes_read.rb time 0.041115
classes_read_yaml.rb time 0.736123
create_arrays.rb time 0.171677
create_arrays_yaml.rb time 7.715646
create_hashes.rb time 0.121288
create_hashes_yaml.rb time 21.457203
create_ostructs.rb time 2.020391
create_ostructs_yaml.rb time 11.011948
create_weak_hashes.rb time 1.035461
create_weak_hashes2.rb time 0.381697
growarray.rb time 1.865321
hashes_read.rb time 0.0376
hashes_read_yaml.rb time 1.802083
ostruct_read.rb time 1.705456
ostruct_read_yaml.rb time 1.108687
plist.rb time 4.64743
shrinkarray.rb time 1.833105
weak_hashes_read.rb time 0.293376

But that's for micro-benchmarks.
I think the reason we see people's performance increase is that since
the GC is suddenly more effective, it doesn't get called as often.  A
big win for larger apps.

Overall I'd call it a large win for Ruby in terms of being much more
stable size-wise in a multi-threaded environment and suggest their
incorporation verbatim.  All 6 :)

raw ruby-benchmark-suite comparison is in the footnote.
Note a few things:
one test erred with 187 normal but succeeded with MBARI patches
(core-library/bm_so_concatenate.rb)
the threaded tests do indeed run faster with MBARI.

normal:
core-library/bm_vm3_thread_create_join.rb,0.20678186416626
patched:
core-library/bm_vm3_thread_create_join.rb,0.0140390396118164


Some other thoughts I've had are that theoretically you only need to
clear the stack once between GC's, so you may be able to just keep a
"range already cleared" per thread or what not, and reset it after
each GC.  This would especially work if rb_thread_alone is true.

You might be able to get away with only checking for stack depth once
every CHECK_INT [instead of with xmalloc].

Maybe  even clear the stack only at ruby_stack_check [though this is
probably too infrequent].

I did a small experiment with memset versus tight loop and [somehow] a
tight loop seems to win.

I think there is some potential for optimization if you were to use
fixed 2K heap chunks and binary search for is_pointer_to_heap [with
cacheing of the most recently found heap chunk to help save on speed].
 Theoretically it might bring RAM usage down even further [1.9 does
this].

I know that at least for me I will definitely use these for my own
apps so that they have more control for memory.

Re: javaeye.com speed "almost the same" with railsbench GC patch +
these versus just railsbench GC patch--I think that what is happening
in this case is that GC is being called only when the freelist is used
up, since the malloc_limit is so large.  Tough to know how to speed it
up in that case [except for running GC in a different process and
earlier].

Thanks for your hard work.  I think it was something a few of us had
thought necessary but never got up the gumption to do :)

-=r

Some raw data [to me this means little compared to the rails stuffs
reported earlier].

ruby-benchmark-suite
with patch:
Benchmark Name,Time #1,Time #2,Average Time,Standard Deviation,Input 
Size
Startup,0.00860691070556641,0.00712394714355469,0.007865428924561,0.000741481781006,n/a
real-world/bm_hilbert_matrix.rb,0.0715880393981934,0.0691721439361572,0.070380091667175,0.001207947731018,10
real-world/bm_hilbert_matrix.rb,0.705732822418213,0.707005977630615,0.706369400024414,0.000636577606201,20
real-world/bm_hilbert_matrix.rb,2.71448302268982,2.73366808891296,2.724075555801392,0.009592533111572,30
real-world/bm_hilbert_matrix.rb,7.9450159072876,8.08562898635864,8.015322446823120,0.070306539535522,40
standard-library/bm_app_mandelbrot.rb,3.50128412246704,3.50250101089478,3.501892566680908,0.000608444213867,n/a
micro-benchmarks/bm_meteor_contest.rb,47.9955010414124,48.7175140380859,48.356507539749146,0.361006498336792,n/a
micro-benchmarks/bm_app_pentomino.rb,149.979510068893,150.394422769547,150.186966419219971,0.207456350326538,n/a
micro-benchmarks/bm_fasta.rb,63.8084781169891,54.929356098175,59.368917107582092,4.439561009407043,n/a
micro-benchmarks/bm_fannkuch.rb,0.0105619430541992,0.0125219821929932,0.011541962623596,0.000980019569397,6
micro-benchmarks/bm_fannkuch.rb,0.777202129364014,0.779121875762939,0.778162002563477,0.000959873199463,8
micro-benchmarks/bm_fannkuch.rb,85.7568709850311,85.9479658603668,85.852418422698975,0.095547437667847,10
micro-benchmarks/bm_nbody.rb,15.366986989975,15.376590013504,15.371788501739502,0.004801511764526,n/a
micro-benchmarks/bm_reverse_compliment.rb,8.94560790061951,8.99379897117615,8.969703435897827,0.024095535278320,n/a
micro-benchmarks/bm_quicksort.rb,7.39512896537781,7.40803289413452,7.401580929756165,0.006451964378357,n/a
micro-benchmarks/bm_mergesort.rb,4.32359004020691,4.32240605354309,4.322998046875000,0.000591993331909,n/a
micro-benchmarks/bm_nsieve_bits.rb,36.2198901176453,36.2111361026764,36.215513110160828,0.004377007484436,n/a
micro-benchmarks/bm_mandelbrot.rb,115.146002054214,115.128952026367,115.137477040290833,0.008525013923645,n/a
micro-benchmarks/bm_lucas_lehmer.rb,20.1205780506134,20.1091129779816,20.114845514297485,0.005732536315918,9689
micro-benchmarks/bm_lucas_lehmer.rb,21.7712378501892,21.766205072403,21.768721461296082,0.002516388893127,9941
micro-benchmarks/bm_lucas_lehmer.rb,31.5163369178772,31.5209879875183,31.518662452697754,0.002325534820557,11213
micro-benchmarks/bm_lucas_lehmer.rb,Timeout: 150.00 seconds,,,,19937
micro-benchmarks/bm_fractal.rb,50.184531211853,50.1739339828491,50.179232597351074,0.005298614501953,n/a
micro-benchmarks/bm_knucleotide.rb,2.19779801368713,2.3165180683136,2.257158041000366,0.059360027313232,n/a
micro-benchmarks/bm_monte_carlo_pi.rb,27.104642868042,27.153263092041,27.128952980041504,0.024310111999512,n/a
micro-benchmarks/bm_word_anagrams.rb,13.1331388950348,12.0968029499054,12.614970922470093,0.518167972564697,n/a
micro-benchmarks/bm_binary_trees.rb,101.144680023193,102.439230918884,101.791955471038818,0.647275447845459,n/a
micro-benchmarks/bm_spectral_norm.rb,1.51863193511963,1.51901316642761,1.518822550773621,0.000190615653992,n/a
micro-benchmarks/bm_nsieve.rb,33.2746829986572,33.2826149463654,33.278648972511292,0.003965973854065,n/a
micro-benchmarks/bm_regex_dna.rb,5.48113203048706,6.14786696434021,5.814499497413635,0.333367466926575,n/a
micro-benchmarks/bm_sum_file.rb,14.6941390037537,14.5948259830475,14.644482493400574,0.049656510353088,n/a
micro-benchmarks/bm_partial_sums.rb,37.6713261604309,37.645623922348,37.658475041389465,0.012851119041443,n/a
micro-benchmarks/bm_so_sieve.rb,116.000460863113,116.059950828552,116.030205845832825,0.029744982719421,n/a
core-features/bm_vm1_rescue.rb,0.296025037765503,0.280431985855103,0.288228511810303,0.007796525955200,n/a
core-features/bm_vm1_length.rb,20.3501679897308,20.2971909046173,20.323679447174072,0.026488542556763,10
core-features/bm_vm1_length.rb,20.3735370635986,20.3696429729462,20.371590018272400,0.001947045326233,100
core-features/bm_vm1_length.rb,20.3027820587158,20.3756489753723,20.339215517044067,0.036433458328247,1000
core-features/bm_vm1_length.rb,20.3291130065918,20.3042759895325,20.316694498062134,0.012418508529663,10000
core-features/bm_so_ackermann.rb,0.0445468425750732,0.0442321300506592,0.044389486312866,0.000157356262207,5
core-features/bm_so_ackermann.rb,0.727099895477295,0.724959135055542,0.726029515266418,0.001070380210876,7
core-features/bm_so_ackermann.rb,12.0094769001007,11.994206905365,12.001841902732849,0.007634997367859,9
core-features/bm_vm2_poly_method.rb,4.42595386505127,4.39683985710144,4.411396861076355,0.014557003974915,1000000
core-features/bm_vm2_poly_method.rb,8.86316585540771,8.91431498527527,8.888740420341492,0.025574564933777,2000000
core-features/bm_vm2_poly_method.rb,18.3531260490417,18.3768548965454,18.364990472793579,0.011864423751831,4000000
core-features/bm_vm2_poly_method.rb,35.1964910030365,35.3743059635162,35.285398483276367,0.088907480239868,8000000
core-features/bm_app_tak.rb,0.170708179473877,0.170413970947266,0.170561075210571,0.000147104263306,5
core-features/bm_app_tak.rb,0.616703987121582,0.613183975219727,0.614943981170654,0.001760005950928,6
core-features/bm_app_tak.rb,1.96985197067261,1.96115493774414,1.965503454208374,0.004348516464233,7
core-features/bm_so_random.rb,0.248342990875244,0.251052856445312,0.249697923660278,0.001354932785034,100000
core-features/bm_so_random.rb,1.25759100914001,1.2419810295105,1.249786019325256,0.007804989814758,500000
core-features/bm_so_random.rb,2.50672101974487,2.487135887146,2.496928453445435,0.009792566299438,1000000
core-features/bm_vm1_swap.rb,9.50407981872559,9.48183703422546,9.492958426475525,0.011121392250061,10000000
core-features/bm_vm1_swap.rb,19.0615699291229,19.1599650382996,19.110767483711243,0.049197554588318,20000000
core-features/bm_vm1_swap.rb,37.5512471199036,37.8161060810089,37.683676600456238,0.132429480552673,40000000
core-features/bm_app_fib.rb,0.02223801612854,0.0222301483154297,0.022234082221985,0.000003933906555,20
core-features/bm_app_fib.rb,2.72851300239563,2.7235860824585,2.726049542427063,0.002463459968567,30
core-features/bm_app_fib.rb,30.3188951015472,30.1245629787445,30.221729040145874,0.097166061401367,35
core-features/bm_vm2_zsuper.rb,0.935019969940186,0.979220867156982,0.957120418548584,0.022100448608398,1000000
core-features/bm_vm2_zsuper.rb,1.99105310440063,1.85130095481873,1.921177029609680,0.069876074790955,2000000
core-features/bm_vm2_zsuper.rb,3.85666489601135,3.85760307312012,3.857133984565735,0.000469088554382,4000000
core-features/bm_vm2_zsuper.rb,7.69519710540771,7.70335698127747,7.699277043342590,0.004079937934875,8000000
core-features/bm_app_factorial.rb,0.00926995277404785,0.00803399085998535,0.008651971817017,0.000617980957031,1000
core-features/bm_app_factorial.rb,0.0369241237640381,0.0322320461273193,0.034578084945679,0.002346038818359,2000
core-features/bm_app_factorial.rb,0.245321989059448,0.241642951965332,0.243482470512390,0.001839518547058,5000
core-features/bm_app_factorial.rb,Error: stack level too deep,,,,10000
core-features/bm_app_tarai.rb,6.74241399765015,6.74293112754822,6.742672562599182,0.000258564949036,3
core-features/bm_app_tarai.rb,8.1383171081543,8.13438200950623,8.136349558830261,0.001967549324036,4
core-features/bm_app_tarai.rb,9.85991907119751,9.85500311851501,9.857461094856262,0.002457976341248,5
core-features/bm_vm1_const.rb,9.00359511375427,18.2473177909851,13.625456452369690,4.621861338615417,n/a
core-features/bm_so_nested_loop.rb,0.00854110717773438,0.00856804847717285,0.008554577827454,0.000013470649719,5
core-features/bm_so_nested_loop.rb,0.471377849578857,0.481393098831177,0.476385474205017,0.005007624626160,10
core-features/bm_so_nested_loop.rb,5.18516802787781,5.36552095413208,5.275344491004944,0.090176463127136,15
core-features/bm_vm1_ensure.rb,0.0761599540710449,0.0757908821105957,0.075975418090820,0.000184535980225,100000
core-features/bm_vm1_ensure.rb,0.761255979537964,0.760828971862793,0.761042475700378,0.000213503837585,1000000
core-features/bm_vm1_ensure.rb,7.46813201904297,7.48478889465332,7.476460456848145,0.008328437805176,10000000
core-features/bm_vm2_proc.rb,1.35628509521484,1.35869193077087,1.357488512992859,0.001203417778015,1000000
core-features/bm_vm2_proc.rb,2.70739102363586,2.70443820953369,2.705914616584778,0.001476407051086,2000000
core-features/bm_vm2_proc.rb,5.41131019592285,5.4192328453064,5.415271520614624,0.003961324691772,4000000
core-features/bm_vm2_proc.rb,10.8037610054016,10.8294909000397,10.816625952720642,0.012864947319031,8000000
core-features/bm_loop_times.rb,5.33636403083801,5.50595808029175,5.421161055564880,0.084797024726868,10000000
core-features/bm_loop_times.rb,10.5615899562836,10.6464350223541,10.604012489318848,0.042422533035278,20000000
core-features/bm_loop_times.rb,16.3860912322998,15.2040379047394,15.795064568519592,0.591026663780212,30000000
core-features/bm_vm2_unif1.rb,0.562043190002441,0.56634783744812,0.564195513725281,0.002152323722839,1000000
core-features/bm_vm2_unif1.rb,1.14902997016907,1.12535285949707,1.137191414833069,0.011838555335999,2000000
core-features/bm_vm2_unif1.rb,2.30900192260742,2.34629487991333,2.327648401260376,0.018646478652954,4000000
core-features/bm_vm2_unif1.rb,4.63608813285828,4.79328298568726,4.714685559272766,0.078597426414490,8000000
core-features/bm_vm1_simplereturn.rb,6.01986694335938,6.00216197967529,6.011014461517334,0.008852481842041,10000000
core-features/bm_vm1_simplereturn.rb,12.1239230632782,12.0794620513916,12.101692557334900,0.022230505943298,20000000
core-features/bm_vm1_simplereturn.rb,17.8927519321442,17.8568298816681,17.874790906906128,0.017961025238037,30000000
core-features/bm_loop_whileloop.rb,0.0550401210784912,0.0549750328063965,0.055007576942444,0.000032544136047,100000
core-features/bm_loop_whileloop.rb,0.55099892616272,0.550863981246948,0.550931453704834,0.000067472457886,1000000
core-features/bm_loop_whileloop.rb,5.51106715202332,5.50824809074402,5.509657621383667,0.001409530639648,10000000
core-features/bm_vm2_send.rb,0.680418014526367,0.68721079826355,0.683814406394958,0.003396391868591,1000000
core-features/bm_vm2_send.rb,1.44064593315125,1.3845579624176,1.412601947784424,0.028043985366821,2000000
core-features/bm_vm2_send.rb,2.74571800231934,2.76275110244751,2.754234552383423,0.008516550064087,4000000
core-features/bm_vm2_send.rb,5.66572713851929,5.52676200866699,5.596244573593140,0.069482564926147,8000000
core-features/bm_vm1_block.rb,0.0927438735961914,0.0923471450805664,0.092545509338379,0.000198364257812,100000
core-features/bm_vm1_block.rb,0.925873994827271,0.922999858856201,0.924436926841736,0.001437067985535,1000000
core-features/bm_vm1_block.rb,9.22466993331909,9.25748586654663,9.241077899932861,0.016407966613770,10000000
core-features/bm_vm2_super.rb,0.868482828140259,0.879498958587646,0.873990893363953,0.005508065223694,1000000
core-features/bm_vm2_super.rb,1.7648811340332,1.7305908203125,1.747735977172852,0.017145156860352,2000000
core-features/bm_vm2_super.rb,3.53582000732422,3.50967812538147,3.522749066352844,0.013070940971375,4000000
core-features/bm_vm2_super.rb,7.03860211372375,7.0927209854126,7.065661549568176,0.027059435844421,8000000
core-features/bm_so_object.rb,2.1716628074646,2.17931509017944,2.175488948822021,0.003826141357422,500000
core-features/bm_so_object.rb,4.34661197662354,4.35301184654236,4.349811911582947,0.003199934959412,1000000
core-features/bm_so_object.rb,6.54321002960205,6.57184290885925,6.557526469230652,0.014316439628601,1500000
core-features/bm_app_raise.rb,6.15209579467773,6.16153502464294,6.156815409660339,0.004719614982605,n/a
core-library/bm_so_exception.rb,15.2897758483887,15.3124811649323,15.301128506660461,0.011352658271790,n/a
core-library/bm_so_concatenate.rb,94.1705470085144,86.1923739910126,90.181460499763489,3.989086508750916,5000
core-library/bm_so_concatenate.rb,Error: failed to allocate 
memory,,,,10000
core-library/bm_so_concatenate.rb,Error: failed to allocate 
memory,,,,15000
core-library/bm_so_count_words.rb,12.6775200366974,12.6623919010162,12.669955968856812,0.007564067840576,n/a
core-library/bm_vm2_array.rb,0.747035026550293,0.747730016708374,0.747382521629333,0.000347495079041,1000000
core-library/bm_vm2_array.rb,1.49641180038452,1.49419784545898,1.495304822921753,0.001106977462769,2000000
core-library/bm_vm2_array.rb,2.99150490760803,2.99072194099426,2.991113424301147,0.000391483306885,4000000
core-library/bm_vm2_array.rb,5.98674607276917,5.98161792755127,5.984182000160217,0.002564072608948,8000000
core-library/bm_vm2_regexp.rb,0.963823080062866,0.955650806427002,0.959736943244934,0.004086136817932,10
core-library/bm_vm2_regexp.rb,1.10640692710876,1.1020519733429,1.104229450225830,0.002177476882935,100
core-library/bm_vm2_regexp.rb,2.06975698471069,2.06661009788513,2.068183541297913,0.001573443412781,1000
core-library/bm_vm2_regexp.rb,12.8606810569763,12.8909890651703,12.875835061073303,0.015154004096985,10000
core-library/bm_vm3_thread_create_join.rb,0.0140390396118164,0.0140399932861328,0.014039516448975,0.000000476837158,1000
core-library/bm_vm3_thread_create_join.rb,0.14684009552002,0.142390966415405,0.144615530967712,0.002224564552307,10000
core-library/bm_vm3_thread_create_join.rb,1.42791819572449,1.42644500732422,1.427181601524353,0.000736594200134,100000
core-library/bm_app_strconcat.rb,5.00886416435242,4.99663209915161,5.002748131752014,0.006116032600403,n/a
core-library/bm_so_lists.rb,17.1406989097595,17.1364989280701,17.138598918914795,0.002099990844727,n/a
core-library/bm_so_matrix.rb,2.97365689277649,2.97791600227356,2.975786447525024,0.002129554748535,n/a
core-library/bm_pathname.rb,9.67186689376831,9.60481905937195,9.638342976570129,0.033523917198181,n/a
core-library/bm_so_array.rb,9.70830893516541,9.71373295783997,9.711020946502686,0.002712011337280,n/a





without patch:

Benchmark Name,Time #1,Time #2,Average Time,Standard Deviation,Input 
Size
Startup,0.00891494750976562,0.0071098804473877,0.008012413978577,0.000902533531189,n/a
real-world/bm_hilbert_matrix.rb,0.0650041103363037,0.0641639232635498,0.064584016799927,0.000420093536377,10
real-world/bm_hilbert_matrix.rb,0.648383140563965,0.593698024749756,0.621040582656860,0.027342557907104,20
real-world/bm_hilbert_matrix.rb,2.48112893104553,2.53910279273987,2.510115861892700,0.028986930847168,30
real-world/bm_hilbert_matrix.rb,7.38045883178711,7.66462898254395,7.522543907165527,0.142085075378418,40
standard-library/bm_app_mandelbrot.rb,3.16141700744629,3.16698312759399,3.164200067520142,0.002783060073853,n/a
micro-benchmarks/bm_meteor_contest.rb,44.2824368476868,45.2044949531555,44.743465900421143,0.461029052734375,n/a
micro-benchmarks/bm_app_pentomino.rb,123.641210079193,124.70353603363,124.172373056411743,0.531162977218628,n/a
micro-benchmarks/bm_fasta.rb,53.9337210655212,46.0915009975433,50.012611031532288,3.921110033988953,n/a
micro-benchmarks/bm_fannkuch.rb,0.00894379615783691,0.0106801986694336,0.009811997413635,0.000868201255798,6
micro-benchmarks/bm_fannkuch.rb,0.672353982925415,0.676352024078369,0.674353003501892,0.001999020576477,8
micro-benchmarks/bm_fannkuch.rb,74.7806649208069,75.063551902771,74.922108411788940,0.141443490982056,10
micro-benchmarks/bm_nbody.rb,13.873694896698,13.8695569038391,13.871625900268555,0.002068996429443,n/a
micro-benchmarks/bm_reverse_compliment.rb,8.97242403030396,8.94234395027161,8.957383990287781,0.015040040016174,n/a
micro-benchmarks/bm_quicksort.rb,7.04030704498291,7.07406520843506,7.057186126708984,0.016879081726074,n/a
micro-benchmarks/bm_mergesort.rb,3.57215404510498,3.57273411750793,3.572444081306458,0.000290036201477,n/a
micro-benchmarks/bm_nsieve_bits.rb,30.060455083847,30.0687861442566,30.064620614051819,0.004165530204773,n/a
micro-benchmarks/bm_mandelbrot.rb,97.2698559761047,97.3307840824127,97.300320029258728,0.030464053153992,n/a
micro-benchmarks/bm_lucas_lehmer.rb,20.239284992218,20.231260061264,20.235272526741028,0.004012465476990,9689
micro-benchmarks/bm_lucas_lehmer.rb,22.0810799598694,22.0774850845337,22.079282522201538,0.001797437667847,9941
micro-benchmarks/bm_lucas_lehmer.rb,31.978404045105,31.9865000247955,31.982452034950256,0.004047989845276,11213
micro-benchmarks/bm_lucas_lehmer.rb,Timeout: 150.00 seconds,,,,19937
micro-benchmarks/bm_fractal.rb,42.6185228824615,42.6280272006989,42.623275041580200,0.004752159118652,n/a
micro-benchmarks/bm_knucleotide.rb,2.08846092224121,2.21108198165894,2.149771451950073,0.061310529708862,n/a
micro-benchmarks/bm_monte_carlo_pi.rb,23.9794390201569,23.9089260101318,23.944182515144348,0.035256505012512,n/a
micro-benchmarks/bm_word_anagrams.rb,12.3941428661346,12.7155430316925,12.554842948913574,0.160700082778931,n/a
micro-benchmarks/bm_binary_trees.rb,82.171679019928,81.3044950962067,81.738087058067322,0.433591961860657,n/a
micro-benchmarks/bm_spectral_norm.rb,1.35397505760193,1.35372304916382,1.353849053382874,0.000126004219055,n/a
micro-benchmarks/bm_nsieve.rb,23.6701729297638,23.6756160259247,23.672894477844238,0.002721548080444,n/a
micro-benchmarks/bm_regex_dna.rb,5.42796611785889,6.06900095939636,5.748483538627625,0.320517420768738,n/a
micro-benchmarks/bm_sum_file.rb,15.2133920192719,15.1973860263824,15.205389022827148,0.008002996444702,n/a
micro-benchmarks/bm_partial_sums.rb,33.0055561065674,32.8080351352692,32.906795620918274,0.098760485649109,n/a
micro-benchmarks/bm_so_sieve.rb,84.2921900749207,83.7600059509277,84.026098012924194,0.266092061996460,n/a
core-features/bm_vm1_rescue.rb,0.289474964141846,0.276180028915405,0.282827496528625,0.006647467613220,n/a
core-features/bm_vm1_length.rb,16.6632568836212,17.2531778812408,16.958217382431030,0.294960498809814,10
core-features/bm_vm1_length.rb,17.1057379245758,16.5685300827026,16.837134003639221,0.268603920936584,100
core-features/bm_vm1_length.rb,16.5657980442047,17.0870249271393,16.826411485671997,0.260613441467285,1000
core-features/bm_vm1_length.rb,17.1546268463135,16.5853810310364,16.870003938674927,0.284622907638550,10000
core-features/bm_so_ackermann.rb,0.0397160053253174,0.0391659736633301,0.039440989494324,0.000275015830994,5
core-features/bm_so_ackermann.rb,0.659607887268066,0.658763885498047,0.659185886383057,0.000422000885010,7
core-features/bm_so_ackermann.rb,Error: stack level too deep,,,,9
core-features/bm_vm2_poly_method.rb,3.11452198028564,3.17220616340637,3.143364071846008,0.028842091560364,1000000
core-features/bm_vm2_poly_method.rb,6.2718460559845,6.33952903747559,6.305687546730042,0.033841490745544,2000000
core-features/bm_vm2_poly_method.rb,12.6923749446869,13.1363949775696,12.914384961128235,0.222010016441345,4000000
core-features/bm_vm2_poly_method.rb,26.4767730236053,26.1098349094391,26.293303966522217,0.183469057083130,8000000
core-features/bm_app_tak.rb,0.135724067687988,0.135273933410645,0.135499000549316,0.000225067138672,5
core-features/bm_app_tak.rb,0.493720054626465,0.491595029830933,0.492657542228699,0.001062512397766,6
core-features/bm_app_tak.rb,1.57262206077576,1.55252599716187,1.562574028968811,0.010048031806946,7
core-features/bm_so_random.rb,0.209523916244507,0.209694147109985,0.209609031677246,0.000085115432739,100000
core-features/bm_so_random.rb,1.04600214958191,1.04535102844238,1.045676589012146,0.000325560569763,500000
core-features/bm_so_random.rb,2.10444784164429,2.09432411193848,2.099385976791382,0.005061864852905,1000000
core-features/bm_vm1_swap.rb,8.73163294792175,8.67143487930298,8.701533913612366,0.030099034309387,10000000
core-features/bm_vm1_swap.rb,17.5122091770172,17.5956919193268,17.553950548171997,0.041741371154785,20000000
core-features/bm_vm1_swap.rb,34.7529518604279,34.8331568241119,34.793054342269897,0.040102481842041,40000000
core-features/bm_app_fib.rb,0.0184860229492188,0.0182771682739258,0.018381595611572,0.000104427337646,20
core-features/bm_app_fib.rb,2.23634505271912,2.2517249584198,2.244035005569458,0.007689952850342,30
core-features/bm_app_fib.rb,24.7462511062622,24.8339931964874,24.790122151374817,0.043871045112610,35
core-features/bm_vm2_zsuper.rb,0.919327974319458,0.896940946578979,0.908134460449219,0.011193513870239,1000000
core-features/bm_vm2_zsuper.rb,1.82282018661499,1.81718993186951,1.820005059242249,0.002815127372742,2000000
core-features/bm_vm2_zsuper.rb,3.54206895828247,3.70834898948669,3.625208973884583,0.083140015602112,4000000
core-features/bm_vm2_zsuper.rb,7.16689205169678,7.22776818275452,7.197330117225647,0.030438065528870,8000000
core-features/bm_app_factorial.rb,0.0101971626281738,0.00816202163696289,0.009179592132568,0.001017570495605,1000
core-features/bm_app_factorial.rb,0.0399060249328613,0.0334930419921875,0.036699533462524,0.003206491470337,2000
core-features/bm_app_factorial.rb,Error: stack level too deep,,,,5000
core-features/bm_app_factorial.rb,Error: stack level too deep,,,,10000
core-features/bm_app_tarai.rb,5.36701798439026,5.35589003562927,5.361454010009766,0.005563974380493,3
core-features/bm_app_tarai.rb,6.47303104400635,6.47742319107056,6.475227117538452,0.002196073532104,4
core-features/bm_app_tarai.rb,7.84382104873657,7.862135887146,7.852978467941284,0.009157419204712,5
core-features/bm_vm1_const.rb,8.81164598464966,18.2617099285126,13.536677956581116,4.725031971931458,n/a
core-features/bm_so_nested_loop.rb,0.0085291862487793,0.00852203369140625,0.008525609970093,0.000003576278687,5
core-features/bm_so_nested_loop.rb,0.47477388381958,0.482151031494141,0.478462457656860,0.003688573837280,10
core-features/bm_so_nested_loop.rb,5.32685899734497,5.50593280792236,5.416395902633667,0.089536905288696,15
core-features/bm_vm1_ensure.rb,0.069011926651001,0.0690209865570068,0.069016456604004,0.000004529953003,100000
core-features/bm_vm1_ensure.rb,0.69483208656311,0.688596963882446,0.691714525222778,0.003117561340332,1000000
core-features/bm_vm1_ensure.rb,6.77466106414795,6.78414702415466,6.779404044151306,0.004742980003357,10000000
core-features/bm_vm2_proc.rb,1.20864987373352,1.21058702468872,1.209618449211121,0.000968575477600,1000000
core-features/bm_vm2_proc.rb,2.42319822311401,2.42020487785339,2.421701550483704,0.001496672630310,2000000
core-features/bm_vm2_proc.rb,4.83699607849121,4.83324503898621,4.835120558738708,0.001875519752502,4000000
core-features/bm_vm2_proc.rb,9.69077706336975,9.66815495491028,9.679466009140015,0.011311054229736,8000000
core-features/bm_loop_times.rb,4.91027808189392,4.85836100578308,4.884319543838501,0.025958538055420,10000000
core-features/bm_loop_times.rb,9.6381299495697,9.79347586631775,9.715802907943726,0.077672958374023,20000000
core-features/bm_loop_times.rb,14.0774569511414,14.2317838668823,14.154620409011841,0.077163457870483,30000000
core-features/bm_vm2_unif1.rb,0.5470130443573,0.575318098068237,0.561165571212769,0.014152526855469,1000000
core-features/bm_vm2_unif1.rb,1.15082097053528,1.09408688545227,1.122453927993774,0.028367042541504,2000000
core-features/bm_vm2_unif1.rb,2.34767317771912,2.39744305610657,2.372558116912842,0.024884939193726,4000000
core-features/bm_vm2_unif1.rb,4.54664993286133,4.72602391242981,4.636336922645569,0.089686989784241,8000000
core-features/bm_vm1_simplereturn.rb,5.74362897872925,5.69164609909058,5.717637538909912,0.025991439819336,10000000
core-features/bm_vm1_simplereturn.rb,11.5744888782501,11.6653461456299,11.619917511940002,0.045428633689880,20000000
core-features/bm_vm1_simplereturn.rb,16.4999470710754,17.2785489559174,16.889248013496399,0.389300942420959,30000000
core-features/bm_loop_whileloop.rb,0.04463791847229,0.0446460247039795,0.044641971588135,0.000004053115845,100000
core-features/bm_loop_whileloop.rb,0.445381879806519,0.445470094680786,0.445425987243652,0.000044107437134,1000000
core-features/bm_loop_whileloop.rb,4.45436692237854,4.45519089698792,4.454778909683228,0.000411987304688,10000000
core-features/bm_vm2_send.rb,0.666916847229004,0.647531032562256,0.657223939895630,0.009692907333374,1000000
core-features/bm_vm2_send.rb,1.31860494613647,1.33532500267029,1.326964974403381,0.008360028266907,2000000
core-features/bm_vm2_send.rb,2.61160898208618,2.61185622215271,2.611732602119446,0.000123620033264,4000000
core-features/bm_vm2_send.rb,5.21212983131409,5.2205491065979,5.216339468955994,0.004209637641907,8000000
core-features/bm_vm1_block.rb,0.0858049392700195,0.0851829051971436,0.085493922233582,0.000311017036438,100000
core-features/bm_vm1_block.rb,0.851619005203247,0.855067014694214,0.853343009948730,0.001724004745483,1000000
core-features/bm_vm1_block.rb,8.54761481285095,8.54824185371399,8.547928333282471,0.000313520431519,10000000
core-features/bm_vm2_super.rb,0.820611000061035,0.826474905014038,0.823542952537537,0.002931952476501,1000000
core-features/bm_vm2_super.rb,1.66156506538391,1.7179229259491,1.689743995666504,0.028178930282593,2000000
core-features/bm_vm2_super.rb,3.43505811691284,3.11910891532898,3.277083516120911,0.157974600791931,4000000
core-features/bm_vm2_super.rb,6.57702493667603,6.63278102874756,6.604902982711792,0.027878046035767,8000000
core-features/bm_so_object.rb,2.12350010871887,2.12192320823669,2.122711658477783,0.000788450241089,500000
core-features/bm_so_object.rb,4.24955415725708,4.26192998886108,4.255742073059082,0.006187915802002,1000000
core-features/bm_so_object.rb,6.3975510597229,6.39479994773865,6.396175503730774,0.001375555992126,1500000
core-features/bm_app_raise.rb,6.0956289768219,6.09303498268127,6.094331979751587,0.001296997070312,n/a
core-library/bm_so_exception.rb,14.7507960796356,14.7960770130157,14.773436546325684,0.022640466690063,n/a
core-library/bm_so_concatenate.rb,Error: failed to allocate 
memory,,,,5000
core-library/bm_so_concatenate.rb,Error: failed to allocate 
memory,,,,10000
core-library/bm_so_concatenate.rb,Error: string sizes too big,,,,15000
core-library/bm_so_count_words.rb,12.6491630077362,12.6552991867065,12.652231097221375,0.003068089485168,n/a
core-library/bm_vm2_array.rb,0.745777130126953,0.743591070175171,0.744684100151062,0.001093029975891,1000000
core-library/bm_vm2_array.rb,1.49140596389771,1.49024105072021,1.490823507308960,0.000582456588745,2000000
core-library/bm_vm2_array.rb,2.98199105262756,2.97910499572754,2.980548024177551,0.001443028450012,4000000
core-library/bm_vm2_array.rb,5.96128010749817,5.95619893074036,5.958739519119263,0.002540588378906,8000000
core-library/bm_vm2_regexp.rb,0.969479084014893,0.94690990447998,0.958194494247437,0.011284589767456,10
core-library/bm_vm2_regexp.rb,1.14880299568176,1.13046097755432,1.139631986618042,0.009171009063721,100
core-library/bm_vm2_regexp.rb,2.10414791107178,2.11249113082886,2.108319520950317,0.004171609878540,1000
core-library/bm_vm2_regexp.rb,12.8499979972839,12.8127069473267,12.831352472305298,0.018645524978638,10000
core-library/bm_vm3_thread_create_join.rb,0.0200490951538086,0.0200908184051514,0.020069956779480,0.000020861625671,1000
core-library/bm_vm3_thread_create_join.rb,0.20678186416626,0.203513860702515,0.205147862434387,0.001634001731873,10000
core-library/bm_vm3_thread_create_join.rb,2.04601502418518,2.04799294471741,2.047003984451294,0.000988960266113,100000
core-library/bm_app_strconcat.rb,5.07870984077454,5.0784158706665,5.078562855720520,0.000146985054016,n/a
core-library/bm_so_lists.rb,14.3261790275574,14.3377418518066,14.331960439682007,0.005781412124634,n/a
core-library/bm_so_matrix.rb,2.74682712554932,2.76162505149841,2.754226088523865,0.007398962974548,n/a
core-library/bm_pathname.rb,9.27168798446655,9.26259899139404,9.267143487930298,0.004544496536255,n/a
core-library/bm_so_array.rb,8.83219408988953,8.83007097244263,8.831132531166077,0.001061558723450,n/a
Posted by Brent Roman (brentr)
on 2008-12-27 08:31
(Received via mailing list)
Roger,

You ran this benchmark suite, correct?

http://github.com/acangiano/ruby-benchmark-suite/tree/master

I'd never heard of them before now.  Thanks!

I don't believe that these patches cause GC to run any less frequently 
by
default.
GC is still run (by default) after allocating 8MB of objects.  Nothing 
I'm
doing causes Ruby to allocate fewer or smaller objects.  I do believe we 
are
seeing that applications with large stack space(s) spend a lot of time
during GC scanning each and every word on those stacks.  These patches 
make
those stacks much smaller and zero out most ghost object pointers so 
they no
longer need to be marked.

see my comments below, marked Brent:
Posted by Roger Pack (rogerdpack)
on 2008-12-27 20:38
(Received via mailing list)
> You ran this benchmark suite, correct?
>
> http://github.com/acangiano/ruby-benchmark-suite/tree/master

Yeah, that and http://lloydforge.org/projects/misc/
the latter taking considerably less time to run :)


> I don't believe that these patches cause GC to run any less frequently by
> default.
> GC is still run (by default) after allocating 8MB of objects.  Nothing I'm
> doing causes Ruby to allocate fewer or smaller objects.  I do believe we are
> seeing that applications with large stack space(s) spend a lot of time
> during GC scanning each and every word on those stacks.  These patches make
> those stacks much smaller and zero out most ghost object pointers so they no
> longer need to be marked.

It would be interesting to see if the GC is being caused by malloc
versus running out of free list.  If it's the latter then the patches
could indeed cause GC to be less frequent.  If not then maybe it's as
you said--GC just takes less time as there's less to traversal during
the mark phase.

>> Brent:  A >14x speed up.  Whoopie!  :-0
Yeah I think multi-thread apps will definitely like this.
Unfortunately most benchmarks are single threaded and micro-y so won't
show the "real" speedup [Antonio's included].


>> that a GC was about to occur, and get away with zeroing the stack at that
>> one point.  However, recall that  the collector scans each thread's stack
>> in multithreaded apps (and those using Continuations).  So, I'd need to
>> know when a GC or a context switch was going to occur while the stack was
>> still shallow.  I haven't figured out how to implement that oracle
>> function (and I doubt it is possible).

Hmm so the biggest speed hit is probably in the clearing of the stack
[over and over] right? [judging from your comment that measurement is
cheap].

I was just suggesting that once a thread has [reached a very shallow
spot and cleaned the stack in its entirety] it only needs to repeat
that after the next GC--left over references from this round will be
cleared [once] after the subsequent GC (when the thread reaches a
shallow point again).  So if you're willing to wait a couple of GC's,
you only have to clear once per GC, per thread.  So the oracle is "do
it once after each GC."
Sorry it's hard to explain.

Anyway imagine a single threaded app.  As long as that app clears the
stack "once and well" [say the first time it gets very high it cleans
off the whole thing--or accomplish this piece-wise as it grows high
the first time] then in a staggered way, every reference to garbage
will eventually be zeroed out and the item collected.
Not that it really matters I'm just trying to make sure that my
thought has been explained well.
thoughts?
-=r
Posted by Brent Roman (brentr)
on 2008-12-28 09:41
(Received via mailing list)
Roger,

I see what you mean.  If these patches let the GC collect
objects more efficiently, the object free list will not empty as often.
The speed up we observe for large single threaded apps could well be a
combination
faster ObjectSpace traversal and fewer GC passes triggered by an empty 
free
list.

My bogus2 benchmark switches between one thread having a very deep stack 
and
another with a shallow stack.  It's the worst conceivable case of stack
thrashing.  It runs about 15% faster if I disable only the clearing of 
the
stack.

I've spent a couple hours today "imagining" what might happen if each
thread's stack were cleared only once soon after each GC is run.  Here 
are
my observations thus far:

1)  I think I now see your point about VALUE pointers not necessarily
needing to be zeroed.
We just want to minimize the number of permanent ghost object pointers
residing on any stack.
When whatever transient ghost references remain, change value, GC will
eventually collect the objects to which they referred.  Correct?

2)  GC is not triggered by any thread's particular activities.  It may 
be
that a given thread, whose stack has become full of ghost references due 
to
deferred stack clearing, stops running for long periods of time.  Or, 
that a
such a thread just never happens to be running when a GC is triggered.

3)  It is critical that the stack be cleared very soon after each 
context
swap, when the new thread's stack is shallower than the old one's.
Otherwise, VALUE pointers on the old thread's stack will likely be
incorporated into the new thread's stack when it grows (as ghosts there)
after the next context switch.

4)  More generally, there is no guarantee that any thread's stack, once 
it
incorporates ghost values during growth, will ever shrink later to allow
those values to be cleared off.

To me, this all adds up to requiring repeated clearing of the stack.
Because, once ghosts have been pushed onto a thread's stack, they may 
just
stay trapped there indefinitely.
Could you formulate some pseudo-code of an algorithm you think would 
(almost
always) prevent the incorporation of ghost references without repeated 
stack
clearing?

I really want to believe :-)

- brent
Posted by Roger Pack (rogerdpack)
on 2008-12-30 21:05
(Received via mailing list)
Hmm interesting.
So I was looking at it from the single threaded perspective so
obviously missed some subtle implications.

If I understand correctly, the problem is that
1) If you have a large stacked thread "full of garbage" then this
garbage will be copied into the stack of a small stack after context
switch if it grows.
2) If a single thread creates a very "dirty" stack then goes into a
deep nested loop [ex: going to sleep forever within a very nested
call], it will not free the invalid references until it comes out of
that deep stack later.

I suppose we can operate under the assumption that when the program
starts, the extent of the stack is "clear" of bad references.

A few tricks up our sleeve:
We can do a stack cleaning around the time of a context switch:
We can clear the difference in size between the stacks after each
context switch.
We could clear that difference PLUS re-clear the "cleared once" area
below the stack, after each context switch.

Or perhaps do the "clear at most once" trick only if rb_thread_alone,
though I think the above would already do that.

So anyway we could basically reset the "already cleared" markers once
per context switch, instead of once per GC, and re-clear that stacks
damage.  Would that help?
In reality I'm not sure if these would be necessary.  How can we tell
how much is necessary?

Old notes:

So let's then keep two values, per thread.  One being the top of a
"clean section" the other the bottom of the "clean section"  [already
swept section].

Make this "clean section" grow as possible [check it every CHECK_INT,
if you're above it, grow it, if you're below it, reset it to start
below you, etc.].  So we have track of, per thread, a growing cleaned
area.

Now when you context switch, if you switch from a large stack to a
shorter stack, clean the difference, plus the "dirty but clean now"
section--clean it again.  Reset the pointers.

I guess just try it out :)  Or I might get around to it eventually.

Comments inline:

> My bogus2 benchmark switches between one thread having a very deep stack and
> another with a shallow stack.  It's the worst conceivable case of stack
> thrashing.  It runs about 15% faster if I disable only the clearing of the
> stack.

I wonder if that's what causes the micro-benchmark slowdowns [what are
they like 5%?]  What about disabling the depth checker, too? What's
its impact?

> When whatever transient ghost references remain, change value, GC will
> eventually collect the objects to which they referred.  Correct?

Yeah

> 2)  GC is not triggered by any thread's particular activities.  It may be
> that a given thread, whose stack has become full of ghost references due to
> deferred stack clearing, stops running for long periods of time.  Or, that a
> such a thread just never happens to be running when a GC is triggered.

True if a thread "doesn't run at all" between GC's then it won't clear
its stack until...it runs again at some point :)
A thread basically gets a window of 1 GC to create as much trash as it
wants, and, if it ceases running, retains that much trash.

-=r
Posted by Brent Roman (brentr)
on 2009-01-06 19:34
(Received via mailing list)
I've just posted a new patch at:

http://sites.google.com/site/brentsrubypatches/

The MBARI7 patch provides detailed build-time configuration control over
when stack clearing is done and optimizes the GC.  This patched 
interpreter
is as fast as unpatched Ruby 1.8.7 even for small, single threaded
benchmarks, while still effectively clearing ghost object references off 
the
stack.  MBARI7 also fixes a couple benign bugs in MBARI3.

On my 1.6Ghz CoreDuo MacMini, MBARI7 runs the standard Ruby test suite,
producing exactly the same output as the unpatched ruby-1.8.7-p72 in the
same amount of time, using 30Mb less memory.

Can anyone run it on some large applications to see how it performs in 
the
real Ruby world?

Thanks!
Posted by Sylvain Joyeux (Guest)
on 2009-01-06 19:51
(Received via mailing list)
Thanks Brent for this work.

One problem: the way you are getting the SP through alloca() is
architecture-specific. On PPC, there is data between the alloca() space
and the stack pointer, and your stack-wiping patch clears that data as
well (which, among other things, includes the return address).

I did a patch against MBARI6 for that, you can see it here:

  http://github.com/doudou/ruby/commit/0cd9b81d8ba8c...

I'll update to MBARI7 as soon as possible

Sylvain
Posted by Brent Roman (brentr)
on 2009-01-07 08:57
(Received via mailing list)
Sylvain,

Man, the PowerPC is a weird beast.

After spending a couple hours looking over its ABI docs, I've convinced
myself that the stack pointer would be valid for stack clearing if one 
could
get to at it directly from the C code.  MBARI7 introduced a bit of gcc 
asm
to do this for x86.  I've posted an update to the MBARI7 patch on my 
website
that adds asm cases to get the stack pointer for PPC and ARM processors. 
I
don't have ready access to a PowerPC machine, so I'll have to rely on 
others
to test this.

Oddly enough, there has long been (and still is) code in gc.c that gets 
the
stack pointer via alloc(0), but it does not crash on PPC because that
pointer is only used to determine the approximate stack depth for 
catching
infinite recursion in Ruby scripts.

If the compiler is not gcc or the CPU is not x86, PPC or ARM, MBARI7 now
falls back to the more portable method of returning the address of a 
local
variable from a small function flagged with NOINLINE().  This is similar 
to
your patch on MBARI6, but it should even work on strict ANSI 'C' 
compilers
as long as they don't inline that function.  I did verify that gcc had 
no
issues with it.  Aside from issuing warnings about returning the address 
of
a local variable, the resulting build worked fine.  It just ran about 
1.5%
slower.

We'll see...

- brent

P.S.  I'll be trying to learn git and github over the coming weeks. 
Perhaps
we can keep these patches and yours in one place eventually.
Posted by Sylvain Joyeux (Guest)
on 2009-01-07 11:27
(Received via mailing list)
On Wed, Jan 07, 2009 at 04:35:18PM +0900, Brent Roman wrote:
> don't have ready access to a PowerPC machine, so I'll have to rely on others
> to test this.
> 
> Oddly enough, there has long been (and still is) code in gc.c that gets the
> stack pointer via alloc(0), but it does not crash on PPC because that
> pointer is only used to determine the approximate stack depth for catching
> infinite recursion in Ruby scripts.
You should read commit messages when someone points you to one ;-)
That's basically what I said in the message, except that I thought it
was also used to delimit the stack in the mark phase.

Anyway, it works because
 * the stack is not modified
 * there is no ruby variables that can be stored in-between the alloca()
   space and the stack pointer which is not already somewhere else (in
   registers, and the register window is also dumped by the GC AFAIK).

> P.S.  I'll be trying to learn git and github over the coming weeks.  Perhaps
> we can keep these patches and yours in one place eventually.

Well, it will be as simple as cloning my repository in one common place.
I already updated my patch to work on top of MBARI7 with ASM as well ...
We'll see what to keep I guess.

My updated patch is here:
  http://github.com/doudou/ruby/commit/f02bea5b10fea...

Sylvain
Posted by Michael Klishin (Guest)
on 2009-01-07 17:31
(Received via mailing list)
On 06.01.2009, at 21:50, Sylvain Joyeux wrote:

> I did a patch against MBARI6 for that


Imagine the same scenario taking place in a Subversion/diff/patch world:

"I did a patch from your patch, here is my new patch
[patch6_version_2.diff], and by the way, it needs to be applied after
[patch5_version4.diff] that I uploaded earlier" — experimental
branches, the Subversion way.

MK
Posted by Robbin Fan (Guest)
on 2009-01-08 10:15
(Received via mailing list)
Hi, Brent

   I can't compile ruby with MBARI7.

  CPU: AMD Opteron 246 * 2
  OS: SuSE Linux Enterprise Server 9 SP 4 x86-64
  gcc: 3.3.3
  glibc: 2.3.3

  I apply MBARI7 with ruby 1.8.7-p72 and configure below:

  CFLAGS="-O2 -mpreferred-stack-boundary=4" ./configure
--prefix=/usr/local/ruby187patch
 make

The error message below:

ar rcu libruby-static.a array.o bignum.o class.o compar.o dir.o dln.o
enum.o enumerator.o error.o eval.o file.o gc.o hash.o inits.o io.o
marshal.o math.o numeric.o object.o pack.o parse.o process.o prec.o
random.o range.o re.o regex.o ruby.o signal.o sprintf.o st.o string.o
struct.o time.o util.o variable.o version.o  dmyext.o
gcc -O2 -mpreferred-stack-boundary=4    -DRUBY_EXPORT -D_GNU_SOURCE=1
-I. -I.    -c main.c
gcc -O2 -mpreferred-stack-boundary=4    -DRUBY_EXPORT -D_GNU_SOURCE=1
-L.  -rdynamic -Wl,-export-dynamic   main.o  libruby-static.a -ldl
-lcrypt -lm   -o miniruby
./lib/fileutils.rb:1509:in `[]': method `hash' called on terminated
object (0x2a9559d408) (NotImplementedError)
  from ./lib/fileutils.rb:1509:in `collect_method'
  from ./lib/fileutils.rb:1509:in `select'
  from ./lib/fileutils.rb:1509:in `collect_method'
  from ./lib/fileutils.rb:1524
  from ./mkconfig.rb:11:in `require'
  from ./mkconfig.rb:11
make: *** [.rbconfig.time] Error 1



2009/1/7 Brent Roman <brent@mbari.org>:
>
> View this message in context: http://www.nabble.com/-ruby-core%3A19846---Bug--74...
> Sent from the ruby-core mailing list archive at Nabble.com.
>
>
>



--
Robbin Fan (·¶¿­)
JavaEye.com

Office: 021-63505501
Mobile: 13916323361
Email & MSN: fankai@gmail.com
Website: http://www.javaeye.com
Posted by Sylvain Joyeux (Guest)
on 2009-01-08 11:56
(Received via mailing list)
> ./lib/fileutils.rb:1509:in `[]': method `hash' called on terminated
> object (0x2a9559d408) (NotImplementedError)
>   from ./lib/fileutils.rb:1509:in `collect_method'
>   from ./lib/fileutils.rb:1509:in `select'
>   from ./lib/fileutils.rb:1509:in `collect_method'
>   from ./lib/fileutils.rb:1524
>   from ./mkconfig.rb:11:in `require'
>   from ./mkconfig.rb:11
> make: *** [.rbconfig.time] Error 1
It is not that it does not compile, but that the patched interpreter is 
broken
(this kind of error is what you get when you mess the GC process)

Try the following change:

  in rubysig.h, replace __builtin_frame_address by alloca() in the 
following
  four lines:

# else  /* slower, but should work everywhere gcc does */
#  define _set_sp(sp)  VALUE *sp = _get_tos();
NOINLINE(static VALUE *_get_tos(void)) {return 
__builtin_frame_address(0);}
# endif
#else  /* slowest, but should work everwhere */

so that they look like that:

# else  /* slower, but should work everywhere gcc does */
#  define _set_sp(sp)  VALUE *sp = _get_tos();
NOINLINE(static VALUE *_get_tos(void)) {return alloca(0);}
# endif
#else  /* slowest, but should work everwhere */

My guess is that __builtin_frame_address does not work as expected on 
your GCC
version (it works fine on an amd64 with gcc 4.3)

Sylvain
Posted by Brent Roman (brentr)
on 2009-01-08 19:36
(Received via mailing list)
Robbin,

You certainly should try Sylvain's suggestion, but I'm not sure it will 
fix
the problem.
Whether or not it does, could you send me the assembler output of your
older opteron compiler so I might see where the stack clearing patch is
getting
confused?

Here's how to get gcc to generate this file:

  CFLAGS="whatever flags you used for your Ruby make"
  gcc -S  $CFLAGS  eval.c

This will produce eval.s, the assembler source code.

If you want something much smaller that would probably contain the
information I need,
try generating the eval.s file first before making the change rubysig.h 
that
Sylvain suggests,
save that file, change rubysig.h, generate eval.s again, then diff the 
two
versions of
eval.s and send me just that diff (or post it to this list).

  gcc -S  $CFLAGS  eval.c
  mv eval.s eval.s.b4
  {edit rubysig.h}
  gcc -S  $CFLAGS  eval.c
  diff -u eval.s.b4  eval.s  >eval.s.diff

After we fix this, I'd be very interested to see whether MBARI7 manages
to keep the memory size in Rails as low as MBARI6 did.

- brent
Posted by Stephen Sykes (Guest)
on 2009-01-11 11:31
(Received via mailing list)
Brent,

A report from the field...

We have been using your patches in a production Rails environment
since you released them, and this is on x86_64-linux.

We notice no problems, ruby works well and is significantly faster.

And to keep up to date, we just applied patch MBARI7 (from
http://sites.google.com/site/brentsrubypatches/ ) with the default
configuration.  FWIW we see a further small performance improvement,
something like 5% on a rough measurement.

Just a note on your build instructions: the
-mpreferred-stack-boundary=2 flag causes configure to fail on OSX,
complaining that it can't find the size of int (the program to do so
segfaults).  And that setting is not accepted by gcc on x86_64 because
it needs the boundary to be 4 or more.  In both cases I removed the
option and all works fine.

Regards,
Stephen
Posted by Charles Oliver Nutter (Guest)
on 2009-01-11 14:47
(Received via mailing list)
Stephen Sykes wrote:
> Brent,
> 
> A report from the field...
> 
> We have been using your patches in a production Rails environment
> since you released them, and this is on x86_64-linux.
> 
> We notice no problems, ruby works well and is significantly faster.

The patches also appear to help method-call performance a bit:

BEFORE:
$ ./ruby -I lib ../jruby/bench/language/bench_method_dispatch_only.rb
Test ruby method: 100k loops calling self's foo 100 times
   1.580000   0.010000   1.590000 (  1.619531)
   1.570000   0.000000   1.570000 (  1.609721)
   1.610000   0.010000   1.620000 (  1.627628)
   1.570000   0.010000   1.580000 (  1.600705)
   1.580000   0.000000   1.580000 (  1.601550)
   1.570000   0.010000   1.580000 (  1.597049)
   1.570000   0.010000   1.580000 (  1.608728)
   1.570000   0.010000   1.580000 (  1.594988)
   1.570000   0.000000   1.570000 (  1.601885)
   1.570000   0.010000   1.580000 (  1.630782)
[headius @ 247:~/projects/ruby-1.8.7-p72]
$ ./ruby -I lib ../jruby/bench/bench_tak.rb 10
       user     system      total        real
  13.510000   0.060000  13.570000 ( 13.768144)
  13.530000   0.070000  13.600000 ( 13.773649)

AFTER:
$ ./ruby -I lib ../jruby/bench/language/bench_method_dispatch_only.rb
Test ruby method: 100k loops calling self's foo 100 times
   1.360000   0.010000   1.370000 (  1.416073)
   1.350000   0.000000   1.350000 (  1.381519)
   1.360000   0.010000   1.370000 (  1.376705)
   1.350000   0.000000   1.350000 (  1.380676)
   1.350000   0.010000   1.360000 (  1.377904)
   1.360000   0.010000   1.370000 (  1.465818)
   1.350000   0.000000   1.350000 (  1.379431)
   1.350000   0.010000   1.360000 (  1.372702)
   1.340000   0.010000   1.350000 (  1.374763)
   1.350000   0.010000   1.360000 (  1.376614)
$ ./ruby -I lib ../jruby/bench/bench_tak.rb 10
       user     system      total        real
  12.860000   0.060000  12.920000 ( 13.091790)
  12.950000   0.060000  13.010000 ( 13.241058)

For comparison, Ruby 1.9.1RC1 numbers:

$ ruby1.9 ../jruby/bench/language/bench_method_dispatch_only.rb
Test ruby method: 100k loops calling self's foo 100 times
   0.650000   0.000000   0.650000 (  0.665707)
   0.650000   0.010000   0.660000 (  0.657540)
   0.650000   0.000000   0.650000 (  0.662093)
   0.650000   0.010000   0.660000 (  0.667457)
   0.650000   0.000000   0.650000 (  0.670909)
   0.650000   0.000000   0.650000 (  0.665737)
   0.650000   0.010000   0.660000 (  0.664140)
   0.650000   0.000000   0.650000 (  0.667239)
   0.650000   0.000000   0.650000 (  0.662808)
   0.650000   0.010000   0.660000 (  0.661229)
$ ruby1.9 ../jruby/bench/bench_tak.rb 10
       user     system      total        real
   2.900000   0.010000   2.910000 (  2.951113)
   2.890000   0.020000   2.910000 (  2.958036)

- Charlie
Posted by Sylvain Joyeux (Guest)
on 2009-01-12 11:36
(Received via mailing list)
On Sun, Jan 11, 2009 at 10:46:36PM +0900, Charles Oliver Nutter wrote:
> The patches also appear to help method-call performance a bit:
Could you try the same tests with GC disabled ? I'm wondering if the 
change is
purely due to improvement in the GC speed ...

Sylvain
Posted by Brent Roman (brentr)
on 2009-01-12 20:37
(Received via mailing list)
Stephen,

I updated the MBARI7 patch at 
http://sites.google.com/site/brentsrubypatches
again last night (on 1/11/09) before I'd read your post.  (Sorry)

I had already concluded that -mpreferred-stack-boundary=2 is generally a
"bad idea" and have removed it from the recommended options.  It has
portability problems and, even where it works, the net loss in  speed is 
not
worth the small reduction stack usage for most Ruby scripts.  One option
that does increase  speed about 7% across the board is 
-fomit-frame-pointer.
It seems to work well with most recent gcc compilers, but segaults on 
older
ones, so I'm not recommending it by default.  I believe that microsoft 
'C'
has an analogous option.

This latest update to MBARI7 adds a configuration option to select the
method used to clear the stack among four alternatives.  The default is 
to
use a (new) portable method that allocates the "dirty" stack briefly 
with
alloca() before clearing it.  This portable method costs time (~1.5%), 
but
it is safer.

In practice,
The 32-bit x86 is so starved for registers that I'd seen cases where gcc
would emit a PUSH %ESP between the point in the (old, fast) stack 
clearing
routine that read the stack pointer and the loop that was to zero
unallocated stack above the top.  This would cause the stacked base 
pointer
to be cleared as well and yield segfault when it was later POP'ed from 
the
stack.  Fortunately, if this happens, the resulting Ruby binary fails
immediately on the (bogus1.rb and bogus1.rb) test scripts included with 
the
patches.

Ironically, -mpreferred-stack-boundary=2 will make the new, portable 
stack
clearing method ineffective due to gcc's insistence that alloca(x>0) 
always
return a 16-byte aligned pointer regardless of the configured
preferred-stack-boundary.  This might be considered a bug, but I'm 
honestly
not sure.

I cannot seem to find a stack clearing method that is both safe and
portable.  Maybe others will succeed where I have punted.  For now, my 
tests
indicate that, on 32-bit x86 with gcc 4.3, the combination of

CFLAGS="-O2 -fomit-frame-pointer -fno-stack-protector"
and
#define STACK_WIPE_SITES 0x4370  /* in rubysig.h */

works best.  It protects against ghost references well and runs even
micro-benchmarks slightly faster than unpatched 1.8.7-p72.


- brent
Posted by Hongli Lai (Guest)
on 2009-01-12 23:03
(Received via mailing list)
Brent Roman wrote:
> For now, my tests
> indicate that, on 32-bit x86 with gcc 4.3, the combination of
> 
> CFLAGS="-O2 -fomit-frame-pointer -fno-stack-protector"
> and
> #define STACK_WIPE_SITES 0x4370  /* in rubysig.h */
> 
> works best.

According to the GCC documentation, -O (and -O2, -O3 and -Os) implies
-fomit-frame-pointer.

--
Phusion | The Computer Science Company

Web: http://www.phusion.nl/
E-mail: info@phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)
Posted by Brent Roman (brentr)
on 2009-01-12 23:44
(Received via mailing list)
That's not the way the gcc behaves in my experience with the 32-bit x86
machines.
Could you point me at this documentation?

What I read is:
    -O also turns on -fomit-frame-pointer on machines where doing so 
does
not interfere with debugging.

32-bit x86 machines cannot generate stack backtraces without 
framepointers.
This certainly does interfere with debugging.

- brent
Posted by Hongli Lai (Guest)
on 2009-01-12 23:51
(Received via mailing list)
Brent Roman wrote:
> That's not the way the gcc behaves in my experience with the 32-bit x86
> machines.
> Could you point me at this documentation?
> 
> What I read is:
>     -O also turns on -fomit-frame-pointer on machines where doing so does
> not interfere with debugging.
> 
> 32-bit x86 machines cannot generate stack backtraces without framepointers.
> This certainly does interfere with debugging.

Both 'info gcc' and the online manual[1] say "-fomit-frame-pointer ...
Enabled at levels -O, -O2, -O3, -Os."
But if your experience is different then I guess it's a mistake in the
documentation.

[1]
http://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Optimi...

--
Phusion | The Computer Science Company

Web: http://www.phusion.nl/
E-mail: info@phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)
Posted by Stephen Sykes (Guest)
on 2009-01-13 11:57
(Received via mailing list)
On OSX -fomit-frame-pointer is turned off if you use -O2, or other
levels.  In fact, if you turn it on, the compiled ruby crashes.

OSX has an addition to gcc - a "-fast" option that turns on the 
following flags:
-O3 -fomit-frame-pointer -fstrict-aliasing -momit-leaf-frame-pointer
-fno-tree-pre -falign-loops

But as both -fomit-frame-pointer and -momit-leaf-frame-pointer cause
the compiled ruby to crash, I have been using these options to compile
ruby with MBARI7:

-O3 -fstrict-aliasing -fno-tree-pre -falign-loops

Also with these options I have not had any problems setting
STACK_WIPE_SITES to 0x4370

-Stephen
Posted by Brent Roman (brentr)
on 2009-01-14 01:54
(Received via mailing list)
Stephen,

I'm very much a PowerPC newbie, so please bear with me...

Yesterday, I got an off list report that ruby with the MBARI patches was
failing with:

./lib/fileutils.rb:521: stack level too deep (SystemStackError)

after applying them on a PowerBook G4, Mac OS X 10.5.6 with apple GCC 
4.0.

A kind colleague happened to have a very similar laptop I could borrow.
This let me duplicate the failure.  It was being caused by the fact that 
the
rlim_t type returned by the getrlimit() call to get process limits was
*signed* rather than unsigned as under Linux.  This made my patched Ruby
believe that the size of the stack area reserved for it was 0 bytes,
triggering the "stack too deep" exception".  I fixed this and posted a
update to MBARI7 last night after fairly extensive testing.
The date on the latest version is Jan 12, 2009.

So, my first questions are:
  Did you run into this issue?  If not, why not.  If so,  did you fix or
work around it yourself?
The exact gcc being we used was:
powerpc-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465)

Regarding your posting:
There is clearly some misleading gcc documentation out there about
-fomit-frame-pointer.
It is a machine independent option, but the exact effects of the -Ox 
options
are machine dependent.

If I understand you correctly, compiling with gcc and

  -fno-omit-frame-pointer

causes ruby crashes on PowerPC OSx.  Does it also
cause crashing on Intel OSx?

Are these crashes happening whether or not
the MBARI patches are applied or only after applying them?

In any case, it seems that you've managed to get the compiler and MBARI
patches
very well optimized for PowerPC OSx.

Could you post some (brief) PPC OSx benchmark results
comparing runtime and peak process size before and after patching, 
taking
care to build ruby with the same compiler options each time?

- brent
Posted by Stephen Sykes (Guest)
on 2009-01-14 09:18
(Received via mailing list)
Brent,

Sorry, I should have mentioned, I'm running on an Intel Mac - you
assumed I was running on a PowerPC.

> If I understand you correctly, compiling with gcc and
>
>  -fno-omit-frame-pointer
>
> causes ruby crashes on PowerPC OSx.  Does it also
> cause crashing on Intel OSx?

I have no information on PowerPC, it certainly causes crashing on
Intel to compile with -fomit-frame-pointer.  Presumably
-fno-omit-frame-pointer works ok, I haven't tried it.

> Are these crashes happening whether or not
> the MBARI patches are applied or only after applying them?

Only after applying the patches.  With those same compile options and
regular ruby everything works normally.  The error when compiling
patched ruby looks like this:

gcc -O2 -fomit-frame-pointer -pipe -fno-common    -DRUBY_EXPORT  -L.
 main.o dmydln.o libruby-static.a -ldl -lobjc   -o miniruby
./lib/fileutils.rb:1165: [BUG] Bus Error
ruby 1.8.7 (2009-1-11 MBARI 7/0x2370 on patchlevel 72) 
[i686-darwin9.6.0]
make: *** [.rbconfig.time] Abort trap


Last evening I ran into the following issue with my recently compiled
ruby (which I had compiled with the -O3 options I gave before):

/usr/local/lib/ruby/site_ruby/1.8/rubygems/specification.rb:333: [BUG] 
Bus Error
ruby 1.8.7 (2009-1-11 MBARI 7/0x4370 on patchlevel 72) 
[i686-darwin9.6.0]
Abort trap

I recomplied with your suggested options of
-O2 -fno-stack-protector
and this problem went away.  Perhaps best to stick with these options 
for now.

> Could you post some (brief) PPC OSx benchmark results
> comparing runtime and peak process size before and after patching, taking
> care to build ruby with the same compiler options each time?

I can do this for intel OSX if you need?

Regards,
Stephen
Posted by Michal Suchanek (Guest)
on 2009-01-14 14:35
(Received via mailing list)
2009/1/14 Brent Roman <brent@mbari.org>:

> Could you post some (brief) PPC OSx benchmark results
> comparing runtime and peak process size before and after patching, taking
> care to build ruby with the same compiler options each time?
>
> - brent


How do you measure peak process size?

I have an application that takes about an hour to run and requires
about 2G RSS with ruby 1.8, and about half with JRuby.

I would be interested in comparing the performance with and without the 
patch.

Thanks

Michal
Posted by Brent Roman (brentr)
on 2009-01-14 19:27
(Received via mailing list)
Oh... Never mind :-)

If you are running Intel OSx, you've basically got a tweaked, slightly
outdated
Apple fork of GNU gcc for i386.  Others have reported on and off 
problems
compiling ruby with -O3 and/or -fomit-frame-pointer.  I was pleasantly
surprised
when I discovered that -fomit-frame-pointer no longer crashes ruby with
gcc 4.3.2.  But, I would never recommend it on the i386 as a default for
building
ruby.  I've also noticed that i386 -O3 produces a Ruby interpreter that
benchmarks
slower than one compiled with -O2.  You might want to confirm this for
yourself.

See my comments below:

Stephen Sykes-3 wrote:
> I recomplied with your suggested options of
> -O2 -fno-stack-protector
> and this problem went away.  Perhaps best to stick with these options for
> now.
> 
> 

You had mentioned setting STACK_WIPE_SITES to 0x4370.
Do you also get this error with STACK_WIPE_SITES left at its default of
0x2370 ?

I would be willing to try debugging the problem on my Mac Mini after
rebooting it into OSx,
if this failure occurs with the default STACK_WIPE_SITES 0x2*** settings
using gcc options
that otherwise yield a stable unpatched ruby build,

- brent
Posted by Brent Roman (brentr)
on 2009-01-14 19:35
(Received via mailing list)
Michal,

This is the sort of large app that I'd like to see benchmarked before 
and
after patching,
especially given that JRuby process size is half of MRI's.

I don't have a very scientific way to measure peak process size.
I just monitor the output of the "top" command while the process runs.
If your process is going to run for a long time, you might want to set 
up a
script to capture the output of ps and post process that to find the 
peak
process size.  I hope others have better ideas here.

- brent
Posted by Stephen Sykes (Guest)
on 2009-01-14 23:30
(Received via mailing list)
Hi Brent

>I would be willing to try debugging the problem on my Mac Mini after
>rebooting it into OSx,
>if this failure occurs with the default STACK_WIPE_SITES 0x2*** settings
>using gcc options
>that otherwise yield a stable unpatched ruby build,

Yes, it appears that these options cause patched ruby to crash with
either 0x2370 or 0x4370 set for stack_wipe_sites:

-O3 -fstrict-aliasing -fno-tree-pre

It seems that two of these options is not enough to cause the problem,
you need all three.

The crash looks like this (I get it when I run rake test to run my
rails app tests):
/usr/local/lib/ruby/site_ruby/1.8/rubygems/specification.rb:333: [BUG] 
Bus Error
ruby 1.8.7 (2009-1-11 MBARI 7/0x2370 on patchlevel 72) 
[i686-darwin9.6.0]
Abort trap

Yes, it's old gcc (gcc version 4.0.1 (Apple Inc. build 5465)), so it
may not be worth the effort to track down the issue.  On the other
hand, it might be interesting though.

Perhaps contact me directly if you wish to pursue this.

-Stephen
Posted by Roger Pack (Guest)
on 2009-01-17 18:48
(Received via mailing list)
Issue #744 has been updated by Roger Pack.


Here's my field report.
I have a small rails app on a linode slice.  After running it awhile I 
noticed that the system stopped responding--it was running out of RAM.

For some reason my rails app was growing by 8MB of RSS per request.  If 
anybody wants to look into this in more depth I'd be happy to give them 
access.

Updating to 187 trunk:  same result.
Updated to 187 + MBARI patches.  Problem gone.
Also the total RSS now starts at 59MB and [4 days later] has appeared to 
stabilize at 62MB.  Without patches it starts at 78MB, so a 25% RAM use 
reduction, which is very nice for those on slices.

I'd encourage the inclusion of these patches into trunk for the next 
patch release.

A few thoughts on compiler differences: would using the SET_STACK_END 
macros help?  Maybe it could revert to a method call [so force go down 
on stack] as a way to check the stack end?  Or just always add to 20 to 
what alloca returns or what not?

re: measure peak process size: sys-proctable might help.

Thanks much for your work.  It spared me hours of debugging and has 
improved my opinion of Ruby.  Three cheers :)
Where to send donation?

-=r
----------------------------------------
http://redmine.ruby-lang.org/issues/show/744
Posted by Brent Roman (brentr)
on 2009-01-19 09:17
(Received via mailing list)
Yuki and Roger,

I'm glad to hear these patches are working out well for you.

I have just posted yet another update to the MBARI7 patch at:
http://sites.google.com/site/brentsrubypatches/

The latest spin uses a separate stack for garbage collection passes,
eliminating the need to clear the GC stack after each pass.  It also
disables use of assembly code to read the stack pointer on x86 machines 
by
default, because this asm code sometimes caused gcc to emit pushes to 
the
stack between the reading the stack pointer and clearing the area above 
it.
Changed default STACK_WIPE_SITES value from 0x2370 to 0x4770.

This should all make the patches a little more portable and a bit faster 
in
their default configuration.
I don't plan to update MBARI7 again unless bugs are found. (We all know 
how
that goes :-)

- brent
Posted by Roger Pack (Guest)
on 2009-01-19 21:16
(Received via mailing list)
On Mon, Jan 19, 2009 at 1:15 AM, Brent Roman <brent@mbari.org> wrote:

>
> Yuki and Roger,
>
> I'm glad to hear these patches are working out well for you.
>
> I have just posted yet another update to the MBARI7 patch at:
> http://sites.google.com/site/brentsrubypatches/
>
One suggestion I might is that I like GC#exorcise, but it seems a little
ghosty to me--stack_clear or stack_clean might be more specific :)

Thanks again.
-=r
Posted by Nobuyoshi Nakada (nobu)
on 2009-01-19 21:43
(Received via mailing list)
Hi,

At Mon, 19 Jan 2009 17:15:02 +0900,
Brent Roman wrote in [ruby-core:21429]:
> I have just posted yet another update to the MBARI7 patch at:
> http://sites.google.com/site/brentsrubypatches/

Can't you make patches against the head of stable branch?

Current status:

MBARI1: already merged except for a new method.

MBARI2: backported stack-rewind at thread creation from old
  1.9, so I think this patch is no longer needed.

MBARI4: your patch makes Emacs c-mode.el confused.
  <http://www.atdot.net/sp/readonly/rb_eval_split> is
  more c-mode.el friendly.

MBARI5: already merged.

And could you separate new features from bug fixes?
Posted by Brent Roman (brentr)
on 2009-01-20 04:29
(Received via mailing list)
Roger,

The method name is intentionally "ghosty".  Matz himself referred to 
Ruby
being "troubled by
ghost references on the stack".  I thought that was an apt description
so I adopted it as well.

exorcise:
Function:
    transitive verb
Inflected Form(s):
    ex·or·cised also ex·or·cized; ex·or·cis·ing also ex·or·ciz·ing
1 a: to expel (an evil spirit) by adjuration b: to get rid of (something
troublesome, menacing, or oppressive)

Definition 1b seemed a perfect fit to me.  GC.exorcise rids the call 
stack
of troublesome ghost references.  I found the "evil spirit" connotation
amusing.
If others are bothered by the word, I'll be happy to change it.

- brent
Posted by Michal Suchanek (Guest)
on 2009-01-20 11:35
(Received via mailing list)
2009/1/19 Nobuyoshi Nakada <nobu@ruby-lang.org>:
>
> MBARI1: already merged except for a new method.
>
> MBARI2: backported stack-rewind at thread creation from old
>        1.9, so I think this patch is no longer needed.
>
> MBARI4: your patch makes Emacs c-mode.el confused.
>        <http://www.atdot.net/sp/readonly/rb_eval_split> is
>        more c-mode.el friendly.

Perhaps it should be the other way around?

That is the Emacs c-mode should be fixed to work with any code rather
than code modified to work around Emacs quirks.

Thanks

Michal
Posted by Nobuyoshi Nakada (nobu)
on 2009-01-21 04:17
(Received via mailing list)
Hi,

At Tue, 20 Jan 2009 19:33:50 +0900,
Michal Suchanek wrote in [ruby-core:21457]:
> That is the Emacs c-mode should be fixed to work with any code rather
> than code modified to work around Emacs quirks.

By implementing C preprocessor in emacs lisp?
Nice challenge. :)
Posted by Brent Roman (brentr)
on 2009-01-21 10:23
(Received via mailing list)
Hi Nobu,

Yes, I plan to rebase my patches against the HEAD after I move them to 
git.
This should also make it easier for me to separate features from fixes.
I'll be traveling next week, so expect something in 2-3 weeks.

Regarding the patches already applied to HEAD:

MBARI1:
  Can you explain why the Continuation#thread method is not acceptable?
  It does seem to be an intrinsic property of every Continuation and
  without this method, one must often maintain a separate (weak) 
reference
  to the thread on which each continuation operates.

MBARI2:
  I like push/pop_thread_anchor() better than my hack to hide other 
threads'
stacks.
  However, I don't see code in rb_thread_save_context() to copy *only* 
the
active
  stack for each thread.  This is a very important optimization.
  Are you doing this optimization some other way that I am overlooking?
  (To see how important it can be, try running my bogus1.rb and 
bogus2.rb
benchmarks)

MBARI4:
  I'll be happy to incorporate your clever eval_body() #define.
  It cleans the static inline function decls up nicely.  Does it also
restore the Emacs c-mode.el
  compatibility?  If this isn't what bothers emacs, please explain, and 
I'll
try to code around it.
  (Please understand that I haven't used emacs in any serious way for 15
years,
   when I discovered nedit :-)

MBARI5:
  Your version avoids the small cost of the alloca() when the only needs 
to
grow by a small amount.
  Very nice.

- brent
Posted by Michal Babej (Guest)
on 2009-01-21 15:25
(Received via mailing list)
On Wednesday 21 of January 2009 10:21:19 Brent Roman wrote:
> Yes, I plan to rebase my patches against the HEAD after I move them to git.
> This should also make it easier for me to separate features from fixes.
> I'll be traveling next week, so expect something in 2-3 weeks.
>
Hi,

very nice work.  Are you (or someone else) also planning on rebasing the
patches against 1.8.6 ? I've tried that myself but it didn't work very 
well
(ruby test/runner.rb fails 3 tests on 0x2770, and segfaults when i use 
0x4770,
on x86_64 machine)

I also tried building on ppc64, with 0x4770 it wont even build, 
segfaults on
launching miniruby:

gcc -O2 -g    -DRUBY_EXPORT -D_GNU_SOURCE=1  -L.  -rdynamic -Wl,-export-
dynamic   main.o  libruby-static.a -ldl -lcrypt -lm -o miniruby
./ext/purelib.rb:2: [BUG] Segmentation fault
ruby 1.8.7 (2009-1-18 MBARI 7/0x4770 on patchlevel 72) [powerpc64-linux]
make: *** [.rbconfig.time] Aborted

With 0x2770 it builds & runs the same test suite with 6 failures & 1 
error.
(Although i'm not sure how much they are actually ruby's fault)

Regards,
-- mb
Posted by Roger Pack (Guest)
on 2009-01-21 20:07
(Received via mailing list)
>
>
> The method name is intentionally "ghosty".  Matz himself referred to Ruby
> being "troubled by
> ghost references on the stack".  I thought that was an apt description
> so I adopted it as well.


I was just referring to the fact that exorcise seems to have little in
common with "garbage" and doesn't actually say what the function does 
[who
could guess from the name what it would do?] but I'm good either way.
Cheers.
-=r
Posted by Nobuyoshi Nakada (nobu)
on 2009-01-22 03:08
(Received via mailing list)
Hi,

At Wed, 21 Jan 2009 18:21:19 +0900,
Brent Roman wrote in [ruby-core:21483]:
> Regarding the patches already applied to HEAD:
> 
> MBARI1:
>   Can you explain why the Continuation#thread method is not acceptable?
>   It does seem to be an intrinsic property of every Continuation and
>   without this method, one must often maintain a separate (weak) reference
>   to the thread on which each continuation operates.

I don't say it's not acceptable.  It's not a part of the bug
fix, so should be another request.

> MBARI2:
>   I like push/pop_thread_anchor() better than my hack to hide other threads' stacks.
>   However, I don't see code in rb_thread_save_context() to copy *only* the active
>   stack for each thread.  This is a very important optimization. 
>   Are you doing this optimization some other way that I am overlooking?
>   (To see how important it can be, try running my bogus1.rb and bogus2.rb benchmarks)

What do you mean by "active stack"?  The stack region which is
actually used by thread?  The current code reduces those erea
by rewinding the stack.

> MBARI4:
>   I'll be happy to incorporate your clever eval_body() #define.
>   It cleans the static inline function decls up nicely.  Does it also restore the Emacs c-mode.el
>   compatibility?  If this isn't what bothers emacs, please explain, and I'll try to code around it.
>   (Please understand that I haven't used emacs in any serious way for 15 years, 
>    when I discovered nedit :-)  

Compatibility against older Emacs?  I can't test it now.
Posted by Brent Roman (brentr)
on 2009-01-22 12:58
(Received via mailing list)
Michal,

I've got no immediate plans to port these patches to 1.8.6.
Why is this important for you?  I (perhaps naively) thought 1.8.7
would run just about anything that 1.8.6 does.

The Ruby build seems to do special things to configure alloca() on ppc
machines.
In particular, I just noticed that Ruby does not use GNUC's
__builtin_alloca()
on PPC even if compiled with GNUC.
Instead, it substitutes a 'C' version that just calls malloc().
When forced  to use the __builtin_alloca() on PPC, the resulting 
interpreter
failed even if all my stack clearing was disabled.

There is some interesting history here.
Perhaps someone on this list tell me what in Ruby is
incompatible with the GNU's PPC version of __builtin_alloca().

Nevertheless, I've put up a very experimental patch at:
http://sites.google.com/site/brentsrubypatches/
The patch file is an attachment called:
ruby-1.8.7-p72-mbariPPC.patch
near the bottom of the page.

Apply the usual seven MBARI patches, then this PPC patch atop them all.

The PPC patch tries to work around alloca() strangeness by invoking the
_builtin_alloca()
directly for stack clearing whenever __GNUC__ is defined.
This seems to work well on the mac g4 laptop on which I tested.

The test suite ran 11m6s patched vs 11m3s unpatched.
Both versions flagged an Error in test_translit_option plus one other
failure.

I built each with CFLAGS=-O2 because -fno-stack-protector does not seem
to be supported by the apple version of gcc.

Let me know how it works for you on ppc64...
Please send (just me) the output of gcc -v if this patch fails.
You might also want to attach your config.h file

- brent
Posted by Michal Suchanek (Guest)
on 2009-01-22 13:50
(Received via mailing list)
2009/1/22 Brent Roman <brent@mbari.org>:
>
> Michal,
>
> I've got no immediate plans to port these patches to 1.8.6.
> Why is this important for you?  I (perhaps naively) thought 1.8.7
> would run just about anything that 1.8.6 does.

It's far from that simple.

1.8.7 backports a few 1.9 features that were "easy enough" to backport
breaking quite a bit of valid 1.8 code.

Sure the code can be updated easily in most cases but there is large
portion of code that hits the differences and cannot just run on 1.8.7
untouched.

Thanks

Michal
Posted by James Gray (bbazzarrakk)
on 2009-01-22 14:29
(Received via mailing list)
On Jan 22, 2009, at 5:55 AM, Brent Roman wrote:

> I've got no immediate plans to port these patches to 1.8.6.
> Why is this important for you?

I think a lot of Ruby users feel 1.8.7 was a mistake and try to avoid
it. It's just too massive a change for a simple point release.

The ruby-doc.org site has stayed with 1.8.6 and David Black has
recommended we pretend it doesn't exist, just two give two high
profile examples off the top of my head.

James Edward Gray II
Posted by Matthias Wächter (Guest)
on 2009-01-22 16:30
(Received via mailing list)
On 1/22/2009 2:27 PM, James Gray wrote:
> The ruby-doc.org site has stayed with 1.8.6 and David Black has
> recommended we pretend it doesn't exist, just two give two high profile
> examples off the top of my head.

Not just ruby-doc.org, but ruby-lang.org, too, at least for the
German version:

http://www.ruby-lang.org/de/downloads/

Btw: if nobody feels responsible for keeping the non-English pages
in sync, why not just drop them and link to the English ones instead
(or make someone feel responsible every time an update is required).

Cheers,
— Matthias
Posted by Michal Babej (Guest)
on 2009-01-22 17:10
(Received via mailing list)
On Thursday 22 of January 2009 12:55:08 Brent Roman wrote:
> Michal,
>
> I've got no immediate plans to port these patches to 1.8.6.
> Why is this important for you?  I (perhaps naively) thought 1.8.7
> would run just about anything that 1.8.6 does.
Well, it runs Rails, and i could fix my code for it, the biggest issue i 
have
with it, is that it randomly raises EOF and broken pipe exceptions when 
using
sockets.
>
> The Ruby build seems to do special things to configure alloca() on ppc
> machines.
> In particular, I just noticed that Ruby does not use GNUC's
> __builtin_alloca()
> on PPC even if compiled with GNUC.
Interesting. I couldn't find this code in the tree, so i guess i'm 
missing
something. Can you point me to a file+line ?
> The PPC patch tries to work around alloca() strangeness by invoking the
> _builtin_alloca()
> directly for stack clearing whenever __GNUC__ is defined.
> This seems to work well on the mac g4 laptop on which I tested.
It applied cleanly, but i had to change __ppc__ to __powerpc__ at 
rubysig.h:65
otherwise i ended up with 0x4770; and i had to leave __ppc__ at 
rubysig.h:211
because that asm instruction doesn't work on this machine. So i ended up 
with
0xA770 and  __sp = _builtin_alloca(0). This way it works the same as 
0x2770
minus mbari_ppc patch (as in, same errors on running test suite, and 
same
speed)
>
> The test suite ran 11m6s patched vs 11m3s unpatched.
> Both versions flagged an Error in test_translit_option plus one other
> failure.
I have 6 fails + 1 error, most in gdbm.
>
> I built each with CFLAGS=-O2 because -fno-stack-protector does not seem
> to be supported by the apple version of gcc.
I built with ./configure --enable-pthread CFLAGS="-O2 -g"
I figured -fno-stack-protector is not required since man page says about
options "This manual documents only one of these two forms, whichever 
one is
not the default."
>
> Let me know how it works for you on ppc64...
> Please send (just me) the output of gcc -v if this patch fails.
> You might also want to attach your config.h file
Sure.

-- mb
Posted by Brent Roman (brentr)
on 2009-01-22 21:29
(Received via mailing list)
Michal,

OK. Understood.

I do intend to move over to git in the coming weeks.
After that happens, rebasing should become easier.

I'll assume that you're willing to help test the patches on 1.8.6.
Realistically, we're looking at least one month out.

- brent
Posted by Brent Roman (brentr)
on 2009-01-22 21:38
(Received via mailing list)
Hi Nobu,

MBARI2:
  It was late when I compared the patches.  If you are actually
  rewinding the stack, that's even better than this patch's technique
  of linking directly to the base frame to skip around the parent
  threads stacks while leaving them in place.

MBARI4:
  I meant that I don't use emacs anymore, so I won't test against it.
  Even so, I wish you would explain what about this patch confused
  Emacs c-mode.el so I can avoid such constructs in future.
  I'm guessing it had something to do with the NOINLINE function
  declarations, but I'm still not sure.

Do you think you will merge these changes into the 1.8.6 release?

- brent
Posted by Michael King (Guest)
on 2009-01-22 22:43
(Received via mailing list)
I have applied the MBARI patches to 1.8.6 p287. About half the hunks had 
to
be applied by hand. In doing so I noticed 1 hunk that looked odd. The 
hunk
was in gc.c, in the function ruby_xmalloc, the odd line is:
   if ((malloc_increase+=size) > malloc_limit) {

was is intended to change the value of malloc_increase in the if 
statement?

When running test/runner.rb for the patched Ruby I am seeing the 
follwing
error and failure:

  1) Failure:
test_should_propagate_signaled(TestBeginEndBlock)
[./test/ruby/test_beginendblock.rb:82]:
<""> expected to be =~
</Interrupt$/>.

  2) Error:
test_object_id_collision(YAML_Unit_Tests):
RuntimeError: id collision in ordered map
    ./test/yaml/test_yaml.rb:1281:in `test_object_id_collision'

ruby -v:
ruby 1.8.6 (2009-1-18 MBARI 7/0x4770 on patchlevel 287) [i686-linux]

configure command:
CFLAGS="-O2 -fno-stack-protector" ./configure

On the plus side, the MBARI patched version completed the suite in 
417.29
seconds, MRI 416.14.

If it would help I generated a patch file between the stock 1.8.6 p287 
and
my MBARI patched version.

I am going to try a couple other stack clearing settings to see if that 
is
the issue. I will send updates if I discover something.

- Michael
Posted by Brent Roman (brentr)
on 2009-01-22 23:12
(Received via mailing list)
Michael,
I'm glad you are taking this on.  Thanks.  It must have been a tedious 
job.
See comments below --->


Michael King-2 wrote:
> --->  Yes, that is the intent.  You will see this in ruby_xrealloc() and
> </Interrupt$/>.
> --->  This one is worrisome.  I've never seen it.
>         I've never run tests against unpatched 1.8.6
>         Do either of these failures occur there? 
> 
> ruby -v:
> ruby 1.8.6 (2009-1-18 MBARI 7/0x4770 on patchlevel 287) [i686-linux]
> 
> 

Monitor the process size while running the test suite.
If MBARI7 is working properly, you should observe that the size of the 
main
test process
near the end of its run is about 30MB less than when running with 
unpatched
1.8.6.

- brent
Posted by Michael King (Guest)
on 2009-01-23 05:51
(Received via mailing list)
On Thu, Jan 22, 2009 at 4:09 PM, Brent Roman <brent@mbari.org> wrote:

>
> Michael,
> I'm glad you are taking this on.  Thanks.  It must have been a tedious job.
> See comments below --->
>

At my company we are running several copies of 2 Rails applications 
which we
have to restart on a regular basis because of the Ruby memory leak. This
patch has the capability of ending that. Tedious or not it is worth the
effort. The memory savings is an added bonus the may let us run more 
copies.

I am also working with a copy of 1.8.6 patched with a GC stats patch
discussed here: http://blog.pluron.com/2008/02/memory-profilin.html to 
aid
in performance testing and profiling our applications. And I have also 
been
investigating Phusion's Ruby Enterprise Edition, the changes to make the 
GC
copy-on-write friendly could give us a benefit.

Combining all these patches gets a little tricky in a couple places, if 
I
need to I will use a GC stats patched MRI for performance and profiling 
and
MBARI patched for production to save memory. The REE copy-on-write is 
just
an added bonus.


> > was is intended to change the value of malloc_increase in the if
> > statement?
> >
> > --->  Yes, that is the intent.  You will see this in ruby_xrealloc() and
> > ruby_xmalloc()
> >         Doing it this way saves a jump at the machine code level.


Ok, this was the only instance that I saw and I know I have done code 
like
this when I wasn't intending to, so I wanted to double check.

> >
> >         I've never run tests against unpatched 1.8.6
> >         Do either of these failures occur there?


I have done multiple runs now with my unpatched copy of Ruby 1.8.6 and I
have seen 0 failures and 0 errors.

I also did a run of 1.8.7 patched and unpatched and both had 0 error and 
0
failure.

I am doing this round of compiling and testing on Ubuntu 8.04 with gcc
4.2.4. I have tried compilations with the CFLAGS listed with the code 
and no
CFLAGS, doesn't change the outcome. our deployment environment is Ubuntu
6.06 so I will be running the tests there as well.

>
>
> Monitor the process size while running the test suite.
> If MBARI7 is working properly, you should observe that the size of the main
> test process
> near the end of its run is about 30MB less than when running with unpatched
> 1.8.6.
>
> - brent
>

This is interesting.... I will rerun the tests tomorrow, I'm done for
tonight.

Unpatched Ruby 1.8.6 capped out at 94M
Ruby 1.8.6 patched with MBARI and GC-stats capped out at 42M
Ruby 1.8.6 patched with MBARI capped out at 53M


- Michael
Posted by Nobuyoshi Nakada (nobu)
on 2009-01-23 06:19
(Received via mailing list)
Hi,

At Fri, 23 Jan 2009 05:36:05 +0900,
Brent Roman wrote in [ruby-core:21530]:
> MBARI4:
>   I meant that I don't use emacs anymore, so I won't test against it.
>   Even so, I wish you would explain what about this patch confused
>   Emacs c-mode.el so I can avoid such constructs in future.
>   I'm guessing it had something to do with the NOINLINE function
>   declarations, but I'm still not sure.

I seemed missing something.  It indents like:

NOINLINE(static VALUE
   eval_match2(self, node))
VALUE self;
NODE *node;

This isn't bad too much, but c-beginning-of-defun jumps to the
beginning of the line `NODE *node;' line, not eval_match2.

Also, since VC8 needs prototype declaration or definition
for noinline, your patch causes compile error with it.

# I won't object you even if you were propose to drop the
# support for VC8 or later :)

> Do you think you will merge these changes into the 1.8.6 release?

We'll have to merge them into the 1.8 head first.
Posted by Brent Roman (brentr)
on 2009-01-23 10:12
(Received via mailing list)
Michal,

I got an account on a ppc64 (Darwin) server with apple gcc 4.01.
After testing there I updated the PPC patch on my website.
You might want to give it another try.

The G5 server I'm on normally wants to compile in 32-bit mode.
With the latest PPC patch, compiling in 32-bit mode,
I could run the ruby test suite without any unexpected errors
with STACK_WIPE_SITES set to 0x9770. (fast and thorough)
(note that the codes changed, have a look at rubysig.h for details)
On the PowerPC, its always better to read the stack pointer via assembly
code, as __builtin_alloca(0) does not return it.

This latest PPC patch checks the __ppc64__ #define as well as __ppc__
(I mistakenly thought that __ppc__ would always be defined if __ppc64__ 
was)

When I force the compiler to produce a 64-bit binary, I find that
I must use the "safe" stack clearing method (which is now the default 
for
PowerPCs).
But, I *can* and do use assembler to read the stack pointer on my G5 
system.
I don't understand why this bit of asm fails on your ppc64 box.

Here's a typical gcc command from my 64bit build:
gcc  -m64 -pipe -fno-common    -DRUBY_EXPORT  -I. -I.  -D_XOPEN_SOURCE
-D_DARWIN_C_SOURCE   -c gc.c
file gc.o responds:  Mach-O 64-bit object ppc64

Unfortunately, ppc64 versions of the system libraries
are not installed on my test box, so I could not try the Ruby test
suite in 64-bit mode.  It did run my little benchmarks without trouble.

Again, let me know how you do there.
Also, *please* send me details on your configuration:
The config.h file from your build directory and
and the output of the gcc -v command.

- brent
Posted by Roger Pack (Guest)
on 2009-01-23 20:18
(Received via mailing list)
On Thu, Jan 22, 2009 at 9:08 AM, Michal Babej <calcifer@runbox.com> 
wrote:

> > I've got no immediate plans to port these patches to 1.8.6.
> > Why is this important for you?  I (perhaps naively) thought 1.8.7
> > would run just about anything that 1.8.6 does.
> Well, it runs Rails, and i could fix my code for it, the biggest issue i
> have
> with it, is that it randomly raises EOF and broken pipe exceptions when
> using
> sockets.


Maybe you could submit a bug report for it?
-=r
Posted by Roger Pack (Guest)
on 2009-01-24 21:37
(Received via mailing list)
On Mon, Jan 19, 2009 at 1:15 AM, Brent Roman <brent@mbari.org> wrote:

>
> Yuki and Roger,
>
> I'm glad to hear these patches are working out well for you.


I assume that with 1.9 this style patch isn't as necessary as threads 
don't
"share garbage" between each other--is that right? [each thread could 
still
clean itself, but at least they don't share garbage between threads--is 
that
right?]

Also I might recommend renaming GC#limit to GC#malloc_limit or
GC#alloc_limit since "limit" is somewhat ambiguous--is it a limit to the
number of free pointers it will use? malloc size? [that type of thing].
Thanks so much!
-=r
Posted by Michael King (Guest)
on 2009-01-26 03:21
(Received via mailing list)
On Thu, Jan 22, 2009 at 10:48 PM, Michael King <kingmt@gmail.com> wrote:

>
>
> Combining all these patches gets a little tricky in a couple places, if I
> need to I will use a GC stats patched MRI for performance and profiling and
> MBARI patched for production to save memory. The REE copy-on-write is just
> an added bonus.
>

Its starting to look like it is trickier than I originally thought...


>> > </Interrupt$/>.
> tonight.
>
> Unpatched Ruby 1.8.6 capped out at 94M
> Ruby 1.8.6 patched with MBARI and GC-stats capped out at 42M
> Ruby 1.8.6 patched with MBARI capped out at 53M
>
>
> - Michael
>
>
I recompiled Ruby 1.8.6 patched with MBARI and set the STACK_WIPE_SITES 
to
0x0000. Rerunning the test show the same failure, however the memory use 
was
54M. It would appear that I applied the patches wrong somehow...

- Michael
Posted by Michael King (Guest)
on 2009-01-26 21:34
(Received via mailing list)
Patching Ruby 1.8.6 p287 with MBARI patches 1 and 2 gave no warnings or
errors. MBARI patch 3 resulted in:

  1) Error:
test_object_id_collision(YAML_Unit_Tests):
RuntimeError: id collision in ordered map
    ./test/yaml/test_yaml.rb:1281:in `test_object_id_collision'

Digging though the changelog is appears that this test is in result of 
bug
8548 (
http://rubyforge.org/tracker/?func=detail&atid=169...) 
I
haven't looked at the code to get an idea of why this is failing.

I have tried compiles using:
CFLAGS="-O2 -fno-stack-protector -fomit-frame-pointer" configure

I compiled under Ubuntu 8.04 with gcc 4.2.3 and under Ubuntu 8.10 with 
gcc
4.3.1

So it would appear that something in MBARI patch 3 breaks 1.8.6

- Michael
Posted by Brent Roman (brentr)
on 2009-01-27 06:26
(Received via mailing list)
Roger,

With native threading, each thread gets its own private stack managed by 
the
OS.
So, yes, in Ruby 1.9, there should not be any ghost references from one
thread's stack creeping onto another's.  However, there is still the
potential for ghost object references within any given thread's stack.

GC.limit= determines that number of bytes that will be allocated (or
reallocated)
before a garbage collection pass is automatically triggered.  It 
defaults to
8e6 bytes.  I set it to 2e6 bytes on memory limited embedded targets.
The process size will "breathe" by this amount of bytes while Ruby runs.
Some might want to breathe deeper (and less often) if they've got bigger
lungs.
GC.limit is documented in ri as such.
It is the primary GC tunable.  If someone introduces a free list limit, 
they
can call it GC.freelist_limit.  I'm would not be confused by that.

Nonetheless, if a couple more folks complain, I'll change GC.limit to
something longer.
I expect that it will get renamed in any case if it makes it into the
thrunk.

- brent
Posted by Brent Roman (brentr)
on 2009-01-27 06:43
(Received via mailing list)
Michael,

MBARI3 is a factors the big rb_eval() into many smaller functions.
It's a big patch.
When I ported it from 1.6.8 to 1.8.7, it was by far the most tedious.
I put 1.6.8 and 1.8.7 side-by-side into xxdiff and worked through it
block-by-block.

You could try backing out the MBARI3 patch by replacing the
factored rb_eval() with the original one from 1.8.6.  All the rest
of the patches should work.  You'll just have slower context
switches due to the larger call stack, but the memory leaks caused by
ghost object references should still be eliminated by MBARI4 and MBARI7.
If that fixes the bug, you could start factoring half of rb_eval()
at a time (binary search) until you find its cause.

I'm not surprised that you still see the memory size improvement
with STACK_WIPE_SITES set to 0x0000 -- the factored rb_eval() is
more likely to overwrite ghost object references.

- brent
Posted by Brent Roman (brentr)
on 2009-01-30 07:09
(Received via mailing list)
I cannot seem to build a working Ruby 1.8.7 for the PPC64 under Leopard 
10.5
Has anyone else managed it?

It runs simple test scripts, but hangs on the test suite.
This is 1.8.7-p72 without any patches.

Here's how I'm building:

$ export ARCHFLAG="-arch ppc64"
$ CFLAGS="-O2 -m64 -fno-stack-protector"  configure  --prefix=$HOME
$ make
$ sudo make install
$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [powerpc-darwin9.6.0]

$ uname -a
Darwin G5-Client.shore.mbari.org 9.6.0 Darwin Kernel Version 9.6.0: Thu 
Nov
6 19:35:49 PST 2008; root:xnu-1228.9.57~1/RELEASE_PPC Power Macintosh

$ gcc -v
Using built-in specs.
Target: powerpc-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5465~16/src/configure 
--disable-checking
-enable-werror --prefix=/usr --mandir=/share/man
--enable-languages=c,objc,c++,obj-c++
--program-transform-name=/^[cg][^.-]*$/s/$/-4.0/
--with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib
--build=i686-apple-darwin9 --program-prefix= 
--host=powerpc-apple-darwin9
--target=powerpc-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5465)

$ cd test
$ time ruby runner.rb
HANGS here.....

Any ideas?

- brent
Posted by Brent Roman (brentr)
on 2009-02-01 12:18
(Received via mailing list)
Michal,

I finally managed to get a working Ruby 1.8.7 on ppc64 with and without
MBARI patches
under OSx 10.5 as follows:

export ARCH_FLAG="-arch ppc64"
CFLAGS="-O2 -g -m64 -fno-stack-protector"  configure

There may yet be a problem with my build configuration, but I don't 
think
the MBARI patches have anything to do with these failures.
Both patched and unpatched versions fail the same 6 tests.
Does anyone have a ppc64 (64-bit code) Ruby that does not fail these 
tests?

Note that I am using the PowerPC patch of 1/23/09.  For now, it must be
applied manually
after the MBARI7 patch.  I will integrate it after it has been tested on
x86_64.
(It should be called the 64-bit patch, as it is intended to fix x86_64 
as
well as ppc64)

http://sites.google.com/site/brentsrubypatches/Hom...

The PPC patch changes the meaning of the STACK_WIPE_SITES #define.
See rubysig.h for details.

One interesting observation is that my *unpatched* ppc64 ruby did not 
leak
when executing:

ruby -e "loop{@x=callcc{|c|c}}"

This could be because the ppc versions put ruby call arguments on the 
heap
rather than the 'C' stack.

- brent


results:

The patched ppc64 Ruby runs the test suite about 30 seconds quicker.
341 vs. 312 seconds
It used a bit less RAM, but the difference wasn't large:
114Mb vs. 106Mb peak VSIZE

I've included the details of the run below.
Can anyone verify whether or not these failures occur with unpatched
1.8.7-p72?

---------------

$ uname -a
Darwin G5-Client.shore.mbari.org 9.6.0 Darwin Kernel Version 9.6.0: Thu 
Nov
6 19:35:49 PST 2008; root:xnu-1228.9.57~1/RELEASE_PPC Power Mac

$ ruby -v
ruby 1.8.7 (2009-1-23 MBARI 7/0x5770 on patchlevel 72) 
[powerpc-darwin9.6.0]

$ file ~/bin/ruby
/u/brent/bin/ruby: Mach-O 64-bit executable ppc64

$ time ruby runner.rb
Loaded suite .
Started
........................................................................................................................................................................................................................................................................................................................................................F..........................Warning:
OpenSSL::PKCS7::PKCS7 is deprecated after Ruby 1.9; use OpenSSL::PKCS7
instead
.Warning: OpenSSL::PKCS7::PKCS7 is deprecated after Ruby 1.9; use
OpenSSL::PKCS7 instead
.Warning: OpenSSL::PKCS7::PKCS7 is deprecated after Ruby 1.9; use
OpenSSL::PKCS7 instead
Warning: OpenSSL::PKCS7::PKCS7 is deprecated after Ruby 1.9; use
OpenSSL::PKCS7 instead
Warning: OpenSSL::PKCS7::PKCS7 is deprecated after Ruby 1.9; use
OpenSSL::PKCS7 instead
..............................FF....F..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................../ruby/test_array.rb:536:
warning: given block not used
........................................................................F.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................E........................................................................................................................................................................................................................................................................................................................................................................................
Finished in 312.740792 seconds.

  1) Failure:
test_decode(OpenSSL::TestASN1) [./openssl/test_asn1.rb:195]:
<"\217\a\362~Q38\262\332\212H6N\244\022n\267\343I8\233\000\017|\361\265\024\335\353\202\237h\016\201\032bxV\300\343N\252\227w\320\263\241%\035s\366P\2147>dy\306\004\023\367\267\v\214\272\fY\331\326\016\346\216\003\310\323\ek+Y}is\361\263\034\313\f\006e\200V\274\302\222\201\314\260\350\210\321<G\317\024\260H\371+\002\350\210\216cHk\375\246\301\324c\363\324\203\225\330\221
\036"> expected but was
<"\246\317\022M\337\207
\202\022\374\221\214\375\365\307\231\030\375t\027\306Y.\022\302\207\377\224\234\370l\a\211\r\241\225\003\220d\323k\346[>\351\004M\v\347\336\240\365\265\242\226\324?\214eR\300p\003`!m#\217\e6\250\306G\324#\004`\273\240\376\357`\265\367\3658\275t?\342\274\335.\370\261\227\325)V\376\240Z\276\206`\2056b\305\022s\tY%\025~r\207\267\323\226\315\243L\203\023\306K">.

  2) Failure:
test_create_by_factory(OpenSSL::TestX509Extension)
[./openssl/test_x509ext.rb:41]:
<"0\022\006\003U\035\023\001\001\000\004\b0\006\001\001\000\002\001\002">
expected but was
<"0\022\006\003U\035\023\001\001\377\004\b0\006\001\001\377\002\001\002">.

  3) Failure:
test_new(OpenSSL::TestX509Extension) [./openssl/test_x509ext.rb:29]:
<true> expected but was
<false>.

  4) Failure:
test_attr(OpenSSL::TestX509Request) [./openssl/test_x509req.rb:94]:
<[["keyUsage", "Digital Signature, Key Encipherment", true],
 ["subjectAltName", "email:gotoyuzo@ruby-lang.org", false]]> expected 
but
was
<[["keyUsage", "Digital Signature, Key Encipherment", false],
 ["subjectAltName", "email:gotoyuzo@ruby-lang.org", false]]>.

  5) Failure:
test_should_propagate_signaled(TestBeginEndBlock)
[./ruby/test_beginendblock.rb:81]:
<""> expected to be =~
</Interrupt$/>.

  6) Error:
test_fd_passing(TestUNIXSocket):
SocketError: file descriptor was not passed (msg_controllen=20, 24 
expected)
    ./socket/test_unix.rb:19:in `recv_io'
    ./socket/test_unix.rb:19:in `test_fd_passing'

1976 tests, 1668917 assertions, 5 failures, 1 errors

real    5m23.872s
user    3m54.124s
sys     0m27.542s
Posted by Brent Roman (brentr)
on 2009-02-10 08:06
(Received via mailing list)
I just updated the MBARI7 patch for Ruby 1.8.7-p72 at:

http://sites.google.com/site/brentsrubypatches/

I've tested this February 9, 2009 compiling with GNUC targeting the
following CPU types:

ppc, ppc64, arm, i386, and x86_64

For each CPU, no more errors test suite errors occurred patched than
unpatched.

I'd welcome any feedback on ppc or x86_64 in particular.
[If you run into trouble, please include the output of gcc -v and uname 
-a]

I'm working on github release next, including patches for 1.8.6
Is 1.8.6-p287 (patchlevel 287) the specific version I should target?

- brent
Posted by Michael King (Guest)
on 2009-02-12 22:25
(Received via mailing list)
I was attempting to backport your MBARI patches to 1.8.6 p287, which is 
what
my company is currently using in production.

When I was backporting all 7 patches I was seeing these errors:
  1) Failure:
test_should_propagate_signaled(TestBeginEndBlock)
[./test/ruby/test_beginendblock.rb:82]:
<""> expected to be =~
</Interrupt$/>.

  2) Error:
test_object_id_collision(YAML_Unit_Tests):
RuntimeError: id collision in ordered map
    ./test/yaml/test_yaml.rb:1281:in `test_object_id_collision'

I started hand applying a few hunks at a time from the patches and then
running make check. If all tests passed then I would move on to the next 
few
hunks. I applied all of patches 1, 2, and 3 with all test passing.
Previously the error would start showing up on patch three but it looks 
like
maybe a hunk was getting applied to the wrong area. The failure started
showing up on the hunks from patch 4 that were moving the functions out 
of
rb_eval. I had to apply about 10 hunks just to compile to I can't really
narrow it down more than that.

Unfortunately this is at the limits of my understanding so I can't 
really
help you fix it.

- Michael
Posted by Roger Pack (Guest)
on 2009-02-14 08:19
(Received via mailing list)
>> dynamic   main.o  libruby-static.a -ldl -lcrypt -lm -o miniruby
>> ./ext/purelib.rb:2: [BUG] Segmentation fault
>> ruby 1.8.7 (2009-1-18 MBARI 7/0x4770 on patchlevel 72) [powerpc64-linux]
>> make: *** [.rbconfig.time] Aborted


Here's an interesting one.
I built 1.8.7p72 with the mbari patches.  Works fine on the computer
where it was built.  If I run it on another computer on the same
network, same OS [slightly different cpu], it sometimes [depending on
the moon phase] results in:


[09:1721][rdp@ilab2:~/tmp_src]$ ruby driver.rb  -pbitTorrent
--name=yanc_and_bittorrent_100_take2
/home/rdp/i386/lib/ruby/site_ruby/1.8/rubygems/specification.rb:48:
[BUG] terminated node (0xb7c3505c)
ruby 1.8.7 (2009-1-18 MBARI 7/0x4770 on patchlevel 72) [i686-linux]

Aborted

[13:2228][rdp@ilab1:~]$ gcc -v
uReading specs from 
/home/rdp/installs/lib/gcc/i686-pc-linux-gnu/3.4.6/specs
Configured with: ./configure --prefix=/home/rdp/installs
Thread model: posix
gcc version 3.4.6
[13:2228][rdp@ilab1:~]$ uname -a
Linux ilab1 2.6.24-23-generic #1 SMP Mon Jan 26 00:13:11 UTC 2009 i686 
GNU/Linux

make test-all clears except a few zlib errors [it isn't installed] and

  4) Failure:
test_should_propagate_signaled(TestBeginEndBlock)
[./test/ruby/test_beginendblock.rb:81]:
<""> expected to be =~
</Interrupt$/>.

any thoughts?
Thanks!
-=r
Posted by Brent Roman (brentr)
on 2009-02-14 10:20
(Received via mailing list)
Michael,

I have just posted the MBARI patches on GitHub at:

http://github.com/brentr/matzruby/tree/v1_8_7_72-mbari

I believe you can pull from it via this git URL:

git://github.com/brentr/matzruby.git

A few points regarding your difficulties with the porting the MBARI 
patches
to 1.8.6:

1)  Your report helped me identify the cause of the test failure in
TestBeginEndBlock.
     There was always a bit of a race condition in handling the
ruby/suicide.rb test case
      (Would CHECK_INTS get called before the interpreter terminated?)
     The big rb_eval() refactoring of MBARI4 moved the point at which
CHECK_INTS is invoked
     and made that race much more likely.   Even so, it always worked
sometimes :-)
     My fix is to invoke CHECK_INTS just after sending a signal to any
process
     It's in the MBARI7 patch dated 2/13/09 on github.  (Not yet on my
website)

2)  I never see the YAML failure here.  That may be a problem unique to
1.8.6
     or it may be an error in porting the patches.

3)  This is the only failure I see and you don't list it:

  1) Failure:
test_client_session(OpenSSL::TestSSL)
    [./openssl/test_ssl.rb:426:in `test_client_session'
     ./openssl/test_ssl.rb:417:in `times'
     ./openssl/test_ssl.rb:417:in `test_client_session'
     ./openssl/test_ssl.rb:129:in `call'
     ./openssl/test_ssl.rb:129:in `start_server'
     ./openssl/test_ssl.rb:416:in `test_client_session']:
<false> is not true.

Any clues?  I'm guessing that I'm missing a supporting library.

3)  I'm working on a version of these patches for 1.8.6-p287 right now.
     Stay tuned...

Git seems to be behaving.  If it's laughing at me, it is doing so behind 
my
back.
Please do try to build from my git repo and let me know how that goes.

- brent
Posted by Brent Roman (brentr)
on 2009-02-14 10:38
(Received via mailing list)
Roger,

Ummm... Moving binaries between different CPUs doesn't always work.
How, exactly, did the host and target machines differ?

Anyway...
The version I just pushed to github at:

http://github.com/brentr/matzruby/tree/v1_8_7_72-mbari

passed the Ruby test suite earlier this week on ppc, ppc64, arm, i386, 
and
x86_64 CPUs.

As mentioned in my previous post, this version should fix the failing
TestBeginEndBlock test.
Could you try building from my github repo to verify this and
let me know if your bittorent test still fails?

I'm still new to the git stuff.
Please let me know whether I've set up my repository correctly.

- brent
Posted by Stephen Bannasch (Guest)
on 2009-02-14 21:02
(Received via mailing list)
At 6:18 PM +0900 2/14/09, Brent Roman wrote:
>Michael,
>
>I have just posted the MBARI patches on GitHub at:
>
>http://github.com/brentr/matzruby/tree/v1_8_7_72-mbari
>
>I believe you can pull from it via this git URL:
>
>git://github.com/brentr/matzruby.git

>Please do try to build from my git repo and let me know how that goes.

Brent, thanks for putting them on github. This makes it very easy to
follow your work now.

Building it worked fine.

When I built the latest v1_8_7_72-mbari branch:

commit 6b169f9546ad52cb0edb9a19d48110e08f86a296
Author: Brent Roman <brent@mbari.org>
Date:   Fri Feb 13 23:06:56 2009 -0800

and ran the latest full suite of rubyspecs on it I got 8 failures and
14 errors.

See the full output from mspec here: http://gist.github.com/64442

It doesn't look like your patches have much to do with those errors
... but I'm not sure. I haven't worked with 1.8.7 much. The trunk
version of 1.8.7 doesn't build and install correctly on my system.

Here's how I built and tested your branch:

I already have the matzruby git repo cloned so I added the mbari repo
as another remote, fetched and checked out the remote branch
v1_8_7_72-mbari into my working dir.

$ cd ruby/src/matzruby.git/
$ git remote add mbari git://github.com/brentr/matzruby.git
$ git remote -v
mbari  git://github.com/brentr/matzruby.git
origin  git://github.com/rubyspec/matzruby.git

$ git pull
$ git fetch mbari
$ git co -b v1_8_7_72-mbari mbari/v1_8_7_72-mbari

Built it and made sure it can print it's version:

$ autoconf && ./configure
--prefix=/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari
$ make clean && make && make install
$ /Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/bin/ruby -v
ruby 1.8.7 (2009-2-13 MBARI 7/0x8770 on patchlevel 72) 
[i686-darwin9.6.0]

Here's a summary of the files that have changed between Brent's mbari
branch and the tag v1_8_7_72:

$ git diff --stat v1_8_7_72
  ChangeLog        |  197 +++++
  common.mk        |    2 +-
  eval.c           | 2354 
++++++++++++++++++++++++++++++++----------------------
  gc.c             |  589 ++++++++-------
  intern.h         |    2 +-
  missing/alloca.c |    8 +-
  node.h           |    6 +-
  rubysig.h        |  212 +++++-
  signal.c         |    3 +-
  version.h        |   17 +-
  10 files changed, 2123 insertions(+), 1267 deletions(-)

A closer look at the changes in my favorite diff viewer (GitX):

$ git diff v1_8_7_72 | gitx

Run the latest rubyspecs against it

$ cd /Users/stephen/dev/ruby/src/rubyspec.git
$ which mspec
/Users/stephen/dev/ruby/src/mspec.git/bin/mspec

$ git pull
Already up-to-date.

Running just the core rubyspec tests:

$ mspec -t /Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/bin/ruby 
core
ruby 1.8.7 (2009-2-13 MBARI 7/0x8770 on patchlevel 72) 
[i686-darwin9.6.0]
..EE..EE.....................................................................E........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F.......................................................................................................................................................................................................................................................
.........................................................................................................................................

1)
ARGF.bytes returns an Enumerable::Enumerator when passed no block ERROR
NoMethodError: undefined method `be_an_instance_of' for 
#<Object:0x3409b8>
./core/argf/shared/each_byte.rb:41
./core/argf/shared/each_byte.rb:39
./core/argf/bytes_spec.rb:2:in `all?'
./core/argf/bytes_spec.rb:5
./core/argf/bytes_spec.rb:4

2)
ARGF.chars returns an Enumerable::Enumerator when passed no block ERROR
NoMethodError: undefined method `be_an_instance_of' for 
#<Object:0x33acd4>
./core/argf/shared/each_char.rb:32
./core/argf/shared/each_char.rb:30
./core/argf/chars_spec.rb:2:in `all?'
./core/argf/chars_spec.rb:5
./core/argf/chars_spec.rb:4

3)
ARGF.each_byte returns an Enumerable::Enumerator when passed no block 
ERROR
NoMethodError: undefined method `be_an_instance_of' for 
#<Object:0x330d60>
./core/argf/shared/each_byte.rb:41
./core/argf/shared/each_byte.rb:39
./core/argf/each_byte_spec.rb:2:in `all?'
./core/argf/each_byte_spec.rb:4

4)
ARGF.each_char returns an Enumerable::Enumerator when passed no block 
ERROR
NoMethodError: undefined method `be_an_instance_of' for 
#<Object:0x32ddcc>
./core/argf/shared/each_char.rb:32
./core/argf/shared/each_char.rb:30
./core/argf/each_char_spec.rb:2:in `all?'
./core/argf/each_char_spec.rb:5
./core/argf/each_char_spec.rb:4

5)
An exception occurred during: before :all ERROR
LoadError:
dlopen(/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle,
9): Symbol not found: _rb_DLStdcallCallbackProcs
   Referenced from:
/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle
   Expected in: flat namespace
  -
/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle
/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle
./core/array/pack_spec.rb:2363
./core/array/pack_spec.rb:2272:in `all?'
./core/array/pack_spec.rb:2426

6)
An exception occurred during: before :all ERROR
LoadError:
dlopen(/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle,
9): Symbol not found: _rb_DLStdcallCallbackProcs
   Referenced from:
/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle
   Expected in: flat namespace
  -
/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle
/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle
./core/array/pack_spec.rb:2363
./core/array/pack_spec.rb:2363:in `all?'
./core/array/pack_spec.rb:2475

7)
Module#autoload shares the autoload request across dup'ed copies of
modules FAILED
Expected NameError
but got TypeError (wrong autoload table:
#<Proc:0x005358a4@./core/module/autoload_spec.rb:252>)
./core/module/autoload_spec.rb:252
./core/module/autoload_spec.rb:238:in `all?'
./core/module/autoload_spec.rb:15

Finished in 13.107626 seconds

1127 files, 5697 examples, 19595 expectations, 1 failure, 6 errors
Posted by Brent Roman (brentr)
on 2009-02-15 05:45
(Received via mailing list)
Stephan,

My acceptance test is that the test suite delivered with ruby produce no 
new
failures running patched vs. unpatched.

I really don't want to get into mspec.  However,
just a cursory glance at the errors it output leads
me to believe it was testing against ruby 1.8.6 specs.
Have you tried this same mspec against unpatched 1.8.7-p72?

I built and tested with the following:

$ cd ruby
$  git clone git://github.com/brentr/matzruby.git mri.git
$ cd mri.git
$ git checkout -b v1_8_7_72-mbari origin/v1_8_7_72-mbari
$ autoconf
$ CFLAGS="-O2 -fno-stack-protector" configure --prefix=$HOME/ruby/stage
$ make -j3
$ make install
$ cd test
$ time   ~/ruby/stage/bin/ruby runner.rb

Output:
  1) Failure:
test_client_session(OpenSSL::TestSSL)
    [./openssl/test_ssl.rb:426:in `test_client_session'
     ./openssl/test_ssl.rb:417:in `times'
     ./openssl/test_ssl.rb:417:in `test_client_session'
     ./openssl/test_ssl.rb:129:in `call'
     ./openssl/test_ssl.rb:129:in `start_server'
     ./openssl/test_ssl.rb:416:in `test_client_session']:
<false> is not true.

1985 tests, 1345472 assertions, 1 failures, 0 errors

real  4m10.124s
user  1m38.430s
sys  0m4.160s

This is the same single failure I've always seen with every 1.8.7-p72 
Ruby
on my machine.  I'm still hoping someone might tell me what
might cause this.

- brent
Posted by Brent Roman (brentr)
on 2009-02-17 21:19
(Received via mailing list)
I just pushed out a version of the MBARI patches for Ruby 1.8.6-p287 
onto:

git://github.com/brentr/matzruby.git

in the branch v1_8_6_287-mbari

I'm down to just one test failure:

  1) Error:
test_object_id_collision(YAML_Unit_Tests):
RuntimeError: id collision in ordered map
    ./test/yaml/test_yaml.rb:1281:in `test_object_id_collision'

However, I'm not motivated to investigate much further because this
test fails on every version of 1.8.6-p287 I build from source on
four different linux boxes with varying versions of gcc including those
built directly from the archive:

ftp://ruby-lang.org/pub/ruby/ruby-1.8.6-p287.tar.bz2

On the other hand, I recall Michael King claimed to have gotten 
1.8.6-p287
to complete
the test suite without any errors whatsoever.

I'm building like this:
$  CFLAGS="-O2 -fno-stack-protector" configure --prefix=$HOME/ruby/test
$ make -j3 && make install

(I've tried all sorts of CFLAGS, so please no comments about those)
and running the yaml test with:

$  cd test
$  ~/ruby/test/bin/ruby  runner.rb  yaml
Loaded suite yaml
Started
.........E................................................
Finished in 0.409985 seconds.

 1) Error:
test_object_id_collision(YAML_Unit_Tests):
RuntimeError: id collision in ordered map
   ./yaml/test_yaml.rb:1281:in `test_object_id_collision'

58 tests, 206 assertions, 0 failures, 1 errors

If you try that on your Ruby 1.8.6-p287 built from source,
do you see the error?

Is there another way to build it from source that avoids the error?

Please respond with details on your build procedure, environment etc.
only if you've built Ruby 1.8.6-p287 from source and do not see the 
above
error.
If you've got it working, I'd sure like to how exactly how!

- brent

P.S.  Note that this issue was supposedly fixed by a patch applied on
6/15/08.
That patch appears to be present in 1.8.6-p287.
See http://redmine.ruby-lang.org/issues/show/411
If no one responds, I'll add this report to redmine, but for now, I'm
assuming
I've got a problem with my build procedure.
Posted by Michael King (Guest)
on 2009-02-18 18:14
(Received via mailing list)
I will try and take a look at this soon, hopefully before the end of the
week...

- Michael
Posted by Roger Pack (Guest)
on 2009-02-18 18:53
(Received via mailing list)
> I'm building like this:
> $  CFLAGS="-O2 -fno-stack-protector" configure --prefix=$HOME/ruby/test
> $ make -j3 && make install

Question:
does the -fno-stack-protector stuff make much of a speed difference?
Thanks!
-=r
Posted by Brent Roman (brentr)
on 2009-02-18 20:40
(Received via mailing list)
I used to think it was more, but in fact, -fno-stack-protector probably 
saves
less than 1% of execution time.  The stack clearing of the MBARI patches
invokes alloca often, so any extra overhead there will be felt more than
without stack clearing.

Note also that the stack-protector stuff was added to gcc to
detect malicious attempts to hack the stack in 'C' code that processes
networking data.  Ruby's stack cannot be hacked that way as all
array indecies are checked explicitly.  So, in Ruby, gcc's
stack-protector is sort of like wearing a belt and suspenders.

- brent
Posted by Michal Babej (Guest)
on 2009-02-19 18:52
(Received via mailing list)
On Saturday 14 of February 2009 08:17:22 Roger Pack wrote:
> [BUG] terminated node (0xb7c3505c)
> ruby 1.8.7 (2009-1-18 MBARI 7/0x4770 on patchlevel 72) [i686-linux]
>
> Aborted


The moon has shifted phases since January :) Seriously though, I've also 
found
Jan 18 version to segfault/abort randomly on my x86_64, however latest 
from
git (Feb 15) is working very nice so far - only 2 failures
test_client_session(OpenSSL) and test_readline. Could you try the latest 
and
report the results ?

  -- Michal
Posted by Roger Pack (Guest)
on 2009-02-20 23:15
(Received via mailing list)
> The moon has shifted phases since January :) Seriously though, I've also found
> Jan 18 version to segfault/abort randomly on my x86_64, however latest from
> git (Feb 15) is working very nice so far - only 2 failures
> test_client_session(OpenSSL) and test_readline. Could you try the latest and
> report the results ?

The moon is in a good phase. LOL.
It does seem more stable using the latest version.  I will report back
if the errors occur more.

If it does perhaps it has something to do with the same reason that GC
refuses if the yy_parse stack is on the stack [?] whatever that means,
anyway.

Thanks!
-=r
Posted by Michal Babej (Guest)
on 2009-02-21 10:57
Attachment: array_test.rb (1,29 KB)
(Received via mailing list)
Hi,

On Friday 20 of February 2009 23:14:13 Roger Pack wrote:
> The moon is in a good phase. LOL.
> It does seem more stable using the latest version.  I will report back
> if the errors occur more.
Turns out, good moon phases end right after writing a positive feedback 
emails
:) Feb 15 ruby-mbari runs the full test suite with same errors as 
unpatched
ruby on my machine, but it still segfaults on some certain tests.  E.g.
running "test/runner.rb net" in row quickly results in segfault.

P.S. i wrote a small script to see how ruby works with fork's 
copy-on-write
mechanism. It allocates an array of 1 mil float, then forks, and in the 
child
starts rewriting the array in batches (batch size is ARGV[0]). It gives
completely different results for mbari ruby, and i'd be glad if someone 
could
explain why :)

-- Michal
Posted by Brent Roman (brentr)
on 2009-02-21 18:15
(Received via mailing list)
Roger,

I was unaware of the interaction between YYSTACK_USE_ALLOCA and the 
MBARI
patches.
Does anyone have a test case I can debug?

- brent
Posted by Brent Roman (brentr)
on 2009-02-21 20:36
(Received via mailing list)
Michal,

What you are seeing in unpatched ruby is memory leaking between your
"passes" in array_test.rb.
This is just another manifestation of the same leak that occurs with
unpatched ruby and the script:

loop do
  @x=callcc{|c|c}
end

(see leakcheck.rb in the MBARIpatches tarball and the innocent redmine
 entry at the top of this endless thread)

The uninitialized stack for iteration n+1 contains old (dead) object
references from
iteration n.  The GC strings them all together into a linked list of 
object
references.
It therfore cannot collect any of them until the whole loop terminates.

The stack clearing patches break this bogus chain of stale object 
reference
links and
thus allow the GC to properly identify refs from previous iterations of 
the
loop as
being "dead".

I pushed an update to the patches onto github last night that seems to
improve
stability of the MBARI patches on the x86_64 platform.  Others platforms
seem to be working
great, but the x86_64 still has exhibits vexing, very occasional 
segfaults.

I'll be working on it through this rainy weekend.  If I can see it, I'm
confident I can (eventually)
fix it.

- brent
Posted by Roger Pack (Guest)
on 2009-02-21 23:02
(Received via mailing list)
> I pushed an update to the patches onto github last night that seems to
> improve
> stability of the MBARI patches on the x86_64 platform.  Others platforms
> seem to be working
> great, but the x86_64 still has exhibits vexing, very occasional segfaults.
>
> I'll be working on it through this rainy weekend.  If I can see it, I'm
> confident I can (eventually)
> fix it.

I wish I had an easy to reproduce script for it but don't [will keep
my eye out for it, though].
As a note, mine was having problems on 32-bit
ruby 1.8.7 (2009-2-13 MBARI 7/0x8770 on patchlevel 72) [i686-linux]
but that was a slightly older version. I'll update to the latest.
Thanks!
-=r
Posted by Aman Gupta (Guest)
on 2009-03-10 21:50
(Received via mailing list)
I am continuing to see random segfaults on x86_64, especially with god
(http://god.rubyforge.org/), which makes liberal use of threads and
forking.

*** glibc detected *** free(): invalid pointer: 0x00000000012b7724 ***
*** glibc detected *** free(): invalid pointer: 0x00000000012b7724 ***

./gems/local/gems/god-0.7.8/bin/../lib/god/event_handler.rb:35: [BUG]
Segmentation fault
/custom/lib/ruby/1.8/net/smtp.rb:462: [BUG] Segmentation fault
/custom/lib/ruby/1.8/timeout.rb:92: [BUG] Segmentation fault
./gems/local/gems/god-0.7.12/bin/../lib/god/process.rb:193: [BUG]
Segmentation fault
/custom/lib/ruby/1.8/net/http.rb:439: [BUG] Segmentation fault

#0  0x00007f7d5efa307b in raise () from /lib/libc.so.6
#1  0x00007f7d5efa484e in abort () from /lib/libc.so.6
#2  0x00007f7d5f596410 in rb_bug (fmt=0x7f7d5f62c195 "Segmentation
fault") at error.c:213
#3  0x00007f7d5f5fd2af in sigsegv (sig=<value optimized out>) at 
signal.c:634
#4  0x00007f7d5efa3110 in killpg () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()

So far I've been unable to come up with a reproducible test case, but
I've managed to narrow the problem down to mbari2. Vanilla ruby 1.8.7
does not have this issue, whereas 1.8.7+mbari2 will segfault randomly
every few days.

Perhaps it is worth backporting thread anchors from ruby 1.8 HEAD?

  Aman
Posted by Brent Roman (brentr)
on 2009-03-11 04:49
(Received via mailing list)
Aman,

When I merge the MBARI patches with 1.8 HEAD, I also plan to replace the
stack optimization introduced in the MBARI2 patch with the (better) 
thread
anchors already in HEAD (which, I think, were originally backported from
1.9).   This should happen in the next week or so.  In the meantime, you
might want to try this patch against the current (MBARI 8B) patches on 
1.8.6
or 1.8.7:

http://www.nabble.com/file/p22385077/rmMBARI2.patch

It just disables the MBARI2 patch and leaves the rest intact.
It would be very helpful to find out whether or not that alone 
eliminates
God's segfaults.

Will you give this a try?
If it works, I'll do an 8C patch that to replace the stack splicing of
MBARI2 with stack anchors on 1.8.7-p72 and perhaps 1.8.6-p287 as well.

- brent
Posted by Roger Pack (Guest)
on 2009-03-11 12:11
(Received via mailing list)
> does not have this issue, whereas 1.8.7+mbari2 will segfault randomly
> every few days.

Perhaps valgrind would help?
-=r
Posted by Aman Gupta (Guest)
on 2009-03-20 06:47
(Received via mailing list)
I can confirm that removing mbari2 fixes the issue. I was able to get
a better stack trace, but am still unsure about the root cause and
unable to reproduce it consistently. It seems like a double free is
occurring for some reason and that eventually causes the segfault.

*** glibc detected *** free(): invalid pointer: 0x0000000002312734 ***
*** glibc detected *** free(): invalid pointer: 0x0000000002312734 ***

Core was generated by `ruby gems/local/gems/god-0.7.8/bin/god'.
Program terminated with signal 6, Aborted.
#0  0x00007fc3d0cbb07b in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007fc3d0cbb07b in raise () from /lib/libc.so.6
#1  0x00007fc3d0cbc84e in abort () from /lib/libc.so.6
#2  0x00007fc3d0cf15f9 in __fsetlocking () from /lib/libc.so.6
#3  0x00007fc3d0cf8163 in mallopt () from /lib/libc.so.6
#4  0x00007fc3d0cf81ee in free () from /lib/libc.so.6
#5  0x00007fc3d134b4b8 in time_free (tobj=0x2312734) at time.c:43
#6  0x00007fc3d12dfed9 in rb_gc_call_finalizer_at_exit () at gc.c:2324
#7  0x00007fc3d12b6fd9 in ruby_finalize_1 () at eval.c:1561
#8  0x00007fc3d12b7146 in ruby_cleanup (ex=0) at eval.c:1598
#9  0x00007fc3d12b733c in ruby_stop (ex=0) at eval.c:1653
#10 0x00007fc3d1317306 in rb_f_fork (obj=140478970802560) at 
process.c:1343
#11 0x00007fc3d12c425a in call_cfunc (func=0x7fc3d1317286 <rb_f_fork>,
recv=140478970802560, len=0, argc=0, argv=0x0) at eval.c:5759
#12 0x00007fc3d12c3535 in rb_call0 (klass=140479007795520,
recv=140478970802560, id=5321, oid=5321, argc=0, argv=0x0,
body=0x7fc3d159dae8, flags=2) at eval.c:5911
#13 0x00007fc3d12c4d84 in rb_call (klass=140479007795520,
recv=140478970802560, mid=5321, argc=0, argv=0x0, scope=1,
self=140478970802560) at eval.c:6158
#14 0x00007fc3d12bc82b in rb_eval (self=140478970802560,
n=0x7fc3cfde5f18) at eval.c:3508
#15 0x00007fc3d12bb0d2 in rb_eval (self=140478970802560,
n=0x7fc3cfde5f40) at eval.c:3223
#16 0x00007fc3d12bd827 in rb_eval (self=140478970802560,
n=0x7fc3cfde5d60) at eval.c:3678
#17 0x00007fc3d12bb8dc in rb_eval (self=140478970802560,
n=0x7fc3cfde57c0) at eval.c:3357
#18 0x00007fc3d12ba068 in rb_eval (self=140478970802560,
n=0x7fc3cfde6878) at eval.c:2962
#19 0x00007fc3d12c3dfc in rb_call0 (klass=140478982783720,
recv=140478970802560, id=38449, oid=38449, argc=0,
argv=0x7fffd95ab848, body=0x7fc3cfde6878, flags=0) at eval.c:6062
#20 0x00007fc3d12c4d84 in rb_call (klass=140478982783720,
recv=140478970802560, mid=38449, argc=1, argv=0x7fffd95ab840, scope=0,
self=140478970803320) at eval.c:6158
#21 0x00007fc3d12bc4f1 in rb_eval (self=140478970803320,
n=0x7fc3cfe124a0) at eval.c:3493
#22 0x00007fc3d12c3dfc in rb_call0 (klass=140478982957680,
recv=140478970803320, id=38449, oid=38449, argc=0,
argv=0x7fffd95ac3f0, body=0x7fc3cfe124a0, flags=0) at eval.c:6062
#23 0x00007fc3d12c4d84 in rb_call (klass=140478982957680,
recv=140478970803320, mid=38449, argc=2, argv=0x7fffd95ac3e0, scope=1,
self=140478970803320) at eval.c:6158
#24 0x00007fc3d12bc82b in rb_eval (self=140478970803320,
n=0x7fc3cfe135d0) at eval.c:3508
#25 0x00007fc3d12ba068 in rb_eval (self=140478970803320,
n=0x7fc3cfe12900) at eval.c:2962
#26 0x00007fc3d12c3dfc in rb_call0 (klass=140478982957680,
recv=140478970803320, id=24553, oid=24553, argc=0,
argv=0x7fffd95ad6f8, body=0x7fc3cfe12900, flags=0) at eval.c:6062
---Type <return> to continue, or q <return> to quit---
#27 0x00007fc3d12c4d84 in rb_call (klass=140478982957680,
recv=140478970803320, mid=24553, argc=1, argv=0x7fffd95ad6f0, scope=0,
self=140478970803320) at eval.c:6158
#28 0x00007fc3d12bc4f1 in rb_eval (self=140478970803320,
n=0x7fc3d0b50068) at eval.c:3493
#29 0x00007fc3d12ba068 in rb_eval (self=140478970803320,
n=0x7fc3d0b42648) at eval.c:2962
#30 0x00007fc3d12c3dfc in rb_call0 (klass=140478996330640,
recv=140478970803320, id=24537, oid=24537, argc=0,
argv=0x7fffd95aea48, body=0x7fc3d0b42648, flags=0) at eval.c:6062
#31 0x00007fc3d12c4d84 in rb_call (klass=140478996330640,
recv=140478970803320, mid=24537, argc=1, argv=0x7fffd95aea40, scope=0,
self=140478970803320) at eval.c:6158
#32 0x00007fc3d12bc4f1 in rb_eval (self=140478970803320,
n=0x7fc3d0af5500) at eval.c:3493
#33 0x00007fc3d12bb651 in rb_eval (self=140478970803320,
n=0x7fc3d0b0bbc0) at eval.c:3309
#34 0x00007fc3d12c3dfc in rb_call0 (klass=140478996330640,
recv=140478970803320, id=26833, oid=26833, argc=0,
argv=0x7fffd95afd78, body=0x7fc3d0b0bbc0, flags=0) at eval.c:6062
#35 0x00007fc3d12c4d84 in rb_call (klass=140478996330640,
recv=140478970803320, mid=26833, argc=1, argv=0x7fffd95afd70, scope=0,
self=140478970802960) at eval.c:6158
#36 0x00007fc3d12bc4f1 in rb_eval (self=140478970802960,
n=0x7fc3cfe1cd88) at eval.c:3493
#37 0x00007fc3d12ba068 in rb_eval (self=140478970802960,
n=0x7fc3cfe1ca18) at eval.c:2962
#38 0x00007fc3d12c3dfc in rb_call0 (klass=140478983025360,
recv=140478970802960, id=26777, oid=26777, argc=0, argv=0x0,
body=0x7fc3cfe1ca18, flags=0) at eval.c:6062
#39 0x00007fc3d12c4d84 in rb_call (klass=140478983025360,
recv=140478970802960, mid=26777, argc=0, argv=0x0, scope=0,
self=140478970802960) at eval.c:6158
#40 0x00007fc3d12bc4f1 in rb_eval (self=140478970802960,
n=0x7fc3cfe1dfd0) at eval.c:3493
#41 0x00007fc3d12bb651 in rb_eval (self=140478970802960,
n=0x7fc3cfe1d8c8) at eval.c:3309
#42 0x00007fc3d12c0e81 in rb_yield_0 (val=6, self=140478970802960,
klass=0, flags=0, avalue=0) at eval.c:5083
#43 0x00007fc3d12c1553 in loop_i () at eval.c:5216
#44 0x00007fc3d12c2316 in rb_rescue2 (b_proc=0x7fc3d12c152e <loop_i>,
data1=0, r_proc=0, data2=0) at eval.c:5480
#45 0x00007fc3d12c15ca in rb_f_loop () at eval.c:5241
#46 0x00007fc3d12c425a in call_cfunc (func=0x7fc3d12c1593 <rb_f_loop>,
recv=140478970802960, len=0, argc=0, argv=0x0) at eval.c:5759
#47 0x00007fc3d12c3535 in rb_call0 (klass=140479007795520,
recv=140478970802960, id=4121, oid=4121, argc=0, argv=0x0,
body=0x7fc3d15b6b88, flags=2) at eval.c:5911
#48 0x00007fc3d12c4d84 in rb_call (klass=140479007795520,
recv=140478970802960, mid=4121, argc=0, argv=0x0, scope=1,
self=140478970802960) at eval.c:6158
#49 0x00007fc3d12bc82b in rb_eval (self=140478970802960,
n=0x7fc3cfe1d850) at eval.c:3508
#50 0x00007fc3d12bb0d2 in rb_eval (self=140478970802960,
n=0x7fc3cfe1d828) at eval.c:3223
#51 0x00007fc3d12c0e81 in rb_yield_0 (val=140478970802760,
self=140478970802960, klass=0, flags=1, avalue=2) at eval.c:5083
#52 0x00007fc3d12d21d5 in rb_thread_yield (arg=140478970802760,
th=0x230b190) at eval.c:12426
#53 0x00007fc3d12d1e60 in rb_thread_start_0 (fn=0x7fc3d12d20f3
<rb_thread_yield>, arg=0x7fc3cf273248, th=0x230b190) at eval.c:12344
---Type <return> to continue, or q <return> to quit---
#54 0x00007fc3d12d2327 in rb_thread_initialize
(thread=140478970802800, args=140478970802760) at eval.c:12500
#55 0x00007fc3d12c4223 in call_cfunc (func=0x7fc3d12d2257
<rb_thread_initialize>, recv=140478970802800, len=-2, argc=0,
argv=0x0) at eval.c:5753
#56 0x00007fc3d12c3535 in rb_call0 (klass=140479007761480,
recv=140478970802800, id=2961, oid=2961, argc=0, argv=0x0, body=0x0,
flags=4) at eval.c:5911
#57 0x00007fc3d12c3535 in rb_call0 (klass=140479007761480,
recv=140478968811240, id=333, oid=333, argc=2, argv=0x7fffd95b43b0,
body=0x7fc3d15b1890, flags=0) at eval.c:5911
#58 0x00007fc3d12c4d84 in rb_call (klass=140479007761480,
recv=140478968811240, mid=333, argc=2, argv=0x7fffd95b43b0, scope=0,
self=140478969580760) at eval.c:6158
#59 0x00007fc3d12bb0d2 in rb_eval (self=140478969580760,
n=0x7fc3d0750d50) at eval.c:3223
#60 0x000000000256edd0 in ?? ()
#61 0x000000000256f068 in ?? ()
#62 0x00007fffd95b4bd0 in ?? ()
#63 0x00007fffd95bbd90 in ?? ()
#64 0x0000000000000007 in ?? ()
#65 0x00007fffd95b4df0 in ?? ()
#66 0x00007fc3d12d02e3 in rb_thread_schedule () at eval.c:11251
Previous frame inner to this frame (corrupt stack?)

(gdb) define rb_trace
>  set $frame = ruby_frame
>  while $frame
 >    set $node = $frame->node
 >    print $node->nd_file
 >    print ((unsigned int)(($node->flags>>19)&35184372088831)) # 
nd_line macro
 >    set $frame = $frame->prev
 >  end
>end

(gdb) rb_trace
$16 = 0x253cc31 "./gems/local/gems/god-0.7.8/bin/../lib/god/process.rb"
$17 = 215
$18 = 0x250ff11 "./gems/local/gems/god-0.7.8/bin/../lib/god/watch.rb"
$19 = 154
$20 = 0x250ff11 "./gems/local/gems/god-0.7.8/bin/../lib/god/watch.rb"
$21 = 117
$22 = 0x2393c51 "./gems/local/gems/god-0.7.8/bin/../lib/god/task.rb"
$23 = 171
$24 = 0x2393c51 "./gems/local/gems/god-0.7.8/bin/../lib/god/task.rb"
$25 = 344
$26 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb"
$27 = 68
$28 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb"
$29 = 41
$30 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb"
$31 = 36
$32 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb"
$33 = 36
$34 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb"
$35 = 35
$36 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb"
$37 = 35

God uses a double-fork to spawn processes, and it looks like the
double free usually occurs when the first forked process (in
process.rb:215) dies. God also uses a C extension
(http://github.com/mojombo/god/blob/master/ext/god/...)
which could be causing issues across the fork.

  Aman
Posted by Brent Roman (brentr)
on 2009-03-20 07:11
(Received via mailing list)
Aman,

It's quite possible that the double-frees are occurring both with and
without the MBARI2 patch, but they are not causing segfaults unless 
MBARI2
is applied.  You may want to try using valgrind or some similar tool to
catch the double frees. (valgrind is really very good at this)

A few days ago, I pushed a branch to my github repo with the MBARI 
patches
applied to ruby_1_8 head.  These patches use the "thread anchors" 
backported
(by Nobu, I believe) from 1.9 .  It seems to be a bit slower than my
approach, but it may well be more robust.   The branch is called
ruby_1_8-mbari:

git://github.com/brentr/matzruby.git

http://github.com/brentr/matzruby/commits/ruby_1_8-mbari/

It is a dev version, but this snapshot did pass the bundled ruby test 
suite
and all my tests as well.
It would give you the benefits of the MBARI2 patch via the thread 
anchors.

I'd really be must interested in finding out whether there are still
double-frees happening.  Let me know what you find.  If the double-frees
only happen with MBARI2 applied, I'll consider replacing MBARI2 with the
thread anchors from 1.8.8-dev

- brent
Posted by Aman Gupta (Guest)
on 2009-03-31 01:06
(Received via mailing list)
I've had no more issues since reverting mbari2. I'm able to reproduce
the segfault on my mac:

ruby(83833) malloc: *** error for object 0x152e6d4: Non-aligned
pointer being freed
*** set a breakpoint in malloc_error_break to debug
ruby(83833) malloc: *** error for object 0x152d154: Non-aligned
pointer being freed
*** set a breakpoint in malloc_error_break to debug
ruby(83891) malloc: *** error for object 0x152e6d4: Non-aligned
pointer being freed
*** set a breakpoint in malloc_error_break to debug
ruby(83891) malloc: *** error for object 0x152d154: Non-aligned
pointer being freed
*** set a breakpoint in malloc_error_break to debug

./gems/local/gems/god-0.7.8/bin/../lib/god/process.rb:183: [BUG] Bus 
Error
/opt/ruby-fiber/lib/ruby/1.8/net/http.rb:439: [BUG] Segmentation fault

Using gdb to breakpoint malloc_error_break shows:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000016
blk_copy_prev (block=0x15623b0) at eval.c:8549
8549    for (vars = tmp->dyna_vars; vars; vars = vars->next) {

(gdb) bt
#0  blk_copy_prev (block=0x15623b0) at eval.c:8549
#1  0x00020697 in proc_alloc (klass=1236720, proc=0) at eval.c:8773
#2  0x00022ea7 in rb_eval (self=6141440, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3780
#3  0x00026480 in rb_call0 (klass=6141380, recv=6141440, id=5313,
oid=5313, argc=0, argv=0xbfff4268, body=0x5266d8, flags=<value
temporarily unavailable, due to optimizations>) at eval.c:6130
#4  0x000269dc in rb_call (klass=6141380, recv=6141440, mid=5313,
argc=2, argv=0xbfff4260, scope=0, self=18370740) at eval.c:6233
#5  0x00024002 in rb_eval (self=18370740, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3517
#6  0x000253d6 in rb_eval (self=18370740, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3242
#7  0x00024be4 in rb_eval (self=18370740, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3328
#8  0x00026480 in rb_call0 (klass=6122780, recv=18370740, id=8457,
oid=8457, argc=0, argv=0x0, body=0x52b750, flags=<value temporarily
unavailable, due to optimizations>) at eval.c:6130
#9  0x000269dc in rb_call (klass=6122780, recv=18370740, mid=8457,
argc=0, argv=0x0, scope=0, self=18375120) at eval.c:6233
#10 0x00024002 in rb_eval (self=18375120, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3517
#11 0x00023d15 in rb_eval (self=18375120, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3702
#12 0x00026480 in rb_call0 (klass=5519380, recv=18375120, id=27009,
oid=27009, argc=0, argv=0xbfff5344, body=0x5474dc, flags=<value
temporarily unavailable, due to optimizations>) at eval.c:6130
#13 0x000269dc in rb_call (klass=5519380, recv=18375120, mid=27009,
argc=1, argv=0xbfff5340, scope=0, self=18374940) at eval.c:6233
#14 0x00024002 in rb_eval (self=18374940, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3517
#15 0x0002296e in rb_eval (self=18374940, n=<value temporarily
unavailable, due to optimizations>) at eval.c:2966
#16 0x00026480 in rb_call0 (klass=5500380, recv=18374940, id=26953,
oid=26953, argc=0, argv=0x0, body=0x540b50, flags=<value temporarily
unavailable, due to optimizations>) at eval.c:6130
#17 0x000269dc in rb_call (klass=5500380, recv=18374940, mid=26953,
argc=0, argv=0x0, scope=0, self=18374940) at eval.c:6233
#18 0x00024002 in rb_eval (self=18374940, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3517
#19 0x00024be4 in rb_eval (self=18374940, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3328
#20 0x0002a8d1 in rb_yield_0 (val=<value temporarily unavailable, due
to optimizations>, self=18374940, klass=0, flags=0, avalue=0) at
eval.c:5116
#21 0x0002c85d in loop_i () at eval.c:5249
#22 0x0001aa53 in rb_rescue2 (b_proc=0x2c820 <loop_i>, data1=0,
r_proc=0, data2=0) at eval.c:5513
#23 0x0001ab57 in rb_f_loop () at eval.c:5274
#24 0x00025adf in rb_call0 (klass=1301660, recv=18374940, id=4121,
oid=4121, argc=-1073782088, argv=0x0, body=0x13bdc0, flags=<value
temporarily unavailable, due to optimizations>) at eval.c:5951
#25 0x000269dc in rb_call (klass=1301660, recv=18374940, mid=4121,
argc=0, argv=0x0, scope=1, self=18374940) at eval.c:6233
#26 0x00022ffd in rb_eval (self=<value temporarily unavailable, due to
optimizations>, n=<value temporarily unavailable, due to
optimizations>) at eval.c:3532
#27 0x000253d6 in rb_eval (self=18374940, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3242
#28 0x0002a8d1 in rb_yield_0 (val=<value temporarily unavailable, due
to optimizations>, self=18374940, klass=0, flags=0, avalue=2) at
eval.c:5116
#29 0x0002d8cb in rb_thread_start_0 (fn=0x2abc0 <rb_thread_yield>,
arg=0x11860b8, th=0x875a00) at eval.c:12408
#30 0x00025adf in rb_call0 (klass=1284640, recv=18374860, id=2961,
oid=3221192596, argc=1093632, argv=0xbfff6ee8, body=0x23f17,
flags=<value temporarily unavailable, due to optimizations>) at
eval.c:5951
#31 0x01184934 in ?? ()

I will try to dig into the issue a bit more with valgrind, etc.

  Aman
Posted by Brent Roman (brentr)
on 2009-03-31 02:15
(Received via mailing list)
Aman,

Could you reduce this repeatable failure to a script I could easily run 
to
reproduce it here?

My main machine is a mac mini running linux, but I can always reboot it 
into
OS/x
I've got access to PPC macs too, but they only run OS/x.

- brent
Posted by Roger Pack (Guest)
on 2009-04-21 16:50
(Received via mailing list)
Issue #744 has been updated by Roger Pack.


is anybody still getting segfaults with the latest MBARI patches?
They are working well for me, at least I haven't run into the segfaults 
of last Dec./Jan. for quite awhile.
Thanks.
-=r
----------------------------------------
http://redmine.ruby-lang.org/issues/show/744
Posted by Brent Roman (brentr)
on 2009-05-06 05:25
(Received via mailing list)
Rogar,

I have no outstanding problem reports aside from Aman's issues with God 
on
x86_64 reported
here over a month ago.  I'm still hoping he can distill this failure 
into
something I can replicate
and fix.

I merged the full patch set into the 1.8 trunk in mid-March, but I've 
had no
feedback from
the core developers since then.

- brent
Posted by Nobuyoshi Nakada (nobu)
on 2009-05-06 10:53
(Received via mailing list)
Hi,

At Wed, 6 May 2009 12:25:12 +0900,
Brent Roman wrote in [ruby-core:23365]:
> I merged the full patch set into the 1.8 trunk in mid-March, but I've had no
> feedback from
> the core developers since then.

Sorry to be late, but I have to resolve conflicts after it and
split directly irrelevant changes.
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.