Bug #744: memory leak in callcc? http://redmine.ruby-lang.org/issues/show/744 Author: Roger Pack Status: Open, Priority: Normal from http://rubyforge.org/tracker/?func=detail&atid=169... this code require 'generator' loop { g = Generator.new {|x| (1..3).each {|i| x.yield i}} } seems to leak for me--I'm not sure if this is expected or not. Thanks.
on 2008-11-11 21:34
on 2008-11-13 02:49
Roger, I've run into a number of issues related to Continuations and MRI's garbage collector, so I thought I'd have a look at this one. I investigated the equivalent (non-generator) example described at: http://rubyforge.org/tracker/?func=detail&atid=169... This: loop {@x = callcc {|c| c}} quickly consumes all of memory. One of my x86 Linux machines crashed after a couple minutes of running this loop with a Segmentation Fault. My guess here is that the stack, which during GC is unchecked, got too deep. What I saw is that the stack during garbage collection became rediculously deep. (>15000 frames deep in the GC) Here's a bit of the backtrace: #11768 0x0806401a in mark_locations_array (x=0xa887004, n=1228) at gc.c:437 #11769 0x0805dceb in thread_mark (th=0xa8860f8) at eval.c:7403 #11770 0x0806438e in rb_gc_mark (ptr=59) at gc.c:881 #11771 0x0806401a in mark_locations_array (x=0xa88924c, n=1228) at gc.c:437 #11772 0x0805dceb in thread_mark (th=0xa888340) at eval.c:7403 #11773 0x0806438e in rb_gc_mark (ptr=59) at gc.c:881 #... This looks like rb_gc_mark() got passed a bogus VALUE pointer. I cannot even unwind the stack to the point where this happened without gdb itself segfaulting. Interestinly, the very same Ruby interpreter running on an ARM9 under Linux handles this case without leaking memory or segfaulting. So, in answer to your original question: I don't think this behavior is intentional. And, I plan to spend a bit more time looking into it. Any hints would be appreciated... - brent
on 2008-11-13 09:36
Roger,
This "leak" appears to be an artifact of MRI's conservative garbage
collector.
Depending upon the compiler options and the target CPU, there may be
unused references to these continuation objects left on the thread's
stack
when
it is copied in whole to by the callcc method.
In your example, these unused references form a linked list of
continuations
across the loop iterations. When the GC tries to mark such a recursive
structure, it
consumes a lot of stack space. With Ruby 1.68, this leads to a
segmentation
fault when the stack size exceeds the max allowed (see ulimit -s)
With later versions of Ruby, the mark phase of the GC will "give up"
when the stack grows too large. This avoids the segfault, as GC
silently
stops working instead. At least that's what I think I see in v1.8.7 p72
When I built x86 Ruby using gcc without optimization (CFLAGS=-g),
even this caused a memory leak:
loop { callcc {|c| c}}
However, when I rebuilt it with CFLAGS=-O2, the memory leak only
appeared
when the continuations returned by callcc where assigned to a variable.
When compiled for the ARM9 with gcc CFLAGS=-Os or CFLAGS=-O2, everything
works as
it should. No leaks observed. However, when I changed to CFLAGS=-g or
CFLAGS=-O3,
the original example leaks badly. These tests were performed using x86
gcc
v3.3.5 and
ARM gcc v3.4.5.
Can anyone suggest debugging techniques to help determine what is
leaving
the dangling
references to these continuations on Ruby's 'C' call stack?
If we knew what wrote these, we might be able to explicitly clear them
once
they go out of scope. This should result in better significantly GC
performance all around.
- brent
on 2008-11-15 19:03
Issue #744 has been updated by Roger Pack. > If we knew what wrote these, we might be able to explicitly clear them > once > they go out of scope. This should result in better significantly GC > performance all around. Yeah I've wondered that too. Maybe we can have a hackfest for it some saturday :) http://redmine.ruby-lang.org/issues/show/649 is related [and somewhat frustrating to be honest]. My thought is that maybe there's a way to "clear the stack" of data that isn't currently "useful" and thus clear it of old references [I realize this may be hard]. Thoughts? -=R ---------------------------------------- http://redmine.ruby-lang.org/issues/show/744
on 2008-11-17 04:09
On Sun, 16 Nov 2008 02:59:18 +0900, Roger Pack wrote: > somewhat frustrating to be honest]. My thought is that maybe there's a > way to "clear the stack" of data that isn't currently "useful" and thus > clear it of old references [I realize this may be hard]. I can't reproduce http://redmine.ruby-lang.org/issues/show/649 in Debian's Ruby 1.8 or 1.9. ruby1.8 1.8.7.72-1 ruby1.9 1.9.0.2-8 (The callcc thing, on the other hand, is broken on Debian's Ruby 1.8)
on 2008-11-17 04:58
Roger, Well, I just summarized the result of this Saturday's hackfest at: http://rubyforge.org/tracker/?func=detail&atid=169... The main problem seems to be that 'C'/C++ compilers do not initialize automatic variables, so one is bound to have old, unused, but valid pointers left on the stack from previous at any point in time. By the way, even this will fix the example leak: loop {@x = callcc {|c| c}; 2*6+4} Pretty silly, but it works for me. And, the fact it works proves that the issue is unused references left on the stack. The behavior is a fundamental design weakness of conservative GC. We notice it most when managing large and/or highly connected objects like threads, continuations and large arrays. One could hack the gcc to force it to initialize automatic variables to zero even though this violates the 'C' langauage spec. But I can't help feeling that there must be a better way. In my on again off again quest to put Ruby on a diet, I'll probably hack at this a bit more over to coming weeks, initially on my patched v1.68 interpreter. Thanks for the redmine link. Found this hanging off it. Looks better than the "reachability" patches to GC: http://softwareverify.com/ruby/customBuild/memtrac... Here are some interesting posts about this problem from outside the Ruby world. The first is especially relevant: http://gcc.gnu.org/ml/java/2005-05/msg00265.html http://www.red-bean.com/guile/guile/new/msg01070.html http://www.digitalmars.com/rtl/gcdescr.html - brent
on 2008-11-17 08:52
At 12:54 08/11/17, Brent Roman wrote: >One could hack the gcc to force it to initialize automatic variables to zero >even though this violates the 'C' langauage spec. I haven't read the spec, but my guess (having worked on other specs) is that the only thing that the 'C' language spec says is that the value is undefined. A value that happens to be zero would still be undefined, as far as I understand. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
on 2008-11-17 13:11
Martin, Well. Ummm. If a compiler writes zeros then is it not setting the value of variables that the spec says should remain undefined until explicitly initialized? Whether or not it violates the 'C' language spec, I don't know any way to make gcc do this with existing compiler options or pragmas. Does anyone else? - brent Martin Duerst wrote: > > At 12:54 08/11/17, Brent Roman wrote: > >>One could hack the gcc to force it to initialize automatic variables to zero
on 2008-11-17 15:08
A common technique is to allocate a reasonably sized array (256-bytes)
on the C stack and zero it before and after each allocation. This
reduces garbage left on the stack before and after allocation and
possible GC:
void *my_alloc(size_t size)
{
char zeros[256];
void *ptr;
memset(zeros, 0, sizeof(zeros));
ptr = my_alloc_inner(size);
memset(zeros, 0, sizeof(zeros));
return ptr;
}
void *my_alloc_inner(size_t size)
{
/* may call GC */
}
Might need to put my_alloc_inner() in a separate compilation unit to
avoid inlining.
Kurt
on 2008-11-28 11:00
After a couple weeks of long nights and false starts, I feel I may have
come
up with
a fix for a large class of Ruby memory leak. The basic technique is a
refinement of the
one Kurt Stephens suggested. It not only eliminates the leaks in this
one
liner:
loop {@x=callcc{|c|c}}
but also in our multi-threaded robotics application. Our Ruby process
used
to grow
to 20+ MB during a day long run. The same run now stays smaller than
10MB.
On an embedded ARM Linux machine with only 32MB of DRAM, this is a great
result!
The central problem is that gcc (and other compilers) tend to create
sparse stack frames such that, when a new frame is pushed onto the
stack, it
does not
completely overwrite the one that had been previously stored there. The
new
frame gets
activated with old VALUE pointers preserved inside its holes. These
become
"live" again
as far as any conservative garbage collector is concerned. And, viola,
a
leak is born!
I implemented a scheme for recording the maximum depth of the C stack in
xmalloc and during garbage collection itself. However, I realized that
there was no point in clearing the stack when it is near its maximum
depth.
Instead, stack clearing is deferred until CHECK_INTS, as this tends to
happen
between evaluation of nodes, when the stack is likely to be shallower.
At this point
a tight loop quickly zeros the region between the current top of stack,
as
returned by alloca(0), and the maximum recorded stack extent. It also
updates
the stack extent so no memory is cleared repeatedly if the stack
contracts
further.
This paper discusses this and similar techniques:
http://www.hpl.hp.com/personal/Hans_Boehm/gc/paper...
Another related issue is that the style of rb_eval() in eval.c in the
1.8 and 1.6 series causes gcc to emit a especially large and sparse
stack
frames.
Consider that gcc allocates two pair of stack slots for r and l in
constructs like this:
switch (nd_type(node)) {
/* nodes for speed-up(literal match) */
case NODE_MATCH2:
{
VALUE l = rb_eval(self,node->nd_recv);
VALUE r = rb_eval(self,node->nd_value);
result = rb_reg_match(l, r);
}
break;
/* nodes for speed-up(literal match) */
case NODE_MATCH3:
{
VALUE r = rb_eval(self,node->nd_recv);
VALUE l = rb_eval(self,node->nd_value);
....
By the time the compiler's optimizer is allocating stack frame slots,
all
the block structure
of the original code has been lost in various transformations.
As a result, each rb_eval() call ends up pushing about 4k bytes onto the
C
stack,
of which less than 20% is even initialized. This means that:
1) There is a high probability that old VALUEs from previous frames
will be resurrected as the stack grows,
2) The GC must scan a sparse, large stack and mark the many dead object
pointers it contains.
3) callcc and thread context switches must copy needlessly large stacks
4) recursive Ruby programs run out of stack space much earlier than
than
they might otherwise.
When I simply re-factored rb_eval() such that it calls a (non-inline)
function
for each node type it encounters, the total observed C stack size for my
application
was reduced by more than two thirds. Not surprisingly, threading and
continuation
micro benchmarks and run about 3 - 4 times faster. However, I expect
that
benchmarks that
operate repeatedly on a few large, long lived objects will run slower.
Keep in mind that these techniques should improve the performance of
*any*
garbage
collector that scans the unstructured C stack for valid object pointers.
It
may
even be relevant for the 1.9 series Ruby, but I'll leave that for those
more
qualified to determine.
Today, this is implemented only in my heavily patched version of Ruby
1.6.8.
In the short term, if there's interest,
I can quickly post my hacked 1.6.8 Ruby to an FTP site for others to
test.
Longer term,
The stack clearing could be supplied as a small patch to the 1.8 series,
however the
refactoring of rb_eval() is probably too large to be attached to an
email
message
on this list. I will take the time to produce these patches only if at
least a few people
commit to testing them, reporting detailed results and suggestions for
improvement here.
- brent
on 2008-11-30 04:42
Hi, At Fri, 28 Nov 2008 18:54:45 +0900, Brent Roman wrote in [ruby-core:20149]: > Longer term, > The stack clearing could be supplied as a small patch to the 1.8 series, > however the > refactoring of rb_eval() is probably too large to be attached to an email > message > on this list. I will take the time to produce these patches only if at > least a few people > commit to testing them, reporting detailed results and suggestions for > improvement here. In shorter, if you use gcc, can't you try -mpreferred-stack-boundary=2 option?
on 2008-11-30 06:15
Before hacking rb_eval(), I first tried finding some compiler
options that would fill the stack holes.
Decreasing the stack slot alignment requirements
does pack stack somewhat, however, the very sparse stack
frame generated by the huge switch statement in rb_eval() remains
largely unaffected by any compiler options I could find.
These holes still caused the GC to preform poorly for my app and
to fail utterly when presented with: @x=loop {callcc {|c| c}}
Just have a look at the generated assembler code for rb_eval:
from "gcc -S -O2 eval.c". The function preamble decrements stack
pointer by
566 bytes. Which of those bytes is actually written is determined
by the node type processed. Most of them remain uninitialized in *all*
cases.
With -mpreferred-stack-boundary=2, rb_eval() starts by decrementing
the stack pointer by 548 bytes. No much difference.
After factoring, rb_eval() decriments the stack pointer by
only about 20 bytes. I got best results with these options on x86 gcc
4.3.2:
gcc -mpreferred-stack-boundary=2 -fno-stack-protector
-fno-inline-functions-called-once
Nobu, these are not just 2%-5% memory and time reductions.
For multithreaded applications, the both time and space performance
are significantly improved. I suspect that some large single threaded
apps will also benefit. (Maybe even rails?! :-)
There's an opportunity here. I hope that
the core developers will find time to seriously explore it.
- brent
on 2008-11-30 07:08
> After a couple weeks of long nights and false starts, I feel I may have come > up with > a fix for a large class of Ruby memory leak. The basic technique is a > refinement of the > one Kurt Stephens suggested. It not only eliminates the leaks in this one > liner: Wow thanks for doing that. I'd say please create a redmine bug for it [or attach it to an existing]. A patch to 1.8.7 would be sweet :) A patch for 1.9 would be great too :) I'd imagine that your system is "better" than just blindly doing a garbage_collect() { clear_stack(); ....do normal gc } void clear_stack() { a = char[10000]; memclear(a); } ? Thanks! -=R Note that I use gcc 3.4.5 I assume that won't be a problem though.
on 2008-11-30 12:27
The problem can be demonstrated with a very simple program (attached), and it looks to me like a bug in gcc - surely it should overlap stack assignments for automatic variables which aren't in scope simultaneously? One solution to rb_eval() might be an ugly union at the top of the function (second attachment). But it seems wrong to have to do this just to code around an implementation problem with one particular compiler, albeit a ubiquitous one. Regards, Brian.
on 2008-11-30 20:12
Brian, Thanks for the very clear demo program to illustrate the problem. Is there anyone who can run look at the assembler code generated for this demo by a recent Microsoft or Intel 'C' compiler? In any case, I doubt that the gcc maintainers would consider this behavior a bug. It's been with them from before v3.3.5. They've known about it for many years. They view it is an limitation of their register optimization techniques and are more concerned about speeding up the code than shrinking its stack footprint. However, for us, larger stacks = slower code due to stack copying and the conservative GC. The "ugly union" solution would not be sufficient because much of the stack is occupied by compiler generated temporaries that have no representation in the 'C' input source. I did consider such wholesale code changes, but resisted because they would have been, as you say, quite ugly and difficult to maintain. What I did come up with was not ugly at all. Factor the unwieldy switch statement of rb_eval() into separate functions to handle each node type and clear the stack at a few opportune times. rb_eval() becomes smaller and more likely to be optimized. I buried the stack clearing into macros that already exist. - brent
on 2008-11-30 20:40
Roger, I already responded in detail to this bug: http://rubyforge.org/tracker/?func=detail&atid=169... I just bang on Ruby 1.6.8 for our robotics application. You seem to already be doing a lot of excellent Ruby testing with current versions. If I spent a couple days developing these two patches for Ruby 1.8.7, would you be willing to run regression tests against them and to report the results here? I think the small stack clearing patch should improve the GC behavior, but, by itself, it will likely slow down some apps due to its having to clear large areas of stack. I'd expect to see that slow down mitigated by the larger patch that would refactor rb_eval() and thereby keep the stack smaller. The combined patches will likely be large, so I'll just post links to them here. Would anyone else be willing to test them? ... Particularly those who have large apps, and/or apps that use multiple threads or continuations that seem to leak memory? - brent P.S. I use gcc 3.4.5 for generating code for our embedded ARM targets. The older compiler generates fewer stack temporaries than the newer ones. Don't rush to update :-) P.P.S. The way GC is currently invoked causes it to occur when that stack is already near its maximum depth. This patch tries to make GC normally occur is part of CHECK_INTS, when the stack tends to be shallower. At that point, clearing the stack can be much more effective.
on 2008-12-01 10:35
> What I did come up with was not ugly at all. Factor the unwieldy switch > statement of rb_eval() into separate functions to handle each node > type Did you replace the whole switch statement with a dispatch table? That sounds like a sensible thing to do anyway. OTOH, if this is for ruby 1.8.x, I'm afraid you may not find much interest in such changes while the focus is all on 1.9. Perhaps worth checking how 1.9's bytecode interpreter stacks up under the same conditions? OTOH, 1.9 doesn't have callcc anyway, so maybe your application code would need a lot of restructuring to use Fiber instead. I don't know if it's possible to implement callcc in terms of Fiber. Regards, Brian.
on 2008-12-01 13:30
On Mon, Dec 01, 2008 at 06:29:00PM +0900, Brian Candler wrote: > OTOH, 1.9 doesn't have callcc anyway, so maybe your application code would > need a lot of restructuring to use Fiber instead. I don't know if it's > possible to implement callcc in terms of Fiber. 1.9 does have callcc (require 'continuation'). It's probably not good to use it, though. Paul
on 2008-12-01 20:16
Brent- I would love to see a version of these patches against 1.8.6 or 1.8.7. I can test them on a few hundred servers to see what kind of resource consumption these changes have in larger deployments. Awesome work on this. I'm very interetsed in testing this for you. You can contact me off list if you like or if you want servers to use to test this on. Thanks Ezra Zygmuntowicz ez@engineyard.com
on 2008-12-01 20:19
On Dec 1, 2008, at 1:29 AM, Brian Candler wrote:
> in such changes while the focus is all on 1.9.
Actually I think you will find a *ton* of interest in this for the
1.8.* branch. There are thousands of production apps that are not
going to move to 1.9 anytime soon and any improvements to 1.8.* thread
and callcc handling like this would be very welcome.
Thanks
Ezra Zygmuntowicz
ez@engineyard.com
on 2008-12-01 20:54
Brian, gcc optimizes the big switch in rb_eval() into a dispatch table both before and after my factoring of it into separate node handling functions. I realize that the Ruby world has moved on, which is why I'm not going to bother with more work on this until at least a couple folks commit to testing it. The 1.8 series is similar enough to 1.6.8 that I know I could create a patch patch for it in a few days. If that tested well, I might consider trying it with 1.9, but I suspect that would be a lot more effort. If 1.9 is using the same GC and gcc as 1.8, then I would expect that it would benefit from this patch. However, that remains to be proven. Also, 1.9 and its "standard libs" have gotten so large that they simply won't fit on my target (embedded ARM linux) machines. The 1.8 core is really not that much bigger than 1.9, I'd just have to strip away most of its new "standard" libs. Does anyone know the current status of "Atomic Ruby?" As Paul as already pointed out, Matz and Koichi kept callcc in v1.9 Ruby via some very amazing code hardwired into the VM. It is made accessible after require "continuation". I've traced the reliability issues with continuations to the fact that the GC object mark function for them is incorrect, and posted a patch to fix this in v1.8.6 about a year ago. That fix was never implemented so continuations continue to have a bad wrap. My own experience is with them since than is quite good. However, Paul Brannan told me that he has had trouble with them due to their incompatibility with some of the non-standard libraries with which his application links. (Something about call backs, if I recall correctly) In any case, Continuations are more general than Fibers. Fibers can be implemented in terms of continuations quite readily, but Continuations cannot be implemented in terms of Fibers. - brent
on 2008-12-01 22:00
> > Actually I think you will find a *ton* of interest in this for the > 1.8.* branch. There are thousands of production apps that are not going to > move to 1.9 anytime soon and any improvements to 1.8.* thread and callcc > handling like this would be very welcome. > > Thanks > Ezra Zygmuntowicz I would like to second that. 1.8.7 patches would be very interesting indeed. -Stephen
on 2008-12-01 23:02
On Tue, Dec 02, 2008 at 04:47:46AM +0900, Brent Roman wrote: > I've traced the reliability issues with continuations to the fact that > the GC object mark function for them is incorrect, and posted > a patch to fix this in v1.8.6 about a year ago. That fix was never > implemented I know what you mean. My own small patches (just to fix compatibility for uClibc(*)) were also ignored. This is what I meant when I said "not find much interest": of course the user base is hugely interested in the development of the robust 1.8 code. I'm just unconvinced that the ruby core developers are. Even now that ruby 1.9 is supposedly no longer a moving target, I certainly have no plans to move to it in any production environment. I just don't want the pain of all those broken libraries and frameworks. Maybe in a year or two. Regards, Brian. (*) I'm interested in resource-limited platforms too. ruby 1.8 installs fine on OpenWrt boxes with 4MB of flash, if you trim the standard libraries a bit.
on 2008-12-02 02:27
At 06:56 08/12/02, Brian Candler wrote: >I know what you mean. My own small patches (just to fix compatibility for >uClibc(*)) were also ignored. Please don't assume that this was on purpose. With that much going on, things can easily be lost. Please try again resending the patch, or even better (now that it exists) use redmine. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
on 2008-12-02 09:54
On Tue, Dec 02, 2008 at 10:21:05AM +0900, Martin Duerst wrote: > At 06:56 08/12/02, Brian Candler wrote: > > >I know what you mean. My own small patches (just to fix compatibility for > >uClibc(*)) were also ignored. > > Please don't assume that this was on purpose. With that much going > on, things can easily be lost. Please try again resending the patch, > or even better (now that it exists) use redmine. I posted it twice to ruby-core, once to rubyforge tracker and then migrated that to redmine a few weeks ago. There was no response in any of those locations. Here are the links: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/... http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/... http://rubyforge.org/tracker/index.php?func=detail... http://redmine.ruby-lang.org/issues/show/720 I spent time diagnosing, fixing and reporting this particular problem. So even an explicit rejection of this work would have been better than no response at all. As far as I can tell, I've followed the processes documented at http://www.ruby-lang.org/en/community/ruby-core/ Regards, Brian.
on 2008-12-02 10:20
Hi,
In message "Re: [ruby-core:20207] Re: Promising C coding techniques to
reduceMRI's memory use"
on Tue, 2 Dec 2008 17:47:25 +0900, Brian Candler
<B.Candler@pobox.com> writes:
|Here are the links:
|
|http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
|http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
|http://rubyforge.org/tracker/index.php?func=detail...
|http://redmine.ruby-lang.org/issues/show/720
|
|I spent time diagnosing, fixing and reporting this particular problem. So
|even an explicit rejection of this work would have been better than no
|response at all.
My bad, somehow I (we) missed all of your posts. I am awfully sorry.
Definitely I will check and merge them if I see no problem, after the
deadline I am facing. Ping me, if you see no further action after a
week or two.
matz.
on 2008-12-02 10:32
Hi,
In message "Re: [ruby-core:20208] Re: Promising C coding techniques to
reduceMRI's memory use"
on Tue, 2 Dec 2008 18:14:33 +0900, Yukihiro Matsumoto
<matz@ruby-lang.org> writes:
|My bad, somehow I (we) missed all of your posts. I am awfully sorry.
|Definitely I will check and merge them if I see no problem, after the
|deadline I am facing. Ping me, if you see no further action after a
|week or two.
I briefly checked soon after the post, and found out that:
* I missed the original report in the rubyforge tracker
* after reposting to redmime, I checked in the patch into the 1.9
trunk, so that 1.9 does not have this problem.
* then I forgot to apply this one to 1.8.
* I just checked in to 1.8 head.
* next 1.8.7 maintenance release or 1.8.8 will not have the problem.
I am sorry.
matz.
on 2008-12-02 10:38
Hi,
In message "Re: [ruby-core:20179] Re: Promising C coding techniques to
reduce MRI's memory use"
on Mon, 1 Dec 2008 04:34:12 +0900, Brent Roman <brent@mbari.org>
writes:
|If I spent a couple days developing these two patches for Ruby 1.8.7,
|would you be willing to run
|regression tests against them and to report the results here?
We are troubled by the "ghost references from the machine stack"
generated by GCC for years. We are more than happy to see the patch,
and merge it if it's acceptable.
matz.
on 2008-12-02 13:24
On Tue, Dec 02, 2008 at 06:26:06PM +0900, Yukihiro Matsumoto wrote: > * I just checked in to 1.8 head. > * next 1.8.7 maintenance release or 1.8.8 will not have the problem. Many thanks - I hadn't noticed that you had applied the patch to 1.9 already. While we're at it, I also analysed some issues with WEBrick: is there any interest in these? http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/... -- possible patch in [18565] http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/... http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/... I included some suggested patches in those posts, but I really wanted some discussion/feedback on what was the best way forward. For now I am using a local monkey-patch (attached) which addresses these issues. This patch also adds the ability to return a proc as the body of a HTTPResponse; the proc is passed an output object, and everything written to it is turned into a HTTP chunk. This is an expansion of the patch in [18460]. It also increases block size from 4K to 16K. I could rewrite these changes as an actual patch to WEBrick if there is interest in applying them, and agreement on the solutions I've used. Regards, Brian.
on 2008-12-03 18:48
On Tue, Dec 2, 2008 at 5:18 AM, Brian Candler <B.Candler@pobox.com> wrote: > http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/... > -- possible patch in [18565] > http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/... > http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/... > > I included some suggested patches in those posts, but I really wanted some > discussion/feedback on what was the best way forward. Few people use webrick maybe that's why there's no discussion :) If they're no in redmine I'd add them there so they don't forgotten [hopefully the new tracker will help]. Cheers! -=R
on 2008-12-03 18:58
> I just bang on Ruby 1.6.8 for our robotics application. I was wondering why the older version :) > You seem to already be doing a lot of excellent Ruby testing with current > versions. > If I spent a couple days developing these two patches for Ruby 1.8.7, > would you be willing to run > regression tests against them and to report the results here? Absolutely. I'll test them against some trivial stuff and a small rails app and see if they help memory wise and check for speed :) > P.P.S. The way GC is currently invoked causes it to occur when that stack > is already near its maximum depth. This patch tries to make GC normally > occur is part of CHECK_INTS, when the stack tends to be shallower. > At that point, clearing the stack can be much more effective. I wonder if there are less intrusive ways, like changing [from a previous post] VALUE l = rb_eval(self,node->nd_recv); VALUE r = rb_eval(self,node->nd_value); result = rb_reg_match(l, r); } break; /* nodes for speed-up(literal match) */ case NODE_MATCH3: { VALUE r = rb_eval(self,node->nd_recv); VALUE l = rb_eval(self,node->nd_value); .... to ... VALUE l = NULL; VALUE r = NULL; l = rb_eval(self,node->nd_recv); r = rb_eval(self,node->nd_value); result = rb_reg_match(l, r); } break; /* nodes for speed-up(literal match) */ case NODE_MATCH3: { r = rb_eval(self,node->nd_recv); l = rb_eval(self,node->nd_value); [reuse same variable]. Also re: size --doesn't 1.9 have rubygems pre-installed so that it isn't as large of a standard library? [just pointing out that maybe it could use some minimizing love still?] :) Thanks! -=R
on 2008-12-03 19:53
Roger, I'll be posting a set of patches to 1.8.7 on an ftp server in a week or so, with a URL to it here. Thanks for agreeing to test it. The "ghost VALUE references" would not be affected by the code changes you propose. GCC's optimizer will just remove your attempts to initialize VALUEs to NULL. Even if you could prevent that (with volatile, perhaps), there would remain many uninitialized anonymous temporaries that you could not even access from the 'C' source code. - brent P.S. The core of 1.9 got a good deal larger due to its more sophisticated VM and support for non-latin languages. But, in all fairness, I haven't looked at 1.9 seriously for almost a year now. Maybe it could benefit from some "minimizing love" now.
on 2008-12-03 21:48
Hello, On Wednesday 03 December 2008 19:47:01 Brent Roman wrote: > I'll be posting a set of patches to 1.8.7 on an ftp server > in a week or so, with a URL to it here. Thanks for > agreeing to test it. I'll definitely try it out, too. > > The "ghost VALUE references" would not be affected > by the code changes you propose. GCC's optimizer > will just remove your attempts to initialize VALUEs to NULL. Even if you Actually that's not exact, according to my experiments - it optimizes away assignments of NULL to a pointer. VALUE is not a pointer, and it doesn't optimize away neither NULL nor 0 assignments (i tried with gcc 4.3.2) Coincidently, GCC 4.4 is supposed to have an optimization for variables in switch (see http://gcc.gnu.org/gcc-4.4/changes.html), but unfortunately, if i understand it correctly it's for constants only (i wonder if it's impossible for variables, or just nobody has written it yet :) Regards, -- mb
on 2008-12-04 18:16
On Wednesday 03 December 2008 21:42:27 Michal Babej wrote: > Actually that's not exact, according to my experiments - it optimizes away > assignments of NULL to a pointer. VALUE is not a pointer, and it doesn't > optimize away neither NULL nor 0 assignments (i tried with gcc 4.3.2) Sorry, my bad, was jumping to conclusions too fast. Ignore that :) -- mb
on 2008-12-05 17:55
The "initialization holes" that leave potential pointers on the stack occur in the interpreter, any system libraries and the GC itself. Thus clearing some stack words before and *after* allocation/GC helps, but at an obvious cost. Keeping stack frames small helps, perhaps moving some data structures out of the C stack into explicit stacks would help there? A call/cc implemenation that copies less C stack might also reduce leaks and overhead: http://github.com/kstephens/ll/tree/master/src/ccont Recompiling Ruby with flags to reduce initialization holes will not help leaks from appearing in initialization holes in system libraries. We have some Ruby processes (> 375 MB) that we'd like to keep running longer, but are unable to do so because of leaks. I'll help test your patches on 1.8.6. Kurt
on 2008-12-13 03:20
> updates > the stack extent so no memory is cleared repeatedly if the stack contracts > further. This is sweet. I liked the idea so much I coded my own [perhaps much smaller, definitely less effective] version. It only includes the stack clearing you referred to, and doesn't even monitor "exactly" the stack size, but approximates it by metering it once every CHECK_INTS. Ruby seems to run "as fast as normal" with it, and collect better. In principle, you'd only have to clear the stack once "between each GC" so if you kept track of which portions of it you'd been able to clear, you could avoid a few stack clearings :) I'm not sure exactly how much cpu that would save, though. This patch also doesn't fix the loop {@x=callcc{|c|c}} aspect [presumably because ruby's green threads copy chunks of the stack to heap, so they aren't cleaned]--so I'd imagine it's less effective in multi-threaded codes [but hopefully still helpful]. Look forward to the real patch when it comes in :) Note that as it is currently, if you run GC.start it also calls clean_stack, so if you run GC.start when your program is at it "inner depth [most nested call]" it will notice exactly how deep it is, and hopefully clean up the stack "all the way" when you ascend out of deep calls. I suppose creating a new call "GC.clear_stack" would be useful. i.e. GC.start -> GC.start + "clean stack/make a note of how deep the stack is currently" With [1] it successfully prevents the string 'a' from not being garbage collected: With [2] it successfully collects a few more objects than the unpatched does. I'm not positive how well it works but I think it does. Enjoy. -=R [0] patch: http://wilkboardonline.com/roger/clear_stack_only2.diff [1] file.rb: def does_nothing end def deep(how, gc = false) if(how == 175) 'a'*1000 end if how == 300 print "222222deepest" GC.start print "222222deepest" return end deep(how+1) 20.times {does_nothing} end puts deep(0) GC.start deep(0, true) count = 0 ObjectSpace.each_object(String) do |s| print s, ' '; count = count+1; end print count [2] file2.rb: count = 0 ObjectSpace.each_object{|o| count += 1 } print count GC.disable def go depth if depth == 50 GC.enable GC.start return end if(rand(10) == 3) a = 'abcd' go(depth+1) go(depth+1) end if(rand(10) == 3) b = 'abcd' end go(depth+1) end go 0 count = 0 ObjectSpace.each_object{|o| count += 1 } print count
on 2008-12-13 18:17
Roger, Look for the "real patch" next week. In fact, there will be at least five patches: #1: prevents continuations from segfaulting when they refer to dead threads #2: limit each thread's stack to its own stack frames (none from other threads) #3: My stack clearing patch #4: factor rb_eval() to reduce the size of its stack frame #5: replace recursive stack_extend() in eval.c, replace GC.stress with GC.limit= My stack clearing patch is quite small, however it does tend to clear the same areas repeatedly. The difficultly I had avoiding this was that one could not know exactly when the GC would occur. If it always kept occurring when the stack was deep, clearing the stack just before GC would have no real effect on the "ghost references" still on it. I'd be interested if anyone knows a way to cope with this without repeated zeroing the stack "just in case" whenever it is shallow. In any case, like you, I didn't notice any measurable slowing of Ruby due to clearing the stack this way -- just much reduced memory usage. It may well be that the time for stack clearing is more than offset by the quicker GC passes. - brent
on 2008-12-21 08:41
I've finally put together the promised set of patches against version 1.8.7-p72 and posted them at: http://sites.google.com/site/brentsrubypatches From that page: Aside from bug fixes, the primary goal of these patches is to reduce the memory consumption of the 1.8 series Ruby interpreters. Happily, these same techniques tend also to increase the speed of most applications, but speed increase was not my primary concern. Each of the six patches below (mbari1-6) fixes a specific problem with or optimizes some facet of the Ruby interpreter. The patches were intended to be applied in order, starting with official interpreter release 1.8.7-patchlevel72 from ruby-lang.org. However, you may be able to apply only a subset of them if you don't want a particular feature or optimization. Until more people test them, this must all be treated as alpha quality software. ... My development environment today is 32-bit Intel x86 Linux compiling with gcc version 4.3.2. I've tried to keep these patches portable to other platforms, but will make no such claims until others have tested them there. If you test these under MS-Windows, I'll be interested and try to be helpful, but I won't be able to verify your results. Please post any bugs, flames, benchmark results, requests for improvement, etc. to the ruby-core mailing list by replying to this message.
on 2008-12-21 08:51
I've finally put together the promised set of patches against version 1.8.7-p72 and posted them at: http://sites.google.com/site/brentsrubypatches From that page: Aside from bug fixes, the primary goal of these patches is to reduce the memory consumption of the 1.8 series Ruby interpreters. Happily, these same techniques tend also to increase the speed of most applications, but speed increase was not my primary concern. Each of the six patches below (mbari1-6) fixes a specific problem with or optimizes some facet of the Ruby interpreter. The patches were intended to be applied in order, starting with official interpreter release 1.8.7-patchlevel72 from ruby-lang.org. However, you may be able to apply only a subset of them if you don't want a particular feature or optimization. Until more people test them, this must all be treated as alpha quality software. ... My development environment today is 32-bit Intel x86 Linux compiling with gcc version 4.3.2. I've tried to keep these patches portable to other platforms, but will make no such claims until others have tested them there. If you test these under MS-Windows, I'll be interested and try to be helpful, but I won't be able to verify your results. Please post any bugs, flames, benchmark results, requests for improvement, etc. to the ruby-core mailing list by replying to this message.
on 2008-12-21 09:02
These look like awesome patches Brent! Thanks for making them available. I will play with them over the hol;idays and let me know what I come up with for some larger apps. Cheers- -Ezra On Dec 20, 2008, at 11:42 PM, Brent Roman wrote: > memory consumption of the 1.8 series Ruby interpreters. Happily, > 1.8.7-patchlevel72 from ruby-lang.org. However, you may be able to > platforms, but will make no such claims until others have tested > -- > View this message in context: http://www.nabble.com/-ruby-core%3A19846---Bug--74... > Sent from the ruby-core mailing list archive at Nabble.com. > > Ezra Zygmuntowicz ez@engineyard.com
on 2008-12-21 10:15
Just finished running the standard regression test suite with both
unpatched and patched versions of 1.8.7.
I think the results are encouraging, but there are a couple issues:
Process Size Inital/Final User's CPU time (from
the
time command)
Unpatched 1.8.7-p72: 30MB/97MB 92 seconds
MBARI 6 atop 1.8.7-p2: 30MB/57MB 100 seconds
The patched version reports one additional failure:
2) Failure:
test_should_propagate_signaled(TestBeginEndBlock)
[./ruby/test_beginendblock.rb:81]:
<""> expected to be =~
</Interrupt$/>.
1878 tests, 1344988 assertions, 2 failures, 0 errors
real 2m35.696s
user 1m39.422s
sys 0m3.284s
And, the drb test segfaults with the patched version.
(so I removed it for both the patched and unpatched for comparason)
Looks like I also will be playing with these patches over the holidays.
Enjoy,
- brent
on 2008-12-21 17:20
Brent Roman wrote: > I've finally put together the promised set of patches against version > 1.8.7-p72 and posted them at: > > http://sites.google.com/site/brentsrubypatches Awesome work! Very good explanations.
on 2008-12-22 09:39
First thanks for doing all that hard work. I'm sure it's not pleasant to try and figure this all out, and you seem to have done a very thorough job :) A few questions. > Process Size Inital/Final User's CPU time (from the time command) > Unpatched 1.8.7-p72: 30MB/97MB 92 seconds > MBARI 6 atop 1.8.7-p2: 30MB/57MB 100 seconds Is this the time to complete test-all? I wonder why it uses more total time... :) [the RAM usage looks nice though]. Makes me wish we had similar patches for 1.9, too [running make test-all on 1.9 for me typically uses like 400MB RSS for some reason...]. > The patched version reports one additional failure: > 2) Failure: > test_should_propagate_signaled(TestBeginEndBlock) > [./ruby/test_beginendblock.rb:81]: > <""> expected to be =~ > </Interrupt$/>. Does it report this consistently? Interestingly, with 1.8.6 HEAD on mingw currently I get this: 3) Failure: test_should_propagate_signaled(TestBeginEndBlock) [../ruby_1_8/test/ruby/test_beginendblock.rb:83]: <nil> expected but was <3>. > And, the drb test segfaults with the patched version. > (so I removed it for both the patched and unpatched for comparason) Maybe you could post a gdb backtrace [in case someone can figure out what's going on...] Question--The install instructions mention using -mpreferred-stack-boundary=2, though in the writeup it says it helps only slightly--but you recommend it because it still helps? re MBARI2: gc sometimes segfaults: do you have any examples of how it does this? So these old frames are collected but not really--is that what happens? re: MBARI3 is it possible to use memzero to forcefully overwrite local variables [though as you pointed out, it would still leave temporaries]. Are there any other culprits besides rb_eval [and doesn't eval get called fairly rarely so this isn't a help for most progs?] You mention that after this the callcc stuff should work--do you think that only applying this one patch should be sufficient for that to happen? why remove the dynamic malloc_limit? One thing you might want to try would be the ruby benchmark suite with and without [1]. MBARI5 : ruby extends the stack when it needs to thread shift from a "smaller stack" thread to a larger stack thread, is that right? After shifting to a smaller stack might be a good time to clean the stack... re: MBARI6 question: why are these included with 5 other gc patches? [besides that they're cool and useful]? Might be convenient to just include the 1.9 style syntax by default [I might could come up with a patch for it].:) re: sourceref--it might be convenient to tie in with SCRIPT_LINES__ stuff, perhaps [thanks to nobu for pointing out its existence recently to me]. I suppose my only wish list for these would be that it didn't clear the stack but once per thread per GC. I might could help out sometime with it. Thanks much for your work on these. I'll give them a shot on windows mingw/linux by next weekend. Cheers. -=r [1] http://github.com/acangiano/ruby-benchmark-suite/tree/master
on 2008-12-22 11:08
Roger, I just updated the patches at: http://sites.google.com/site/brentsrubypatches to fix the bug that was causing the drb test suite to segfault. All the test suites now run to completion. Responses to your questions: R: Is this the time to complete test-all?, What patches for about 1.9?, Why slower? B: This is the time to complete the command: ruby runner.rb in the test subdir of the 1.8.7p72 directory. I suspect that the unpatched interpreter is leaking throughout the execution of the tests. Process size just keep increasing. With these patches is stabilizes about 1/3 the way through. These techniques may work with v1.9 as my understanding is that the GC is largely unchanged. Apps that don't swap context much will be a few percent slower. Those that do should be faster. There certainly is more that can be down to optimize the stack clearing. My initial goal was to plug the memory leaks so that Ruby apps could run for long periods without swapping (or worse). In practice, once a Ruby process starts swapping to virtual memory, its performance degrades much more than a few percent. R: > The patched version reports one additional failure: > 2) Failure: > test_should_propagate_signaled(TestBeginEndBlock) > [./ruby/test_beginendblock.rb:81]: > <""> expected to be =~ > </Interrupt$/>. Does it report this consistently? B: Funny you should ask that... No, it does not fail consistently. Any ideas what's happening here? It does feel like the same problem you see with or mingw port. R: The install instructions mention using -mpreferred-stack-boundary=2, though in the writeup it says it helps only slightly--but you recommend it because it still helps? B: Yes, stack-boundary=2 helps keep the frames a little smaller. For a multi-threaded app, this is probably worth the little performance hit. For a single threaded app, it may be better to leave out the -mpreferred-stack-boundary=2 We need more benchmarking to tell. Ruby should no longer leak memory regardless. R: re: MBARI2: gc sometimes segfaults: do you have any examples of how it does this? So these old frames are collected but not really--is that what happens? B: Have a look at this post of mine dated 12/03/07 http://markmail.org/message/jjmqzsxenp7oaojm R: re: MBARI3 is it possible to use memzero to forcefully overwrite local variables [though as you pointed out, it would still leave temporaries]. Are there any other culprits besides rb_eval [and doesn't eval get called fairly rarely so this isn't a help for most progs?] B: I suspect memzero would be slower than the tight loop I have zeroing the stack now. In any case, the temporaries are critically important. rb_eval is the 800 pound gorrilla :-) R: You mention that after this the callcc stuff should work--do you think that only applying this one patch should be sufficient for that to happen? B: I think so. However, I'd recommend installing at least MBARI2 as well to improve performance. R: why remove the dynamic malloc_limit? B: Because I believe the malloc_limit should be tuned for your target environment. In a target with 32MB DRAM, malloc_limit should not be 8MB and I certainly don't want it to increase on its own. Remember, once Ruby starts swapping, performance goes into the toilet. I probably won't be motivated enough to benchmark it. A few percent run time change does not matter much to me. I want my app to run for months at a time and to play nice with others. R: MBARI5 : ruby extends the stack when it needs to thread shift from a "smaller stack" thread to a larger stack thread, is that right? After shifting to a smaller stack might be a good time to clean the stack... B: The MBARI3 patch updates the stack extent at a number of points, including on every context switch, but it defers clearing it until the next CHECKINTS(), when the stack is likely to be smaller still. Even so, optimizing this further is definitely possible. I've considered only clearing the stack after GC.increase rises to 75% of GC.limit, for instance. R: re: MBARI6 question: why are these included with 5 other gc patches? [besides that they're cool and useful]? Might be convenient to just include the 1.9 style syntax by default [I might could come up with a patch for it].:) B: MBARI6 probably should have been packaged separately. My __line__ and __file__ patches predate the 1.9 stuff by about 5 years. See: http://markmail.org/message/ybrbhvvzlhyv552y I did think of redoing them in the 1.9 style, but I don't particularly like the idea of returning an array in this context, where numbered indices replace named attributes. In any case, I can emulate the 1.9 style methods with a tiny bit of Ruby glue. R: I suppose my only wish list for these would be that it didn't clear the stack but once per thread per GC. I might could help out sometime with it. B: That's on my wish list too. I'd be very grateful for any help, even just discussing ideas. - brent
on 2008-12-22 12:12
Hey Brent, Thanks for patches man. I am yet to dig deeper, but I benchmarked rails against them: Here is the Average request/response for patched version: Requests per second: 234.77 [#/sec] (mean) Time per request: 42.594 [ms] (mean) Time per request: 4.259 [ms] (mean, across all concurrent requests) Transfer rate: 108.82 [Kbytes/sec] received Memory usage stayed around 30MB For Stock Ruby version: Requests per second: 138.48 [#/sec] (mean) Time per request: 72.214 [ms] (mean) Time per request: 7.221 [ms] (mean, across all concurrent requests) Transfer rate: 64.21 [Kbytes/sec] received Memory usage stayed around 53 MB I compiled both ruby versions without "--disable-pthread" and was wondering if your patches modify anything there. On Mon, Dec 22, 2008 at 3:29 PM, Brent Roman <brent@mbari.org> wrote: > > of the tests. > performance degrades much more than a few percent. > Does it report this consistently? > B: Yes, stack-boundary=2 helps keep the frames a little smaller. > > B: > > environment. > "smaller stack" thread to a larger stack thread, is that right? After > R: > I did think of redoing them in the 1.9 style, but I don't particularly like > with it. > View this message in context: http://www.nabble.com/-ruby-core%3A19846---Bug--74... > Sent from the ruby-core mailing list archive at Nabble.com. > > > -- Let them talk of their oriental summer climes of everlasting conservatories; give me the privilege of making my own summer with my own coals. http://gnufied.org
on 2008-12-23 00:39
On Mon, 22 Dec 2008 20:59:05 +1100, Brent Roman <brent@mbari.org> wrote: > I suspect memzero would be slower than the tight loop I have zeroing the > stack now. In my experience on x86 architecture using GCC, "memset(p, 0, len)" is substantially faster than a tight loop (between 2 & 10 times faster depending whether the loop is byte-by-byte or word-by-word). This is because GCC knows to optimize "memset" inline to a single instruction (or close to it). Mike
on 2008-12-23 05:36
My patches don't mess with any of the pthread stuff. I'm a pleasantly surprised by your rails benchmark results. I would have expected this memory savings, but I can't think of why a single threaded application like Rails (that doesn't use Continuations), would see the sort of speed up you observed. I'd expect it to be 2 to 10 percent slower unless it was doing a lot of context switches. I did get an off list response from a chinese website that confirms the rails memory savings, but they said there was no change in speed. Does your Rails application use threads or continuations? Are you comparing ruby built from the same source tarball with the same compiler options before and after patching? - brent
on 2008-12-23 07:37
Hi, Brent: I have test MBARI patch on JavaEye.com (http://www.javaeye.com) , that is a chinese software development community website which has 200,000 members and 800,00 pageviews per day. JavaEye is written by Ruby on Rails and running with lighttpd/fastcgi mode. The server environment: AMD64 machine, SuSE Linux x86-64, ruby 1.8.7-p72 and Rails 2.1.2. I test Rails app performance and memory usage with 4 ruby implements: 1. ruby MRI 1.8.7-p72 2. ruby 1.8.7-p72 with Railsbench GC patch and set GC variables below: RUBY_HEAP_MIN_SLOTS=600000 RUBY_HEAP_SLOTS_INCREMENT=600000 RUBY_HEAP_FREE_MIN=100000 RUBY_GC_MALLOC_LIMIT=60000000 3. ruby 1.8.7-p72 with MBARI patch. 4. ruby 1.8.7-p72 with MBARI patch but I modified GC variables in gc.c same as above. Test one: Simple Rails app I create a simple rails app to test rails routes and template rendering: ab -c 1 -n 1000 http://localhost:3000/test/index ruby version performance memory ----------------------------------------------------- ruby 106 request/s 39MB ruby GC patch 125 request/s 60MB ruby MBARI patch 160 request/s 35MB ruby MBARI merge GC patch 173 request/s 60MB Test One Summary: MBARI patch save a little memory than MRI but improve rails performance significantly. Test two: Real Rails website test I select two typical page on JavaEye.com to benchmark: Page 1 : http://robbin.javaeye.com/ ab -c 1 -n 100 http://robbin.joinnet.cn/ ruby version performance memory ----------------------------------------------------- ruby 1.69 request/s 136MB ruby GC patch 2.81 request/s 179MB ruby MBARI patch 1.96 request/s 103MB ruby MBARI merge GC patch 2.90 request/s 158MB Page 2: http://robbin.joinnet.cn/blog/283992 ruby version performance memory ----------------------------------------------------- ruby 2.20 request/s 136MB ruby GC patch 3.61 request/s 179MB ruby MBARI patch 2.47 request/s 103MB ruby MBARI merge GC patch 3.73 request/s 158MB Test Two Summary: 1. MBARI patch not only save a lot of memory than MRI but also improve rails performance about 13% 2. MBARI merge with Railsbench GC patch win others with the highest rails performance and save some memory than Railsbench GC patch. My suggest: 1. MBARI patch has some uncompatible with complicated Regexp. for example, I met this error: premature end of regular expression: /0ãk \000\000\000\000x/ On line #12 of blog/index/_blog.rhtml 2. I wish MBARI merge Railsbench GC patch, because Railsbench GC patch has a lot of rails performance improvement on JavaEye.com website. 3. I expect MBARI merge into ruby trunk :)
on 2008-12-23 08:10
Hi On Tue, Dec 23, 2008 at 9:57 AM, Brent Roman <brent@mbari.org> wrote: > rails memory savings, but > they said there was no change in speed. Are you talking about (http://www.javaeye.com)? > > Does your Rails application use threads or continuations? No, I was just benchmarking a hello world rails application. > > Are you comparing ruby built from the same source tarball with the same > compiler options before and after patching? Yes. Essentially before patching and after patching.
on 2008-12-23 08:17
Hi Robbin, You are the second to observe these patches improving Rails performance. I really did not expect this. All I can suppose is that the smaller call stack caused by the MBARI4 patch is saving more GC time than is spent by the stack clearing of the MBARI3 patch. Someone running rails would have to instrument their code to record the total time spent in GC in order to prove or disprove this. Regarding your regex failure: There was a bug in the patches originally posted to the website on the December 19th. It was corrected yesterday. If the output of ruby -v is: ruby 1.8.7 (2008-12-19 MBARI 6 on patchlevel 72) ... You have downloaded the original version with the bug. If so, please download the patches again and retest. ruby -v should output: ruby 1.8.7 (2008-12-21 MBARI 6 on patchlevel 72) ... If you can get the regex problem to occur with the latest patches, please try to create and post a self contained test program that demonstrates it. Thanks for your benchmarks, - brent
on 2008-12-23 08:40
I met Regexp error on MBARI 2008-12-21 version, which occur when we use Rails sanitize helper to format html fragments. But I haven't replay this error yet. If I focus it, I will report to you.
on 2008-12-24 08:42
Mike, Certainly, if one copies byte-at-a-time, performance will be awful. I'm copying aligned words one ruby VALUE sized word at a time. As an experiment, I tried substituting memset for my tight stack clearing loop... and discovered that memset() is actually quite a large function, and gcc does not inline it. It is large because, in this context, the compiler cannot tell that the pointers are already long-word aligned and that we are copying an integer number of long words. So it emits code to copy bytes on either end. And, since we're trying to clear memory from the current stack pointer down, we must also add a kludgey offset to avoid wiping memset()'s own stack frame. If anyone else wants to try this on an x86, in rubysig.h, change: #define __stack_zero_down(end,sp) while (end <= --sp) *sp=0 to: #define __stack_zero_down(end,sp) \ if (sp-6 > end) memset(end, 0, (void *)(sp-6)-(void*)end) My tiny "bogus1" and "bogus2" show no measurable improvement, but perhaps it might help for a larger application. On the other hand... Very recently, folks who've looked into this far more intensively than I concluded that an unrolled 'C' loop was better than the venerable rep stols assembly instructions used by x86 gcc's __built_in_memset(). See: http://sourceware.org/ml/newlib/2008/msg00286.html They note that microcoded instructions are slower than simple ones for the modern x86 (RISC-ish) execution cores. The fastest way to clear memory these days is supposedly to use MMX instructions. (I'm not going there, but I welcome others to explore where that might lead :-) - brent
on 2008-12-25 05:36
On Wed, 24 Dec 2008 18:32:57 +1100, Brent Roman <brent@mbari.org> wrote: > As an experiment, I tried substituting memset for my tight stack clearing > loop... > > and discovered that memset() is actually quite a large function, > and gcc does not inline it. It is large because, in this context, the > compiler > cannot tell that the pointers are already long-word aligned and that we > are copying an integer number of long words. So it emits code to copy > bytes on either end. Try using the gcc option "-minline-all-stringops". I think that should force memset (and other stuff) to be inlined. > They note that microcoded instructions are slower than simple ones for > the modern x86 (RISC-ish) execution cores. The fastest way to clear > memory these days is supposedly to use MMX instructions. > (I'm not going there, but I welcome others to explore where that might > lead Thanks for this reference. I got the impression that he was saying that: - Memset on GCC 3.4 could be slower than his C tight loop when working on unaligned data. However I thhink that this may be fixed in GCC 4. - "rep stosl" was fastest when working on 8-byte aligned data on some x86 platforms. His assembly patch seems to set the first few bytes until it gets to an address divisible by 8, then uses "rep stosl" from there. I think GCC 4.3.2 seems to do 4 byte aligned copies using "rep stosl" when inlined. However his code ALWAYS did a function call to memset or a version of it, so it is not clear whether the function call overhead makes much difference compared to inlining the memset call. The fact that you didn't notice much difference between the C loop and a function call to memset() seems to imply that this optimization may not be all that important to ruby stack clearing. It really depends on how often it is called, and how much it is clearing at a time. It is probably worth benchmarking a little more, but I may be barking up the wrong tree here! Cheers Mike
on 2008-12-25 07:18
I just had a quick play with the gcc option "-minline-all-stringops". It was definitely a step in the right direction. Because it in-lined the memset, I could safely remove the offset kludge (as there was no longer a memset() stack frame to preserve) But, the compiler still emitted (useless) code to longword align after the main block of the memset operation. This reformulation of the macro eliminates that (and removes the offset): #define __stack_zero_down(end,sp) \ if (sp > end) memset(end, 0, (sp-end)*sizeof(VALUE)) Now the generated code looks quite clean: movl %edx, %ecx subl %edi, %ecx andl $-4, %ecx cmpl $4, %ecx jb .L1508 ;skip if sp<=end shrl $2, %ecx xorl %eax, %eax rep stosl However, I still don't see any improvement on my little benchmarks. If someone comes up with an app or test case where these patches appear to slow things down, then I'll ask them to try this alternative and perhaps we'll see an improvement. I'm leery of this technique because, if you omit -minline-all-stringops, one must offset the stack pointer for the size of the memset() frame to preserve it, otherwise the memset causes a segfault. This optimization is very machine/compiler dependent and the gain is not yet demonstrated. But, it's reassuring to have worked it out. Thanks for the tip! - brent
on 2008-12-26 23:17
Seems to overall be a tidge slower for "micro" stuff--5 or 10%. viz: lloyd gc bench: 187 unpatched: arrays_read.rb time 0.072516 arrays_read_yaml.rb time 0.671292 classes_read.rb time 0.040723 classes_read_yaml.rb time 0.736394 create_arrays.rb time 0.165607 create_arrays_yaml.rb time 7.638495 create_hashes.rb time 0.136778 create_hashes_yaml.rb time 20.888187 create_ostructs.rb time 2.028835 create_ostructs_yaml.rb time 10.707594 create_weak_hashes.rb time 0.946386 create_weak_hashes2.rb time 0.389543 growarray.rb time 1.788691 hashes_read.rb time 0.037333 hashes_read_yaml.rb time 1.687161 ostruct_read.rb time 1.691467 ostruct_read_yaml.rb time 1.05634 plist.rb time 4.27333 shrinkarray.rb time 1.751121 weak_hashes_read.rb time 0.293751 187 patched: arrays_read.rb time 0.060988 arrays_read_yaml.rb time 0.706926 classes_read.rb time 0.041115 classes_read_yaml.rb time 0.736123 create_arrays.rb time 0.171677 create_arrays_yaml.rb time 7.715646 create_hashes.rb time 0.121288 create_hashes_yaml.rb time 21.457203 create_ostructs.rb time 2.020391 create_ostructs_yaml.rb time 11.011948 create_weak_hashes.rb time 1.035461 create_weak_hashes2.rb time 0.381697 growarray.rb time 1.865321 hashes_read.rb time 0.0376 hashes_read_yaml.rb time 1.802083 ostruct_read.rb time 1.705456 ostruct_read_yaml.rb time 1.108687 plist.rb time 4.64743 shrinkarray.rb time 1.833105 weak_hashes_read.rb time 0.293376 But that's for micro-benchmarks. I think the reason we see people's performance increase is that since the GC is suddenly more effective, it doesn't get called as often. A big win for larger apps. Overall I'd call it a large win for Ruby in terms of being much more stable size-wise in a multi-threaded environment and suggest their incorporation verbatim. All 6 :) raw ruby-benchmark-suite comparison is in the footnote. Note a few things: one test erred with 187 normal but succeeded with MBARI patches (core-library/bm_so_concatenate.rb) the threaded tests do indeed run faster with MBARI. normal: core-library/bm_vm3_thread_create_join.rb,0.20678186416626 patched: core-library/bm_vm3_thread_create_join.rb,0.0140390396118164 Some other thoughts I've had are that theoretically you only need to clear the stack once between GC's, so you may be able to just keep a "range already cleared" per thread or what not, and reset it after each GC. This would especially work if rb_thread_alone is true. You might be able to get away with only checking for stack depth once every CHECK_INT [instead of with xmalloc]. Maybe even clear the stack only at ruby_stack_check [though this is probably too infrequent]. I did a small experiment with memset versus tight loop and [somehow] a tight loop seems to win. I think there is some potential for optimization if you were to use fixed 2K heap chunks and binary search for is_pointer_to_heap [with cacheing of the most recently found heap chunk to help save on speed]. Theoretically it might bring RAM usage down even further [1.9 does this]. I know that at least for me I will definitely use these for my own apps so that they have more control for memory. Re: javaeye.com speed "almost the same" with railsbench GC patch + these versus just railsbench GC patch--I think that what is happening in this case is that GC is being called only when the freelist is used up, since the malloc_limit is so large. Tough to know how to speed it up in that case [except for running GC in a different process and earlier]. Thanks for your hard work. I think it was something a few of us had thought necessary but never got up the gumption to do :) -=r Some raw data [to me this means little compared to the rails stuffs reported earlier]. ruby-benchmark-suite with patch: Benchmark Name,Time #1,Time #2,Average Time,Standard Deviation,Input Size Startup,0.00860691070556641,0.00712394714355469,0.007865428924561,0.000741481781006,n/a real-world/bm_hilbert_matrix.rb,0.0715880393981934,0.0691721439361572,0.070380091667175,0.001207947731018,10 real-world/bm_hilbert_matrix.rb,0.705732822418213,0.707005977630615,0.706369400024414,0.000636577606201,20 real-world/bm_hilbert_matrix.rb,2.71448302268982,2.73366808891296,2.724075555801392,0.009592533111572,30 real-world/bm_hilbert_matrix.rb,7.9450159072876,8.08562898635864,8.015322446823120,0.070306539535522,40 standard-library/bm_app_mandelbrot.rb,3.50128412246704,3.50250101089478,3.501892566680908,0.000608444213867,n/a micro-benchmarks/bm_meteor_contest.rb,47.9955010414124,48.7175140380859,48.356507539749146,0.361006498336792,n/a micro-benchmarks/bm_app_pentomino.rb,149.979510068893,150.394422769547,150.186966419219971,0.207456350326538,n/a micro-benchmarks/bm_fasta.rb,63.8084781169891,54.929356098175,59.368917107582092,4.439561009407043,n/a micro-benchmarks/bm_fannkuch.rb,0.0105619430541992,0.0125219821929932,0.011541962623596,0.000980019569397,6 micro-benchmarks/bm_fannkuch.rb,0.777202129364014,0.779121875762939,0.778162002563477,0.000959873199463,8 micro-benchmarks/bm_fannkuch.rb,85.7568709850311,85.9479658603668,85.852418422698975,0.095547437667847,10 micro-benchmarks/bm_nbody.rb,15.366986989975,15.376590013504,15.371788501739502,0.004801511764526,n/a micro-benchmarks/bm_reverse_compliment.rb,8.94560790061951,8.99379897117615,8.969703435897827,0.024095535278320,n/a micro-benchmarks/bm_quicksort.rb,7.39512896537781,7.40803289413452,7.401580929756165,0.006451964378357,n/a micro-benchmarks/bm_mergesort.rb,4.32359004020691,4.32240605354309,4.322998046875000,0.000591993331909,n/a micro-benchmarks/bm_nsieve_bits.rb,36.2198901176453,36.2111361026764,36.215513110160828,0.004377007484436,n/a micro-benchmarks/bm_mandelbrot.rb,115.146002054214,115.128952026367,115.137477040290833,0.008525013923645,n/a micro-benchmarks/bm_lucas_lehmer.rb,20.1205780506134,20.1091129779816,20.114845514297485,0.005732536315918,9689 micro-benchmarks/bm_lucas_lehmer.rb,21.7712378501892,21.766205072403,21.768721461296082,0.002516388893127,9941 micro-benchmarks/bm_lucas_lehmer.rb,31.5163369178772,31.5209879875183,31.518662452697754,0.002325534820557,11213 micro-benchmarks/bm_lucas_lehmer.rb,Timeout: 150.00 seconds,,,,19937 micro-benchmarks/bm_fractal.rb,50.184531211853,50.1739339828491,50.179232597351074,0.005298614501953,n/a micro-benchmarks/bm_knucleotide.rb,2.19779801368713,2.3165180683136,2.257158041000366,0.059360027313232,n/a micro-benchmarks/bm_monte_carlo_pi.rb,27.104642868042,27.153263092041,27.128952980041504,0.024310111999512,n/a micro-benchmarks/bm_word_anagrams.rb,13.1331388950348,12.0968029499054,12.614970922470093,0.518167972564697,n/a micro-benchmarks/bm_binary_trees.rb,101.144680023193,102.439230918884,101.791955471038818,0.647275447845459,n/a micro-benchmarks/bm_spectral_norm.rb,1.51863193511963,1.51901316642761,1.518822550773621,0.000190615653992,n/a micro-benchmarks/bm_nsieve.rb,33.2746829986572,33.2826149463654,33.278648972511292,0.003965973854065,n/a micro-benchmarks/bm_regex_dna.rb,5.48113203048706,6.14786696434021,5.814499497413635,0.333367466926575,n/a micro-benchmarks/bm_sum_file.rb,14.6941390037537,14.5948259830475,14.644482493400574,0.049656510353088,n/a micro-benchmarks/bm_partial_sums.rb,37.6713261604309,37.645623922348,37.658475041389465,0.012851119041443,n/a micro-benchmarks/bm_so_sieve.rb,116.000460863113,116.059950828552,116.030205845832825,0.029744982719421,n/a core-features/bm_vm1_rescue.rb,0.296025037765503,0.280431985855103,0.288228511810303,0.007796525955200,n/a core-features/bm_vm1_length.rb,20.3501679897308,20.2971909046173,20.323679447174072,0.026488542556763,10 core-features/bm_vm1_length.rb,20.3735370635986,20.3696429729462,20.371590018272400,0.001947045326233,100 core-features/bm_vm1_length.rb,20.3027820587158,20.3756489753723,20.339215517044067,0.036433458328247,1000 core-features/bm_vm1_length.rb,20.3291130065918,20.3042759895325,20.316694498062134,0.012418508529663,10000 core-features/bm_so_ackermann.rb,0.0445468425750732,0.0442321300506592,0.044389486312866,0.000157356262207,5 core-features/bm_so_ackermann.rb,0.727099895477295,0.724959135055542,0.726029515266418,0.001070380210876,7 core-features/bm_so_ackermann.rb,12.0094769001007,11.994206905365,12.001841902732849,0.007634997367859,9 core-features/bm_vm2_poly_method.rb,4.42595386505127,4.39683985710144,4.411396861076355,0.014557003974915,1000000 core-features/bm_vm2_poly_method.rb,8.86316585540771,8.91431498527527,8.888740420341492,0.025574564933777,2000000 core-features/bm_vm2_poly_method.rb,18.3531260490417,18.3768548965454,18.364990472793579,0.011864423751831,4000000 core-features/bm_vm2_poly_method.rb,35.1964910030365,35.3743059635162,35.285398483276367,0.088907480239868,8000000 core-features/bm_app_tak.rb,0.170708179473877,0.170413970947266,0.170561075210571,0.000147104263306,5 core-features/bm_app_tak.rb,0.616703987121582,0.613183975219727,0.614943981170654,0.001760005950928,6 core-features/bm_app_tak.rb,1.96985197067261,1.96115493774414,1.965503454208374,0.004348516464233,7 core-features/bm_so_random.rb,0.248342990875244,0.251052856445312,0.249697923660278,0.001354932785034,100000 core-features/bm_so_random.rb,1.25759100914001,1.2419810295105,1.249786019325256,0.007804989814758,500000 core-features/bm_so_random.rb,2.50672101974487,2.487135887146,2.496928453445435,0.009792566299438,1000000 core-features/bm_vm1_swap.rb,9.50407981872559,9.48183703422546,9.492958426475525,0.011121392250061,10000000 core-features/bm_vm1_swap.rb,19.0615699291229,19.1599650382996,19.110767483711243,0.049197554588318,20000000 core-features/bm_vm1_swap.rb,37.5512471199036,37.8161060810089,37.683676600456238,0.132429480552673,40000000 core-features/bm_app_fib.rb,0.02223801612854,0.0222301483154297,0.022234082221985,0.000003933906555,20 core-features/bm_app_fib.rb,2.72851300239563,2.7235860824585,2.726049542427063,0.002463459968567,30 core-features/bm_app_fib.rb,30.3188951015472,30.1245629787445,30.221729040145874,0.097166061401367,35 core-features/bm_vm2_zsuper.rb,0.935019969940186,0.979220867156982,0.957120418548584,0.022100448608398,1000000 core-features/bm_vm2_zsuper.rb,1.99105310440063,1.85130095481873,1.921177029609680,0.069876074790955,2000000 core-features/bm_vm2_zsuper.rb,3.85666489601135,3.85760307312012,3.857133984565735,0.000469088554382,4000000 core-features/bm_vm2_zsuper.rb,7.69519710540771,7.70335698127747,7.699277043342590,0.004079937934875,8000000 core-features/bm_app_factorial.rb,0.00926995277404785,0.00803399085998535,0.008651971817017,0.000617980957031,1000 core-features/bm_app_factorial.rb,0.0369241237640381,0.0322320461273193,0.034578084945679,0.002346038818359,2000 core-features/bm_app_factorial.rb,0.245321989059448,0.241642951965332,0.243482470512390,0.001839518547058,5000 core-features/bm_app_factorial.rb,Error: stack level too deep,,,,10000 core-features/bm_app_tarai.rb,6.74241399765015,6.74293112754822,6.742672562599182,0.000258564949036,3 core-features/bm_app_tarai.rb,8.1383171081543,8.13438200950623,8.136349558830261,0.001967549324036,4 core-features/bm_app_tarai.rb,9.85991907119751,9.85500311851501,9.857461094856262,0.002457976341248,5 core-features/bm_vm1_const.rb,9.00359511375427,18.2473177909851,13.625456452369690,4.621861338615417,n/a core-features/bm_so_nested_loop.rb,0.00854110717773438,0.00856804847717285,0.008554577827454,0.000013470649719,5 core-features/bm_so_nested_loop.rb,0.471377849578857,0.481393098831177,0.476385474205017,0.005007624626160,10 core-features/bm_so_nested_loop.rb,5.18516802787781,5.36552095413208,5.275344491004944,0.090176463127136,15 core-features/bm_vm1_ensure.rb,0.0761599540710449,0.0757908821105957,0.075975418090820,0.000184535980225,100000 core-features/bm_vm1_ensure.rb,0.761255979537964,0.760828971862793,0.761042475700378,0.000213503837585,1000000 core-features/bm_vm1_ensure.rb,7.46813201904297,7.48478889465332,7.476460456848145,0.008328437805176,10000000 core-features/bm_vm2_proc.rb,1.35628509521484,1.35869193077087,1.357488512992859,0.001203417778015,1000000 core-features/bm_vm2_proc.rb,2.70739102363586,2.70443820953369,2.705914616584778,0.001476407051086,2000000 core-features/bm_vm2_proc.rb,5.41131019592285,5.4192328453064,5.415271520614624,0.003961324691772,4000000 core-features/bm_vm2_proc.rb,10.8037610054016,10.8294909000397,10.816625952720642,0.012864947319031,8000000 core-features/bm_loop_times.rb,5.33636403083801,5.50595808029175,5.421161055564880,0.084797024726868,10000000 core-features/bm_loop_times.rb,10.5615899562836,10.6464350223541,10.604012489318848,0.042422533035278,20000000 core-features/bm_loop_times.rb,16.3860912322998,15.2040379047394,15.795064568519592,0.591026663780212,30000000 core-features/bm_vm2_unif1.rb,0.562043190002441,0.56634783744812,0.564195513725281,0.002152323722839,1000000 core-features/bm_vm2_unif1.rb,1.14902997016907,1.12535285949707,1.137191414833069,0.011838555335999,2000000 core-features/bm_vm2_unif1.rb,2.30900192260742,2.34629487991333,2.327648401260376,0.018646478652954,4000000 core-features/bm_vm2_unif1.rb,4.63608813285828,4.79328298568726,4.714685559272766,0.078597426414490,8000000 core-features/bm_vm1_simplereturn.rb,6.01986694335938,6.00216197967529,6.011014461517334,0.008852481842041,10000000 core-features/bm_vm1_simplereturn.rb,12.1239230632782,12.0794620513916,12.101692557334900,0.022230505943298,20000000 core-features/bm_vm1_simplereturn.rb,17.8927519321442,17.8568298816681,17.874790906906128,0.017961025238037,30000000 core-features/bm_loop_whileloop.rb,0.0550401210784912,0.0549750328063965,0.055007576942444,0.000032544136047,100000 core-features/bm_loop_whileloop.rb,0.55099892616272,0.550863981246948,0.550931453704834,0.000067472457886,1000000 core-features/bm_loop_whileloop.rb,5.51106715202332,5.50824809074402,5.509657621383667,0.001409530639648,10000000 core-features/bm_vm2_send.rb,0.680418014526367,0.68721079826355,0.683814406394958,0.003396391868591,1000000 core-features/bm_vm2_send.rb,1.44064593315125,1.3845579624176,1.412601947784424,0.028043985366821,2000000 core-features/bm_vm2_send.rb,2.74571800231934,2.76275110244751,2.754234552383423,0.008516550064087,4000000 core-features/bm_vm2_send.rb,5.66572713851929,5.52676200866699,5.596244573593140,0.069482564926147,8000000 core-features/bm_vm1_block.rb,0.0927438735961914,0.0923471450805664,0.092545509338379,0.000198364257812,100000 core-features/bm_vm1_block.rb,0.925873994827271,0.922999858856201,0.924436926841736,0.001437067985535,1000000 core-features/bm_vm1_block.rb,9.22466993331909,9.25748586654663,9.241077899932861,0.016407966613770,10000000 core-features/bm_vm2_super.rb,0.868482828140259,0.879498958587646,0.873990893363953,0.005508065223694,1000000 core-features/bm_vm2_super.rb,1.7648811340332,1.7305908203125,1.747735977172852,0.017145156860352,2000000 core-features/bm_vm2_super.rb,3.53582000732422,3.50967812538147,3.522749066352844,0.013070940971375,4000000 core-features/bm_vm2_super.rb,7.03860211372375,7.0927209854126,7.065661549568176,0.027059435844421,8000000 core-features/bm_so_object.rb,2.1716628074646,2.17931509017944,2.175488948822021,0.003826141357422,500000 core-features/bm_so_object.rb,4.34661197662354,4.35301184654236,4.349811911582947,0.003199934959412,1000000 core-features/bm_so_object.rb,6.54321002960205,6.57184290885925,6.557526469230652,0.014316439628601,1500000 core-features/bm_app_raise.rb,6.15209579467773,6.16153502464294,6.156815409660339,0.004719614982605,n/a core-library/bm_so_exception.rb,15.2897758483887,15.3124811649323,15.301128506660461,0.011352658271790,n/a core-library/bm_so_concatenate.rb,94.1705470085144,86.1923739910126,90.181460499763489,3.989086508750916,5000 core-library/bm_so_concatenate.rb,Error: failed to allocate memory,,,,10000 core-library/bm_so_concatenate.rb,Error: failed to allocate memory,,,,15000 core-library/bm_so_count_words.rb,12.6775200366974,12.6623919010162,12.669955968856812,0.007564067840576,n/a core-library/bm_vm2_array.rb,0.747035026550293,0.747730016708374,0.747382521629333,0.000347495079041,1000000 core-library/bm_vm2_array.rb,1.49641180038452,1.49419784545898,1.495304822921753,0.001106977462769,2000000 core-library/bm_vm2_array.rb,2.99150490760803,2.99072194099426,2.991113424301147,0.000391483306885,4000000 core-library/bm_vm2_array.rb,5.98674607276917,5.98161792755127,5.984182000160217,0.002564072608948,8000000 core-library/bm_vm2_regexp.rb,0.963823080062866,0.955650806427002,0.959736943244934,0.004086136817932,10 core-library/bm_vm2_regexp.rb,1.10640692710876,1.1020519733429,1.104229450225830,0.002177476882935,100 core-library/bm_vm2_regexp.rb,2.06975698471069,2.06661009788513,2.068183541297913,0.001573443412781,1000 core-library/bm_vm2_regexp.rb,12.8606810569763,12.8909890651703,12.875835061073303,0.015154004096985,10000 core-library/bm_vm3_thread_create_join.rb,0.0140390396118164,0.0140399932861328,0.014039516448975,0.000000476837158,1000 core-library/bm_vm3_thread_create_join.rb,0.14684009552002,0.142390966415405,0.144615530967712,0.002224564552307,10000 core-library/bm_vm3_thread_create_join.rb,1.42791819572449,1.42644500732422,1.427181601524353,0.000736594200134,100000 core-library/bm_app_strconcat.rb,5.00886416435242,4.99663209915161,5.002748131752014,0.006116032600403,n/a core-library/bm_so_lists.rb,17.1406989097595,17.1364989280701,17.138598918914795,0.002099990844727,n/a core-library/bm_so_matrix.rb,2.97365689277649,2.97791600227356,2.975786447525024,0.002129554748535,n/a core-library/bm_pathname.rb,9.67186689376831,9.60481905937195,9.638342976570129,0.033523917198181,n/a core-library/bm_so_array.rb,9.70830893516541,9.71373295783997,9.711020946502686,0.002712011337280,n/a without patch: Benchmark Name,Time #1,Time #2,Average Time,Standard Deviation,Input Size Startup,0.00891494750976562,0.0071098804473877,0.008012413978577,0.000902533531189,n/a real-world/bm_hilbert_matrix.rb,0.0650041103363037,0.0641639232635498,0.064584016799927,0.000420093536377,10 real-world/bm_hilbert_matrix.rb,0.648383140563965,0.593698024749756,0.621040582656860,0.027342557907104,20 real-world/bm_hilbert_matrix.rb,2.48112893104553,2.53910279273987,2.510115861892700,0.028986930847168,30 real-world/bm_hilbert_matrix.rb,7.38045883178711,7.66462898254395,7.522543907165527,0.142085075378418,40 standard-library/bm_app_mandelbrot.rb,3.16141700744629,3.16698312759399,3.164200067520142,0.002783060073853,n/a micro-benchmarks/bm_meteor_contest.rb,44.2824368476868,45.2044949531555,44.743465900421143,0.461029052734375,n/a micro-benchmarks/bm_app_pentomino.rb,123.641210079193,124.70353603363,124.172373056411743,0.531162977218628,n/a micro-benchmarks/bm_fasta.rb,53.9337210655212,46.0915009975433,50.012611031532288,3.921110033988953,n/a micro-benchmarks/bm_fannkuch.rb,0.00894379615783691,0.0106801986694336,0.009811997413635,0.000868201255798,6 micro-benchmarks/bm_fannkuch.rb,0.672353982925415,0.676352024078369,0.674353003501892,0.001999020576477,8 micro-benchmarks/bm_fannkuch.rb,74.7806649208069,75.063551902771,74.922108411788940,0.141443490982056,10 micro-benchmarks/bm_nbody.rb,13.873694896698,13.8695569038391,13.871625900268555,0.002068996429443,n/a micro-benchmarks/bm_reverse_compliment.rb,8.97242403030396,8.94234395027161,8.957383990287781,0.015040040016174,n/a micro-benchmarks/bm_quicksort.rb,7.04030704498291,7.07406520843506,7.057186126708984,0.016879081726074,n/a micro-benchmarks/bm_mergesort.rb,3.57215404510498,3.57273411750793,3.572444081306458,0.000290036201477,n/a micro-benchmarks/bm_nsieve_bits.rb,30.060455083847,30.0687861442566,30.064620614051819,0.004165530204773,n/a micro-benchmarks/bm_mandelbrot.rb,97.2698559761047,97.3307840824127,97.300320029258728,0.030464053153992,n/a micro-benchmarks/bm_lucas_lehmer.rb,20.239284992218,20.231260061264,20.235272526741028,0.004012465476990,9689 micro-benchmarks/bm_lucas_lehmer.rb,22.0810799598694,22.0774850845337,22.079282522201538,0.001797437667847,9941 micro-benchmarks/bm_lucas_lehmer.rb,31.978404045105,31.9865000247955,31.982452034950256,0.004047989845276,11213 micro-benchmarks/bm_lucas_lehmer.rb,Timeout: 150.00 seconds,,,,19937 micro-benchmarks/bm_fractal.rb,42.6185228824615,42.6280272006989,42.623275041580200,0.004752159118652,n/a micro-benchmarks/bm_knucleotide.rb,2.08846092224121,2.21108198165894,2.149771451950073,0.061310529708862,n/a micro-benchmarks/bm_monte_carlo_pi.rb,23.9794390201569,23.9089260101318,23.944182515144348,0.035256505012512,n/a micro-benchmarks/bm_word_anagrams.rb,12.3941428661346,12.7155430316925,12.554842948913574,0.160700082778931,n/a micro-benchmarks/bm_binary_trees.rb,82.171679019928,81.3044950962067,81.738087058067322,0.433591961860657,n/a micro-benchmarks/bm_spectral_norm.rb,1.35397505760193,1.35372304916382,1.353849053382874,0.000126004219055,n/a micro-benchmarks/bm_nsieve.rb,23.6701729297638,23.6756160259247,23.672894477844238,0.002721548080444,n/a micro-benchmarks/bm_regex_dna.rb,5.42796611785889,6.06900095939636,5.748483538627625,0.320517420768738,n/a micro-benchmarks/bm_sum_file.rb,15.2133920192719,15.1973860263824,15.205389022827148,0.008002996444702,n/a micro-benchmarks/bm_partial_sums.rb,33.0055561065674,32.8080351352692,32.906795620918274,0.098760485649109,n/a micro-benchmarks/bm_so_sieve.rb,84.2921900749207,83.7600059509277,84.026098012924194,0.266092061996460,n/a core-features/bm_vm1_rescue.rb,0.289474964141846,0.276180028915405,0.282827496528625,0.006647467613220,n/a core-features/bm_vm1_length.rb,16.6632568836212,17.2531778812408,16.958217382431030,0.294960498809814,10 core-features/bm_vm1_length.rb,17.1057379245758,16.5685300827026,16.837134003639221,0.268603920936584,100 core-features/bm_vm1_length.rb,16.5657980442047,17.0870249271393,16.826411485671997,0.260613441467285,1000 core-features/bm_vm1_length.rb,17.1546268463135,16.5853810310364,16.870003938674927,0.284622907638550,10000 core-features/bm_so_ackermann.rb,0.0397160053253174,0.0391659736633301,0.039440989494324,0.000275015830994,5 core-features/bm_so_ackermann.rb,0.659607887268066,0.658763885498047,0.659185886383057,0.000422000885010,7 core-features/bm_so_ackermann.rb,Error: stack level too deep,,,,9 core-features/bm_vm2_poly_method.rb,3.11452198028564,3.17220616340637,3.143364071846008,0.028842091560364,1000000 core-features/bm_vm2_poly_method.rb,6.2718460559845,6.33952903747559,6.305687546730042,0.033841490745544,2000000 core-features/bm_vm2_poly_method.rb,12.6923749446869,13.1363949775696,12.914384961128235,0.222010016441345,4000000 core-features/bm_vm2_poly_method.rb,26.4767730236053,26.1098349094391,26.293303966522217,0.183469057083130,8000000 core-features/bm_app_tak.rb,0.135724067687988,0.135273933410645,0.135499000549316,0.000225067138672,5 core-features/bm_app_tak.rb,0.493720054626465,0.491595029830933,0.492657542228699,0.001062512397766,6 core-features/bm_app_tak.rb,1.57262206077576,1.55252599716187,1.562574028968811,0.010048031806946,7 core-features/bm_so_random.rb,0.209523916244507,0.209694147109985,0.209609031677246,0.000085115432739,100000 core-features/bm_so_random.rb,1.04600214958191,1.04535102844238,1.045676589012146,0.000325560569763,500000 core-features/bm_so_random.rb,2.10444784164429,2.09432411193848,2.099385976791382,0.005061864852905,1000000 core-features/bm_vm1_swap.rb,8.73163294792175,8.67143487930298,8.701533913612366,0.030099034309387,10000000 core-features/bm_vm1_swap.rb,17.5122091770172,17.5956919193268,17.553950548171997,0.041741371154785,20000000 core-features/bm_vm1_swap.rb,34.7529518604279,34.8331568241119,34.793054342269897,0.040102481842041,40000000 core-features/bm_app_fib.rb,0.0184860229492188,0.0182771682739258,0.018381595611572,0.000104427337646,20 core-features/bm_app_fib.rb,2.23634505271912,2.2517249584198,2.244035005569458,0.007689952850342,30 core-features/bm_app_fib.rb,24.7462511062622,24.8339931964874,24.790122151374817,0.043871045112610,35 core-features/bm_vm2_zsuper.rb,0.919327974319458,0.896940946578979,0.908134460449219,0.011193513870239,1000000 core-features/bm_vm2_zsuper.rb,1.82282018661499,1.81718993186951,1.820005059242249,0.002815127372742,2000000 core-features/bm_vm2_zsuper.rb,3.54206895828247,3.70834898948669,3.625208973884583,0.083140015602112,4000000 core-features/bm_vm2_zsuper.rb,7.16689205169678,7.22776818275452,7.197330117225647,0.030438065528870,8000000 core-features/bm_app_factorial.rb,0.0101971626281738,0.00816202163696289,0.009179592132568,0.001017570495605,1000 core-features/bm_app_factorial.rb,0.0399060249328613,0.0334930419921875,0.036699533462524,0.003206491470337,2000 core-features/bm_app_factorial.rb,Error: stack level too deep,,,,5000 core-features/bm_app_factorial.rb,Error: stack level too deep,,,,10000 core-features/bm_app_tarai.rb,5.36701798439026,5.35589003562927,5.361454010009766,0.005563974380493,3 core-features/bm_app_tarai.rb,6.47303104400635,6.47742319107056,6.475227117538452,0.002196073532104,4 core-features/bm_app_tarai.rb,7.84382104873657,7.862135887146,7.852978467941284,0.009157419204712,5 core-features/bm_vm1_const.rb,8.81164598464966,18.2617099285126,13.536677956581116,4.725031971931458,n/a core-features/bm_so_nested_loop.rb,0.0085291862487793,0.00852203369140625,0.008525609970093,0.000003576278687,5 core-features/bm_so_nested_loop.rb,0.47477388381958,0.482151031494141,0.478462457656860,0.003688573837280,10 core-features/bm_so_nested_loop.rb,5.32685899734497,5.50593280792236,5.416395902633667,0.089536905288696,15 core-features/bm_vm1_ensure.rb,0.069011926651001,0.0690209865570068,0.069016456604004,0.000004529953003,100000 core-features/bm_vm1_ensure.rb,0.69483208656311,0.688596963882446,0.691714525222778,0.003117561340332,1000000 core-features/bm_vm1_ensure.rb,6.77466106414795,6.78414702415466,6.779404044151306,0.004742980003357,10000000 core-features/bm_vm2_proc.rb,1.20864987373352,1.21058702468872,1.209618449211121,0.000968575477600,1000000 core-features/bm_vm2_proc.rb,2.42319822311401,2.42020487785339,2.421701550483704,0.001496672630310,2000000 core-features/bm_vm2_proc.rb,4.83699607849121,4.83324503898621,4.835120558738708,0.001875519752502,4000000 core-features/bm_vm2_proc.rb,9.69077706336975,9.66815495491028,9.679466009140015,0.011311054229736,8000000 core-features/bm_loop_times.rb,4.91027808189392,4.85836100578308,4.884319543838501,0.025958538055420,10000000 core-features/bm_loop_times.rb,9.6381299495697,9.79347586631775,9.715802907943726,0.077672958374023,20000000 core-features/bm_loop_times.rb,14.0774569511414,14.2317838668823,14.154620409011841,0.077163457870483,30000000 core-features/bm_vm2_unif1.rb,0.5470130443573,0.575318098068237,0.561165571212769,0.014152526855469,1000000 core-features/bm_vm2_unif1.rb,1.15082097053528,1.09408688545227,1.122453927993774,0.028367042541504,2000000 core-features/bm_vm2_unif1.rb,2.34767317771912,2.39744305610657,2.372558116912842,0.024884939193726,4000000 core-features/bm_vm2_unif1.rb,4.54664993286133,4.72602391242981,4.636336922645569,0.089686989784241,8000000 core-features/bm_vm1_simplereturn.rb,5.74362897872925,5.69164609909058,5.717637538909912,0.025991439819336,10000000 core-features/bm_vm1_simplereturn.rb,11.5744888782501,11.6653461456299,11.619917511940002,0.045428633689880,20000000 core-features/bm_vm1_simplereturn.rb,16.4999470710754,17.2785489559174,16.889248013496399,0.389300942420959,30000000 core-features/bm_loop_whileloop.rb,0.04463791847229,0.0446460247039795,0.044641971588135,0.000004053115845,100000 core-features/bm_loop_whileloop.rb,0.445381879806519,0.445470094680786,0.445425987243652,0.000044107437134,1000000 core-features/bm_loop_whileloop.rb,4.45436692237854,4.45519089698792,4.454778909683228,0.000411987304688,10000000 core-features/bm_vm2_send.rb,0.666916847229004,0.647531032562256,0.657223939895630,0.009692907333374,1000000 core-features/bm_vm2_send.rb,1.31860494613647,1.33532500267029,1.326964974403381,0.008360028266907,2000000 core-features/bm_vm2_send.rb,2.61160898208618,2.61185622215271,2.611732602119446,0.000123620033264,4000000 core-features/bm_vm2_send.rb,5.21212983131409,5.2205491065979,5.216339468955994,0.004209637641907,8000000 core-features/bm_vm1_block.rb,0.0858049392700195,0.0851829051971436,0.085493922233582,0.000311017036438,100000 core-features/bm_vm1_block.rb,0.851619005203247,0.855067014694214,0.853343009948730,0.001724004745483,1000000 core-features/bm_vm1_block.rb,8.54761481285095,8.54824185371399,8.547928333282471,0.000313520431519,10000000 core-features/bm_vm2_super.rb,0.820611000061035,0.826474905014038,0.823542952537537,0.002931952476501,1000000 core-features/bm_vm2_super.rb,1.66156506538391,1.7179229259491,1.689743995666504,0.028178930282593,2000000 core-features/bm_vm2_super.rb,3.43505811691284,3.11910891532898,3.277083516120911,0.157974600791931,4000000 core-features/bm_vm2_super.rb,6.57702493667603,6.63278102874756,6.604902982711792,0.027878046035767,8000000 core-features/bm_so_object.rb,2.12350010871887,2.12192320823669,2.122711658477783,0.000788450241089,500000 core-features/bm_so_object.rb,4.24955415725708,4.26192998886108,4.255742073059082,0.006187915802002,1000000 core-features/bm_so_object.rb,6.3975510597229,6.39479994773865,6.396175503730774,0.001375555992126,1500000 core-features/bm_app_raise.rb,6.0956289768219,6.09303498268127,6.094331979751587,0.001296997070312,n/a core-library/bm_so_exception.rb,14.7507960796356,14.7960770130157,14.773436546325684,0.022640466690063,n/a core-library/bm_so_concatenate.rb,Error: failed to allocate memory,,,,5000 core-library/bm_so_concatenate.rb,Error: failed to allocate memory,,,,10000 core-library/bm_so_concatenate.rb,Error: string sizes too big,,,,15000 core-library/bm_so_count_words.rb,12.6491630077362,12.6552991867065,12.652231097221375,0.003068089485168,n/a core-library/bm_vm2_array.rb,0.745777130126953,0.743591070175171,0.744684100151062,0.001093029975891,1000000 core-library/bm_vm2_array.rb,1.49140596389771,1.49024105072021,1.490823507308960,0.000582456588745,2000000 core-library/bm_vm2_array.rb,2.98199105262756,2.97910499572754,2.980548024177551,0.001443028450012,4000000 core-library/bm_vm2_array.rb,5.96128010749817,5.95619893074036,5.958739519119263,0.002540588378906,8000000 core-library/bm_vm2_regexp.rb,0.969479084014893,0.94690990447998,0.958194494247437,0.011284589767456,10 core-library/bm_vm2_regexp.rb,1.14880299568176,1.13046097755432,1.139631986618042,0.009171009063721,100 core-library/bm_vm2_regexp.rb,2.10414791107178,2.11249113082886,2.108319520950317,0.004171609878540,1000 core-library/bm_vm2_regexp.rb,12.8499979972839,12.8127069473267,12.831352472305298,0.018645524978638,10000 core-library/bm_vm3_thread_create_join.rb,0.0200490951538086,0.0200908184051514,0.020069956779480,0.000020861625671,1000 core-library/bm_vm3_thread_create_join.rb,0.20678186416626,0.203513860702515,0.205147862434387,0.001634001731873,10000 core-library/bm_vm3_thread_create_join.rb,2.04601502418518,2.04799294471741,2.047003984451294,0.000988960266113,100000 core-library/bm_app_strconcat.rb,5.07870984077454,5.0784158706665,5.078562855720520,0.000146985054016,n/a core-library/bm_so_lists.rb,14.3261790275574,14.3377418518066,14.331960439682007,0.005781412124634,n/a core-library/bm_so_matrix.rb,2.74682712554932,2.76162505149841,2.754226088523865,0.007398962974548,n/a core-library/bm_pathname.rb,9.27168798446655,9.26259899139404,9.267143487930298,0.004544496536255,n/a core-library/bm_so_array.rb,8.83219408988953,8.83007097244263,8.831132531166077,0.001061558723450,n/a
on 2008-12-27 08:31
Roger, You ran this benchmark suite, correct? http://github.com/acangiano/ruby-benchmark-suite/tree/master I'd never heard of them before now. Thanks! I don't believe that these patches cause GC to run any less frequently by default. GC is still run (by default) after allocating 8MB of objects. Nothing I'm doing causes Ruby to allocate fewer or smaller objects. I do believe we are seeing that applications with large stack space(s) spend a lot of time during GC scanning each and every word on those stacks. These patches make those stacks much smaller and zero out most ghost object pointers so they no longer need to be marked. see my comments below, marked Brent:
on 2008-12-27 20:38
> You ran this benchmark suite, correct? > > http://github.com/acangiano/ruby-benchmark-suite/tree/master Yeah, that and http://lloydforge.org/projects/misc/ the latter taking considerably less time to run :) > I don't believe that these patches cause GC to run any less frequently by > default. > GC is still run (by default) after allocating 8MB of objects. Nothing I'm > doing causes Ruby to allocate fewer or smaller objects. I do believe we are > seeing that applications with large stack space(s) spend a lot of time > during GC scanning each and every word on those stacks. These patches make > those stacks much smaller and zero out most ghost object pointers so they no > longer need to be marked. It would be interesting to see if the GC is being caused by malloc versus running out of free list. If it's the latter then the patches could indeed cause GC to be less frequent. If not then maybe it's as you said--GC just takes less time as there's less to traversal during the mark phase. >> Brent: A >14x speed up. Whoopie! :-0 Yeah I think multi-thread apps will definitely like this. Unfortunately most benchmarks are single threaded and micro-y so won't show the "real" speedup [Antonio's included]. >> that a GC was about to occur, and get away with zeroing the stack at that >> one point. However, recall that the collector scans each thread's stack >> in multithreaded apps (and those using Continuations). So, I'd need to >> know when a GC or a context switch was going to occur while the stack was >> still shallow. I haven't figured out how to implement that oracle >> function (and I doubt it is possible). Hmm so the biggest speed hit is probably in the clearing of the stack [over and over] right? [judging from your comment that measurement is cheap]. I was just suggesting that once a thread has [reached a very shallow spot and cleaned the stack in its entirety] it only needs to repeat that after the next GC--left over references from this round will be cleared [once] after the subsequent GC (when the thread reaches a shallow point again). So if you're willing to wait a couple of GC's, you only have to clear once per GC, per thread. So the oracle is "do it once after each GC." Sorry it's hard to explain. Anyway imagine a single threaded app. As long as that app clears the stack "once and well" [say the first time it gets very high it cleans off the whole thing--or accomplish this piece-wise as it grows high the first time] then in a staggered way, every reference to garbage will eventually be zeroed out and the item collected. Not that it really matters I'm just trying to make sure that my thought has been explained well. thoughts? -=r
on 2008-12-28 09:41
Roger, I see what you mean. If these patches let the GC collect objects more efficiently, the object free list will not empty as often. The speed up we observe for large single threaded apps could well be a combination faster ObjectSpace traversal and fewer GC passes triggered by an empty free list. My bogus2 benchmark switches between one thread having a very deep stack and another with a shallow stack. It's the worst conceivable case of stack thrashing. It runs about 15% faster if I disable only the clearing of the stack. I've spent a couple hours today "imagining" what might happen if each thread's stack were cleared only once soon after each GC is run. Here are my observations thus far: 1) I think I now see your point about VALUE pointers not necessarily needing to be zeroed. We just want to minimize the number of permanent ghost object pointers residing on any stack. When whatever transient ghost references remain, change value, GC will eventually collect the objects to which they referred. Correct? 2) GC is not triggered by any thread's particular activities. It may be that a given thread, whose stack has become full of ghost references due to deferred stack clearing, stops running for long periods of time. Or, that a such a thread just never happens to be running when a GC is triggered. 3) It is critical that the stack be cleared very soon after each context swap, when the new thread's stack is shallower than the old one's. Otherwise, VALUE pointers on the old thread's stack will likely be incorporated into the new thread's stack when it grows (as ghosts there) after the next context switch. 4) More generally, there is no guarantee that any thread's stack, once it incorporates ghost values during growth, will ever shrink later to allow those values to be cleared off. To me, this all adds up to requiring repeated clearing of the stack. Because, once ghosts have been pushed onto a thread's stack, they may just stay trapped there indefinitely. Could you formulate some pseudo-code of an algorithm you think would (almost always) prevent the incorporation of ghost references without repeated stack clearing? I really want to believe :-) - brent
on 2008-12-30 21:05
Hmm interesting. So I was looking at it from the single threaded perspective so obviously missed some subtle implications. If I understand correctly, the problem is that 1) If you have a large stacked thread "full of garbage" then this garbage will be copied into the stack of a small stack after context switch if it grows. 2) If a single thread creates a very "dirty" stack then goes into a deep nested loop [ex: going to sleep forever within a very nested call], it will not free the invalid references until it comes out of that deep stack later. I suppose we can operate under the assumption that when the program starts, the extent of the stack is "clear" of bad references. A few tricks up our sleeve: We can do a stack cleaning around the time of a context switch: We can clear the difference in size between the stacks after each context switch. We could clear that difference PLUS re-clear the "cleared once" area below the stack, after each context switch. Or perhaps do the "clear at most once" trick only if rb_thread_alone, though I think the above would already do that. So anyway we could basically reset the "already cleared" markers once per context switch, instead of once per GC, and re-clear that stacks damage. Would that help? In reality I'm not sure if these would be necessary. How can we tell how much is necessary? Old notes: So let's then keep two values, per thread. One being the top of a "clean section" the other the bottom of the "clean section" [already swept section]. Make this "clean section" grow as possible [check it every CHECK_INT, if you're above it, grow it, if you're below it, reset it to start below you, etc.]. So we have track of, per thread, a growing cleaned area. Now when you context switch, if you switch from a large stack to a shorter stack, clean the difference, plus the "dirty but clean now" section--clean it again. Reset the pointers. I guess just try it out :) Or I might get around to it eventually. Comments inline: > My bogus2 benchmark switches between one thread having a very deep stack and > another with a shallow stack. It's the worst conceivable case of stack > thrashing. It runs about 15% faster if I disable only the clearing of the > stack. I wonder if that's what causes the micro-benchmark slowdowns [what are they like 5%?] What about disabling the depth checker, too? What's its impact? > When whatever transient ghost references remain, change value, GC will > eventually collect the objects to which they referred. Correct? Yeah > 2) GC is not triggered by any thread's particular activities. It may be > that a given thread, whose stack has become full of ghost references due to > deferred stack clearing, stops running for long periods of time. Or, that a > such a thread just never happens to be running when a GC is triggered. True if a thread "doesn't run at all" between GC's then it won't clear its stack until...it runs again at some point :) A thread basically gets a window of 1 GC to create as much trash as it wants, and, if it ceases running, retains that much trash. -=r
on 2009-01-06 19:34
I've just posted a new patch at: http://sites.google.com/site/brentsrubypatches/ The MBARI7 patch provides detailed build-time configuration control over when stack clearing is done and optimizes the GC. This patched interpreter is as fast as unpatched Ruby 1.8.7 even for small, single threaded benchmarks, while still effectively clearing ghost object references off the stack. MBARI7 also fixes a couple benign bugs in MBARI3. On my 1.6Ghz CoreDuo MacMini, MBARI7 runs the standard Ruby test suite, producing exactly the same output as the unpatched ruby-1.8.7-p72 in the same amount of time, using 30Mb less memory. Can anyone run it on some large applications to see how it performs in the real Ruby world? Thanks!
on 2009-01-06 19:51
Thanks Brent for this work. One problem: the way you are getting the SP through alloca() is architecture-specific. On PPC, there is data between the alloca() space and the stack pointer, and your stack-wiping patch clears that data as well (which, among other things, includes the return address). I did a patch against MBARI6 for that, you can see it here: http://github.com/doudou/ruby/commit/0cd9b81d8ba8c... I'll update to MBARI7 as soon as possible Sylvain
on 2009-01-07 08:57
Sylvain, Man, the PowerPC is a weird beast. After spending a couple hours looking over its ABI docs, I've convinced myself that the stack pointer would be valid for stack clearing if one could get to at it directly from the C code. MBARI7 introduced a bit of gcc asm to do this for x86. I've posted an update to the MBARI7 patch on my website that adds asm cases to get the stack pointer for PPC and ARM processors. I don't have ready access to a PowerPC machine, so I'll have to rely on others to test this. Oddly enough, there has long been (and still is) code in gc.c that gets the stack pointer via alloc(0), but it does not crash on PPC because that pointer is only used to determine the approximate stack depth for catching infinite recursion in Ruby scripts. If the compiler is not gcc or the CPU is not x86, PPC or ARM, MBARI7 now falls back to the more portable method of returning the address of a local variable from a small function flagged with NOINLINE(). This is similar to your patch on MBARI6, but it should even work on strict ANSI 'C' compilers as long as they don't inline that function. I did verify that gcc had no issues with it. Aside from issuing warnings about returning the address of a local variable, the resulting build worked fine. It just ran about 1.5% slower. We'll see... - brent P.S. I'll be trying to learn git and github over the coming weeks. Perhaps we can keep these patches and yours in one place eventually.
on 2009-01-07 11:27
On Wed, Jan 07, 2009 at 04:35:18PM +0900, Brent Roman wrote: > don't have ready access to a PowerPC machine, so I'll have to rely on others > to test this. > > Oddly enough, there has long been (and still is) code in gc.c that gets the > stack pointer via alloc(0), but it does not crash on PPC because that > pointer is only used to determine the approximate stack depth for catching > infinite recursion in Ruby scripts. You should read commit messages when someone points you to one ;-) That's basically what I said in the message, except that I thought it was also used to delimit the stack in the mark phase. Anyway, it works because * the stack is not modified * there is no ruby variables that can be stored in-between the alloca() space and the stack pointer which is not already somewhere else (in registers, and the register window is also dumped by the GC AFAIK). > P.S. I'll be trying to learn git and github over the coming weeks. Perhaps > we can keep these patches and yours in one place eventually. Well, it will be as simple as cloning my repository in one common place. I already updated my patch to work on top of MBARI7 with ASM as well ... We'll see what to keep I guess. My updated patch is here: http://github.com/doudou/ruby/commit/f02bea5b10fea... Sylvain
on 2009-01-07 17:31
On 06.01.2009, at 21:50, Sylvain Joyeux wrote:
> I did a patch against MBARI6 for that
Imagine the same scenario taking place in a Subversion/diff/patch world:
"I did a patch from your patch, here is my new patch
[patch6_version_2.diff], and by the way, it needs to be applied after
[patch5_version4.diff] that I uploaded earlier" — experimental
branches, the Subversion way.
MK
on 2009-01-08 10:15
Hi, Brent I can't compile ruby with MBARI7. CPU: AMD Opteron 246 * 2 OS: SuSE Linux Enterprise Server 9 SP 4 x86-64 gcc: 3.3.3 glibc: 2.3.3 I apply MBARI7 with ruby 1.8.7-p72 and configure below: CFLAGS="-O2 -mpreferred-stack-boundary=4" ./configure --prefix=/usr/local/ruby187patch make The error message below: ar rcu libruby-static.a array.o bignum.o class.o compar.o dir.o dln.o enum.o enumerator.o error.o eval.o file.o gc.o hash.o inits.o io.o marshal.o math.o numeric.o object.o pack.o parse.o process.o prec.o random.o range.o re.o regex.o ruby.o signal.o sprintf.o st.o string.o struct.o time.o util.o variable.o version.o dmyext.o gcc -O2 -mpreferred-stack-boundary=4 -DRUBY_EXPORT -D_GNU_SOURCE=1 -I. -I. -c main.c gcc -O2 -mpreferred-stack-boundary=4 -DRUBY_EXPORT -D_GNU_SOURCE=1 -L. -rdynamic -Wl,-export-dynamic main.o libruby-static.a -ldl -lcrypt -lm -o miniruby ./lib/fileutils.rb:1509:in `[]': method `hash' called on terminated object (0x2a9559d408) (NotImplementedError) from ./lib/fileutils.rb:1509:in `collect_method' from ./lib/fileutils.rb:1509:in `select' from ./lib/fileutils.rb:1509:in `collect_method' from ./lib/fileutils.rb:1524 from ./mkconfig.rb:11:in `require' from ./mkconfig.rb:11 make: *** [.rbconfig.time] Error 1 2009/1/7 Brent Roman <brent@mbari.org>: > > View this message in context: http://www.nabble.com/-ruby-core%3A19846---Bug--74... > Sent from the ruby-core mailing list archive at Nabble.com. > > > -- Robbin Fan (·¶¿) JavaEye.com Office: 021-63505501 Mobile: 13916323361 Email & MSN: fankai@gmail.com Website: http://www.javaeye.com
on 2009-01-08 11:56
> ./lib/fileutils.rb:1509:in `[]': method `hash' called on terminated > object (0x2a9559d408) (NotImplementedError) > from ./lib/fileutils.rb:1509:in `collect_method' > from ./lib/fileutils.rb:1509:in `select' > from ./lib/fileutils.rb:1509:in `collect_method' > from ./lib/fileutils.rb:1524 > from ./mkconfig.rb:11:in `require' > from ./mkconfig.rb:11 > make: *** [.rbconfig.time] Error 1 It is not that it does not compile, but that the patched interpreter is broken (this kind of error is what you get when you mess the GC process) Try the following change: in rubysig.h, replace __builtin_frame_address by alloca() in the following four lines: # else /* slower, but should work everywhere gcc does */ # define _set_sp(sp) VALUE *sp = _get_tos(); NOINLINE(static VALUE *_get_tos(void)) {return __builtin_frame_address(0);} # endif #else /* slowest, but should work everwhere */ so that they look like that: # else /* slower, but should work everywhere gcc does */ # define _set_sp(sp) VALUE *sp = _get_tos(); NOINLINE(static VALUE *_get_tos(void)) {return alloca(0);} # endif #else /* slowest, but should work everwhere */ My guess is that __builtin_frame_address does not work as expected on your GCC version (it works fine on an amd64 with gcc 4.3) Sylvain
on 2009-01-08 19:36
Robbin,
You certainly should try Sylvain's suggestion, but I'm not sure it will
fix
the problem.
Whether or not it does, could you send me the assembler output of your
older opteron compiler so I might see where the stack clearing patch is
getting
confused?
Here's how to get gcc to generate this file:
CFLAGS="whatever flags you used for your Ruby make"
gcc -S $CFLAGS eval.c
This will produce eval.s, the assembler source code.
If you want something much smaller that would probably contain the
information I need,
try generating the eval.s file first before making the change rubysig.h
that
Sylvain suggests,
save that file, change rubysig.h, generate eval.s again, then diff the
two
versions of
eval.s and send me just that diff (or post it to this list).
gcc -S $CFLAGS eval.c
mv eval.s eval.s.b4
{edit rubysig.h}
gcc -S $CFLAGS eval.c
diff -u eval.s.b4 eval.s >eval.s.diff
After we fix this, I'd be very interested to see whether MBARI7 manages
to keep the memory size in Rails as low as MBARI6 did.
- brent
on 2009-01-11 11:31
Brent, A report from the field... We have been using your patches in a production Rails environment since you released them, and this is on x86_64-linux. We notice no problems, ruby works well and is significantly faster. And to keep up to date, we just applied patch MBARI7 (from http://sites.google.com/site/brentsrubypatches/ ) with the default configuration. FWIW we see a further small performance improvement, something like 5% on a rough measurement. Just a note on your build instructions: the -mpreferred-stack-boundary=2 flag causes configure to fail on OSX, complaining that it can't find the size of int (the program to do so segfaults). And that setting is not accepted by gcc on x86_64 because it needs the boundary to be 4 or more. In both cases I removed the option and all works fine. Regards, Stephen
on 2009-01-11 14:47
Stephen Sykes wrote: > Brent, > > A report from the field... > > We have been using your patches in a production Rails environment > since you released them, and this is on x86_64-linux. > > We notice no problems, ruby works well and is significantly faster. The patches also appear to help method-call performance a bit: BEFORE: $ ./ruby -I lib ../jruby/bench/language/bench_method_dispatch_only.rb Test ruby method: 100k loops calling self's foo 100 times 1.580000 0.010000 1.590000 ( 1.619531) 1.570000 0.000000 1.570000 ( 1.609721) 1.610000 0.010000 1.620000 ( 1.627628) 1.570000 0.010000 1.580000 ( 1.600705) 1.580000 0.000000 1.580000 ( 1.601550) 1.570000 0.010000 1.580000 ( 1.597049) 1.570000 0.010000 1.580000 ( 1.608728) 1.570000 0.010000 1.580000 ( 1.594988) 1.570000 0.000000 1.570000 ( 1.601885) 1.570000 0.010000 1.580000 ( 1.630782) [headius @ 247:~/projects/ruby-1.8.7-p72] $ ./ruby -I lib ../jruby/bench/bench_tak.rb 10 user system total real 13.510000 0.060000 13.570000 ( 13.768144) 13.530000 0.070000 13.600000 ( 13.773649) AFTER: $ ./ruby -I lib ../jruby/bench/language/bench_method_dispatch_only.rb Test ruby method: 100k loops calling self's foo 100 times 1.360000 0.010000 1.370000 ( 1.416073) 1.350000 0.000000 1.350000 ( 1.381519) 1.360000 0.010000 1.370000 ( 1.376705) 1.350000 0.000000 1.350000 ( 1.380676) 1.350000 0.010000 1.360000 ( 1.377904) 1.360000 0.010000 1.370000 ( 1.465818) 1.350000 0.000000 1.350000 ( 1.379431) 1.350000 0.010000 1.360000 ( 1.372702) 1.340000 0.010000 1.350000 ( 1.374763) 1.350000 0.010000 1.360000 ( 1.376614) $ ./ruby -I lib ../jruby/bench/bench_tak.rb 10 user system total real 12.860000 0.060000 12.920000 ( 13.091790) 12.950000 0.060000 13.010000 ( 13.241058) For comparison, Ruby 1.9.1RC1 numbers: $ ruby1.9 ../jruby/bench/language/bench_method_dispatch_only.rb Test ruby method: 100k loops calling self's foo 100 times 0.650000 0.000000 0.650000 ( 0.665707) 0.650000 0.010000 0.660000 ( 0.657540) 0.650000 0.000000 0.650000 ( 0.662093) 0.650000 0.010000 0.660000 ( 0.667457) 0.650000 0.000000 0.650000 ( 0.670909) 0.650000 0.000000 0.650000 ( 0.665737) 0.650000 0.010000 0.660000 ( 0.664140) 0.650000 0.000000 0.650000 ( 0.667239) 0.650000 0.000000 0.650000 ( 0.662808) 0.650000 0.010000 0.660000 ( 0.661229) $ ruby1.9 ../jruby/bench/bench_tak.rb 10 user system total real 2.900000 0.010000 2.910000 ( 2.951113) 2.890000 0.020000 2.910000 ( 2.958036) - Charlie
on 2009-01-12 11:36
On Sun, Jan 11, 2009 at 10:46:36PM +0900, Charles Oliver Nutter wrote:
> The patches also appear to help method-call performance a bit:
Could you try the same tests with GC disabled ? I'm wondering if the
change is
purely due to improvement in the GC speed ...
Sylvain
on 2009-01-12 20:37
Stephen, I updated the MBARI7 patch at http://sites.google.com/site/brentsrubypatches again last night (on 1/11/09) before I'd read your post. (Sorry) I had already concluded that -mpreferred-stack-boundary=2 is generally a "bad idea" and have removed it from the recommended options. It has portability problems and, even where it works, the net loss in speed is not worth the small reduction stack usage for most Ruby scripts. One option that does increase speed about 7% across the board is -fomit-frame-pointer. It seems to work well with most recent gcc compilers, but segaults on older ones, so I'm not recommending it by default. I believe that microsoft 'C' has an analogous option. This latest update to MBARI7 adds a configuration option to select the method used to clear the stack among four alternatives. The default is to use a (new) portable method that allocates the "dirty" stack briefly with alloca() before clearing it. This portable method costs time (~1.5%), but it is safer. In practice, The 32-bit x86 is so starved for registers that I'd seen cases where gcc would emit a PUSH %ESP between the point in the (old, fast) stack clearing routine that read the stack pointer and the loop that was to zero unallocated stack above the top. This would cause the stacked base pointer to be cleared as well and yield segfault when it was later POP'ed from the stack. Fortunately, if this happens, the resulting Ruby binary fails immediately on the (bogus1.rb and bogus1.rb) test scripts included with the patches. Ironically, -mpreferred-stack-boundary=2 will make the new, portable stack clearing method ineffective due to gcc's insistence that alloca(x>0) always return a 16-byte aligned pointer regardless of the configured preferred-stack-boundary. This might be considered a bug, but I'm honestly not sure. I cannot seem to find a stack clearing method that is both safe and portable. Maybe others will succeed where I have punted. For now, my tests indicate that, on 32-bit x86 with gcc 4.3, the combination of CFLAGS="-O2 -fomit-frame-pointer -fno-stack-protector" and #define STACK_WIPE_SITES 0x4370 /* in rubysig.h */ works best. It protects against ghost references well and runs even micro-benchmarks slightly faster than unpatched 1.8.7-p72. - brent
on 2009-01-12 23:03
Brent Roman wrote: > For now, my tests > indicate that, on 32-bit x86 with gcc 4.3, the combination of > > CFLAGS="-O2 -fomit-frame-pointer -fno-stack-protector" > and > #define STACK_WIPE_SITES 0x4370 /* in rubysig.h */ > > works best. According to the GCC documentation, -O (and -O2, -O3 and -Os) implies -fomit-frame-pointer. -- Phusion | The Computer Science Company Web: http://www.phusion.nl/ E-mail: info@phusion.nl Chamber of commerce no: 08173483 (The Netherlands)
on 2009-01-12 23:44
That's not the way the gcc behaves in my experience with the 32-bit x86
machines.
Could you point me at this documentation?
What I read is:
-O also turns on -fomit-frame-pointer on machines where doing so
does
not interfere with debugging.
32-bit x86 machines cannot generate stack backtraces without
framepointers.
This certainly does interfere with debugging.
- brent
on 2009-01-12 23:51
Brent Roman wrote: > That's not the way the gcc behaves in my experience with the 32-bit x86 > machines. > Could you point me at this documentation? > > What I read is: > -O also turns on -fomit-frame-pointer on machines where doing so does > not interfere with debugging. > > 32-bit x86 machines cannot generate stack backtraces without framepointers. > This certainly does interfere with debugging. Both 'info gcc' and the online manual[1] say "-fomit-frame-pointer ... Enabled at levels -O, -O2, -O3, -Os." But if your experience is different then I guess it's a mistake in the documentation. [1] http://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Optimi... -- Phusion | The Computer Science Company Web: http://www.phusion.nl/ E-mail: info@phusion.nl Chamber of commerce no: 08173483 (The Netherlands)
on 2009-01-13 11:57
On OSX -fomit-frame-pointer is turned off if you use -O2, or other levels. In fact, if you turn it on, the compiled ruby crashes. OSX has an addition to gcc - a "-fast" option that turns on the following flags: -O3 -fomit-frame-pointer -fstrict-aliasing -momit-leaf-frame-pointer -fno-tree-pre -falign-loops But as both -fomit-frame-pointer and -momit-leaf-frame-pointer cause the compiled ruby to crash, I have been using these options to compile ruby with MBARI7: -O3 -fstrict-aliasing -fno-tree-pre -falign-loops Also with these options I have not had any problems setting STACK_WIPE_SITES to 0x4370 -Stephen
on 2009-01-14 01:54
Stephen, I'm very much a PowerPC newbie, so please bear with me... Yesterday, I got an off list report that ruby with the MBARI patches was failing with: ./lib/fileutils.rb:521: stack level too deep (SystemStackError) after applying them on a PowerBook G4, Mac OS X 10.5.6 with apple GCC 4.0. A kind colleague happened to have a very similar laptop I could borrow. This let me duplicate the failure. It was being caused by the fact that the rlim_t type returned by the getrlimit() call to get process limits was *signed* rather than unsigned as under Linux. This made my patched Ruby believe that the size of the stack area reserved for it was 0 bytes, triggering the "stack too deep" exception". I fixed this and posted a update to MBARI7 last night after fairly extensive testing. The date on the latest version is Jan 12, 2009. So, my first questions are: Did you run into this issue? If not, why not. If so, did you fix or work around it yourself? The exact gcc being we used was: powerpc-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465) Regarding your posting: There is clearly some misleading gcc documentation out there about -fomit-frame-pointer. It is a machine independent option, but the exact effects of the -Ox options are machine dependent. If I understand you correctly, compiling with gcc and -fno-omit-frame-pointer causes ruby crashes on PowerPC OSx. Does it also cause crashing on Intel OSx? Are these crashes happening whether or not the MBARI patches are applied or only after applying them? In any case, it seems that you've managed to get the compiler and MBARI patches very well optimized for PowerPC OSx. Could you post some (brief) PPC OSx benchmark results comparing runtime and peak process size before and after patching, taking care to build ruby with the same compiler options each time? - brent
on 2009-01-14 09:18
Brent, Sorry, I should have mentioned, I'm running on an Intel Mac - you assumed I was running on a PowerPC. > If I understand you correctly, compiling with gcc and > > -fno-omit-frame-pointer > > causes ruby crashes on PowerPC OSx. Does it also > cause crashing on Intel OSx? I have no information on PowerPC, it certainly causes crashing on Intel to compile with -fomit-frame-pointer. Presumably -fno-omit-frame-pointer works ok, I haven't tried it. > Are these crashes happening whether or not > the MBARI patches are applied or only after applying them? Only after applying the patches. With those same compile options and regular ruby everything works normally. The error when compiling patched ruby looks like this: gcc -O2 -fomit-frame-pointer -pipe -fno-common -DRUBY_EXPORT -L. main.o dmydln.o libruby-static.a -ldl -lobjc -o miniruby ./lib/fileutils.rb:1165: [BUG] Bus Error ruby 1.8.7 (2009-1-11 MBARI 7/0x2370 on patchlevel 72) [i686-darwin9.6.0] make: *** [.rbconfig.time] Abort trap Last evening I ran into the following issue with my recently compiled ruby (which I had compiled with the -O3 options I gave before): /usr/local/lib/ruby/site_ruby/1.8/rubygems/specification.rb:333: [BUG] Bus Error ruby 1.8.7 (2009-1-11 MBARI 7/0x4370 on patchlevel 72) [i686-darwin9.6.0] Abort trap I recomplied with your suggested options of -O2 -fno-stack-protector and this problem went away. Perhaps best to stick with these options for now. > Could you post some (brief) PPC OSx benchmark results > comparing runtime and peak process size before and after patching, taking > care to build ruby with the same compiler options each time? I can do this for intel OSX if you need? Regards, Stephen
on 2009-01-14 14:35
2009/1/14 Brent Roman <brent@mbari.org>: > Could you post some (brief) PPC OSx benchmark results > comparing runtime and peak process size before and after patching, taking > care to build ruby with the same compiler options each time? > > - brent How do you measure peak process size? I have an application that takes about an hour to run and requires about 2G RSS with ruby 1.8, and about half with JRuby. I would be interested in comparing the performance with and without the patch. Thanks Michal
on 2009-01-14 19:27
Oh... Never mind :-) If you are running Intel OSx, you've basically got a tweaked, slightly outdated Apple fork of GNU gcc for i386. Others have reported on and off problems compiling ruby with -O3 and/or -fomit-frame-pointer. I was pleasantly surprised when I discovered that -fomit-frame-pointer no longer crashes ruby with gcc 4.3.2. But, I would never recommend it on the i386 as a default for building ruby. I've also noticed that i386 -O3 produces a Ruby interpreter that benchmarks slower than one compiled with -O2. You might want to confirm this for yourself. See my comments below: Stephen Sykes-3 wrote: > I recomplied with your suggested options of > -O2 -fno-stack-protector > and this problem went away. Perhaps best to stick with these options for > now. > > You had mentioned setting STACK_WIPE_SITES to 0x4370. Do you also get this error with STACK_WIPE_SITES left at its default of 0x2370 ? I would be willing to try debugging the problem on my Mac Mini after rebooting it into OSx, if this failure occurs with the default STACK_WIPE_SITES 0x2*** settings using gcc options that otherwise yield a stable unpatched ruby build, - brent
on 2009-01-14 19:35
Michal, This is the sort of large app that I'd like to see benchmarked before and after patching, especially given that JRuby process size is half of MRI's. I don't have a very scientific way to measure peak process size. I just monitor the output of the "top" command while the process runs. If your process is going to run for a long time, you might want to set up a script to capture the output of ps and post process that to find the peak process size. I hope others have better ideas here. - brent
on 2009-01-14 23:30
Hi Brent >I would be willing to try debugging the problem on my Mac Mini after >rebooting it into OSx, >if this failure occurs with the default STACK_WIPE_SITES 0x2*** settings >using gcc options >that otherwise yield a stable unpatched ruby build, Yes, it appears that these options cause patched ruby to crash with either 0x2370 or 0x4370 set for stack_wipe_sites: -O3 -fstrict-aliasing -fno-tree-pre It seems that two of these options is not enough to cause the problem, you need all three. The crash looks like this (I get it when I run rake test to run my rails app tests): /usr/local/lib/ruby/site_ruby/1.8/rubygems/specification.rb:333: [BUG] Bus Error ruby 1.8.7 (2009-1-11 MBARI 7/0x2370 on patchlevel 72) [i686-darwin9.6.0] Abort trap Yes, it's old gcc (gcc version 4.0.1 (Apple Inc. build 5465)), so it may not be worth the effort to track down the issue. On the other hand, it might be interesting though. Perhaps contact me directly if you wish to pursue this. -Stephen
on 2009-01-17 18:48
Issue #744 has been updated by Roger Pack. Here's my field report. I have a small rails app on a linode slice. After running it awhile I noticed that the system stopped responding--it was running out of RAM. For some reason my rails app was growing by 8MB of RSS per request. If anybody wants to look into this in more depth I'd be happy to give them access. Updating to 187 trunk: same result. Updated to 187 + MBARI patches. Problem gone. Also the total RSS now starts at 59MB and [4 days later] has appeared to stabilize at 62MB. Without patches it starts at 78MB, so a 25% RAM use reduction, which is very nice for those on slices. I'd encourage the inclusion of these patches into trunk for the next patch release. A few thoughts on compiler differences: would using the SET_STACK_END macros help? Maybe it could revert to a method call [so force go down on stack] as a way to check the stack end? Or just always add to 20 to what alloca returns or what not? re: measure peak process size: sys-proctable might help. Thanks much for your work. It spared me hours of debugging and has improved my opinion of Ruby. Three cheers :) Where to send donation? -=r ---------------------------------------- http://redmine.ruby-lang.org/issues/show/744
on 2009-01-19 09:17
Yuki and Roger, I'm glad to hear these patches are working out well for you. I have just posted yet another update to the MBARI7 patch at: http://sites.google.com/site/brentsrubypatches/ The latest spin uses a separate stack for garbage collection passes, eliminating the need to clear the GC stack after each pass. It also disables use of assembly code to read the stack pointer on x86 machines by default, because this asm code sometimes caused gcc to emit pushes to the stack between the reading the stack pointer and clearing the area above it. Changed default STACK_WIPE_SITES value from 0x2370 to 0x4770. This should all make the patches a little more portable and a bit faster in their default configuration. I don't plan to update MBARI7 again unless bugs are found. (We all know how that goes :-) - brent
on 2009-01-19 21:16
On Mon, Jan 19, 2009 at 1:15 AM, Brent Roman <brent@mbari.org> wrote: > > Yuki and Roger, > > I'm glad to hear these patches are working out well for you. > > I have just posted yet another update to the MBARI7 patch at: > http://sites.google.com/site/brentsrubypatches/ > One suggestion I might is that I like GC#exorcise, but it seems a little ghosty to me--stack_clear or stack_clean might be more specific :) Thanks again. -=r
on 2009-01-19 21:43
Hi, At Mon, 19 Jan 2009 17:15:02 +0900, Brent Roman wrote in [ruby-core:21429]: > I have just posted yet another update to the MBARI7 patch at: > http://sites.google.com/site/brentsrubypatches/ Can't you make patches against the head of stable branch? Current status: MBARI1: already merged except for a new method. MBARI2: backported stack-rewind at thread creation from old 1.9, so I think this patch is no longer needed. MBARI4: your patch makes Emacs c-mode.el confused. <http://www.atdot.net/sp/readonly/rb_eval_split> is more c-mode.el friendly. MBARI5: already merged. And could you separate new features from bug fixes?
on 2009-01-20 04:29
Roger,
The method name is intentionally "ghosty". Matz himself referred to
Ruby
being "troubled by
ghost references on the stack". I thought that was an apt description
so I adopted it as well.
exorcise:
Function:
transitive verb
Inflected Form(s):
ex·or·cised also ex·or·cized; ex·or·cis·ing also ex·or·ciz·ing
1 a: to expel (an evil spirit) by adjuration b: to get rid of (something
troublesome, menacing, or oppressive)
Definition 1b seemed a perfect fit to me. GC.exorcise rids the call
stack
of troublesome ghost references. I found the "evil spirit" connotation
amusing.
If others are bothered by the word, I'll be happy to change it.
- brent
on 2009-01-20 11:35
2009/1/19 Nobuyoshi Nakada <nobu@ruby-lang.org>: > > MBARI1: already merged except for a new method. > > MBARI2: backported stack-rewind at thread creation from old > 1.9, so I think this patch is no longer needed. > > MBARI4: your patch makes Emacs c-mode.el confused. > <http://www.atdot.net/sp/readonly/rb_eval_split> is > more c-mode.el friendly. Perhaps it should be the other way around? That is the Emacs c-mode should be fixed to work with any code rather than code modified to work around Emacs quirks. Thanks Michal
on 2009-01-21 04:17
Hi, At Tue, 20 Jan 2009 19:33:50 +0900, Michal Suchanek wrote in [ruby-core:21457]: > That is the Emacs c-mode should be fixed to work with any code rather > than code modified to work around Emacs quirks. By implementing C preprocessor in emacs lisp? Nice challenge. :)
on 2009-01-21 10:23
Hi Nobu, Yes, I plan to rebase my patches against the HEAD after I move them to git. This should also make it easier for me to separate features from fixes. I'll be traveling next week, so expect something in 2-3 weeks. Regarding the patches already applied to HEAD: MBARI1: Can you explain why the Continuation#thread method is not acceptable? It does seem to be an intrinsic property of every Continuation and without this method, one must often maintain a separate (weak) reference to the thread on which each continuation operates. MBARI2: I like push/pop_thread_anchor() better than my hack to hide other threads' stacks. However, I don't see code in rb_thread_save_context() to copy *only* the active stack for each thread. This is a very important optimization. Are you doing this optimization some other way that I am overlooking? (To see how important it can be, try running my bogus1.rb and bogus2.rb benchmarks) MBARI4: I'll be happy to incorporate your clever eval_body() #define. It cleans the static inline function decls up nicely. Does it also restore the Emacs c-mode.el compatibility? If this isn't what bothers emacs, please explain, and I'll try to code around it. (Please understand that I haven't used emacs in any serious way for 15 years, when I discovered nedit :-) MBARI5: Your version avoids the small cost of the alloca() when the only needs to grow by a small amount. Very nice. - brent
on 2009-01-21 15:25
On Wednesday 21 of January 2009 10:21:19 Brent Roman wrote: > Yes, I plan to rebase my patches against the HEAD after I move them to git. > This should also make it easier for me to separate features from fixes. > I'll be traveling next week, so expect something in 2-3 weeks. > Hi, very nice work. Are you (or someone else) also planning on rebasing the patches against 1.8.6 ? I've tried that myself but it didn't work very well (ruby test/runner.rb fails 3 tests on 0x2770, and segfaults when i use 0x4770, on x86_64 machine) I also tried building on ppc64, with 0x4770 it wont even build, segfaults on launching miniruby: gcc -O2 -g -DRUBY_EXPORT -D_GNU_SOURCE=1 -L. -rdynamic -Wl,-export- dynamic main.o libruby-static.a -ldl -lcrypt -lm -o miniruby ./ext/purelib.rb:2: [BUG] Segmentation fault ruby 1.8.7 (2009-1-18 MBARI 7/0x4770 on patchlevel 72) [powerpc64-linux] make: *** [.rbconfig.time] Aborted With 0x2770 it builds & runs the same test suite with 6 failures & 1 error. (Although i'm not sure how much they are actually ruby's fault) Regards, -- mb
on 2009-01-21 20:07
> > > The method name is intentionally "ghosty". Matz himself referred to Ruby > being "troubled by > ghost references on the stack". I thought that was an apt description > so I adopted it as well. I was just referring to the fact that exorcise seems to have little in common with "garbage" and doesn't actually say what the function does [who could guess from the name what it would do?] but I'm good either way. Cheers. -=r
on 2009-01-22 03:08
Hi, At Wed, 21 Jan 2009 18:21:19 +0900, Brent Roman wrote in [ruby-core:21483]: > Regarding the patches already applied to HEAD: > > MBARI1: > Can you explain why the Continuation#thread method is not acceptable? > It does seem to be an intrinsic property of every Continuation and > without this method, one must often maintain a separate (weak) reference > to the thread on which each continuation operates. I don't say it's not acceptable. It's not a part of the bug fix, so should be another request. > MBARI2: > I like push/pop_thread_anchor() better than my hack to hide other threads' stacks. > However, I don't see code in rb_thread_save_context() to copy *only* the active > stack for each thread. This is a very important optimization. > Are you doing this optimization some other way that I am overlooking? > (To see how important it can be, try running my bogus1.rb and bogus2.rb benchmarks) What do you mean by "active stack"? The stack region which is actually used by thread? The current code reduces those erea by rewinding the stack. > MBARI4: > I'll be happy to incorporate your clever eval_body() #define. > It cleans the static inline function decls up nicely. Does it also restore the Emacs c-mode.el > compatibility? If this isn't what bothers emacs, please explain, and I'll try to code around it. > (Please understand that I haven't used emacs in any serious way for 15 years, > when I discovered nedit :-) Compatibility against older Emacs? I can't test it now.
on 2009-01-22 12:58
Michal, I've got no immediate plans to port these patches to 1.8.6. Why is this important for you? I (perhaps naively) thought 1.8.7 would run just about anything that 1.8.6 does. The Ruby build seems to do special things to configure alloca() on ppc machines. In particular, I just noticed that Ruby does not use GNUC's __builtin_alloca() on PPC even if compiled with GNUC. Instead, it substitutes a 'C' version that just calls malloc(). When forced to use the __builtin_alloca() on PPC, the resulting interpreter failed even if all my stack clearing was disabled. There is some interesting history here. Perhaps someone on this list tell me what in Ruby is incompatible with the GNU's PPC version of __builtin_alloca(). Nevertheless, I've put up a very experimental patch at: http://sites.google.com/site/brentsrubypatches/ The patch file is an attachment called: ruby-1.8.7-p72-mbariPPC.patch near the bottom of the page. Apply the usual seven MBARI patches, then this PPC patch atop them all. The PPC patch tries to work around alloca() strangeness by invoking the _builtin_alloca() directly for stack clearing whenever __GNUC__ is defined. This seems to work well on the mac g4 laptop on which I tested. The test suite ran 11m6s patched vs 11m3s unpatched. Both versions flagged an Error in test_translit_option plus one other failure. I built each with CFLAGS=-O2 because -fno-stack-protector does not seem to be supported by the apple version of gcc. Let me know how it works for you on ppc64... Please send (just me) the output of gcc -v if this patch fails. You might also want to attach your config.h file - brent
on 2009-01-22 13:50
2009/1/22 Brent Roman <brent@mbari.org>: > > Michal, > > I've got no immediate plans to port these patches to 1.8.6. > Why is this important for you? I (perhaps naively) thought 1.8.7 > would run just about anything that 1.8.6 does. It's far from that simple. 1.8.7 backports a few 1.9 features that were "easy enough" to backport breaking quite a bit of valid 1.8 code. Sure the code can be updated easily in most cases but there is large portion of code that hits the differences and cannot just run on 1.8.7 untouched. Thanks Michal
on 2009-01-22 14:29
On Jan 22, 2009, at 5:55 AM, Brent Roman wrote: > I've got no immediate plans to port these patches to 1.8.6. > Why is this important for you? I think a lot of Ruby users feel 1.8.7 was a mistake and try to avoid it. It's just too massive a change for a simple point release. The ruby-doc.org site has stayed with 1.8.6 and David Black has recommended we pretend it doesn't exist, just two give two high profile examples off the top of my head. James Edward Gray II
on 2009-01-22 16:30
On 1/22/2009 2:27 PM, James Gray wrote: > The ruby-doc.org site has stayed with 1.8.6 and David Black has > recommended we pretend it doesn't exist, just two give two high profile > examples off the top of my head. Not just ruby-doc.org, but ruby-lang.org, too, at least for the German version: http://www.ruby-lang.org/de/downloads/ Btw: if nobody feels responsible for keeping the non-English pages in sync, why not just drop them and link to the English ones instead (or make someone feel responsible every time an update is required). Cheers, — Matthias
on 2009-01-22 17:10
On Thursday 22 of January 2009 12:55:08 Brent Roman wrote: > Michal, > > I've got no immediate plans to port these patches to 1.8.6. > Why is this important for you? I (perhaps naively) thought 1.8.7 > would run just about anything that 1.8.6 does. Well, it runs Rails, and i could fix my code for it, the biggest issue i have with it, is that it randomly raises EOF and broken pipe exceptions when using sockets. > > The Ruby build seems to do special things to configure alloca() on ppc > machines. > In particular, I just noticed that Ruby does not use GNUC's > __builtin_alloca() > on PPC even if compiled with GNUC. Interesting. I couldn't find this code in the tree, so i guess i'm missing something. Can you point me to a file+line ? > The PPC patch tries to work around alloca() strangeness by invoking the > _builtin_alloca() > directly for stack clearing whenever __GNUC__ is defined. > This seems to work well on the mac g4 laptop on which I tested. It applied cleanly, but i had to change __ppc__ to __powerpc__ at rubysig.h:65 otherwise i ended up with 0x4770; and i had to leave __ppc__ at rubysig.h:211 because that asm instruction doesn't work on this machine. So i ended up with 0xA770 and __sp = _builtin_alloca(0). This way it works the same as 0x2770 minus mbari_ppc patch (as in, same errors on running test suite, and same speed) > > The test suite ran 11m6s patched vs 11m3s unpatched. > Both versions flagged an Error in test_translit_option plus one other > failure. I have 6 fails + 1 error, most in gdbm. > > I built each with CFLAGS=-O2 because -fno-stack-protector does not seem > to be supported by the apple version of gcc. I built with ./configure --enable-pthread CFLAGS="-O2 -g" I figured -fno-stack-protector is not required since man page says about options "This manual documents only one of these two forms, whichever one is not the default." > > Let me know how it works for you on ppc64... > Please send (just me) the output of gcc -v if this patch fails. > You might also want to attach your config.h file Sure. -- mb
on 2009-01-22 21:29
Michal, OK. Understood. I do intend to move over to git in the coming weeks. After that happens, rebasing should become easier. I'll assume that you're willing to help test the patches on 1.8.6. Realistically, we're looking at least one month out. - brent
on 2009-01-22 21:38
Hi Nobu, MBARI2: It was late when I compared the patches. If you are actually rewinding the stack, that's even better than this patch's technique of linking directly to the base frame to skip around the parent threads stacks while leaving them in place. MBARI4: I meant that I don't use emacs anymore, so I won't test against it. Even so, I wish you would explain what about this patch confused Emacs c-mode.el so I can avoid such constructs in future. I'm guessing it had something to do with the NOINLINE function declarations, but I'm still not sure. Do you think you will merge these changes into the 1.8.6 release? - brent
on 2009-01-22 22:43
I have applied the MBARI patches to 1.8.6 p287. About half the hunks had
to
be applied by hand. In doing so I noticed 1 hunk that looked odd. The
hunk
was in gc.c, in the function ruby_xmalloc, the odd line is:
if ((malloc_increase+=size) > malloc_limit) {
was is intended to change the value of malloc_increase in the if
statement?
When running test/runner.rb for the patched Ruby I am seeing the
follwing
error and failure:
1) Failure:
test_should_propagate_signaled(TestBeginEndBlock)
[./test/ruby/test_beginendblock.rb:82]:
<""> expected to be =~
</Interrupt$/>.
2) Error:
test_object_id_collision(YAML_Unit_Tests):
RuntimeError: id collision in ordered map
./test/yaml/test_yaml.rb:1281:in `test_object_id_collision'
ruby -v:
ruby 1.8.6 (2009-1-18 MBARI 7/0x4770 on patchlevel 287) [i686-linux]
configure command:
CFLAGS="-O2 -fno-stack-protector" ./configure
On the plus side, the MBARI patched version completed the suite in
417.29
seconds, MRI 416.14.
If it would help I generated a patch file between the stock 1.8.6 p287
and
my MBARI patched version.
I am going to try a couple other stack clearing settings to see if that
is
the issue. I will send updates if I discover something.
- Michael
on 2009-01-22 23:12
Michael, I'm glad you are taking this on. Thanks. It must have been a tedious job. See comments below ---> Michael King-2 wrote: > ---> Yes, that is the intent. You will see this in ruby_xrealloc() and > </Interrupt$/>. > ---> This one is worrisome. I've never seen it. > I've never run tests against unpatched 1.8.6 > Do either of these failures occur there? > > ruby -v: > ruby 1.8.6 (2009-1-18 MBARI 7/0x4770 on patchlevel 287) [i686-linux] > > Monitor the process size while running the test suite. If MBARI7 is working properly, you should observe that the size of the main test process near the end of its run is about 30MB less than when running with unpatched 1.8.6. - brent
on 2009-01-23 05:51
On Thu, Jan 22, 2009 at 4:09 PM, Brent Roman <brent@mbari.org> wrote: > > Michael, > I'm glad you are taking this on. Thanks. It must have been a tedious job. > See comments below ---> > At my company we are running several copies of 2 Rails applications which we have to restart on a regular basis because of the Ruby memory leak. This patch has the capability of ending that. Tedious or not it is worth the effort. The memory savings is an added bonus the may let us run more copies. I am also working with a copy of 1.8.6 patched with a GC stats patch discussed here: http://blog.pluron.com/2008/02/memory-profilin.html to aid in performance testing and profiling our applications. And I have also been investigating Phusion's Ruby Enterprise Edition, the changes to make the GC copy-on-write friendly could give us a benefit. Combining all these patches gets a little tricky in a couple places, if I need to I will use a GC stats patched MRI for performance and profiling and MBARI patched for production to save memory. The REE copy-on-write is just an added bonus. > > was is intended to change the value of malloc_increase in the if > > statement? > > > > ---> Yes, that is the intent. You will see this in ruby_xrealloc() and > > ruby_xmalloc() > > Doing it this way saves a jump at the machine code level. Ok, this was the only instance that I saw and I know I have done code like this when I wasn't intending to, so I wanted to double check. > > > > I've never run tests against unpatched 1.8.6 > > Do either of these failures occur there? I have done multiple runs now with my unpatched copy of Ruby 1.8.6 and I have seen 0 failures and 0 errors. I also did a run of 1.8.7 patched and unpatched and both had 0 error and 0 failure. I am doing this round of compiling and testing on Ubuntu 8.04 with gcc 4.2.4. I have tried compilations with the CFLAGS listed with the code and no CFLAGS, doesn't change the outcome. our deployment environment is Ubuntu 6.06 so I will be running the tests there as well. > > > Monitor the process size while running the test suite. > If MBARI7 is working properly, you should observe that the size of the main > test process > near the end of its run is about 30MB less than when running with unpatched > 1.8.6. > > - brent > This is interesting.... I will rerun the tests tomorrow, I'm done for tonight. Unpatched Ruby 1.8.6 capped out at 94M Ruby 1.8.6 patched with MBARI and GC-stats capped out at 42M Ruby 1.8.6 patched with MBARI capped out at 53M - Michael
on 2009-01-23 06:19
Hi, At Fri, 23 Jan 2009 05:36:05 +0900, Brent Roman wrote in [ruby-core:21530]: > MBARI4: > I meant that I don't use emacs anymore, so I won't test against it. > Even so, I wish you would explain what about this patch confused > Emacs c-mode.el so I can avoid such constructs in future. > I'm guessing it had something to do with the NOINLINE function > declarations, but I'm still not sure. I seemed missing something. It indents like: NOINLINE(static VALUE eval_match2(self, node)) VALUE self; NODE *node; This isn't bad too much, but c-beginning-of-defun jumps to the beginning of the line `NODE *node;' line, not eval_match2. Also, since VC8 needs prototype declaration or definition for noinline, your patch causes compile error with it. # I won't object you even if you were propose to drop the # support for VC8 or later :) > Do you think you will merge these changes into the 1.8.6 release? We'll have to merge them into the 1.8 head first.
on 2009-01-23 10:12
Michal, I got an account on a ppc64 (Darwin) server with apple gcc 4.01. After testing there I updated the PPC patch on my website. You might want to give it another try. The G5 server I'm on normally wants to compile in 32-bit mode. With the latest PPC patch, compiling in 32-bit mode, I could run the ruby test suite without any unexpected errors with STACK_WIPE_SITES set to 0x9770. (fast and thorough) (note that the codes changed, have a look at rubysig.h for details) On the PowerPC, its always better to read the stack pointer via assembly code, as __builtin_alloca(0) does not return it. This latest PPC patch checks the __ppc64__ #define as well as __ppc__ (I mistakenly thought that __ppc__ would always be defined if __ppc64__ was) When I force the compiler to produce a 64-bit binary, I find that I must use the "safe" stack clearing method (which is now the default for PowerPCs). But, I *can* and do use assembler to read the stack pointer on my G5 system. I don't understand why this bit of asm fails on your ppc64 box. Here's a typical gcc command from my 64bit build: gcc -m64 -pipe -fno-common -DRUBY_EXPORT -I. -I. -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -c gc.c file gc.o responds: Mach-O 64-bit object ppc64 Unfortunately, ppc64 versions of the system libraries are not installed on my test box, so I could not try the Ruby test suite in 64-bit mode. It did run my little benchmarks without trouble. Again, let me know how you do there. Also, *please* send me details on your configuration: The config.h file from your build directory and and the output of the gcc -v command. - brent
on 2009-01-23 20:18
On Thu, Jan 22, 2009 at 9:08 AM, Michal Babej <calcifer@runbox.com> wrote: > > I've got no immediate plans to port these patches to 1.8.6. > > Why is this important for you? I (perhaps naively) thought 1.8.7 > > would run just about anything that 1.8.6 does. > Well, it runs Rails, and i could fix my code for it, the biggest issue i > have > with it, is that it randomly raises EOF and broken pipe exceptions when > using > sockets. Maybe you could submit a bug report for it? -=r
on 2009-01-24 21:37
On Mon, Jan 19, 2009 at 1:15 AM, Brent Roman <brent@mbari.org> wrote: > > Yuki and Roger, > > I'm glad to hear these patches are working out well for you. I assume that with 1.9 this style patch isn't as necessary as threads don't "share garbage" between each other--is that right? [each thread could still clean itself, but at least they don't share garbage between threads--is that right?] Also I might recommend renaming GC#limit to GC#malloc_limit or GC#alloc_limit since "limit" is somewhat ambiguous--is it a limit to the number of free pointers it will use? malloc size? [that type of thing]. Thanks so much! -=r
on 2009-01-26 03:21
On Thu, Jan 22, 2009 at 10:48 PM, Michael King <kingmt@gmail.com> wrote: > > > Combining all these patches gets a little tricky in a couple places, if I > need to I will use a GC stats patched MRI for performance and profiling and > MBARI patched for production to save memory. The REE copy-on-write is just > an added bonus. > Its starting to look like it is trickier than I originally thought... >> > </Interrupt$/>. > tonight. > > Unpatched Ruby 1.8.6 capped out at 94M > Ruby 1.8.6 patched with MBARI and GC-stats capped out at 42M > Ruby 1.8.6 patched with MBARI capped out at 53M > > > - Michael > > I recompiled Ruby 1.8.6 patched with MBARI and set the STACK_WIPE_SITES to 0x0000. Rerunning the test show the same failure, however the memory use was 54M. It would appear that I applied the patches wrong somehow... - Michael
on 2009-01-26 21:34
Patching Ruby 1.8.6 p287 with MBARI patches 1 and 2 gave no warnings or
errors. MBARI patch 3 resulted in:
1) Error:
test_object_id_collision(YAML_Unit_Tests):
RuntimeError: id collision in ordered map
./test/yaml/test_yaml.rb:1281:in `test_object_id_collision'
Digging though the changelog is appears that this test is in result of
bug
8548 (
http://rubyforge.org/tracker/?func=detail&atid=169...)
I
haven't looked at the code to get an idea of why this is failing.
I have tried compiles using:
CFLAGS="-O2 -fno-stack-protector -fomit-frame-pointer" configure
I compiled under Ubuntu 8.04 with gcc 4.2.3 and under Ubuntu 8.10 with
gcc
4.3.1
So it would appear that something in MBARI patch 3 breaks 1.8.6
- Michael
on 2009-01-27 06:26
Roger, With native threading, each thread gets its own private stack managed by the OS. So, yes, in Ruby 1.9, there should not be any ghost references from one thread's stack creeping onto another's. However, there is still the potential for ghost object references within any given thread's stack. GC.limit= determines that number of bytes that will be allocated (or reallocated) before a garbage collection pass is automatically triggered. It defaults to 8e6 bytes. I set it to 2e6 bytes on memory limited embedded targets. The process size will "breathe" by this amount of bytes while Ruby runs. Some might want to breathe deeper (and less often) if they've got bigger lungs. GC.limit is documented in ri as such. It is the primary GC tunable. If someone introduces a free list limit, they can call it GC.freelist_limit. I'm would not be confused by that. Nonetheless, if a couple more folks complain, I'll change GC.limit to something longer. I expect that it will get renamed in any case if it makes it into the thrunk. - brent
on 2009-01-27 06:43
Michael, MBARI3 is a factors the big rb_eval() into many smaller functions. It's a big patch. When I ported it from 1.6.8 to 1.8.7, it was by far the most tedious. I put 1.6.8 and 1.8.7 side-by-side into xxdiff and worked through it block-by-block. You could try backing out the MBARI3 patch by replacing the factored rb_eval() with the original one from 1.8.6. All the rest of the patches should work. You'll just have slower context switches due to the larger call stack, but the memory leaks caused by ghost object references should still be eliminated by MBARI4 and MBARI7. If that fixes the bug, you could start factoring half of rb_eval() at a time (binary search) until you find its cause. I'm not surprised that you still see the memory size improvement with STACK_WIPE_SITES set to 0x0000 -- the factored rb_eval() is more likely to overwrite ghost object references. - brent
on 2009-01-30 07:09
I cannot seem to build a working Ruby 1.8.7 for the PPC64 under Leopard 10.5 Has anyone else managed it? It runs simple test scripts, but hangs on the test suite. This is 1.8.7-p72 without any patches. Here's how I'm building: $ export ARCHFLAG="-arch ppc64" $ CFLAGS="-O2 -m64 -fno-stack-protector" configure --prefix=$HOME $ make $ sudo make install $ ruby -v ruby 1.8.7 (2008-08-11 patchlevel 72) [powerpc-darwin9.6.0] $ uname -a Darwin G5-Client.shore.mbari.org 9.6.0 Darwin Kernel Version 9.6.0: Thu Nov 6 19:35:49 PST 2008; root:xnu-1228.9.57~1/RELEASE_PPC Power Macintosh $ gcc -v Using built-in specs. Target: powerpc-apple-darwin9 Configured with: /var/tmp/gcc/gcc-5465~16/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --program-prefix= --host=powerpc-apple-darwin9 --target=powerpc-apple-darwin9 Thread model: posix gcc version 4.0.1 (Apple Inc. build 5465) $ cd test $ time ruby runner.rb HANGS here..... Any ideas? - brent
on 2009-02-01 12:18
Michal, I finally managed to get a working Ruby 1.8.7 on ppc64 with and without MBARI patches under OSx 10.5 as follows: export ARCH_FLAG="-arch ppc64" CFLAGS="-O2 -g -m64 -fno-stack-protector" configure There may yet be a problem with my build configuration, but I don't think the MBARI patches have anything to do with these failures. Both patched and unpatched versions fail the same 6 tests. Does anyone have a ppc64 (64-bit code) Ruby that does not fail these tests? Note that I am using the PowerPC patch of 1/23/09. For now, it must be applied manually after the MBARI7 patch. I will integrate it after it has been tested on x86_64. (It should be called the 64-bit patch, as it is intended to fix x86_64 as well as ppc64) http://sites.google.com/site/brentsrubypatches/Hom... The PPC patch changes the meaning of the STACK_WIPE_SITES #define. See rubysig.h for details. One interesting observation is that my *unpatched* ppc64 ruby did not leak when executing: ruby -e "loop{@x=callcc{|c|c}}" This could be because the ppc versions put ruby call arguments on the heap rather than the 'C' stack. - brent results: The patched ppc64 Ruby runs the test suite about 30 seconds quicker. 341 vs. 312 seconds It used a bit less RAM, but the difference wasn't large: 114Mb vs. 106Mb peak VSIZE I've included the details of the run below. Can anyone verify whether or not these failures occur with unpatched 1.8.7-p72? --------------- $ uname -a Darwin G5-Client.shore.mbari.org 9.6.0 Darwin Kernel Version 9.6.0: Thu Nov 6 19:35:49 PST 2008; root:xnu-1228.9.57~1/RELEASE_PPC Power Mac $ ruby -v ruby 1.8.7 (2009-1-23 MBARI 7/0x5770 on patchlevel 72) [powerpc-darwin9.6.0] $ file ~/bin/ruby /u/brent/bin/ruby: Mach-O 64-bit executable ppc64 $ time ruby runner.rb Loaded suite . Started ........................................................................................................................................................................................................................................................................................................................................................F..........................Warning: OpenSSL::PKCS7::PKCS7 is deprecated after Ruby 1.9; use OpenSSL::PKCS7 instead .Warning: OpenSSL::PKCS7::PKCS7 is deprecated after Ruby 1.9; use OpenSSL::PKCS7 instead .Warning: OpenSSL::PKCS7::PKCS7 is deprecated after Ruby 1.9; use OpenSSL::PKCS7 instead Warning: OpenSSL::PKCS7::PKCS7 is deprecated after Ruby 1.9; use OpenSSL::PKCS7 instead Warning: OpenSSL::PKCS7::PKCS7 is deprecated after Ruby 1.9; use OpenSSL::PKCS7 instead ..............................FF....F..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................../ruby/test_array.rb:536: warning: given block not used ........................................................................F.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................E........................................................................................................................................................................................................................................................................................................................................................................................ Finished in 312.740792 seconds. 1) Failure: test_decode(OpenSSL::TestASN1) [./openssl/test_asn1.rb:195]: <"\217\a\362~Q38\262\332\212H6N\244\022n\267\343I8\233\000\017|\361\265\024\335\353\202\237h\016\201\032bxV\300\343N\252\227w\320\263\241%\035s\366P\2147>dy\306\004\023\367\267\v\214\272\fY\331\326\016\346\216\003\310\323\ek+Y}is\361\263\034\313\f\006e\200V\274\302\222\201\314\260\350\210\321<G\317\024\260H\371+\002\350\210\216cHk\375\246\301\324c\363\324\203\225\330\221 \036"> expected but was <"\246\317\022M\337\207 \202\022\374\221\214\375\365\307\231\030\375t\027\306Y.\022\302\207\377\224\234\370l\a\211\r\241\225\003\220d\323k\346[>\351\004M\v\347\336\240\365\265\242\226\324?\214eR\300p\003`!m#\217\e6\250\306G\324#\004`\273\240\376\357`\265\367\3658\275t?\342\274\335.\370\261\227\325)V\376\240Z\276\206`\2056b\305\022s\tY%\025~r\207\267\323\226\315\243L\203\023\306K">. 2) Failure: test_create_by_factory(OpenSSL::TestX509Extension) [./openssl/test_x509ext.rb:41]: <"0\022\006\003U\035\023\001\001\000\004\b0\006\001\001\000\002\001\002"> expected but was <"0\022\006\003U\035\023\001\001\377\004\b0\006\001\001\377\002\001\002">. 3) Failure: test_new(OpenSSL::TestX509Extension) [./openssl/test_x509ext.rb:29]: <true> expected but was <false>. 4) Failure: test_attr(OpenSSL::TestX509Request) [./openssl/test_x509req.rb:94]: <[["keyUsage", "Digital Signature, Key Encipherment", true], ["subjectAltName", "email:gotoyuzo@ruby-lang.org", false]]> expected but was <[["keyUsage", "Digital Signature, Key Encipherment", false], ["subjectAltName", "email:gotoyuzo@ruby-lang.org", false]]>. 5) Failure: test_should_propagate_signaled(TestBeginEndBlock) [./ruby/test_beginendblock.rb:81]: <""> expected to be =~ </Interrupt$/>. 6) Error: test_fd_passing(TestUNIXSocket): SocketError: file descriptor was not passed (msg_controllen=20, 24 expected) ./socket/test_unix.rb:19:in `recv_io' ./socket/test_unix.rb:19:in `test_fd_passing' 1976 tests, 1668917 assertions, 5 failures, 1 errors real 5m23.872s user 3m54.124s sys 0m27.542s
on 2009-02-10 08:06
I just updated the MBARI7 patch for Ruby 1.8.7-p72 at: http://sites.google.com/site/brentsrubypatches/ I've tested this February 9, 2009 compiling with GNUC targeting the following CPU types: ppc, ppc64, arm, i386, and x86_64 For each CPU, no more errors test suite errors occurred patched than unpatched. I'd welcome any feedback on ppc or x86_64 in particular. [If you run into trouble, please include the output of gcc -v and uname -a] I'm working on github release next, including patches for 1.8.6 Is 1.8.6-p287 (patchlevel 287) the specific version I should target? - brent
on 2009-02-12 22:25
I was attempting to backport your MBARI patches to 1.8.6 p287, which is
what
my company is currently using in production.
When I was backporting all 7 patches I was seeing these errors:
1) Failure:
test_should_propagate_signaled(TestBeginEndBlock)
[./test/ruby/test_beginendblock.rb:82]:
<""> expected to be =~
</Interrupt$/>.
2) Error:
test_object_id_collision(YAML_Unit_Tests):
RuntimeError: id collision in ordered map
./test/yaml/test_yaml.rb:1281:in `test_object_id_collision'
I started hand applying a few hunks at a time from the patches and then
running make check. If all tests passed then I would move on to the next
few
hunks. I applied all of patches 1, 2, and 3 with all test passing.
Previously the error would start showing up on patch three but it looks
like
maybe a hunk was getting applied to the wrong area. The failure started
showing up on the hunks from patch 4 that were moving the functions out
of
rb_eval. I had to apply about 10 hunks just to compile to I can't really
narrow it down more than that.
Unfortunately this is at the limits of my understanding so I can't
really
help you fix it.
- Michael
on 2009-02-14 08:19
>> dynamic main.o libruby-static.a -ldl -lcrypt -lm -o miniruby >> ./ext/purelib.rb:2: [BUG] Segmentation fault >> ruby 1.8.7 (2009-1-18 MBARI 7/0x4770 on patchlevel 72) [powerpc64-linux] >> make: *** [.rbconfig.time] Aborted Here's an interesting one. I built 1.8.7p72 with the mbari patches. Works fine on the computer where it was built. If I run it on another computer on the same network, same OS [slightly different cpu], it sometimes [depending on the moon phase] results in: [09:1721][rdp@ilab2:~/tmp_src]$ ruby driver.rb -pbitTorrent --name=yanc_and_bittorrent_100_take2 /home/rdp/i386/lib/ruby/site_ruby/1.8/rubygems/specification.rb:48: [BUG] terminated node (0xb7c3505c) ruby 1.8.7 (2009-1-18 MBARI 7/0x4770 on patchlevel 72) [i686-linux] Aborted [13:2228][rdp@ilab1:~]$ gcc -v uReading specs from /home/rdp/installs/lib/gcc/i686-pc-linux-gnu/3.4.6/specs Configured with: ./configure --prefix=/home/rdp/installs Thread model: posix gcc version 3.4.6 [13:2228][rdp@ilab1:~]$ uname -a Linux ilab1 2.6.24-23-generic #1 SMP Mon Jan 26 00:13:11 UTC 2009 i686 GNU/Linux make test-all clears except a few zlib errors [it isn't installed] and 4) Failure: test_should_propagate_signaled(TestBeginEndBlock) [./test/ruby/test_beginendblock.rb:81]: <""> expected to be =~ </Interrupt$/>. any thoughts? Thanks! -=r
on 2009-02-14 10:20
Michael, I have just posted the MBARI patches on GitHub at: http://github.com/brentr/matzruby/tree/v1_8_7_72-mbari I believe you can pull from it via this git URL: git://github.com/brentr/matzruby.git A few points regarding your difficulties with the porting the MBARI patches to 1.8.6: 1) Your report helped me identify the cause of the test failure in TestBeginEndBlock. There was always a bit of a race condition in handling the ruby/suicide.rb test case (Would CHECK_INTS get called before the interpreter terminated?) The big rb_eval() refactoring of MBARI4 moved the point at which CHECK_INTS is invoked and made that race much more likely. Even so, it always worked sometimes :-) My fix is to invoke CHECK_INTS just after sending a signal to any process It's in the MBARI7 patch dated 2/13/09 on github. (Not yet on my website) 2) I never see the YAML failure here. That may be a problem unique to 1.8.6 or it may be an error in porting the patches. 3) This is the only failure I see and you don't list it: 1) Failure: test_client_session(OpenSSL::TestSSL) [./openssl/test_ssl.rb:426:in `test_client_session' ./openssl/test_ssl.rb:417:in `times' ./openssl/test_ssl.rb:417:in `test_client_session' ./openssl/test_ssl.rb:129:in `call' ./openssl/test_ssl.rb:129:in `start_server' ./openssl/test_ssl.rb:416:in `test_client_session']: <false> is not true. Any clues? I'm guessing that I'm missing a supporting library. 3) I'm working on a version of these patches for 1.8.6-p287 right now. Stay tuned... Git seems to be behaving. If it's laughing at me, it is doing so behind my back. Please do try to build from my git repo and let me know how that goes. - brent
on 2009-02-14 10:38
Roger, Ummm... Moving binaries between different CPUs doesn't always work. How, exactly, did the host and target machines differ? Anyway... The version I just pushed to github at: http://github.com/brentr/matzruby/tree/v1_8_7_72-mbari passed the Ruby test suite earlier this week on ppc, ppc64, arm, i386, and x86_64 CPUs. As mentioned in my previous post, this version should fix the failing TestBeginEndBlock test. Could you try building from my github repo to verify this and let me know if your bittorent test still fails? I'm still new to the git stuff. Please let me know whether I've set up my repository correctly. - brent
on 2009-02-14 21:02
At 6:18 PM +0900 2/14/09, Brent Roman wrote: >Michael, > >I have just posted the MBARI patches on GitHub at: > >http://github.com/brentr/matzruby/tree/v1_8_7_72-mbari > >I believe you can pull from it via this git URL: > >git://github.com/brentr/matzruby.git >Please do try to build from my git repo and let me know how that goes. Brent, thanks for putting them on github. This makes it very easy to follow your work now. Building it worked fine. When I built the latest v1_8_7_72-mbari branch: commit 6b169f9546ad52cb0edb9a19d48110e08f86a296 Author: Brent Roman <brent@mbari.org> Date: Fri Feb 13 23:06:56 2009 -0800 and ran the latest full suite of rubyspecs on it I got 8 failures and 14 errors. See the full output from mspec here: http://gist.github.com/64442 It doesn't look like your patches have much to do with those errors ... but I'm not sure. I haven't worked with 1.8.7 much. The trunk version of 1.8.7 doesn't build and install correctly on my system. Here's how I built and tested your branch: I already have the matzruby git repo cloned so I added the mbari repo as another remote, fetched and checked out the remote branch v1_8_7_72-mbari into my working dir. $ cd ruby/src/matzruby.git/ $ git remote add mbari git://github.com/brentr/matzruby.git $ git remote -v mbari git://github.com/brentr/matzruby.git origin git://github.com/rubyspec/matzruby.git $ git pull $ git fetch mbari $ git co -b v1_8_7_72-mbari mbari/v1_8_7_72-mbari Built it and made sure it can print it's version: $ autoconf && ./configure --prefix=/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari $ make clean && make && make install $ /Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/bin/ruby -v ruby 1.8.7 (2009-2-13 MBARI 7/0x8770 on patchlevel 72) [i686-darwin9.6.0] Here's a summary of the files that have changed between Brent's mbari branch and the tag v1_8_7_72: $ git diff --stat v1_8_7_72 ChangeLog | 197 +++++ common.mk | 2 +- eval.c | 2354 ++++++++++++++++++++++++++++++++---------------------- gc.c | 589 ++++++++------- intern.h | 2 +- missing/alloca.c | 8 +- node.h | 6 +- rubysig.h | 212 +++++- signal.c | 3 +- version.h | 17 +- 10 files changed, 2123 insertions(+), 1267 deletions(-) A closer look at the changes in my favorite diff viewer (GitX): $ git diff v1_8_7_72 | gitx Run the latest rubyspecs against it $ cd /Users/stephen/dev/ruby/src/rubyspec.git $ which mspec /Users/stephen/dev/ruby/src/mspec.git/bin/mspec $ git pull Already up-to-date. Running just the core rubyspec tests: $ mspec -t /Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/bin/ruby core ruby 1.8.7 (2009-2-13 MBARI 7/0x8770 on patchlevel 72) [i686-darwin9.6.0] ..EE..EE.....................................................................E........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F....................................................................................................................................................................................................................................................... ......................................................................................................................................... 1) ARGF.bytes returns an Enumerable::Enumerator when passed no block ERROR NoMethodError: undefined method `be_an_instance_of' for #<Object:0x3409b8> ./core/argf/shared/each_byte.rb:41 ./core/argf/shared/each_byte.rb:39 ./core/argf/bytes_spec.rb:2:in `all?' ./core/argf/bytes_spec.rb:5 ./core/argf/bytes_spec.rb:4 2) ARGF.chars returns an Enumerable::Enumerator when passed no block ERROR NoMethodError: undefined method `be_an_instance_of' for #<Object:0x33acd4> ./core/argf/shared/each_char.rb:32 ./core/argf/shared/each_char.rb:30 ./core/argf/chars_spec.rb:2:in `all?' ./core/argf/chars_spec.rb:5 ./core/argf/chars_spec.rb:4 3) ARGF.each_byte returns an Enumerable::Enumerator when passed no block ERROR NoMethodError: undefined method `be_an_instance_of' for #<Object:0x330d60> ./core/argf/shared/each_byte.rb:41 ./core/argf/shared/each_byte.rb:39 ./core/argf/each_byte_spec.rb:2:in `all?' ./core/argf/each_byte_spec.rb:4 4) ARGF.each_char returns an Enumerable::Enumerator when passed no block ERROR NoMethodError: undefined method `be_an_instance_of' for #<Object:0x32ddcc> ./core/argf/shared/each_char.rb:32 ./core/argf/shared/each_char.rb:30 ./core/argf/each_char_spec.rb:2:in `all?' ./core/argf/each_char_spec.rb:5 ./core/argf/each_char_spec.rb:4 5) An exception occurred during: before :all ERROR LoadError: dlopen(/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle, 9): Symbol not found: _rb_DLStdcallCallbackProcs Referenced from: /Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle Expected in: flat namespace - /Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle /Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle ./core/array/pack_spec.rb:2363 ./core/array/pack_spec.rb:2272:in `all?' ./core/array/pack_spec.rb:2426 6) An exception occurred during: before :all ERROR LoadError: dlopen(/Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle, 9): Symbol not found: _rb_DLStdcallCallbackProcs Referenced from: /Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle Expected in: flat namespace - /Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle /Users/stephen/dev/ruby/builds/mri/v1_8_7_72-mbari/lib/ruby/1.8/i686-darwin9.6.0/dl.bundle ./core/array/pack_spec.rb:2363 ./core/array/pack_spec.rb:2363:in `all?' ./core/array/pack_spec.rb:2475 7) Module#autoload shares the autoload request across dup'ed copies of modules FAILED Expected NameError but got TypeError (wrong autoload table: #<Proc:0x005358a4@./core/module/autoload_spec.rb:252>) ./core/module/autoload_spec.rb:252 ./core/module/autoload_spec.rb:238:in `all?' ./core/module/autoload_spec.rb:15 Finished in 13.107626 seconds 1127 files, 5697 examples, 19595 expectations, 1 failure, 6 errors
on 2009-02-15 05:45
Stephan,
My acceptance test is that the test suite delivered with ruby produce no
new
failures running patched vs. unpatched.
I really don't want to get into mspec. However,
just a cursory glance at the errors it output leads
me to believe it was testing against ruby 1.8.6 specs.
Have you tried this same mspec against unpatched 1.8.7-p72?
I built and tested with the following:
$ cd ruby
$ git clone git://github.com/brentr/matzruby.git mri.git
$ cd mri.git
$ git checkout -b v1_8_7_72-mbari origin/v1_8_7_72-mbari
$ autoconf
$ CFLAGS="-O2 -fno-stack-protector" configure --prefix=$HOME/ruby/stage
$ make -j3
$ make install
$ cd test
$ time ~/ruby/stage/bin/ruby runner.rb
Output:
1) Failure:
test_client_session(OpenSSL::TestSSL)
[./openssl/test_ssl.rb:426:in `test_client_session'
./openssl/test_ssl.rb:417:in `times'
./openssl/test_ssl.rb:417:in `test_client_session'
./openssl/test_ssl.rb:129:in `call'
./openssl/test_ssl.rb:129:in `start_server'
./openssl/test_ssl.rb:416:in `test_client_session']:
<false> is not true.
1985 tests, 1345472 assertions, 1 failures, 0 errors
real 4m10.124s
user 1m38.430s
sys 0m4.160s
This is the same single failure I've always seen with every 1.8.7-p72
Ruby
on my machine. I'm still hoping someone might tell me what
might cause this.
- brent
on 2009-02-17 21:19
I just pushed out a version of the MBARI patches for Ruby 1.8.6-p287
onto:
git://github.com/brentr/matzruby.git
in the branch v1_8_6_287-mbari
I'm down to just one test failure:
1) Error:
test_object_id_collision(YAML_Unit_Tests):
RuntimeError: id collision in ordered map
./test/yaml/test_yaml.rb:1281:in `test_object_id_collision'
However, I'm not motivated to investigate much further because this
test fails on every version of 1.8.6-p287 I build from source on
four different linux boxes with varying versions of gcc including those
built directly from the archive:
ftp://ruby-lang.org/pub/ruby/ruby-1.8.6-p287.tar.bz2
On the other hand, I recall Michael King claimed to have gotten
1.8.6-p287
to complete
the test suite without any errors whatsoever.
I'm building like this:
$ CFLAGS="-O2 -fno-stack-protector" configure --prefix=$HOME/ruby/test
$ make -j3 && make install
(I've tried all sorts of CFLAGS, so please no comments about those)
and running the yaml test with:
$ cd test
$ ~/ruby/test/bin/ruby runner.rb yaml
Loaded suite yaml
Started
.........E................................................
Finished in 0.409985 seconds.
1) Error:
test_object_id_collision(YAML_Unit_Tests):
RuntimeError: id collision in ordered map
./yaml/test_yaml.rb:1281:in `test_object_id_collision'
58 tests, 206 assertions, 0 failures, 1 errors
If you try that on your Ruby 1.8.6-p287 built from source,
do you see the error?
Is there another way to build it from source that avoids the error?
Please respond with details on your build procedure, environment etc.
only if you've built Ruby 1.8.6-p287 from source and do not see the
above
error.
If you've got it working, I'd sure like to how exactly how!
- brent
P.S. Note that this issue was supposedly fixed by a patch applied on
6/15/08.
That patch appears to be present in 1.8.6-p287.
See http://redmine.ruby-lang.org/issues/show/411
If no one responds, I'll add this report to redmine, but for now, I'm
assuming
I've got a problem with my build procedure.
on 2009-02-18 18:14
I will try and take a look at this soon, hopefully before the end of the week... - Michael
on 2009-02-18 18:53
> I'm building like this: > $ CFLAGS="-O2 -fno-stack-protector" configure --prefix=$HOME/ruby/test > $ make -j3 && make install Question: does the -fno-stack-protector stuff make much of a speed difference? Thanks! -=r
on 2009-02-18 20:40
I used to think it was more, but in fact, -fno-stack-protector probably saves less than 1% of execution time. The stack clearing of the MBARI patches invokes alloca often, so any extra overhead there will be felt more than without stack clearing. Note also that the stack-protector stuff was added to gcc to detect malicious attempts to hack the stack in 'C' code that processes networking data. Ruby's stack cannot be hacked that way as all array indecies are checked explicitly. So, in Ruby, gcc's stack-protector is sort of like wearing a belt and suspenders. - brent
on 2009-02-19 18:52
On Saturday 14 of February 2009 08:17:22 Roger Pack wrote: > [BUG] terminated node (0xb7c3505c) > ruby 1.8.7 (2009-1-18 MBARI 7/0x4770 on patchlevel 72) [i686-linux] > > Aborted The moon has shifted phases since January :) Seriously though, I've also found Jan 18 version to segfault/abort randomly on my x86_64, however latest from git (Feb 15) is working very nice so far - only 2 failures test_client_session(OpenSSL) and test_readline. Could you try the latest and report the results ? -- Michal
on 2009-02-20 23:15
> The moon has shifted phases since January :) Seriously though, I've also found > Jan 18 version to segfault/abort randomly on my x86_64, however latest from > git (Feb 15) is working very nice so far - only 2 failures > test_client_session(OpenSSL) and test_readline. Could you try the latest and > report the results ? The moon is in a good phase. LOL. It does seem more stable using the latest version. I will report back if the errors occur more. If it does perhaps it has something to do with the same reason that GC refuses if the yy_parse stack is on the stack [?] whatever that means, anyway. Thanks! -=r
on 2009-02-21 10:57
Hi, On Friday 20 of February 2009 23:14:13 Roger Pack wrote: > The moon is in a good phase. LOL. > It does seem more stable using the latest version. I will report back > if the errors occur more. Turns out, good moon phases end right after writing a positive feedback emails :) Feb 15 ruby-mbari runs the full test suite with same errors as unpatched ruby on my machine, but it still segfaults on some certain tests. E.g. running "test/runner.rb net" in row quickly results in segfault. P.S. i wrote a small script to see how ruby works with fork's copy-on-write mechanism. It allocates an array of 1 mil float, then forks, and in the child starts rewriting the array in batches (batch size is ARGV[0]). It gives completely different results for mbari ruby, and i'd be glad if someone could explain why :) -- Michal
on 2009-02-21 18:15
Roger, I was unaware of the interaction between YYSTACK_USE_ALLOCA and the MBARI patches. Does anyone have a test case I can debug? - brent
on 2009-02-21 20:36
Michal,
What you are seeing in unpatched ruby is memory leaking between your
"passes" in array_test.rb.
This is just another manifestation of the same leak that occurs with
unpatched ruby and the script:
loop do
@x=callcc{|c|c}
end
(see leakcheck.rb in the MBARIpatches tarball and the innocent redmine
entry at the top of this endless thread)
The uninitialized stack for iteration n+1 contains old (dead) object
references from
iteration n. The GC strings them all together into a linked list of
object
references.
It therfore cannot collect any of them until the whole loop terminates.
The stack clearing patches break this bogus chain of stale object
reference
links and
thus allow the GC to properly identify refs from previous iterations of
the
loop as
being "dead".
I pushed an update to the patches onto github last night that seems to
improve
stability of the MBARI patches on the x86_64 platform. Others platforms
seem to be working
great, but the x86_64 still has exhibits vexing, very occasional
segfaults.
I'll be working on it through this rainy weekend. If I can see it, I'm
confident I can (eventually)
fix it.
- brent
on 2009-02-21 23:02
> I pushed an update to the patches onto github last night that seems to > improve > stability of the MBARI patches on the x86_64 platform. Others platforms > seem to be working > great, but the x86_64 still has exhibits vexing, very occasional segfaults. > > I'll be working on it through this rainy weekend. If I can see it, I'm > confident I can (eventually) > fix it. I wish I had an easy to reproduce script for it but don't [will keep my eye out for it, though]. As a note, mine was having problems on 32-bit ruby 1.8.7 (2009-2-13 MBARI 7/0x8770 on patchlevel 72) [i686-linux] but that was a slightly older version. I'll update to the latest. Thanks! -=r
on 2009-03-10 21:50
I am continuing to see random segfaults on x86_64, especially with god (http://god.rubyforge.org/), which makes liberal use of threads and forking. *** glibc detected *** free(): invalid pointer: 0x00000000012b7724 *** *** glibc detected *** free(): invalid pointer: 0x00000000012b7724 *** ./gems/local/gems/god-0.7.8/bin/../lib/god/event_handler.rb:35: [BUG] Segmentation fault /custom/lib/ruby/1.8/net/smtp.rb:462: [BUG] Segmentation fault /custom/lib/ruby/1.8/timeout.rb:92: [BUG] Segmentation fault ./gems/local/gems/god-0.7.12/bin/../lib/god/process.rb:193: [BUG] Segmentation fault /custom/lib/ruby/1.8/net/http.rb:439: [BUG] Segmentation fault #0 0x00007f7d5efa307b in raise () from /lib/libc.so.6 #1 0x00007f7d5efa484e in abort () from /lib/libc.so.6 #2 0x00007f7d5f596410 in rb_bug (fmt=0x7f7d5f62c195 "Segmentation fault") at error.c:213 #3 0x00007f7d5f5fd2af in sigsegv (sig=<value optimized out>) at signal.c:634 #4 0x00007f7d5efa3110 in killpg () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () So far I've been unable to come up with a reproducible test case, but I've managed to narrow the problem down to mbari2. Vanilla ruby 1.8.7 does not have this issue, whereas 1.8.7+mbari2 will segfault randomly every few days. Perhaps it is worth backporting thread anchors from ruby 1.8 HEAD? Aman
on 2009-03-11 04:49
Aman, When I merge the MBARI patches with 1.8 HEAD, I also plan to replace the stack optimization introduced in the MBARI2 patch with the (better) thread anchors already in HEAD (which, I think, were originally backported from 1.9). This should happen in the next week or so. In the meantime, you might want to try this patch against the current (MBARI 8B) patches on 1.8.6 or 1.8.7: http://www.nabble.com/file/p22385077/rmMBARI2.patch It just disables the MBARI2 patch and leaves the rest intact. It would be very helpful to find out whether or not that alone eliminates God's segfaults. Will you give this a try? If it works, I'll do an 8C patch that to replace the stack splicing of MBARI2 with stack anchors on 1.8.7-p72 and perhaps 1.8.6-p287 as well. - brent
on 2009-03-11 12:11
> does not have this issue, whereas 1.8.7+mbari2 will segfault randomly > every few days. Perhaps valgrind would help? -=r
on 2009-03-20 06:47
I can confirm that removing mbari2 fixes the issue. I was able to get a better stack trace, but am still unsure about the root cause and unable to reproduce it consistently. It seems like a double free is occurring for some reason and that eventually causes the segfault. *** glibc detected *** free(): invalid pointer: 0x0000000002312734 *** *** glibc detected *** free(): invalid pointer: 0x0000000002312734 *** Core was generated by `ruby gems/local/gems/god-0.7.8/bin/god'. Program terminated with signal 6, Aborted. #0 0x00007fc3d0cbb07b in raise () from /lib/libc.so.6 (gdb) bt #0 0x00007fc3d0cbb07b in raise () from /lib/libc.so.6 #1 0x00007fc3d0cbc84e in abort () from /lib/libc.so.6 #2 0x00007fc3d0cf15f9 in __fsetlocking () from /lib/libc.so.6 #3 0x00007fc3d0cf8163 in mallopt () from /lib/libc.so.6 #4 0x00007fc3d0cf81ee in free () from /lib/libc.so.6 #5 0x00007fc3d134b4b8 in time_free (tobj=0x2312734) at time.c:43 #6 0x00007fc3d12dfed9 in rb_gc_call_finalizer_at_exit () at gc.c:2324 #7 0x00007fc3d12b6fd9 in ruby_finalize_1 () at eval.c:1561 #8 0x00007fc3d12b7146 in ruby_cleanup (ex=0) at eval.c:1598 #9 0x00007fc3d12b733c in ruby_stop (ex=0) at eval.c:1653 #10 0x00007fc3d1317306 in rb_f_fork (obj=140478970802560) at process.c:1343 #11 0x00007fc3d12c425a in call_cfunc (func=0x7fc3d1317286 <rb_f_fork>, recv=140478970802560, len=0, argc=0, argv=0x0) at eval.c:5759 #12 0x00007fc3d12c3535 in rb_call0 (klass=140479007795520, recv=140478970802560, id=5321, oid=5321, argc=0, argv=0x0, body=0x7fc3d159dae8, flags=2) at eval.c:5911 #13 0x00007fc3d12c4d84 in rb_call (klass=140479007795520, recv=140478970802560, mid=5321, argc=0, argv=0x0, scope=1, self=140478970802560) at eval.c:6158 #14 0x00007fc3d12bc82b in rb_eval (self=140478970802560, n=0x7fc3cfde5f18) at eval.c:3508 #15 0x00007fc3d12bb0d2 in rb_eval (self=140478970802560, n=0x7fc3cfde5f40) at eval.c:3223 #16 0x00007fc3d12bd827 in rb_eval (self=140478970802560, n=0x7fc3cfde5d60) at eval.c:3678 #17 0x00007fc3d12bb8dc in rb_eval (self=140478970802560, n=0x7fc3cfde57c0) at eval.c:3357 #18 0x00007fc3d12ba068 in rb_eval (self=140478970802560, n=0x7fc3cfde6878) at eval.c:2962 #19 0x00007fc3d12c3dfc in rb_call0 (klass=140478982783720, recv=140478970802560, id=38449, oid=38449, argc=0, argv=0x7fffd95ab848, body=0x7fc3cfde6878, flags=0) at eval.c:6062 #20 0x00007fc3d12c4d84 in rb_call (klass=140478982783720, recv=140478970802560, mid=38449, argc=1, argv=0x7fffd95ab840, scope=0, self=140478970803320) at eval.c:6158 #21 0x00007fc3d12bc4f1 in rb_eval (self=140478970803320, n=0x7fc3cfe124a0) at eval.c:3493 #22 0x00007fc3d12c3dfc in rb_call0 (klass=140478982957680, recv=140478970803320, id=38449, oid=38449, argc=0, argv=0x7fffd95ac3f0, body=0x7fc3cfe124a0, flags=0) at eval.c:6062 #23 0x00007fc3d12c4d84 in rb_call (klass=140478982957680, recv=140478970803320, mid=38449, argc=2, argv=0x7fffd95ac3e0, scope=1, self=140478970803320) at eval.c:6158 #24 0x00007fc3d12bc82b in rb_eval (self=140478970803320, n=0x7fc3cfe135d0) at eval.c:3508 #25 0x00007fc3d12ba068 in rb_eval (self=140478970803320, n=0x7fc3cfe12900) at eval.c:2962 #26 0x00007fc3d12c3dfc in rb_call0 (klass=140478982957680, recv=140478970803320, id=24553, oid=24553, argc=0, argv=0x7fffd95ad6f8, body=0x7fc3cfe12900, flags=0) at eval.c:6062 ---Type <return> to continue, or q <return> to quit--- #27 0x00007fc3d12c4d84 in rb_call (klass=140478982957680, recv=140478970803320, mid=24553, argc=1, argv=0x7fffd95ad6f0, scope=0, self=140478970803320) at eval.c:6158 #28 0x00007fc3d12bc4f1 in rb_eval (self=140478970803320, n=0x7fc3d0b50068) at eval.c:3493 #29 0x00007fc3d12ba068 in rb_eval (self=140478970803320, n=0x7fc3d0b42648) at eval.c:2962 #30 0x00007fc3d12c3dfc in rb_call0 (klass=140478996330640, recv=140478970803320, id=24537, oid=24537, argc=0, argv=0x7fffd95aea48, body=0x7fc3d0b42648, flags=0) at eval.c:6062 #31 0x00007fc3d12c4d84 in rb_call (klass=140478996330640, recv=140478970803320, mid=24537, argc=1, argv=0x7fffd95aea40, scope=0, self=140478970803320) at eval.c:6158 #32 0x00007fc3d12bc4f1 in rb_eval (self=140478970803320, n=0x7fc3d0af5500) at eval.c:3493 #33 0x00007fc3d12bb651 in rb_eval (self=140478970803320, n=0x7fc3d0b0bbc0) at eval.c:3309 #34 0x00007fc3d12c3dfc in rb_call0 (klass=140478996330640, recv=140478970803320, id=26833, oid=26833, argc=0, argv=0x7fffd95afd78, body=0x7fc3d0b0bbc0, flags=0) at eval.c:6062 #35 0x00007fc3d12c4d84 in rb_call (klass=140478996330640, recv=140478970803320, mid=26833, argc=1, argv=0x7fffd95afd70, scope=0, self=140478970802960) at eval.c:6158 #36 0x00007fc3d12bc4f1 in rb_eval (self=140478970802960, n=0x7fc3cfe1cd88) at eval.c:3493 #37 0x00007fc3d12ba068 in rb_eval (self=140478970802960, n=0x7fc3cfe1ca18) at eval.c:2962 #38 0x00007fc3d12c3dfc in rb_call0 (klass=140478983025360, recv=140478970802960, id=26777, oid=26777, argc=0, argv=0x0, body=0x7fc3cfe1ca18, flags=0) at eval.c:6062 #39 0x00007fc3d12c4d84 in rb_call (klass=140478983025360, recv=140478970802960, mid=26777, argc=0, argv=0x0, scope=0, self=140478970802960) at eval.c:6158 #40 0x00007fc3d12bc4f1 in rb_eval (self=140478970802960, n=0x7fc3cfe1dfd0) at eval.c:3493 #41 0x00007fc3d12bb651 in rb_eval (self=140478970802960, n=0x7fc3cfe1d8c8) at eval.c:3309 #42 0x00007fc3d12c0e81 in rb_yield_0 (val=6, self=140478970802960, klass=0, flags=0, avalue=0) at eval.c:5083 #43 0x00007fc3d12c1553 in loop_i () at eval.c:5216 #44 0x00007fc3d12c2316 in rb_rescue2 (b_proc=0x7fc3d12c152e <loop_i>, data1=0, r_proc=0, data2=0) at eval.c:5480 #45 0x00007fc3d12c15ca in rb_f_loop () at eval.c:5241 #46 0x00007fc3d12c425a in call_cfunc (func=0x7fc3d12c1593 <rb_f_loop>, recv=140478970802960, len=0, argc=0, argv=0x0) at eval.c:5759 #47 0x00007fc3d12c3535 in rb_call0 (klass=140479007795520, recv=140478970802960, id=4121, oid=4121, argc=0, argv=0x0, body=0x7fc3d15b6b88, flags=2) at eval.c:5911 #48 0x00007fc3d12c4d84 in rb_call (klass=140479007795520, recv=140478970802960, mid=4121, argc=0, argv=0x0, scope=1, self=140478970802960) at eval.c:6158 #49 0x00007fc3d12bc82b in rb_eval (self=140478970802960, n=0x7fc3cfe1d850) at eval.c:3508 #50 0x00007fc3d12bb0d2 in rb_eval (self=140478970802960, n=0x7fc3cfe1d828) at eval.c:3223 #51 0x00007fc3d12c0e81 in rb_yield_0 (val=140478970802760, self=140478970802960, klass=0, flags=1, avalue=2) at eval.c:5083 #52 0x00007fc3d12d21d5 in rb_thread_yield (arg=140478970802760, th=0x230b190) at eval.c:12426 #53 0x00007fc3d12d1e60 in rb_thread_start_0 (fn=0x7fc3d12d20f3 <rb_thread_yield>, arg=0x7fc3cf273248, th=0x230b190) at eval.c:12344 ---Type <return> to continue, or q <return> to quit--- #54 0x00007fc3d12d2327 in rb_thread_initialize (thread=140478970802800, args=140478970802760) at eval.c:12500 #55 0x00007fc3d12c4223 in call_cfunc (func=0x7fc3d12d2257 <rb_thread_initialize>, recv=140478970802800, len=-2, argc=0, argv=0x0) at eval.c:5753 #56 0x00007fc3d12c3535 in rb_call0 (klass=140479007761480, recv=140478970802800, id=2961, oid=2961, argc=0, argv=0x0, body=0x0, flags=4) at eval.c:5911 #57 0x00007fc3d12c3535 in rb_call0 (klass=140479007761480, recv=140478968811240, id=333, oid=333, argc=2, argv=0x7fffd95b43b0, body=0x7fc3d15b1890, flags=0) at eval.c:5911 #58 0x00007fc3d12c4d84 in rb_call (klass=140479007761480, recv=140478968811240, mid=333, argc=2, argv=0x7fffd95b43b0, scope=0, self=140478969580760) at eval.c:6158 #59 0x00007fc3d12bb0d2 in rb_eval (self=140478969580760, n=0x7fc3d0750d50) at eval.c:3223 #60 0x000000000256edd0 in ?? () #61 0x000000000256f068 in ?? () #62 0x00007fffd95b4bd0 in ?? () #63 0x00007fffd95bbd90 in ?? () #64 0x0000000000000007 in ?? () #65 0x00007fffd95b4df0 in ?? () #66 0x00007fc3d12d02e3 in rb_thread_schedule () at eval.c:11251 Previous frame inner to this frame (corrupt stack?) (gdb) define rb_trace > set $frame = ruby_frame > while $frame > set $node = $frame->node > print $node->nd_file > print ((unsigned int)(($node->flags>>19)&35184372088831)) # nd_line macro > set $frame = $frame->prev > end >end (gdb) rb_trace $16 = 0x253cc31 "./gems/local/gems/god-0.7.8/bin/../lib/god/process.rb" $17 = 215 $18 = 0x250ff11 "./gems/local/gems/god-0.7.8/bin/../lib/god/watch.rb" $19 = 154 $20 = 0x250ff11 "./gems/local/gems/god-0.7.8/bin/../lib/god/watch.rb" $21 = 117 $22 = 0x2393c51 "./gems/local/gems/god-0.7.8/bin/../lib/god/task.rb" $23 = 171 $24 = 0x2393c51 "./gems/local/gems/god-0.7.8/bin/../lib/god/task.rb" $25 = 344 $26 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb" $27 = 68 $28 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb" $29 = 41 $30 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb" $31 = 36 $32 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb" $33 = 36 $34 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb" $35 = 35 $36 = 0x2507e61 "./gems/local/gems/god-0.7.8/bin/../lib/god/driver.rb" $37 = 35 God uses a double-fork to spawn processes, and it looks like the double free usually occurs when the first forked process (in process.rb:215) dies. God also uses a C extension (http://github.com/mojombo/god/blob/master/ext/god/...) which could be causing issues across the fork. Aman
on 2009-03-20 07:11
Aman, It's quite possible that the double-frees are occurring both with and without the MBARI2 patch, but they are not causing segfaults unless MBARI2 is applied. You may want to try using valgrind or some similar tool to catch the double frees. (valgrind is really very good at this) A few days ago, I pushed a branch to my github repo with the MBARI patches applied to ruby_1_8 head. These patches use the "thread anchors" backported (by Nobu, I believe) from 1.9 . It seems to be a bit slower than my approach, but it may well be more robust. The branch is called ruby_1_8-mbari: git://github.com/brentr/matzruby.git http://github.com/brentr/matzruby/commits/ruby_1_8-mbari/ It is a dev version, but this snapshot did pass the bundled ruby test suite and all my tests as well. It would give you the benefits of the MBARI2 patch via the thread anchors. I'd really be must interested in finding out whether there are still double-frees happening. Let me know what you find. If the double-frees only happen with MBARI2 applied, I'll consider replacing MBARI2 with the thread anchors from 1.8.8-dev - brent
on 2009-03-31 01:06
I've had no more issues since reverting mbari2. I'm able to reproduce
the segfault on my mac:
ruby(83833) malloc: *** error for object 0x152e6d4: Non-aligned
pointer being freed
*** set a breakpoint in malloc_error_break to debug
ruby(83833) malloc: *** error for object 0x152d154: Non-aligned
pointer being freed
*** set a breakpoint in malloc_error_break to debug
ruby(83891) malloc: *** error for object 0x152e6d4: Non-aligned
pointer being freed
*** set a breakpoint in malloc_error_break to debug
ruby(83891) malloc: *** error for object 0x152d154: Non-aligned
pointer being freed
*** set a breakpoint in malloc_error_break to debug
./gems/local/gems/god-0.7.8/bin/../lib/god/process.rb:183: [BUG] Bus
Error
/opt/ruby-fiber/lib/ruby/1.8/net/http.rb:439: [BUG] Segmentation fault
Using gdb to breakpoint malloc_error_break shows:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000016
blk_copy_prev (block=0x15623b0) at eval.c:8549
8549 for (vars = tmp->dyna_vars; vars; vars = vars->next) {
(gdb) bt
#0 blk_copy_prev (block=0x15623b0) at eval.c:8549
#1 0x00020697 in proc_alloc (klass=1236720, proc=0) at eval.c:8773
#2 0x00022ea7 in rb_eval (self=6141440, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3780
#3 0x00026480 in rb_call0 (klass=6141380, recv=6141440, id=5313,
oid=5313, argc=0, argv=0xbfff4268, body=0x5266d8, flags=<value
temporarily unavailable, due to optimizations>) at eval.c:6130
#4 0x000269dc in rb_call (klass=6141380, recv=6141440, mid=5313,
argc=2, argv=0xbfff4260, scope=0, self=18370740) at eval.c:6233
#5 0x00024002 in rb_eval (self=18370740, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3517
#6 0x000253d6 in rb_eval (self=18370740, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3242
#7 0x00024be4 in rb_eval (self=18370740, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3328
#8 0x00026480 in rb_call0 (klass=6122780, recv=18370740, id=8457,
oid=8457, argc=0, argv=0x0, body=0x52b750, flags=<value temporarily
unavailable, due to optimizations>) at eval.c:6130
#9 0x000269dc in rb_call (klass=6122780, recv=18370740, mid=8457,
argc=0, argv=0x0, scope=0, self=18375120) at eval.c:6233
#10 0x00024002 in rb_eval (self=18375120, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3517
#11 0x00023d15 in rb_eval (self=18375120, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3702
#12 0x00026480 in rb_call0 (klass=5519380, recv=18375120, id=27009,
oid=27009, argc=0, argv=0xbfff5344, body=0x5474dc, flags=<value
temporarily unavailable, due to optimizations>) at eval.c:6130
#13 0x000269dc in rb_call (klass=5519380, recv=18375120, mid=27009,
argc=1, argv=0xbfff5340, scope=0, self=18374940) at eval.c:6233
#14 0x00024002 in rb_eval (self=18374940, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3517
#15 0x0002296e in rb_eval (self=18374940, n=<value temporarily
unavailable, due to optimizations>) at eval.c:2966
#16 0x00026480 in rb_call0 (klass=5500380, recv=18374940, id=26953,
oid=26953, argc=0, argv=0x0, body=0x540b50, flags=<value temporarily
unavailable, due to optimizations>) at eval.c:6130
#17 0x000269dc in rb_call (klass=5500380, recv=18374940, mid=26953,
argc=0, argv=0x0, scope=0, self=18374940) at eval.c:6233
#18 0x00024002 in rb_eval (self=18374940, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3517
#19 0x00024be4 in rb_eval (self=18374940, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3328
#20 0x0002a8d1 in rb_yield_0 (val=<value temporarily unavailable, due
to optimizations>, self=18374940, klass=0, flags=0, avalue=0) at
eval.c:5116
#21 0x0002c85d in loop_i () at eval.c:5249
#22 0x0001aa53 in rb_rescue2 (b_proc=0x2c820 <loop_i>, data1=0,
r_proc=0, data2=0) at eval.c:5513
#23 0x0001ab57 in rb_f_loop () at eval.c:5274
#24 0x00025adf in rb_call0 (klass=1301660, recv=18374940, id=4121,
oid=4121, argc=-1073782088, argv=0x0, body=0x13bdc0, flags=<value
temporarily unavailable, due to optimizations>) at eval.c:5951
#25 0x000269dc in rb_call (klass=1301660, recv=18374940, mid=4121,
argc=0, argv=0x0, scope=1, self=18374940) at eval.c:6233
#26 0x00022ffd in rb_eval (self=<value temporarily unavailable, due to
optimizations>, n=<value temporarily unavailable, due to
optimizations>) at eval.c:3532
#27 0x000253d6 in rb_eval (self=18374940, n=<value temporarily
unavailable, due to optimizations>) at eval.c:3242
#28 0x0002a8d1 in rb_yield_0 (val=<value temporarily unavailable, due
to optimizations>, self=18374940, klass=0, flags=0, avalue=2) at
eval.c:5116
#29 0x0002d8cb in rb_thread_start_0 (fn=0x2abc0 <rb_thread_yield>,
arg=0x11860b8, th=0x875a00) at eval.c:12408
#30 0x00025adf in rb_call0 (klass=1284640, recv=18374860, id=2961,
oid=3221192596, argc=1093632, argv=0xbfff6ee8, body=0x23f17,
flags=<value temporarily unavailable, due to optimizations>) at
eval.c:5951
#31 0x01184934 in ?? ()
I will try to dig into the issue a bit more with valgrind, etc.
Aman
on 2009-03-31 02:15
Aman, Could you reduce this repeatable failure to a script I could easily run to reproduce it here? My main machine is a mac mini running linux, but I can always reboot it into OS/x I've got access to PPC macs too, but they only run OS/x. - brent
on 2009-04-21 16:50
Issue #744 has been updated by Roger Pack. is anybody still getting segfaults with the latest MBARI patches? They are working well for me, at least I haven't run into the segfaults of last Dec./Jan. for quite awhile. Thanks. -=r ---------------------------------------- http://redmine.ruby-lang.org/issues/show/744
on 2009-05-06 05:25
Rogar, I have no outstanding problem reports aside from Aman's issues with God on x86_64 reported here over a month ago. I'm still hoping he can distill this failure into something I can replicate and fix. I merged the full patch set into the 1.8 trunk in mid-March, but I've had no feedback from the core developers since then. - brent
on 2009-05-06 10:53
Hi, At Wed, 6 May 2009 12:25:12 +0900, Brent Roman wrote in [ruby-core:23365]: > I merged the full patch set into the 1.8 trunk in mid-March, but I've had no > feedback from > the core developers since then. Sorry to be late, but I have to resolve conflicts after it and split directly irrelevant changes.
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.