NAME dike SYNOPSIS a simple memory leak detector for ruby with preconfigured rails hooks. INSTALL gem install dike URIS http://www.codeforpeople.com/lib/ruby/ http://rubyforge.org/projects/codeforpeople/ DESCRIPTION the concept behind dike is simple: Object is extended in order that the location of each object's creation is tracked. a summarizer command is given to walk ObjectSpace using each object's class and the location if it's creation to detect memory leaks. not all leaks can be detected and some that are may not really be leaks, but dike provides a simple way to see the hotspots in your code that may potentially be leaking. EXAMPLES PURE RUBY require 'dike' Dike.log STDERR # the default Thread.new do sleep 4.2 and Dike.finger end Main.start RAILS file:RAILS_ROOT/config/environment.rb ... require 'dike' shell: ./script/server shell: curl --silent http://localhost:3000 shell: cat ./log/dike/0 --- - class: String count: 90769 trace: [] - class: Array count: 18931 trace: [] - class: Class count: 2 trace: - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:222:in `class_factory' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:220:in `each' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:220:in `class_factory' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:248:in `Widget' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets/page/base.rb:1 - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:31:in `require' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:31:in `load' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:16:in `for_controller' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:243:in `widget' - /Users/ahoward/site/votelink.com/public/../config/../app/ controllers/application.rb:150 ... shell: curl --silent http://localhost:3000 shell: cat ./log/dike/1 --- - class: String count: 100769 trace: [] - class: Array count: 19931 trace: [] - class: Class count: 5 trace: - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:222:in `class_factory' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:220:in `each' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:220:in `class_factory' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:248:in `Widget' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets/page/base.rb:1 - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:31:in `require' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:31:in `load' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:16:in `for_controller' - /Users/ahoward/site/votelink.com/public/../config/../lib/ widgets.rb:243:in `widget' - /Users/ahoward/site/votelink.com/public/../config/../app/ controllers/application.rb:150 ... shell: dike ./log/dike --- - class: Proc count: 65 trace: - /opt/local/lib/ruby/1.8/cgi/session.rb:165:in `new' - /opt/local/lib/ruby/1.8/cgi/session.rb:165:in `callback' - /opt/local/lib/ruby/1.8/cgi/session.rb:299:in `initialize' - /Users/ahoward/src/ruby/dike/dike-0.0.1/lib/dike.rb:233:in `new' - /Users/ahoward/src/ruby/dike/dike-0.0.1/lib/dike.rb:233:in `call' - /Users/ahoward/src/ruby/dike/dike-0.0.1/lib/dike.rb:233:in `new' - /Users/ahoward/site/votelink.com/public/../config/../vendor/ rails/actionpack/lib/action_contr oller/cgi_process.rb:123:in `session' ... SUMMARY * the 'Dike.finger' method dumps it's log in a format showing class : the class of object being leaked/allocated count : the number of instances leaked from the trace location trace : the trace location of object birth * loading into a rails environment causes snapshots of the above format to be dumped into RAILS_ROOT/log/dike/ after each request. each snapshot is incrementally numbered 0, 1, ... * the 'dike' command line tool can be used in two ways dike directory/with/logs/dike/ dike old_dump new_dump if given a directory 'old_dump' and 'new_dump' are auto- calculated by scanning the directory. in either case the tool dups a delta running old -->> new. the delta shows only changes from old to new, so a line like - class: Proc count: 3 ... means that 3 Proc objects were created between the two dumps. note that, when given a directory, the default old and new dumps are the oldest and newest dumps respectively, to get fine grained information sumarizing the changes between two requests give the files manually, for example dike ./log/dike/41 ./log/dike/42 * options that affect logging - Dike.filter pattern pattern must respond to '===' and each object in ObjectSpace will be compared against it. for example Dike.filter Array would cause logging to restrict itself to Array, or sublcasses of Array, only - Dike.log io set the dike logging object. the object should respond to 'puts'. - Dike.logfactory directory cause logging to occur into a new log for each call the 'Dike.finger'. the logs will be auto numbered 0, 1, ... LIMITATIONS not all object creation can be tracked and not all leaks are reported. AUTHOR ara [dot] t [dot] howard [at] gmail [dot] com enjoy. a @ http://drawohara.com/
on 28.09.2007 06:10
on 28.09.2007 19:26
ara.t.howard wrote: > > NAME > > dike > > SYNOPSIS > > a simple memory leak detector for ruby with preconfigured rails hooks. That looks very useful. I'm not sure "leak" is the term I would use, since the objects are reachable. Unused reachable objects can be just as much of a problem as unreachable objects, of course. If you have Strings created in two places, are they reported separately in the output? I assume so, but I couldn't tell from your examples. If you're looking for trouble spots, it would be useful to know not just that there were N strings floating around, but that N1 of them were created on line L1, N2 on line L2, and so on. Does this have much of a cost on performance? (It doesn't use trace_func, does it?) I'd like to have something like this enabled by default for long-running deployed processes, not just in debugging.
on 28.09.2007 19:38
On Sep 28, 2007, at 11:26 AM, Joel VanderWerf wrote: > That looks very useful. I'm not sure "leak" is the term I > would use, since the objects are reachable. Unused reachable > objects can be just as much of a problem as unreachable objects, of > course. > agree. we seem to be the minority though ;-( > If you have Strings created in two places, are they reported > separately in the output? I assume so, but I couldn't tell from > your examples. If you're looking for trouble spots, it would be > useful to know not just that there were N strings floating around, > but that N1 of them were created on line L1, N2 on line L2, and so on. > yeah, that's *exactly* what it does. of course some objects, like a = [], do not allow ruby hooks to track them so they just get reported under the general stacktrace of '[]', but at least they are reported. > Does this have much of a cost on performance? (It doesn't use > trace_func, does it?) I'd like to have something like this enabled > by default for long-running deployed processes, not just in debugging. > not really. it addes hooks into Object#initialize and Class#new to make sure the creation is stored via caller. the location is stored in a big hash with finalizers to remove entries (object_id => stacktrace of creation). no reporting is done unless you call Dike.finger which, with the rails hooks, is done after every request. so i wouldn't say it has now performance hit, but it should be a lowish hit. i'd be very happy to get some feed back. so far it's been very useful - i've been able to find several leaks in rails (i think) which i'm looking into now. it found 5 leaks in my app and i was able to fix them easily in about 10 minutes, so that's nice. cheers. a @ http://drawohara.com/
on 28.09.2007 19:43
On 9/28/07, ara.t.howard <ara.t.howard@gmail.com> wrote: > > > request. so i wouldn't say it has now performance hit, but it should > be a lowish hit. > > i'd be very happy to get some feed back. so far it's been very > useful - i've been able to find several leaks in rails (i think) > which i'm looking into now. it found 5 leaks in my app and i was > able to fix them easily in about 10 minutes, so that's nice. ruby-prof includes a really nice object allocation profiler (with a small patch to 1.8.x) that may help verify the results you're seeing. It increments a live object count on allocation and decrements on garbage collection and runs super-fast. Best, jeremy
on 28.09.2007 19:45
ara.t.howard wrote: ... > not really. it addes hooks into Object#initialize and Class#new to make > sure the creation is stored via caller. Does this mean you have to make sure your initialize methods super all the way up to Object? Or does the Class#new catch it anyway even if you don't?
on 28.09.2007 20:16
On Sep 28, 2007, at 11:43 AM, Jeremy Kemper wrote: > > ruby-prof includes a really nice object allocation profiler (with a > small patch to 1.8.x) that may help verify the results you're seeing. > It increments a live object count on allocation and decrements on > garbage collection and runs super-fast. > thanks! i've actually used that before and even taught a class that used it - but i'd forgotten the name! cheers. a @ http://drawohara.com/
on 28.09.2007 20:20
On Sep 28, 2007, at 11:44 AM, Joel VanderWerf wrote: > Does this mean you have to make sure your initialize methods super > all the way up to Object? Or does the Class#new catch it anyway > even if you don't? class ::Class Methods = instance_methods.inject(Hash.new){|h, m| h.update m => instance_method(m)} Dike.ignore Methods Methods.each{|k,v| Dike.ignore k, v} verbose = $VERBOSE begin $VERBOSE = nil def new *a, &b object = Methods["new"].bind(self).call *a, &b ensure Dike.mark_birth object, caller rescue nil end def allocate *a, &b object = Methods["allocate"].bind(self).call *a, &b ensure Dike.mark_birth object, caller rescue nil end ensure $VERBOSE = verbose end end class ::Module Methods = instance_methods.inject(Hash.new){|h, m| h.update m => instance_method(m)} Dike.ignore Methods Methods.each{|k,v| Dike.ignore k, v} end .... module Dike class << self Objects = Hash.new def mark_birth object, stacktrace return if Objects[object_id] object_id = Object::Methods["object_id"].bind(object).call Objects[object_id] = stacktrace ObjectSpace.define_finalizer object, &mark_birth_finalizer (object_id) end .... so, basically, i 'super' for you and never mark the birth twice since it can occur in Object.new, Class.allocate, or Class.new. cheers. a @ http://drawohara.com/
on 28.09.2007 20:22
On Sep 28, 2007, at 11:43 AM, Jeremy Kemper wrote: > > ruby-prof includes a really nice object allocation profiler (with a > small patch to 1.8.x) that may help verify the results you're seeing. > It increments a live object count on allocation and decrements on > garbage collection and runs super-fast. forgot to mention that i'm seeing a weird potential leak out of cgi/ session.rb, but it seems hard to imagine that would have gone un- noticed. being able to verify would be nice. any change you've heard of this? cheers. a @ http://drawohara.com/
on 28.09.2007 20:45
On 9/28/07, ara.t.howard <ara.t.howard@gmail.com> wrote: > On Sep 28, 2007, at 11:43 AM, Jeremy Kemper wrote: > > ruby-prof includes a really nice object allocation profiler (with a > > small patch to 1.8.x) that may help verify the results you're seeing. > > It increments a live object count on allocation and decrements on > > garbage collection and runs super-fast. > > forgot to mention that i'm seeing a weird potential leak out of cgi/ > session.rb, but it seems hard to imagine that would have gone un- > noticed. being able to verify would be nice. any change you've > heard of this? CGI::Session uses a finalizer to automatically flush changes when it goes out scope. Perhaps that's what you're seeing? In my profiling of a simple Rails app, the live object count is very consistent (no leaks), but wow is it easy to quickly create a lot of garbage to collect. jeremy
on 28.09.2007 22:36
On Sep 28, 2007, at 12:44 PM, Jeremy Kemper wrote: > > CGI::Session uses a finalizer to automatically flush changes when it > goes out scope. Perhaps that's what you're seeing? > yeah that's it. > In my profiling of a simple Rails app, the live object count is very > consistent (no leaks), but wow is it easy to quickly create a lot of > garbage to collect. for me the count goes up. wonder if it's ruby version issue... seems unlikely that could have gone unnoticed.... a @ http://drawohara.com/
on 29.09.2007 13:26
On Sep 28, 4:34 pm, "ara.t.howard" <ara.t.how...@gmail.com> wrote: > > consistent (no leaks), but wow is it easy to quickly create a lot of > > garbage to collect. > > for me the count goes up. wonder if it's ruby version issue... > seems unlikely that could have gone unnoticed.... If using MySQL check the version of the gem you are using. > a @http://drawohara.com/ > -- > we can deny everything, except that we have the possibility of being > better. simply reflect on that. > h.h. the 14th dalai lama -- Aníbal Rojas http://www.hasmanydevelopers.com http://www.rubycorner.com http://www.anibal.tumblr.com
on 30.09.2007 09:19
On Sep 28, 2007, at 12:44 PM, Jeremy Kemper wrote: > > CGI::Session uses a finalizer to automatically flush changes when it > goes out scope. Perhaps that's what you're seeing? > > In my profiling of a simple Rails app, the live object count is very > consistent (no leaks), but wow is it easy to quickly create a lot of > garbage to collect. > this is not what i am seeing. i'm seeing an *extremely* slow leak - it takes ten of thousands of requests to see it. although this might seem OT to people i'm posting here because - dike claims the leak is in cgi/session.rb - it may or may not show the veracity of dike's reporting the whole project is available for download here http://s3.amazonaws.com/drawohara.com.downloads/leak.tgz and i blogged it here http://drawohara.tumblr.com/post/13655303 i'd love to know if other people can reproduce (it takes a *while* to run) cheers. ps. feel free to email me offline with results. a @ http://drawohara.com/
on 01.10.2007 21:48
On 28/09/2007, ara.t.howard <ara.t.howard@gmail.com> wrote:
>
Are there any unreachable objects in Ruby? They should be collected
and go away eventually. So what I am left with is a gigabyte of
reachable objects ... time to optimize space I guess.
Thanks
Michal
on 01.10.2007 22:24
Michal Suchanek wrote: > > Are there any unreachable objects in Ruby? They should be collected > and go away eventually. So what I am left with is a gigabyte of > reachable objects ... time to optimize space I guess. Sure there are. Ruby's GC is conservative. If it sees a number on the stack that looks like the address of an object, it assumes that the number is a pointer (though it could be something else) and it marks the object, even though the object might not be reachable. That's expected behavior. There can also be bugs, of course.
on 02.10.2007 10:55
On 01/10/2007, Joel VanderWerf <vjoel@path.berkeley.edu> wrote: > >> > So if I understand it correctly when I reach the 1G heap size about half of random numbers on stack appear to be possible object pointers, right? Thanks Michal
on 02.10.2007 19:00
Michal Suchanek wrote: ... > So if I understand it correctly when I reach the 1G heap size about > half of random numbers on stack appear to be possible object pointers, > right? It's not that bad: gc can ignore any pointer that points to a location outside of the heap data structures, or is not aligned on a heap entry boundary (entry size is fixed at sizeof(RVALUE)), or a location that is inside a heap but contains an empty (0) entry. IIUC, ruby's heap data structures will normally be much smaller than all the allocated memory for the process.
on 03.10.2007 22:13
On Tue, 2 Oct 2007, Joel VanderWerf wrote: >>> >> >> Are there any unreachable objects in Ruby? They should be collected >> and go away eventually. So what I am left with is a gigabyte of >> reachable objects ... time to optimize space I guess. > > Sure there are. Ruby's GC is conservative. If it sees a number on the stack > that looks like the address of an object, it assumes that the number is a > pointer (though it could be something else) and it marks the object, even > though the object might not be reachable. That's expected behavior. It occurs to me: how does conservative GC prevent the heap from filling up with false positive pointers over time? This would represent a "fully automatic memory leak for long running processes"... *t --
on 04.10.2007 02:20
On Thu, 2007-10-04 at 05:12 +0900, Tomas Pospisek's Mailing Lists wrote: > It occurs to me: how does conservative GC prevent the heap from filling up > with false positive pointers over time? This would represent a "fully > automatic memory leak for long running processes"... Perhaps it's doing some sort of heap compaction? Not sure. There's a discussion of possible memory leaks happening over on ruby-core; see the latest from Slyvain Joyeux here: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/12378 Yours, Tom
on 04.10.2007 04:21
On Oct 3, 2007, at 13:12 , Tomas Pospisek's Mailing Lists wrote: >>> Are there any unreachable objects in Ruby? They should be collected > filling up with false positive pointers over time? This would > represent a "fully automatic memory leak for long running > processes"... You'd have to be very lucky. The values on the stack would have to be properly aligned, as a ruby object is 20 bytes (32 bit), and would have to be a pointer to an object that was large in order to prevent garbage collection of a large structure.