I'm 99% sure the answer to that question is no, but I thought I'd ask anyway :-) Practically speaking it's only of use to me if it can be done on MRI (Ruby 1.8). Trivially I can do something like (apologies for the hackiness, and not using the sysutil gem): def memory_used File.read("/proc/#$$/status").match(/^VmSize:\s*(\d*)/)[1].to_i end mem_used = memory_used() mem_used_since = memory_used - mem_used # Returns difference in KB since mem_used was set But that only tells me something about the Ruby process, not a particular object. The reason I ask is because I would like to create a cache object that stores complex objects, but is smart enough to remove the oldest things in the cache when the cache is using more than a certain amount of memory. I could Marshall.dump the objects and work out something crudely that way, but the reason I want the cache in the first place is because Marshall.dump and Marshall.load is too time expensive :-) Thanks -Rob
on 2008-06-17 11:56

on 2008-06-17 12:43
Good question :) I'd like to know that as well
on 2008-06-17 17:55

On 17 Jun., 12:42, Marc Heiler <sheve...@linuxmail.org> wrote: > Good question :) > > I'd like to know that as well This comes up from time to time. If you think about it for a moment then it is pretty hard to define what "memory used by an object" actually is. The easiest (and probably not very useful definition) is: it is the space used by the instance for storing a reference to its class object and for storing its instance variable references. Typically you would rather be interested in a particular part of the object graph that is reachable from an instance - and here it becomes difficult, because any object can have any number of references and can be referred to any number of times. Where then do you count the memory? Or do you count it multiple times? etc. Kind regards robert
on 2008-06-17 19:24

On Tue, Jun 17, 2008 at 5:53 PM, Robert Klemme <shortcutter@googlemail.com> wrote: > On 17 Jun., 12:42, Marc Heiler <sheve...@linuxmail.org> wrote: >> Good question :) >> I think it could be defined, however I do not know if this is useful, maybe when considering running on low memeory devices and a deterministic garbage collector. I would define the memory to be used by an object as the number of bytes that have been (re)allocated for it and that could be freed by the GC if there were no more references to that object. I would not care at all if collecting this object makes other objects collectable. This however is where Robert's probably right, if you have to worry over memory consumption than you have to pretty much sure about releasability of other objects too. But I disagree humbly with my homonym that the question does not make sense at all ;). Cheers Robert -- http://ruby-smalltalk.blogspot.com/ --- As simple as possible, but not simpler. Albert Einstein
on 2008-06-18 08:28

On Tuesday 17 June 2008 10:53:56 Robert Klemme wrote: > Typically you would rather be interested in a particular part of the > object graph that is reachable from an instance - and here it becomes > difficult, because any object can have any number of references and > can be referred to any number of times. Where then do you count the > memory? Or do you count it multiple times? etc. I think this is analagous to a filesystem structure. It is possible to do roughly all of the above with tools like ls and du -- for example, 'ls -l' will show the logical size of each file, but also a total that represents the sum of the actual disk space allocated for each file -- that is, "logical size" might show a file as being larger than the filesystem it's on, so long as it's a sparse file. But it will count hardlinks twice in that total. It also isn't deep -- it might show the space used to store a particular subdirectory, but not the space used by the actual files in that subdirectory. And then there's 'du', which will show the total, real size used for any directory tree, recursively, counting each inode only once (so hardlinked files aren't counted twice). I would argue that tools like these would be useful in Ruby, even if there have to be many twitchy options to control how it is counted. But then, I've always been more for letting Ruby take as much RAM as it wants until I run out, and then start optimizing.
on 2008-06-18 08:30

On 17.06.2008 19:23, Robert Dober wrote: > On Tue, Jun 17, 2008 at 5:53 PM, Robert Klemme > <shortcutter@googlemail.com> wrote: >> On 17 Jun., 12:42, Marc Heiler <sheve...@linuxmail.org> wrote: >>> Good question :) >>> > I think it could be defined, however I do not know if this is useful, > maybe when considering running on low memeory devices and a > deterministic garbage collector. Exactly what I mean: a simple definition is more often than not useless for practical purposes but a useful definition is complex and hard to automate (i.e. if you want to implement a mechanism that reports on used memory). > I would define the memory to be used by an object as the number of > bytes that have been (re)allocated for it and that could be freed by > the GC if there were no more references to that object. I would not > care at all if collecting this object > makes other objects collectable. This is pretty much the definition I posted as well. At least this is what I mean. I like your definition better for its improved technical precision over mine. > This however is where Robert's probably right, if you have to worry > over memory consumption than you have to pretty > much sure about releasability of other objects too. Exactly. > But I disagree humbly with my homonym that the question does not make > sense at all ;). Um, where exactly did I say that? Kind regards robert PS: Thanks for the "homonym"! Learn something new every day. :-)
on 2008-06-18 11:46

On 18 Jun., 08:26, David Masover <ni...@slaphack.com> wrote: > On Tuesday 17 June 2008 10:53:56 Robert Klemme wrote: > > > Typically you would rather be interested in a particular part of the > > object graph that is reachable from an instance - and here it becomes > > difficult, because any object can have any number of references and > > can be referred to any number of times. Where then do you count the > > memory? Or do you count it multiple times? etc. > > I think this is analagous to a filesystem structure. It is possible to do > roughly all of the above with tools like ls and du That's a good analogy for reasoning about this topic. But there are differences as well: file systems are typically mostly organized hierarchically, i.e. there are just few links between different sub trees of the complete tree. This is why tools like "ls" and "du" are useful in practice because you can be pretty sure, that only a negligible portion of disk usage is counted more than once. But the topology of an object graph might look totally different, i.e. more interconnected. Then there are different types of links - soft links and hard links. This gives file system utilities a means to ignore paths. Usually soft links are much more often used than hard links (in my personal experience anyways) while hard links more closely resemble object references in Ruby. What are files in a file system would probably be Strings in Ruby world (raw allocated sequences of bytes that are not used for referencing). All other objects rather behave like directories, i.e. they have only references to other objects. > But then, I've always been more for letting Ruby take as much RAM as it wants > until I run out, and then start optimizing. Which is an approach in the pragmatic spirit of Ruby. :-) I just hacked together a toyed memory analyzer. You can find it here http://www.pastie.org/217131 Kind regards robert
on 2008-06-18 19:48

On Wednesday 18 June 2008 04:43:50 Robert Klemme wrote: > This is why tools like "ls" and "du" are > useful in practice because you can be pretty sure, that only a > negligible portion of disk usage is counted more than once. But the > topology of an object graph might look totally different, i.e. more > interconnected. Noting that du won't count hardlink'd files more than once, it seems your point here is that such a tool would be less efficient in Ruby than it is for filesystems? > Then there are different types of links - soft links and hard links. > This gives file system utilities a means to ignore paths. Usually > soft links are much more often used than hard links (in my personal > experience anyways) while hard links more closely resemble object > references in Ruby. However, filesystem utilities can be made to follow softlinks. Also, I'd argue that the choice to use softlinks over hardlinks has nothing to do with making life easier for du, and everything to do with the relative semantics of actual usage here. Typical example, if you have /bin/gunzip symlinked to gzip, you can always replace gzip with a new version, through the old standard method of making a tempfile, then rename-ing it on top of the original (so as to do an atomic replace). Then, gunzip, zcat, and friends, will refer to that new version -- whereas with a hardlink, you would have to manually re-link the new version everywhere it's used. Or, conversely, if you have something which is meant to be edited in-place -- if you truncate a file which has another hardlink, or do a simple append (with the shell >>, say), you never know what you'll be changing in the rest of the filesystem. (That's why we don't often do bang operations on strings, especially those passed in -- or I don't, anyway.)
on 2008-06-18 20:16

> > > Typically you would rather be interested in a particular part of the > > > object graph that is reachable from an instance - and here it becomes > > > difficult, because any object can have any number of references and > > > can be referred to any number of times. Where then do you count the > > > memory? Or do you count it multiple times? etc. It sounds to me like you want to patch ruby's garbage collection. Albeit a bit of a dark art, this will probably get you the answer you want. You really want to know the answer to: If I free this object, how much memory will I gain? As a lot of the replies here imply, answering this question is not simple at all. Particularly if two of the objects in your cache happen to share one large object between them. In which case, freeing neither will gain you much memory. I believe you essentially wish to modify the GC's mark and sweep algorithm to do graph traversal of the object graph, and to intelligently sum the cumulative size of objects within the graph. However, the structure of the object reference graph is not nearly as structured as a file system, and the cascading effect of adding a complex object to a hashtable on the calculated sizes makes doing caching very difficult. Back to your original posting.. 1) Marshall dumping to determine these sizes. I think this will be faster than running a modified GC algorithm or object graph traversal to try and determine how much memory you'd earn if you freed a particular element. However, this in some cases won't answer your question correctly due to other people holding onto references to internal elements of the object you just marshalled. Though if you can guarantee that your objects aren't being referenced externally this will work well. 2) Figuring out your sizes via process size. I actually think this is your best option. Why not turn your system into a Drb engine? Unfortunately you'll be marshall dumping and loading the whole thing each time, but you'll at least get an accurate answer. However, what I'm curious about, is why do this uber-complex thing in the first place? You'd run into the problem of trying to figure out memory sizes and utilization in almost any language due to not understanding the thing you're actually caching. Caches are generally designed to store relatively well known things, not giant spaghetti monsters, that's how they gain their efficiency. I mean, imagine a system that caches filehandles, who knows what resources you're actually tying up by holding onto those filehandles?
on 2008-06-18 23:13

On Wed, Jun 18, 2008 at 8:28 AM, Robert Klemme <shortcutter@googlemail.com> wrote: >> maybe when considering running on low memeory devices and a >> makes other objects collectable. > > PS: Thanks for the "homonym"! Learn something new every day. :-) > > -- http://ruby-smalltalk.blogspot.com/ --- As simple as possible, but not simpler. Albert Einstein
on 2008-06-18 23:15

On Wed, Jun 18, 2008 at 8:28 AM, Robert Klemme <shortcutter@googlemail.com> wrote: >> maybe when considering running on low memeory devices and a >> makes other objects collectable. > >> But I disagree humbly with my homonym that the question does not make >> sense at all ;). > > Um, where exactly did I say that? Stupid me, you said indeed it is pretty hard to define, and not that it does not make sense. All my apologies and congrats for the 1:0 win too, you were the better team, but that was easy ;). Cheers Robert -- http://ruby-smalltalk.blogspot.com/ --- As simple as possible, but not simpler. Albert Einstein
on 2008-06-18 23:24
What about really simple objects? a = "a" b = "b" c = "c"
on 2008-06-19 04:20

On Jun 18, 4:23 pm, Marc Heiler <sheve...@linuxmail.org> wrote: > What about really simple objects? I agree with the previous responses. The overriding concern with this last question seems to be an examination of why you intend cache the objects. In the case of simple objects, rebuilding objects after the garbage collector frees them is likely not going to cause a bottleneck. For large, complex objects/structures as described in the original post, defining their logical memory size is a problem encountered in all languages where references/pointers are available. It seems the only generally useful solution to unbounded memory growth is to attempt to instantiate an object, and if the platform or OS won't allocate enough memory for it, attempt to free some space and try again. For an interesting discussion of how Ruby's garbage collector can assist with this problem, check out this link: http://whytheluckystiff.net/articles/theFullyUptur... -Nick
on 2008-06-19 08:31

On 18.06.2008 19:46, David Masover wrote: > filesystems? Exactly. They would work but the output would be less meaningful or harder to interpret. In other words if you do a "du" in the filesystem you know pretty good which folders or files you'll have to remove to get more space on your device whereas that would not be as obvious for a similar Ruby tool. >> Then there are different types of links - soft links and hard links. >> This gives file system utilities a means to ignore paths. Usually >> soft links are much more often used than hard links (in my personal >> experience anyways) while hard links more closely resemble object >> references in Ruby. > > However, filesystem utilities can be made to follow softlinks. Also, I'd argue > that the choice to use softlinks over hardlinks has nothing to do with making > life easier for du, and everything to do with the relative semantics of > actual usage here. Absolutely. The point was simply that there are different types of links which can be instrumented by the user to control what tools like "ls" will output. So the user can state "follow soft links" or "don't follow soft links" which is a distinction you cannot make with Ruby's object references. And this distinction can be used to determine the scope of objects (i.e. files and directories) that are reached by those unix tools. Since you do not have that in Ruby land you have less control. Kind regards robert
on 2008-06-19 08:35

On 18.06.2008 23:23, Marc Heiler wrote: > What about really simple objects? > > a = "a" > b = "b" > c = "c" They are probably pretty easy to memory diagnose. But even they can be referenced from multiple other objects which introduces the issue again: how do you count them? Kind regards robert