Can I find out the memory used by an object?

quik77 · June 17, 2008, 11:56am

I’m 99% sure the answer to that question is no, but I thought I’d ask
anyway

Practically speaking it’s only of use to me if it can be done on MRI
(Ruby 1.8).

Trivially I can do something like (apologies for the hackiness, and
not using the sysutil gem):

def memory_used
File.read("/proc/#$$/status").match(/^VmSize:\s*(\d*)/)[1].to_i
end

mem_used = memory_used()
mem_used_since = memory_used - mem_used # Returns difference in KB
since mem_used was set

But that only tells me something about the Ruby process, not a
particular object.

The reason I ask is because I would like to create a cache object that
stores complex objects, but is smart enough to remove the oldest
things in the cache when the cache is using more than a certain amount
of memory.

I could Marshall.dump the objects and work out something crudely that
way, but the reason I want the cache in the first place is because
Marshall.dump and Marshall.load is too time expensive

Thanks
-Rob

quik77 · June 17, 2008, 12:43pm

Good question

I’d like to know that as well

quik77 · June 17, 2008, 5:55pm

On 17 Jun., 12:42, Marc H. [email protected] wrote:

Good question

I’d like to know that as well

This comes up from time to time. If you think about it for a moment
then it is pretty hard to define what “memory used by an object”
actually is. The easiest (and probably not very useful definition)
is: it is the space used by the instance for storing a reference to
its class object and for storing its instance variable references.

Typically you would rather be interested in a particular part of the
object graph that is reachable from an instance - and here it becomes
difficult, because any object can have any number of references and
can be referred to any number of times. Where then do you count the
memory? Or do you count it multiple times? etc.

Kind regards

robert

quik77 · June 17, 2008, 7:24pm

On Tue, Jun 17, 2008 at 5:53 PM, Robert K.
[email protected] wrote:

On 17 Jun., 12:42, Marc H. [email protected] wrote:

Good question

I think it could be defined, however I do not know if this is useful,
maybe when considering running on low memeory devices and a
deterministic garbage collector.

I would define the memory to be used by an object as the number of
bytes that have been (re)allocated for it and that could be freed by
the GC if there were no more references to that object. I would not
care at all if collecting this object
makes other objects collectable.

This however is where Robert’s probably right, if you have to worry
over memory consumption than you have to pretty
much sure about releasability of other objects too.

But I disagree humbly with my homonym that the question does not make
sense at all ;).

Cheers
Robert

–
http://ruby-smalltalk.blogspot.com/

As simple as possible, but not simpler.
Albert Einstein

quik77 · June 18, 2008, 8:28am

On Tuesday 17 June 2008 10:53:56 Robert K. wrote:

Typically you would rather be interested in a particular part of the
object graph that is reachable from an instance - and here it becomes
difficult, because any object can have any number of references and
can be referred to any number of times. Where then do you count the
memory? Or do you count it multiple times? etc.

I think this is analagous to a filesystem structure. It is possible to
do
roughly all of the above with tools like ls and du – for example, ‘ls
-l’
will show the logical size of each file, but also a total that
represents the
sum of the actual disk space allocated for each file – that is,
“logical
size” might show a file as being larger than the filesystem it’s on, so
long
as it’s a sparse file.

But it will count hardlinks twice in that total. It also isn’t deep –
it
might show the space used to store a particular subdirectory, but not
the
space used by the actual files in that subdirectory.

And then there’s ‘du’, which will show the total, real size used for any
directory tree, recursively, counting each inode only once (so
hardlinked
files aren’t counted twice).

I would argue that tools like these would be useful in Ruby, even if
there
have to be many twitchy options to control how it is counted.

But then, I’ve always been more for letting Ruby take as much RAM as it
wants
until I run out, and then start optimizing.

quik77 · June 18, 2008, 8:30am

On 17.06.2008 19:23, Robert D. wrote:

On Tue, Jun 17, 2008 at 5:53 PM, Robert K.
[email protected] wrote:

On 17 Jun., 12:42, Marc H. [email protected] wrote:

Good question

I think it could be defined, however I do not know if this is useful,
maybe when considering running on low memeory devices and a
deterministic garbage collector.

Exactly what I mean: a simple definition is more often than not useless
for practical purposes but a useful definition is complex and hard to
automate (i.e. if you want to implement a mechanism that reports on used
memory).

I would define the memory to be used by an object as the number of
bytes that have been (re)allocated for it and that could be freed by
the GC if there were no more references to that object. I would not
care at all if collecting this object
makes other objects collectable.

This is pretty much the definition I posted as well. At least this is
what I mean. I like your definition better for its improved technical
precision over mine.

This however is where Robert’s probably right, if you have to worry
over memory consumption than you have to pretty
much sure about releasability of other objects too.

Exactly.

But I disagree humbly with my homonym that the question does not make
sense at all ;).

Um, where exactly did I say that?

Kind regards

robert

PS: Thanks for the “homonym”! Learn something new every day.

quik77 · June 18, 2008, 11:46am

On 18 Jun., 08:26, David M. [email protected] wrote:

On Tuesday 17 June 2008 10:53:56 Robert K. wrote:

Typically you would rather be interested in a particular part of the
object graph that is reachable from an instance - and here it becomes
difficult, because any object can have any number of references and
can be referred to any number of times. Where then do you count the
memory? Or do you count it multiple times? etc.

I think this is analagous to a filesystem structure. It is possible to do
roughly all of the above with tools like ls and du

That’s a good analogy for reasoning about this topic. But there are
differences as well: file systems are typically mostly organized
hierarchically, i.e. there are just few links between different sub
trees of the complete tree. This is why tools like “ls” and “du” are
useful in practice because you can be pretty sure, that only a
negligible portion of disk usage is counted more than once. But the
topology of an object graph might look totally different, i.e. more
interconnected.

Then there are different types of links - soft links and hard links.
This gives file system utilities a means to ignore paths. Usually
soft links are much more often used than hard links (in my personal
experience anyways) while hard links more closely resemble object
references in Ruby.

What are files in a file system would probably be Strings in Ruby
world (raw allocated sequences of bytes that are not used for
referencing). All other objects rather behave like directories, i.e.
they have only references to other objects.

But then, I’ve always been more for letting Ruby take as much RAM as it wants
until I run out, and then start optimizing.

Which is an approach in the pragmatic spirit of Ruby.

I just hacked together a toyed memory analyzer. You can find it here
http://www.pastie.org/217131

Kind regards

robert

quik77 · June 18, 2008, 7:48pm

On Wednesday 18 June 2008 04:43:50 Robert K. wrote:

This is why tools like “ls” and “du” are
useful in practice because you can be pretty sure, that only a
negligible portion of disk usage is counted more than once. But the
topology of an object graph might look totally different, i.e. more
interconnected.

Noting that du won’t count hardlink’d files more than once, it seems
your
point here is that such a tool would be less efficient in Ruby than it
is for
filesystems?

Then there are different types of links - soft links and hard links.
This gives file system utilities a means to ignore paths. Usually
soft links are much more often used than hard links (in my personal
experience anyways) while hard links more closely resemble object
references in Ruby.

However, filesystem utilities can be made to follow softlinks. Also, I’d
argue
that the choice to use softlinks over hardlinks has nothing to do with
making
life easier for du, and everything to do with the relative semantics of
actual usage here.

Typical example, if you have /bin/gunzip symlinked to gzip, you can
always
replace gzip with a new version, through the old standard method of
making a
tempfile, then rename-ing it on top of the original (so as to do an
atomic
replace). Then, gunzip, zcat, and friends, will refer to that new
version –
whereas with a hardlink, you would have to manually re-link the new
version
everywhere it’s used.

Or, conversely, if you have something which is meant to be edited
in-place –
if you truncate a file which has another hardlink, or do a simple append
(with the shell >>, say), you never know what you’ll be changing in the
rest
of the filesystem. (That’s why we don’t often do bang operations on
strings,
especially those passed in – or I don’t, anyway.)

quik77 · June 18, 2008, 11:13pm

On Wed, Jun 18, 2008 at 8:28 AM, Robert K.
[email protected] wrote:

maybe when considering running on low memeory devices and a
makes other objects collectable.

PS: Thanks for the “homonym”! Learn something new every day.

–
http://ruby-smalltalk.blogspot.com/

As simple as possible, but not simpler.
Albert Einstein

quik77 · June 18, 2008, 8:16pm

Typically you would rather be interested in a particular part of the
object graph that is reachable from an instance - and here it becomes
difficult, because any object can have any number of references and
can be referred to any number of times. Where then do you count the
memory? Or do you count it multiple times? etc.

It sounds to me like you want to patch ruby’s garbage collection.
Albeit a bit of a dark art, this will probably get you the answer you
want. You really want to know the answer to:

If I free this object, how much memory will I gain?

As a lot of the replies here imply, answering this question is not
simple at all. Particularly if two of the objects in your cache
happen to share one large object between them. In which case, freeing
neither will gain you much memory. I believe you essentially wish to
modify the GC’s mark and sweep algorithm to do graph traversal of the
object graph, and to intelligently sum the cumulative size of objects
within the graph. However, the structure of the object reference
graph is not nearly as structured as a file system, and the cascading
effect of adding a complex object to a hashtable on the calculated
sizes makes doing caching very difficult.

Back to your original posting…

Marshall dumping to determine these sizes.
I think this will be faster than running a modified GC algorithm
or object graph traversal to try and determine how much memory you’d
earn if you freed a particular element. However, this in some cases
won’t answer your question correctly due to other people holding onto
references to internal elements of the object you just marshalled.
Though if you can guarantee that your objects aren’t being referenced
externally this will work well.
Figuring out your sizes via process size.
I actually think this is your best option. Why not turn your
system into a Drb engine? Unfortunately you’ll be marshall dumping
and loading the whole thing each time, but you’ll at least get an
accurate answer.

However, what I’m curious about, is why do this uber-complex thing in
the first place? You’d run into the problem of trying to figure out
memory sizes and utilization in almost any language due to not
understanding the thing you’re actually caching. Caches are generally
designed to store relatively well known things, not giant spaghetti
monsters, that’s how they gain their efficiency. I mean, imagine a
system that caches filehandles, who knows what resources you’re
actually tying up by holding onto those filehandles?

quik77 · June 18, 2008, 11:24pm

What about really simple objects?

a = “a”
b = “b”
c = “c”

quik77 · June 18, 2008, 11:15pm

On Wed, Jun 18, 2008 at 8:28 AM, Robert K.
[email protected] wrote:

maybe when considering running on low memeory devices and a
makes other objects collectable.

But I disagree humbly with my homonym that the question does not make
sense at all ;).

Um, where exactly did I say that?
Stupid me, you said indeed it is pretty hard to define, and not that
it does not make sense.
All my apologies and congrats for the 1:0 win too, you were the better
team, but that was easy ;).

Cheers
Robert

–
http://ruby-smalltalk.blogspot.com/

As simple as possible, but not simpler.
Albert Einstein

quik77 · June 19, 2008, 4:20am

On Jun 18, 4:23 pm, Marc H. [email protected] wrote:

What about really simple objects?

I agree with the previous responses. The overriding concern with this
last question seems to be an examination of why you intend cache the
objects. In the case of simple objects, rebuilding objects after the
garbage collector frees them is likely not going to cause a
bottleneck.

For large, complex objects/structures as described in the original
post, defining their logical memory size is a problem encountered in
all languages where references/pointers are available. It seems the
only generally useful solution to unbounded memory growth is to
attempt to instantiate an object, and if the platform or OS won’t
allocate enough memory for it, attempt to free some space and try
again. For an interesting discussion of how Ruby’s garbage collector
can assist with this problem, check out this link:

http://whytheluckystiff.net/articles/theFullyUpturnedBin.html

-Nick

quik77 · June 19, 2008, 8:31am

On 18.06.2008 19:46, David M. wrote:

filesystems?
Exactly. They would work but the output would be less meaningful or
harder to interpret. In other words if you do a “du” in the filesystem
you know pretty good which folders or files you’ll have to remove to get
more space on your device whereas that would not be as obvious for a
similar Ruby tool.

Then there are different types of links - soft links and hard links.
This gives file system utilities a means to ignore paths. Usually
soft links are much more often used than hard links (in my personal
experience anyways) while hard links more closely resemble object
references in Ruby.

However, filesystem utilities can be made to follow softlinks. Also, I’d argue
that the choice to use softlinks over hardlinks has nothing to do with making
life easier for du, and everything to do with the relative semantics of
actual usage here.

Absolutely. The point was simply that there are different types of
links which can be instrumented by the user to control what tools like
“ls” will output. So the user can state “follow soft links” or “don’t
follow soft links” which is a distinction you cannot make with Ruby’s
object references. And this distinction can be used to determine the
scope of objects (i.e. files and directories) that are reached by those
unix tools. Since you do not have that in Ruby land you have less
control.

Kind regards

robert

quik77 · June 19, 2008, 8:35am

On 18.06.2008 23:23, Marc H. wrote:

What about really simple objects?

a = “a”
b = “b”
c = “c”

They are probably pretty easy to memory diagnose. But even they can be
referenced from multiple other objects which introduces the issue again:
how do you count them?

Kind regards

robert