Ruby 1.9.1 linear memory increase in a simple for loop

okkezSS · October 29, 2010, 3:17pm

The following text code shows a linear rise in memory usage. Since this
is a simple loop, I am guessing there’s something fundamentally wrong
(either in my understanding of ruby’s garbage collection, or in ruby’s
garbage collection):

memtest.rb

require ‘matrix’
a = [[1,0,1,0,0,1,1,1],[0, 0, 0, 0, 0, 1, 1, 0],[1, 1, 1, 1, 0, 1, 1,
1],[0, 0, 1, 1, 1, 1, 0, 1],[0, 0, 1, 1, 1, 1, 0, 1],[0, 0, 0, 1, 1, 1,
1, 1],[0, 1, 1, 1, 1, 1, 1, 1],[0, 1, 0, 1, 1, 0, 1, 1]]

m = Matrix.rows(a)
n = Matrix.identity(8)

for i in 1…100 do
n = n * m;
physical_mem_usage = ps -o rss= -p #{Process.pid}.to_i;
virtual_mem_usage = ps -o vsz= -p #{Process.pid}.to_i;
puts “#{i}: VMU = #{virtual_mem}; PMU = #{physical_mem}.”
puts ObjectSpace.count_objects()

ObjectSpace.garbage_collect

STDOUT.flush

end

This code shows a linear increase in memory usage. The
ObjectSpace.count_objects call shows a steady linear increase in counts
of T_ARRAY, T_HASH, T_STRUCT, T_STRING and T_OBJECT.

Can anyone explain this? Typically memory explosions occur in complex
codes with lots of references, and in those cases the memory rise is
exponential. In the above loop, invoking garbage collection manually
(it’s commented out in the snippet above) slows the memory increase, but
only by a constant factor.
For example, if the loop used up 30 kB in 100 iterations originally, it
uses up 30kB in 500 iterations after invoking garbage collection
manually.

Understanding this is critical for me since I am dealing with matrices
of enormous sizes, so even a small memory leak can jeopardize my
project.

chronos · October 29, 2010, 4:01pm

On Fri, Oct 29, 2010 at 3:17 PM, Ritwik Banerjee
[email protected] wrote:

1, 1],[0, 1, 1, 1, 1, 1, 1, 1],[0, 1, 0, 1, 1, 0, 1, 1]]

ObjectSpace.garbage_collect

codes with lots of references, and in those cases the memory rise is
exponential. In the above loop, invoking garbage collection manually
(it’s commented out in the snippet above) slows the memory increase, but
only by a constant factor.
For example, if the loop used up 30 kB in 100 iterations originally, it
uses up 30kB in 500 iterations after invoking garbage collection
manually.

Understanding this is critical for me since I am dealing with matrices
of enormous sizes, so even a small memory leak can jeopardize my
project.

AFAIK there is a certain memory threshold below which the interpreter
never does GC. This helps make small scripts run fast. You probably
need to let your program run much longer and monitor memory to see the
long term effect. Only if you see dramatic increase in the long run
you may conclude that you found a leak IMHO.

Kind regards

robert

chronos · October 29, 2010, 7:11pm

On Fri, Oct 29, 2010 at 3:17 PM, Ritwik Banerjee
[email protected] wrote:

1, 1],[0, 1, 1, 1, 1, 1, 1, 1],[0, 1, 0, 1, 1, 0, 1, 1]]

ObjectSpace.garbage_collect

codes with lots of references, and in those cases the memory rise is
exponential. In the above loop, invoking garbage collection manually
(it’s commented out in the snippet above) slows the memory increase, but
only by a constant factor.
For example, if the loop used up 30 kB in 100 iterations originally, it
uses up 30kB in 500 iterations after invoking garbage collection
manually.

Understanding this is critical for me since I am dealing with matrices
of enormous sizes, so even a small memory leak can jeopardize my
project.

You are calculating n=m**i (m to the i-th power) in subsequent
iterations. The entries in n grow exponentially with rising i. Storage
size of integers is logarithmic (in the Bignum range), meaning that
storage size for n increases linearly. That’s why you see memory
increase linearly.

Peter

chronos · October 30, 2010, 12:12pm

Calamitas wrote in post #958065:

On Fri, Oct 29, 2010 at 3:17 PM, Ritwik Banerjee
[email protected] wrote:

1, 1],[0, 1, 1, 1, 1, 1, 1, 1],[0, 1, 0, 1, 1, 0, 1, 1]]

ObjectSpace.garbage_collect

codes with lots of references, and in those cases the memory rise is
exponential. In the above loop, invoking garbage collection manually
(it’s commented out in the snippet above) slows the memory increase, but
only by a constant factor.
For example, if the loop used up 30 kB in 100 iterations originally, it
uses up 30kB in 500 iterations after invoking garbage collection
manually.

Understanding this is critical for me since I am dealing with matrices
of enormous sizes, so even a small memory leak can jeopardize my
project.

You are calculating n=m**i (m to the i-th power) in subsequent
iterations. The entries in n grow exponentially with rising i. Storage
size of integers is logarithmic (in the Bignum range), meaning that
storage size for n increases linearly. That’s why you see memory
increase linearly.

Peter

I had thought of that too. But, as you can see, in this particular
example I only raise it to the 100th power with an initial binary
matrix. The numbers don’t get big enough to explain the kind of memory
increase I see. Also, an increase in the size of integers should not
increase the number of objects in classes like T_ARRAY, T_HASH,
T_STRUCT, right?
Another point: If I use a matrix from my actual project, however, even
going up to 25 iterations consumes almost my entire 2GB RAM. There too,
the initial matrix consists only of 0/1 entries. In 25 iterations, I
don’t think it should be anywhere near 2GB!

Even so, do you think it is a good idea to cast into float at every
iteration to see if that helps? I’ve avoided floats because of problems
such as (0.3 - 0.2 == 0.1) returns false.

chronos · October 29, 2010, 4:26pm

Robert K. wrote in post #958018:

On Fri, Oct 29, 2010 at 3:17 PM, Ritwik Banerjee
[email protected] wrote:

1, 1],[0, 1, 1, 1, 1, 1, 1, 1],[0, 1, 0, 1, 1, 0, 1, 1]]

ObjectSpace.garbage_collect

codes with lots of references, and in those cases the memory rise is
exponential. In the above loop, invoking garbage collection manually
(it’s commented out in the snippet above) slows the memory increase, but
only by a constant factor.
For example, if the loop used up 30 kB in 100 iterations originally, it
uses up 30kB in 500 iterations after invoking garbage collection
manually.

Understanding this is critical for me since I am dealing with matrices
of enormous sizes, so even a small memory leak can jeopardize my
project.

AFAIK there is a certain memory threshold below which the interpreter
never does GC. This helps make small scripts run fast. You probably
need to let your program run much longer and monitor memory to see the
long term effect. Only if you see dramatic increase in the long run
you may conclude that you found a leak IMHO.

Kind regards

robert

Thanks Robert. You are absolutely right. Even I think Ruby doesn’t
invoke GC for short codes. The code I provided was a mere example. Even
in a much larger code, the nature of the problem remains exactly the
same: linear increase in memory and number of objects in memory heap;
while manual invocation of GC cuts down the memory increase by a
constant factor.

Ritwik

chronos · October 31, 2010, 2:30pm

On Sat, Oct 30, 2010 at 12:12 PM, Ritwik Banerjee
[email protected] wrote:

I had thought of that too. But, as you can see, in this particular
example I only raise it to the 100th power with an initial binary
matrix. The numbers don’t get big enough to explain the kind of memory
increase I see. Also, an increase in the size of integers should not
increase the number of objects in classes like T_ARRAY, T_HASH,
T_STRUCT, right?

Ruby’s garbage collector is triggered by the number of objects
exceeding a threshold, and not their total size. So it is entirely
possible for a Ruby process to eat up all your computer’s memory
without Ruby’s garbage collector ever kicking in because the process
allocates few large objects rather than many small objects. This has
been a problem for libraries such as RMagick (Google for RMagick and
memory leaks), and they mitigate it by triggering garbage collection
explicitly.

Another point: If I use a matrix from my actual project, however, even
going up to 25 iterations consumes almost my entire 2GB RAM. There too,
the initial matrix consists only of 0/1 entries. In 25 iterations, I
don’t think it should be anywhere near 2GB!

Hmmm, I never ran your program to check the actual memory consumption,
I just looked at the matrix and the entries. On my machine I don’t see
the memory usage growing for your example. And you are right, for your
example the numbers aren’t that large yet.

Note though that even if it is a binary matrix, the entries do grow
exponentially depending on the number of 1’s per row/column and the
size of the matrix. In the example you gave the entries increase by a
factor of a bit more than 5. If you had all ones, the factor would
have been 8 which is the size of the matrix. How close it is to this
maximum depends on the fraction of ones in the matrix and on their
distribution (or rather their pattern).

Even so, do you think it is a good idea to cast into float at every
iteration to see if that helps? I’ve avoided floats because of problems
such as (0.3 - 0.2 == 0.1) returns false.

It depends on what you want to do with the result. Matrix
multiplication where the entries are all non-negative is stable (your
binary matrix satisfies this condition, as well as any powers of that
matrix). This does not mean that the results will be exact, but it
does mean that rounding errors only grow slowly.

Personally, I would just try floats for your specific case and see if
you can use the results. Note that “zeroness” will always be exact, so
if the matrix happens to represent an adjacency matrix and you are
mostly interested in which entries are zero and which are not, then
you can safely use floats.

Note that you can speed up calculating a specific power of a matrix
more efficiently as outlined in here:

. The advantage of this is that you do fewer multiplications and you
will have smaller rounding errors in the case of floats.

Peter

chronos · November 1, 2010, 6:18am

Calamitas wrote in post #958323:

On Sat, Oct 30, 2010 at 12:12 PM, Ritwik Banerjee
[email protected] wrote:

I had thought of that too. But, as you can see, in this particular
example I only raise it to the 100th power with an initial binary
matrix. The numbers don’t get big enough to explain the kind of memory
increase I see. Also, an increase in the size of integers should not
increase the number of objects in classes like T_ARRAY, T_HASH,
T_STRUCT, right?

Ruby’s garbage collector is triggered by the number of objects
exceeding a threshold, and not their total size. So it is entirely
possible for a Ruby process to eat up all your computer’s memory
without Ruby’s garbage collector ever kicking in because the process
allocates few large objects rather than many small objects. This has
been a problem for libraries such as RMagick (Google for RMagick and
memory leaks), and they mitigate it by triggering garbage collection
explicitly.

Another point: If I use a matrix from my actual project, however, even
going up to 25 iterations consumes almost my entire 2GB RAM. There too,
the initial matrix consists only of 0/1 entries. In 25 iterations, I
don’t think it should be anywhere near 2GB!

Hmmm, I never ran your program to check the actual memory consumption,
I just looked at the matrix and the entries. On my machine I don’t see
the memory usage growing for your example. And you are right, for your
example the numbers aren’t that large yet.

Note though that even if it is a binary matrix, the entries do grow
exponentially depending on the number of 1’s per row/column and the
size of the matrix. In the example you gave the entries increase by a
factor of a bit more than 5. If you had all ones, the factor would
have been 8 which is the size of the matrix. How close it is to this
maximum depends on the fraction of ones in the matrix and on their
distribution (or rather their pattern).

Even so, do you think it is a good idea to cast into float at every
iteration to see if that helps? I’ve avoided floats because of problems
such as (0.3 - 0.2 == 0.1) returns false.

It depends on what you want to do with the result. Matrix
multiplication where the entries are all non-negative is stable (your
binary matrix satisfies this condition, as well as any powers of that
matrix). This does not mean that the results will be exact, but it
does mean that rounding errors only grow slowly.

Personally, I would just try floats for your specific case and see if
you can use the results. Note that “zeroness” will always be exact, so
if the matrix happens to represent an adjacency matrix and you are
mostly interested in which entries are zero and which are not, then
you can safely use floats.

Note that you can speed up calculating a specific power of a matrix
more efficiently as outlined in here:

. The advantage of this is that you do fewer multiplications and you
will have smaller rounding errors in the case of floats.

Peter

Thanks a lot for this detailed reply, Peter. I did, however, try the
same for-loop with explicit calls to the garbage collector. That slowed
down the increase in memory usage, but negligibly. I also restricted the
number of decimal places of each float to 6 (just to see whether the
number of bits taken up by each number can collectively matter so much),
but that didn’t change anything.

So, to sum it up, the memory usage can’t be reduced by explicit calls to
the garbage collector at each iteration, and it can’t be reduced by
restricting the number of bits used by each number.

chronos · November 1, 2010, 3:58pm

On Mon, Nov 1, 2010 at 6:18 AM, Ritwik Banerjee
[email protected] wrote:

Thanks a lot for this detailed reply, Peter. I did, however, try the
same for-loop with explicit calls to the garbage collector. That slowed
down the increase in memory usage, but negligibly. I also restricted the
number of decimal places of each float to 6 (just to see whether the
number of bits taken up by each number can collectively matter so much),
but that didn’t change anything.

So, to sum it up, the memory usage can’t be reduced by explicit calls to
the garbage collector at each iteration, and it can’t be reduced by
restricting the number of bits used by each number.

Curious, because I’m seeing the exact behavior I described. Using
integers with or without triggering garbage collection gives a steady,
linear increase in memory consumption, but nothing anywhere near what
you are describing. I tried larger matrices and memory use increased
faster but it is nowhere near what you are describing (unless you are
using 10000x10000 matrices or so). When I use floats, memory use is
constant.

Can you try another version of Ruby? This is the version I’m using:

peter@scruffy:~$ ruby1.9 -v
ruby 1.9.0 (2008-10-04 revision 19669) [i486-linux]

Peter