Forum: Ruby Array flatten memory "leak"?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Ron M (Guest)
on 2007-07-20 02:09
(Received via mailing list)
After I use "flatten" on a big array; the garbage collector
seems to get pretty forgetful of what memory it's allowed
to free up.

In both of the following scripts, the first two lines
create an array of many arrays; and then flatten them
back into a flat array.

In both cases, the memory after creating this array
the scripts use about 20MB.   After that, they
repeatedly pack and unpack an array.   The script that
used "flatten" keeps growing quickly until it used
about 1/4 GB of memory.   The script that flattened
the array by hand stays right about at 20MB.

What could it be about flatten that makes the
garbage collector fail to see that it could be
re-using the memory of the pack/unpack loop?


#########################################################################
#  This unexpectedly grows to 200+MB.
#########################################################################
a = (1..200000).map{|x| [x.to_s]};a.length
b = a.flatten.map{|x| x.to_i} ; b.length
puts `grep VmSize /proc/#{$$}/status`           ######## under 20 MB
(1..100).each{|x| b.pack("I*").unpack("I*");}
puts `grep VmSize /proc/#{$$}/status`           ######## WHOA GREW TO
1/4 GB.


#########################################################################
#  This stays at 20MB
#########################################################################
a = (1..200000).map{|x| [x.to_s]};a.length
b = a.map{|x| x[0].to_i} ; b.length
puts `grep VmSize /proc/#{$$}/status`         ######## under 20 MB
(1..100).each{|x| b.pack("I*").unpack("I*");}
puts `grep VmSize /proc/#{$$}/status`         ######## still 20 MB
ara.t.howard (Guest)
on 2007-07-20 06:47
(Received via mailing list)
On Jul 19, 2007, at 4:07 PM, Ron M wrote:

> ######################################################################
> ###
> #  This unexpectedly grows to 200+MB.
          not
> ######################################################################
> ###
> a = (1..200000).map{|x| [x.to_s]};a.length
> b = a.flatten.map{|x| x.to_i} ; b.length

a         #=> big array
a.flatten #=> copy of said big array
a.flatten #=> copy of flattened array

so you take two copies to get one and neglect to call GC.start

> ###
> a = (1..200000).map{|x| [x.to_s]};a.length
> b = a.map{|x| x[0].to_i} ; b.length
> puts `grep VmSize /proc/#{$$}/status`         ######## under 20 MB
> (1..100).each{|x| b.pack("I*").unpack("I*");}
> puts `grep VmSize /proc/#{$$}/status`         ######## still 20 MB
>

a @ http://drawohara.com/
Eric M. (Guest)
on 2007-07-20 06:49
(Received via mailing list)
On 7/19/07, Ron M <removed_email_address@domain.invalid> wrote:
> repeatedly pack and unpack an array.   The script that
> #  This unexpectedly grows to 200+MB.
> #########################################################################
> a = (1..200000).map{|x| [x.to_s]};a.length
> b = a.map{|x| x[0].to_i} ; b.length
> puts `grep VmSize /proc/#{$$}/status`         ######## under 20 MB
> (1..100).each{|x| b.pack("I*").unpack("I*");}
> puts `grep VmSize /proc/#{$$}/status`         ######## still 20 MB
>


I isolated this a bit down by looking at the C code.  The culprit in
Array#flatten seems to be rb_ary_splice which also gets called when
assigning an array slice (one form of Array#[]=).  The culprit in
Array#pack seems to be rb_str_buf_cat which also gets called  from
String#<<FixNum.  I also found that you could even completely remove
references to the array you flattened/sliced and the problem still
persisted.  A more fundamental test showing the problem would be this:

a = Array.new(200000) { |x| [x] }
a.size.times { |i| a[i,1] = a[i][0,1] } # calls rb_ary_splice,
triggering the problem
#a.size.times { |i| a[i] = a[i][0] } # calls rb_ary_store which seems OK
puts `grep VmSize /proc/#{$$}/status`
a = nil
GC.start
(1..200).each{ s="";200000.times{ s<<(?A) }} # calls rb_str_buf_cat
which "leaks"
#(1..200).each{ s="";200000.times{ s<<"A" }} # calls rb_str_append
which seems OK
puts `grep VmSize /proc/#{$$}/status`
GC.start
puts `grep VmSize /proc/#{$$}/status`

The amount of "leakage" seems to be about linear with respect to the
number of iterations in the last loop.  Each string generated in the
loop should be 200K, but is not used again, so the memory increase due
to the last loop should be O(1) (around 200K independent of the number
of iterations).  Also, the GC.start doesn't help.

If you use either of the alternatives, it seems to work.

This sure looks like a bug to me.

Eric
Ron M (Guest)
on 2007-07-22 04:10
(Received via mailing list)
ara.t.howard wrote:
> On Jul 19, 2007, at 4:07 PM, Ron M wrote:
>> #########################################################################
>> #  This unexpectedly grows to 200+MB.
>          not
really?

>> #########################################################################
>> a = (1..200000).map{|x| [x.to_s]};a.length
>> b = a.flatten.map{|x| x.to_i} ; b.length
> a         #=> big array
> a.flatten #=> copy of said big array
> a.flatten #=> copy of flattened array
>
> so you take two copies to get one and neglect to call GC.start

But at that point the program is still very small (20MB).

But the strange (to me) thing is that is that the next loop -
which seems like it shouldn't be allocating more memory since
nothing gets saved to variables - where the program quickly
grows to over 200MB.

>> puts `grep VmSize /proc/#{$$}/status`           ######## under 20 MB
>> (1..100).each{|x| b.pack("I*").unpack("I*");}
>> puts `grep VmSize /proc/#{$$}/status`           ######## WHOA GREW TO

But this loop increasing memory seemingly without bounds seems
somewhat unrelated to the lines above with flatten that only
used 20MB, doesn't it?
Ron M (Guest)
on 2007-07-22 04:32
(Received via mailing list)
Is this a good place to report bugs?

I'm getting this segfault with ruby 1.8.4 and 1.8.6 with
this one liner:


% ruby -e 'a = [1,2,3,4,5]; (1..20).each{ a.push(*a)}'
Segmentation fault



Yes, I know I could workaround it with "concat"; but
push(*x) was much faster when I benchmarked them earlier
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/...
Nobuyoshi N. (Guest)
on 2007-07-22 06:01
(Received via mailing list)
Hi,

At Sun, 22 Jul 2007 09:30:58 +0900,
Ron M wrote in [ruby-core:11777]:
> Is this a good place to report bugs?

Yes.

> I'm getting this segfault with ruby 1.8.4 and 1.8.6 with
> this one liner:
>
> % ruby -e 'a = [1,2,3,4,5]; (1..20).each{ a.push(*a)}'
> Segmentation fault

SEGV without [BUG] message means stack overflow.  a.push(*a)
doubles a, and repeating it 20 times makes a 1<<20 times.

This issue is same as [ruby-talk:255290], and can be avoided by
not using alloca() as I mentioned in [ruby-talk:255418].
ara.t.howard (Guest)
on 2007-07-22 07:15
(Received via mailing list)
On Jul 21, 2007, at 6:08 PM, Ron M wrote:

> But the strange (to me) thing is that is that the next loop -
> which seems like it shouldn't be allocating more memory since
> nothing gets saved to variables - where the program quickly
> grows to over 200MB

but assigning to variables isn't what determines whether memory is
used, it's a simple matter of object creation:

cfp:~ > cat a.rb
#! /usr/bin/env ruby
STDOUT.sync = true
require 'yaml'

def n_objects
   n = 0 and ObjectSpace.each_object{ n += 1 }
   n
end

y 'n_objects_0' => n_objects

65635.times{ "foo" }

y 'n_objects_1' => n_objects

GC.start

y 'n_objects_2' => n_objects



cfp:~ > ruby a.rb
---
n_objects_0: 2599
---
n_objects_1: 28143
---
n_objects_2: 1648



if it were then n_objects_1 wouldn't show such a huge increase: we've
assigned no variables here.  my point is that what is important is
object creation and when those objects get freed by the GC.

regards.

a @ http://drawohara.com/
This topic is locked and can not be replied to.