On 12/2/05, Mauricio Fernández [email protected] wrote:
On Fri, Dec 02, 2005 at 07:27:32PM +0900, Ron J. wrote:
Ruby does use COW for arrays and strings, so internal pointers are
reused and the data they point to duplicated when actually needed.
somearray[start, size] will always reuse the internal array.
If I’m reading rb_str_substr correctly, somestring[start, rsize] will
copy the bytes unless start’+rsize == somestring.size (start’ being
the adjusted offset), amongst other conditions.
What does “COW” mean?
A serious problem occurs when you go modify the original (likely
large) array after taking a slice. It allocates a new area for the
original array and leaves the slice (probably small) pointing the the
original area with the entire array. This can cause serious
performance issues (memory and run-time). Here is a simple example
that shows the problem:
time ruby -e ‘n=2**13; a=(1…n).to_a; n.times { |i| a.push(a[i,1]) };
IO.readlines(“/proc/#$$/status”).grep(/VmSize/).display’
VmSize: 593020 kB
50.44user 0.65system 1:00.24elapsed 84%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (1957major+121801minor)pagefaults 0swaps
You’ll find that both memory and runtime are O(n**2). vs. the
equivalent without element sharing (O(n)):
time ruby -e ‘n=2**13; a=(1…n).to_a; n.times { |i| a.push([a[i]]) };
IO.readlines(“/proc/#$$/status”).grep(/VmSize/).display’
VmSize: 3384 kB
0.01user 0.00system 0:00.22elapsed 7%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (1major+558minor)pagefaults 0swaps
Back in September, I provided a patch to 1.9 for arrays. It addressed
this issue plus made operations near the beginning of the array (i.e.
shift/unshift) just as fast as operations near the end (i.e.
pop/push). Here was my post:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/5861
This patch hasn’t been accepted to the ruby head (yet). The guys
doing the Sydney fork of ruby (1.8 derivative) do plan on integrating
it. I’m not sure where they are on it.
I haven’t had time to do the same for strings.