Garbage collector in Ruby 2.2 provokes unexpected CoW

How do I prevent the GC from provoking copy-on-write, when I fork my
process ? I have recently been analyzing the garbage collector’s
behavior in Ruby, due to some memory issues that I encountered in my
program (I run out of memory on my 60core 0.5Tb machine even for fairly
small tasks). For me this really limits the usefulness of ruby for
running programs on multicore servers. I would like to present my
experiments and results here.

The issue arises when the garbage collector runs during forking. I have
investigated three cases that illustrate the issue.

Case 1: We allocate a lot of objects (strings no longer than 20 bytes)
in the memory using an array. The strings are created using a random
number and string formatting. When the process forks and we force the GC
to run in the child, all the shared memory goes private, causing a
duplication of the initial memory.

Case 2: We allocate a lot of objects (strings) in the memory using an
array, but the string is created using the rand.to_s function, hence we
remove the formatting of the data compared to the previous case. We end
up with a smaller amount of memory being used, presumably due to less
garbage. When the process forks and we force the GC to run in the child,
only part of the memory goes private. We have a duplication of the
initial memory, but to a smaller extent.

Case 3: We allocate fewer objects compared to before, but the objects
are bigger, such that the amount of memory allocated stays the same as
in the previous cases. When the process forks and we force the GC to run
in the child all the memory stays shared, i.e. no memory duplication.

Here I paste the Ruby code that has been used for these experiments. To
switch between cases you only need to change the “option” value in the
memory_object function. The code was tested using Ruby 2.2.2, 2.2.1,
2.1.3, 2.1.5 and 1.9.3 on an Ubuntu 14.04 machine.

Sample output for case 1:

ruby version 2.2.2
 proces   pid log                   priv_dirty   shared_dirty
 Parent  3897 post alloc                   38             0
 Parent  3897 4 fork                        0           37
 Child   3937 4 initial                     0           37
 Child   3937 8 empty GC                   35             5

The exact same code has been written in Python and in all cases the CoW
works perfectly fine.

Sample output for case 1:

python version 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2]
 proces   pid log                   priv_dirty shared_dirty
 Parent  4308 post alloc                35             0
 Parent  4308 4 fork                     0            35
 Child   4309 4 initial                  0            35
 Child   4309 10 empty GC                1            34

Ruby code

$start_time=Time.new

# Monitor use of Resident and Virtual memory.
class Memory

    shared_dirty = '.+?Shared_Dirty:\s+(\d+)'
    priv_dirty = '.+?Private_Dirty:\s+(\d+)'
    MEM_REGEXP = /#{shared_dirty}#{priv_dirty}/m

  # get memory usage
    def self.get_memory_map( pids)
        memory_map = {}
        memory_map[ :pids_found] = {}
        memory_map[ :shared_dirty] = 0
        memory_map[ :priv_dirty] = 0

        pids.each do |pid|
            begin
                lines = nil
                lines = File.read( "/proc/#{pid}/smaps")
            rescue
                lines = nil
            end
            if lines
                lines.scan(MEM_REGEXP) do |shared_dirty, priv_dirty|
                    memory_map[ :pids_found][pid] = true
                    memory_map[ :shared_dirty] += shared_dirty.to_i
                    memory_map[ :priv_dirty] += priv_dirty.to_i
                end
            end
        end
        memory_map[ :pids_found] = memory_map[ :pids_found].keys
        return memory_map
    end

  # get the processes and get the value of the memory usage
    def self.memory_usage( )
    pids   = [ $$]
    result = self.get_memory_map( pids)

        result[ :pids]   = pids
        return result
    end

  # print the values of the private and shared memories
    def self.log( process_name='', log_tag="")
        if process_name == "header"
            puts " %-6s %5s %-12s %10s %10s\n" % ["proces", "pid",

“log”, “priv_dirty”, “shared_dirty”]
else
time = Time.new - $start_time
mem = Memory.memory_usage( )
puts " %-6s %5d %-12s %10d %10d\n" % [process_name, $$,
log_tag, mem[:priv_dirty]/1000, mem[:shared_dirty]/1000]
end
end
end

# function to delay the processes a bit
def time_step( n)
    while Time.new - $start_time < n
        sleep( 0.01)
    end
end

# create an object of specified size. The option argument can be

changed from 0 to 2 to visualize the behavior of the GC in various cases
#
# case 0 (default) : we make a huge array of small objects by
formatting a string
# case 1 : we make a huge array of small objects without formatting
a string (we use the to_s function)
# case 2 : we make a smaller array of big objects
def memory_object( size, option=1)
result = []
count = size/20

    if option > 3 or option < 1
    count.times do
      result << "%20.18f" % rand
    end
    elsif option == 1
    count.times do
      result << rand.to_s
    end
  elsif option == 2
    count = count/10
    count.times do
      result << ("%20.18f" % rand)*30
    end
  end

    return result
end

##### main #####

puts "ruby version #{RUBY_VERSION}"

GC.disable

# print the column headers and first line
Memory.log( "header")

# Allocation of memory
big_memory = memory_object( 1000 * 1000 * 10)

Memory.log( "Parent", "post alloc")

lab_time = Time.new - $start_time
if lab_time < 3.9
    lab_time = 0
end

# start the forking
pid = fork do
    time = 4
    time_step( time + lab_time)
    Memory.log( "Child", "#{time} initial")

  # force GC when nothing happened
  GC.enable; GC.start; GC.disable

    time = 8
    time_step( time + lab_time)
    Memory.log( "Child", "#{time} empty GC")

    sleep( 1)
    STDOUT.flush
    exit!
end

time = 4
time_step( time + lab_time)
Memory.log( "Parent", "#{time} fork")

# wait for the child to finish
Process.wait( pid)

Python code

import re
import time
import os
import random
import sys
import gc

start_time=time.time()

# Monitor use of Resident and Virtual memory.
class Memory:

  def __init__(self):
    self.shared_dirty = '.+?Shared_Dirty:\s+(\d+)'
    self.priv_dirty = '.+?Private_Dirty:\s+(\d+)'
    self.MEM_REGEXP =

re.compile("{shared_dirty}{priv_dirty}".format(shared_dirty=self.shared_dirty,
priv_dirty=self.priv_dirty), re.DOTALL)

  # get memory usage
  def get_memory_map(self, pids):
    memory_map = {}
    memory_map[ "pids_found" ] = {}
    memory_map[ "shared_dirty" ] = 0
    memory_map[ "priv_dirty" ] = 0

    for pid in pids:
      try:
        lines = None

        with open( "/proc/{pid}/smaps".format(pid=pid), "r" ) as

infile:
lines = infile.read()
except:
lines = None

      if lines:
        for shared_dirty, priv_dirty in re.findall( self.MEM_REGEXP,

lines ):
memory_map[ “pids_found” ][pid] = True
memory_map[ “shared_dirty” ] += int( shared_dirty )
memory_map[ “priv_dirty” ] += int( priv_dirty )

    memory_map[ "pids_found" ] = memory_map[ "pids_found" ].keys()
    return memory_map

  # get the processes and get the value of the memory usage
  def memory_usage( self):
    pids   = [ os.getpid() ]
    result = self.get_memory_map( pids)

    result[ "pids" ]   = pids

    return result

  # print the values of the private and shared memories
  def log( self, process_name='', log_tag=""):
    if process_name == "header":
      print " %-6s %5s %-12s %10s %10s" % ("proces", "pid", "log",

“priv_dirty”, “shared_dirty”)
else:
global start_time
Time = time.time() - start_time
mem = self.memory_usage( )
print " %-6s %5d %-12s %10d %10d" % (process_name,
os.getpid(), log_tag, mem[“priv_dirty”]/1000, mem[“shared_dirty”]/1000)

# function to delay the processes a bit
def time_step( n):
  global start_time
  while (time.time() - start_time) < n:
    time.sleep( 0.01)

# create an object of specified size. The option argument can be

changed from 0 to 2 to visualize the behavior of the GC in various cases
#
# case 0 (default) : we make a huge array of small objects by
formatting a string
# case 1 : we make a huge array of small objects without formatting
a string (we use the to_s function)
# case 2 : we make a smaller array of big objects
def memory_object( size, option=2):
count = size/20

  if option > 3 or option < 1:
    result = [ "%20.18f"% random.random() for i in xrange(count) ]

  elif option == 1:
    result = [ str( random.random() ) for i in xrange(count) ]

  elif option == 2:
    count = count/10
    result = [ ("%20.18f"% random.random())*30 for i in

xrange(count) ]

  return result

##### main #####

print "python version {version}".format(version=sys.version)

memory = Memory()

gc.disable()

# print the column headers and first line
memory.log( "header")  # Print the headers of the columns

# Allocation of memory
big_memory = memory_object( 1000 * 1000 * 10)  # Allocate memory

memory.log( "Parent", "post alloc")

lab_time = time.time() - start_time
if lab_time < 3.9:
    lab_time = 0

# start the forking
pid = os.fork()    # fork the process
if pid == 0:
  Time = 4
  time_step( Time + lab_time)
  memory.log( "Child", "{time} initial".format(time=Time))

  # force GC when nothing happened
  gc.enable(); gc.collect(); gc.disable();

  Time = 10
  time_step( Time + lab_time)
  memory.log( "Child", "{time} empty GC".format(time=Time))

  time.sleep( 1)

  sys.exit(0)

Time = 4
time_step( Time + lab_time)
memory.log( "Parent", "{time} fork".format(time=Time))

# Wait for child process to finish
os.waitpid( pid, 0)

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs