Optimization tweak . Using fork as a "mark" and "release" he


#1

Some old Pascal implementations had (and I think some still do) had a
facility to “mark” the heap, and then at some point “release” all
items allocated after that mark.

Here is a nifty way of doing the same (and more!) in ruby…

==========================try.rb======================================
pid = Process.fork do

Load any modules we need

require ‘find’

a = ‘x’ * 10010241024

end

pid, result = Process.waitpid2( pid)

Here is an edited version of the result of running (from root)
strace -v -f -o strace.log ruby try.rb

======================================================================
1597 execve("/usr/local/bin/ruby", [“ruby”, “try.rb”], [“HZ=100”,
“SHELL=/bin/bash”, “TERM=xterm”, “OLDPWD=/root”, “USER=root”,
“MAIL=/var/mail/root”, “PATH=/usr/local/sbin:/usr/local/”…,
“PWD=/home/johnc/tmp”, "PS1=\h:\w\$ ", “SHLVL=1”, “HOME=/root”,
“LOGNAME=root”, “DISPLAY=:0.0”, “_=/usr/bin/strace”]) = 0

…all the start up cost of invoking ruby paid once and only once…

Here is the OS level call to fork…
1597 clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb7d57708) = 1598

Note this is really really very fast as unix just creates a complete

copy via COW pages (copy on write) using virtual memory magic.

Parent proc hands waiting for child…

1597 waitpid(1598, <unfinished …>

Child proc loads and evals find.rb

1598 open("/usr/local/lib/ruby/1.9/find.rb", O_RDONLY|O_LARGEFILE) = 3
1598 close(3) = 0
1598 open("/usr/local/lib/ruby/1.9/find.rb", O_RDONLY|O_LARGEFILE) = 3
1598 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfd1baa8) = -1 ENOTTY
(Inappropriate ioctl for device)
1598 read(3, “#\n# find.rb: the Find module for”…, 8192) = 1922

Child proc grabs a huge chunk more memory

1598 brk(0x81c1000) = 0x81c1000
1598 mmap2(NULL, 104861696, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb1925000

Child exits…

1598 exit_group(0) = ?
1597 <… waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0)
= 1598
1597 — SIGCHLD (Child exited) @ 0 (0) —

WHEE! Bang! All the memory and resources associated with the child

are reclaimed completely and instantly by the OS.

parent continues on it merry way light and free…

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : removed_email_address@domain.invalid
New Zealand

Carter’s Clarification of Murphy’s Law.

“Things only ever go right so that they may go more spectacularly wrong
later.”

From this principle, all of life and physics may be deduced.


#2

On Wed, 29 Mar 2006, John C. wrote:

a = ‘x’ * 10010241024

end

pid, result = Process.waitpid2( pid)

you can get an object back from the child using:

 harp:~ > cat a.rb
 def child
   r, w = IO.pipe
   IO.popen('-') do |pipe|
     if pipe
       w.close
       buf = pipe.read
       pipe.close
       raise Marshal.load(r.read) unless $? == 0
       Marshal.load(buf)
     else
       r.close
       begin
         print(Marshal.dump(yield))
       rescue Exception => e
         w.print(Marshal.dump(e))
         exit! 42
       end
     end
   end
 ensure
   r.close
 end

 emsg = lambda{|e| STDERR.puts %Q[#{ e.message } (#{ e.class })\n#{ 

e.backtrace.join “\n” }]}

 p child{ 'value from child' } rescue emsg[$!]

 p child{ error_from_child } rescue emsg[$!]

 p 'but the parent lives'



 harp:~ > ruby a.rb
 "value from child"
 undefined local variable or method `error_from_child' for 

main:Object (NameError)
a.rb:29
a.rb:14:in child' a.rb:4:inchild’
a.rb:29
“but the parent lives”

with all the same memory preserving side effects.

regards.

-a


#3

John C. wrote:

a = ‘x’ * 10010241024

end

pid, result = Process.waitpid2( pid)

If possible, disable GC in the fork. That can greatly reduce memory
usage because the GC mark algorithm has to touch every reachable block
of allocated heap memory. So the memory manager has to copy most of the
original process anyway–the COW advantage is lost. This is especially
true if the parent process has a lot of objects. Example:

a = (1…2_000_000).map {[]} # emulate a big ObjectSpace

10.times do
pid = fork do
GC.disable if ARGV[0] == “nogc”
a = ‘x’ * 1010241024 # trigger GC, if enabled
puts free[/Swap.*/]
end
end

Process.waitall

$ time ruby fork-gc.rb nogc
Swap: 489940 137340 352600
Swap: 489940 137340 352600
Swap: 489940 137340 352600
Swap: 489940 137340 352600
Swap: 489940 137340 352600
Swap: 489940 137340 352600
Swap: 489940 137340 352600
Swap: 489940 137340 352600
Swap: 489940 137340 352600
Swap: 489940 137340 352600
ruby fork-gc.rb nogc 5.29s user 0.62s system 97% cpu 6.049 total
$ time ruby fork-gc.rb
Swap: 489940 326976 162964
Swap: 489940 327100 162840
Swap: 489940 327336 162604
Swap: 489940 330228 159712
Swap: 489940 334664 155276
Swap: 489940 330456 159484
Swap: 489940 329060 160880
Swap: 489940 328124 161816
Swap: 489940 327148 162792
Swap: 489940 327072 162868
ruby fork-gc.rb 8.82s user 2.97s system 28% cpu 40.712 total

Note the big increase in swap used (second column of numbers).

** Caution: on my 512MB system this can thrash for a while. If you have
less memory, change the parameters.