This posting is a follow up to this thread:
http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/a9dd0d1b466909da/5c2a8a38cac4259f?lnk=gst&q=array+based+class#5c2a8a38cac4259f
In case anyone is still interested…
I decided to again pick up my testing of memory usage and speed for
object initialization complete with resident data. I settled on a
fairly pragmatic approach that mimics my experience with very large
data sets where I may have many millions of objects with anywhere from
7 to 20 attributes. While wanting these to fit into memory, I’d also
like to be able to access their attributes like a normal object (i.e.,
relatively transparent implementation). I’ve done some testing with
some possible objects to use for this, and here are the results.
Method: (this is pretty pragmatic)
[ I’ve included the test files below ] I turn off all my applications
and network interfaces. I flush the swap (with swapoff and then
swapon). For one set of tests I disabled garbage collection, for
another I left it on. Then I start creating objects for the class in
question, initializing each with a 7 member array. Every 100 objects
I check to see if we’ve started to write to system swap yet. If we
are, I calculate the total memory we’ve used and the number of objects
to calculate bytes per object and time per object. I also did a small
Benchmark.bmbm test of the creation of 500,000 objects and the times
for object creation correlate almost perfectly (r^2 ~0.998) with the
‘with GC’ results (a sanity check). I repeated the memory tests on
each object (performed in random order though it shouldn’t make a
difference) 4 times each. I also ran the speed test 4 times.
MEMORY TEST
GC disabled (total initial allocation size)
(bytes/object)
class avg stdev
class1 591.1 0.29
class2 591.8 1.96
Array 360.6 0.06
ArrayClass 360.8 0.12
ArrayFields 555.3 1.02
Struct 452.6 0.32
with GC (cleaned up object size)
class1 296.9 2.27
class2 294.7 3.41
Array 49.5 0.32
ArrayClass 49.1 1.90
ArrayFields 162.5 1.21
Struct 88.6 1.46
SPEED TEST
This is time (microseconds) per object created. ‘speed’ is the
creation of 500,000 objects.
(best viewed with a monospace font)
with GC GC disabled speed
avg stdev avg stdev avg stdev
Array 14.8 0.52 9.3 0.30 1.01 0.01
Struct 26.9 1.60 10.8 0.35 1.49 0.02
ArrayClass 17.2 0.29 11.8 0.28 1.68 0.06
class1 39.5 0.71 13.8 0.48 2.22 0.07
class2 42.1 2.27 14.3 0.66 2.33 0.04
ArrayFields 65.6 1.48 15.8 0.89 2.71 0.07
######################################################
HERE IS THE MEMORY TEST CODE:
######################################################
#!/usr/bin/ruby
require ‘yaml’
require ‘arrayfields’ # 4.5.0!
require ‘array_class’
########################################
CLASSES
########################################
class Class1
def initialize(attribute_array)
(@f1, @f2, @f3, @f4, @f5, @f6, @f7) = attribute_array
end
end
class Class2
def initialize(f1=nil, f2=nil, f3=nil, f4=nil, f5=nil, f6=nil,
f7=nil)
@f1=f1
@f2=f2
@f3=f3
@f4=f4
@f5=f5
@f6=f6
@f7=f7
end
end
StructClass = Struct.new(:f1, :f2, :f3, :f4, :f5, :f6, :f7)
class ArrayBased < Array
end
ArrayFieldsBased = Array.struct :f1, :f2, :f3, :f4, :f5, :f6, :f7
ArrayClassBased =
ArrayClass.new( [:f1, :f2, :f3, :f4, :f5, :f6, :f7] )
########################################
MEMORY TEST
########################################
FUNCTIONS:
returns [memfree, swapcached]
def get_mem_free
hsh = YAML.load_file(‘/proc/meminfo’)
hsh[‘MemFree’].split(/\s+/)[0].to_i
end
def get_swap_cached
hsh = YAML.load_file(‘/proc/meminfo’)
hsh[‘SwapCached’].split(/\s+/)[0].to_i
end
def init_loop(name, &block)
total_memory = get_mem_free
cnt = 0
start_time = Time.now
loop do
block.call
cnt += 1
if get_swap_cached > 0
total_objects = 100*cnt
gmf = get_mem_free
gsc = get_swap_cached
mem_used = total_memory - gmf + gsc
latest = [name, mem_used.to_f/total_objects, (Time.now -
start_time)/total_objects].join(“\t”)
abort latest
end
end
end
def run_it_array(klass)
myarray = [] # only used while GC is on
init_loop(klass.to_s) { 100.times { myarray <<
klass.new( $att_array.dup) } }
end
def run_it_list(klass)
myarray = [] # only used while GC is on
init_loop(klass.to_s) { 100.times { myarray <<
klass.new( *($att_array.dup)) } }
end
TEST:
$att_array = (1…7).to_a
INITIALIZE SWAP
sudo swapoff /dev/sda3
sleep(1)
sudo swapon /dev/sda3
sleep(1)
klass = ARGV.shift
Garbage collection
GC.disable # comment out to keep GC on
klass_to_use = Kernel.const_get(klass.to_sym)
takes_array = [ Class1, Array, ArrayClassBased, ArrayFieldsBased ]
takes_list = [Class2, StructClass]
if takes_array.include?(klass_to_use)
run_it_array(klass_to_use)
elsif takes_list.include?(klass_to_use)
run_it_list(klass_to_use)
end
######################################################
HERE IS THE STANDALONE ‘speed’ test code:
######################################################
#!/usr/bin/ruby
require ‘yaml’
require ‘benchmark’
include Benchmark
require ‘arrayfields’ # 4.5.0!
require ‘array_class’
########################################
CLASSES
########################################
class Class1
def initialize(attribute_array)
(@f1, @f2, @f3, @f4, @f5, @f6, @f7) = attribute_array
end
end
class Class2
def initialize(f1=nil, f2=nil, f3=nil, f4=nil, f5=nil, f6=nil,
f7=nil)
@f1=f1
@f2=f2
@f3=f3
@f4=f4
@f5=f5
@f6=f6
@f7=f7
end
end
StructBased = Struct.new(:f1, :f2, :f3, :f4, :f5, :f6, :f7)
class ArrayBased < Array
end
ArrayFieldsBased = Array.struct :f1, :f2, :f3, :f4, :f5, :f6, :f7
ArrayClassBased =
ArrayClass.new( [:f1, :f2, :f3, :f4, :f5, :f6, :f7] )
########################################
SPEED TEST
########################################
$att_array = (1…7).to_a
REP = 500_000
Benchmark.bmbm do |x|
x.report(“ArrayFieldsBased”) do
REP.times { ArrayFieldsBased.new( $att_array.dup ) }
end
x.report(“class (type 1 init)”) do
REP.times { Class1.new( $att_array.dup ) }
end
x.report(“class (type 2 init)”) do
REP.times { Class2.new( *($att_array.dup) ) }
end
x.report(“ArrayBased”) do
REP.times { ArrayBased.new( $att_array.dup ) }
end
x.report(“ArrayClassBased”) do
REP.times { ArrayClassBased.new( $att_array.dup ) }
end
x.report(“Array.new”) do
REP.times { Array.new( $att_array.dup ) }
end
x.report(“StructBased”) do
REP.times { StructBased.new( *( $att_array.dup ) ) }
end
end