Revisiting memory/speed of an Array based class

This posting is a follow up to this thread:
http://groups.google.com/group/comp.lang.ruby/browse_thread/thread/a9dd0d1b466909da/5c2a8a38cac4259f?lnk=gst&q=array+based+class#5c2a8a38cac4259f

In case anyone is still interested…

I decided to again pick up my testing of memory usage and speed for
object initialization complete with resident data. I settled on a
fairly pragmatic approach that mimics my experience with very large
data sets where I may have many millions of objects with anywhere from
7 to 20 attributes. While wanting these to fit into memory, I’d also
like to be able to access their attributes like a normal object (i.e.,
relatively transparent implementation). I’ve done some testing with
some possible objects to use for this, and here are the results.

Method: (this is pretty pragmatic)
[ I’ve included the test files below ] I turn off all my applications
and network interfaces. I flush the swap (with swapoff and then
swapon). For one set of tests I disabled garbage collection, for
another I left it on. Then I start creating objects for the class in
question, initializing each with a 7 member array. Every 100 objects
I check to see if we’ve started to write to system swap yet. If we
are, I calculate the total memory we’ve used and the number of objects
to calculate bytes per object and time per object. I also did a small
Benchmark.bmbm test of the creation of 500,000 objects and the times
for object creation correlate almost perfectly (r^2 ~0.998) with the
‘with GC’ results (a sanity check). I repeated the memory tests on
each object (performed in random order though it shouldn’t make a
difference) 4 times each. I also ran the speed test 4 times.

MEMORY TEST

GC disabled (total initial allocation size)

         (bytes/object)

class avg stdev
class1 591.1 0.29
class2 591.8 1.96
Array 360.6 0.06
ArrayClass 360.8 0.12
ArrayFields 555.3 1.02
Struct 452.6 0.32

with GC (cleaned up object size)

class1 296.9 2.27
class2 294.7 3.41
Array 49.5 0.32
ArrayClass 49.1 1.90
ArrayFields 162.5 1.21
Struct 88.6 1.46

SPEED TEST

This is time (microseconds) per object created. ‘speed’ is the
creation of 500,000 objects.
(best viewed with a monospace font)
with GC GC disabled speed
avg stdev avg stdev avg stdev
Array 14.8 0.52 9.3 0.30 1.01 0.01
Struct 26.9 1.60 10.8 0.35 1.49 0.02
ArrayClass 17.2 0.29 11.8 0.28 1.68 0.06
class1 39.5 0.71 13.8 0.48 2.22 0.07
class2 42.1 2.27 14.3 0.66 2.33 0.04
ArrayFields 65.6 1.48 15.8 0.89 2.71 0.07

######################################################

HERE IS THE MEMORY TEST CODE:

######################################################

#!/usr/bin/ruby

require ‘yaml’
require ‘arrayfields’ # 4.5.0!
require ‘array_class’

########################################

CLASSES

########################################

class Class1
def initialize(attribute_array)
(@f1, @f2, @f3, @f4, @f5, @f6, @f7) = attribute_array
end
end

class Class2
def initialize(f1=nil, f2=nil, f3=nil, f4=nil, f5=nil, f6=nil,
f7=nil)
@f1=f1
@f2=f2
@f3=f3
@f4=f4
@f5=f5
@f6=f6
@f7=f7
end
end

StructClass = Struct.new(:f1, :f2, :f3, :f4, :f5, :f6, :f7)

class ArrayBased < Array
end

ArrayFieldsBased = Array.struct :f1, :f2, :f3, :f4, :f5, :f6, :f7

ArrayClassBased =
ArrayClass.new( [:f1, :f2, :f3, :f4, :f5, :f6, :f7] )

########################################

MEMORY TEST

########################################

FUNCTIONS:

returns [memfree, swapcached]

def get_mem_free
hsh = YAML.load_file(‘/proc/meminfo’)
hsh[‘MemFree’].split(/\s+/)[0].to_i
end

def get_swap_cached
hsh = YAML.load_file(‘/proc/meminfo’)
hsh[‘SwapCached’].split(/\s+/)[0].to_i
end

def init_loop(name, &block)
total_memory = get_mem_free
cnt = 0
start_time = Time.now
loop do
block.call
cnt += 1
if get_swap_cached > 0
total_objects = 100*cnt
gmf = get_mem_free
gsc = get_swap_cached
mem_used = total_memory - gmf + gsc
latest = [name, mem_used.to_f/total_objects, (Time.now -
start_time)/total_objects].join(“\t”)
abort latest
end
end
end

def run_it_array(klass)
myarray = [] # only used while GC is on
init_loop(klass.to_s) { 100.times { myarray <<
klass.new( $att_array.dup) } }
end
def run_it_list(klass)
myarray = [] # only used while GC is on
init_loop(klass.to_s) { 100.times { myarray <<
klass.new( *($att_array.dup)) } }
end

TEST:

$att_array = (1…7).to_a

INITIALIZE SWAP

sudo swapoff /dev/sda3
sleep(1)
sudo swapon /dev/sda3
sleep(1)

klass = ARGV.shift

Garbage collection

GC.disable # comment out to keep GC on

klass_to_use = Kernel.const_get(klass.to_sym)

takes_array = [ Class1, Array, ArrayClassBased, ArrayFieldsBased ]
takes_list = [Class2, StructClass]

if takes_array.include?(klass_to_use)
run_it_array(klass_to_use)
elsif takes_list.include?(klass_to_use)
run_it_list(klass_to_use)
end

######################################################

HERE IS THE STANDALONE ‘speed’ test code:

######################################################

#!/usr/bin/ruby

require ‘yaml’
require ‘benchmark’
include Benchmark

require ‘arrayfields’ # 4.5.0!
require ‘array_class’

########################################

CLASSES

########################################

class Class1
def initialize(attribute_array)
(@f1, @f2, @f3, @f4, @f5, @f6, @f7) = attribute_array
end
end

class Class2
def initialize(f1=nil, f2=nil, f3=nil, f4=nil, f5=nil, f6=nil,
f7=nil)
@f1=f1
@f2=f2
@f3=f3
@f4=f4
@f5=f5
@f6=f6
@f7=f7
end
end

StructBased = Struct.new(:f1, :f2, :f3, :f4, :f5, :f6, :f7)

class ArrayBased < Array
end

ArrayFieldsBased = Array.struct :f1, :f2, :f3, :f4, :f5, :f6, :f7

ArrayClassBased =
ArrayClass.new( [:f1, :f2, :f3, :f4, :f5, :f6, :f7] )

########################################

SPEED TEST

########################################

$att_array = (1…7).to_a

REP = 500_000

Benchmark.bmbm do |x|

x.report(“ArrayFieldsBased”) do
REP.times { ArrayFieldsBased.new( $att_array.dup ) }
end

x.report(“class (type 1 init)”) do
REP.times { Class1.new( $att_array.dup ) }
end

x.report(“class (type 2 init)”) do
REP.times { Class2.new( *($att_array.dup) ) }
end

x.report(“ArrayBased”) do
REP.times { ArrayBased.new( $att_array.dup ) }
end

x.report(“ArrayClassBased”) do
REP.times { ArrayClassBased.new( $att_array.dup ) }
end

x.report(“Array.new”) do
REP.times { Array.new( $att_array.dup ) }
end

x.report(“StructBased”) do
REP.times { StructBased.new( *( $att_array.dup ) ) }
end

end

this is array_class.rb

module ArrayClass
def self.new(fields)
nclass = Class.new(Array)

nclass.class_eval('def self.add_member(name)
                     i = @@arr_size
                     self.class_eval("def #{name}() self[#{i}]

end")
self.class_eval(“def #{name}=(val)
self[#{i}]=val end”)
self.class_eval("@@ind[:#{name}] = #{i}")
self.class_eval("@@ind["#{name}"] = #{i}")
self.class_eval("@@attributes << :#{name}")
self.class_eval("@@arr_size =
@@attributes.size")
end’)

# This list is derived from ImageList in ImageMagick (and I've

added some
# applicable to this guy) (I’ve left zip here as it is applicable
and a
# great method)
%w(flatten flatten! assoc join rassoc push pop <<).each do |att|
nclass.class_eval(“undef_method :#{att}”)
end
nclass.class_eval(“undef_method :transpose if
Array.instance_methods(false).include?(‘transpose’)”)
nclass.class_eval("@@ind = {}")
# array to preserve order
nclass.class_eval("@@attributes = []")
nclass.class_eval("@@arr_size = 0")
fields.each_with_index do |field,i|
nclass.add_member(field.to_s)
end
#nclass.class_eval(’@@ind.each do |k,v| @@ind["#{k}"] = v end ')
nclass.class_eval "
def initialize(args=nil)
if args.is_a? Array
super(args)
elsif args.is_a? Hash
super(@@arr_size)
args.each do |k,v|
self[@@ind[k]] = v
end
else
super(@@arr_size)
end
end"

# list of
nclass.class_eval('def self.members() @@attributes end')
nclass.class_eval('def members() @@attributes end')
nclass.class_eval("def is_a?(klass)
                     if klass == Array ; false
                     else ; super(klass)
                     end
                   end
                   alias_method :kind_of?, :is_a?")

nclass.class_eval('def inspect
                     bits = @@attributes.map do |v|
                       val = self.send(v)
                       val = "nil" if val.nil?
                       "#{v}=#{val.inspect}"
                     end
                     string = bits.join(", ")
                     "<(ArrayClass based) #{string}>"
                   end ')
#nclass.class_eval('def self.indices() @@ind end')
#nclass.class_eval('def indices() @@ind end')

# NOTE that two separate objects will hash differently (even if

their
# content is the same!)
nclass.class_eval(‘def hash() object_id end’)
nclass
end
end

this is the weakest link in this guy:

we need some way for the array_class objects not to be flattened on

Array#flatten calls

we basically redefine array_class’s is_a? method to say it is NOT an

array

and redefine flatten to work only for things that say they are

arrays.

this is my best solution so far… any better ideas?

class Array

undef_method :flatten
undef_method :flatten!
#tmp_verb = $VERBOSE
#$VERBOSE = nil

WARNING! Array#flatten is redefined so that it calls the is_a?

method

(instead of doing it internally as defined in the C code). This

allows

ArrayClass derived objects to resist flattening by declaring that

they are

not Arrays. This may be slightly slower than the original, but

in all

other respects should be the same (i.e., will flatten array

derived

classes).

def flatten
new_array = []
self.each do |v|
if v.is_a?(Array)
new_array.push( *(v.flatten) )
else
new_array << v
end
end
new_array
end

WARNING! Array#flatten! is redefined flatten method discussed

above.
def flatten!
self.replace(flatten)
end
#$VERBOSE = tmp_verb
end