Hello List,
I have really just started dabbling in Ruby and have found my first
little task that I can use to try out some of what I’ve read.
Basically I’m reading in a CSV file and converting it into a proprietry
XML format. To facilitate in the XML generation, I store the items in a
Hash of Hashes of Arrays. Where the first level of hashing is based on
the type of data item, the second hash is based on the item’s parent,
and the Array carries all the items for a specific parent.
eg. hash=Hash[“class1” => {“parent1” =>
[class1(“a”),class1(“b”),class1(“c”)], “parent2” =>
[class1(“e”),class1(“f”),class1(“g”)], …}, “class2” => […], …]
The code below does this in a somewhat clumsy way using Ruby. I find
that when running this on 10 000 lines -> execution time is 4.5 seconds,
however if I increase to 300 000 lines (30 times the data) -> execution
time is 14 min 21 sec (168 times the execution time). Is this nonlinear
scaling due to allocating memory to hold the data? Could I improve this
using symbols? Any other general remarks would be appreciated.
csv_file.each_line do |line| #FORMAT:
PARENT1,PARENT2,PARENT3,ITEM,ITEMR,ITEMTYPE,SUBITEM,ITEM_LAYER,PNAME,PVALUE,LEVEL
sl=line.split(",")
parent1=sl[0]; parent2=sl[1]; parent3=sl[2]; item=sl[3];
itemr=sl[4]; itemtype=sl[5]; subitem=sl[6]; item_layer=sl[7];
pname=sl[8]; pvalue=sl[9]; level=sl[10]
if parent1 == “NONE” && parent2 != “NONE”
parent_type=“PARENT2”
node=parent2
elsif parent1 != “NONE” && parent2 == “NONE”
parent_type=“PARENT1”
node=parent1
end
i = case level
when “ITEM” then Item.new(node,item)
when “SUBITEM” then Subitem.new(node,item,subitem)
when “ITEM_LAYER” then
Item_layer.new(node,item,subitem,item_layer)
when “RELATION” then Relation.new(node,item,itemr)
when “PARENT3” then Parent3.new(node,parent3)
when “PARENT2” then Parent2.new(parent2)
when “PARENT1” then Parent1.new(parent1)
end
if ! i.nil?
hash[(i.class)]=Hash.new if hash[(i.class)].nil?
hash[(i.class)][(i.parent)]=Array.new if
hash[(i.class)][(i.parent)].nil?
ind = hash[(i.class)][(i.parent)].index(i)
if ind.nil?
i.addSetting( Setting.new( pname, pvalue ) )
hash[(i.class)][(i.parent)].push(i)
else
(hash[(i.class)][(i.parent)])[ind].addSetting(
Setting.new( pname, pvalue ) )
end
end
end
Thanks in advance,
Jeremy