Manipulating of an Array of Structs

I have data that I’ve extracted from a database, that I wish to pass to
a calling (external) piece of software through Drb.

The are a large number of data set formats, but an employee based
example might look something like:

ID Name Age
1 Mary Smith 23
3 Frank Zappa 52
19 Mary Jones 41

I’m trying to find a “nice” structure to pass this data around in, so
that it can be easily understood and manipulated.

I’m currently using an Array of Hashes so it looks something like:

[ { :id => 1, :name => “Mary Smith”, :age => 23 }, { :id => 3, :name =>
“Frank Zappa”, :age => 52 }, { :id =>19, :name => “Mary Jones”, :age =>
41 } ]

or more nicely formatted

[
{ :id => 1, :name => “Mary Smith”, :age => 23 },
{ :id => 3, :name => “Frank Zappa”, :age => 52 },
{ :id =>19, :name => “Mary Jones”, :age => 41 }
]

This is fine if I want to extract single pieces of information, but it’s
clumsy to manipulate the whole array, for example, extract out all the
names, sort the array by age, create a new array containing employees
over 40. It’s also inefficient as I’m repeating the labels on every
record.

I realise I can do this by writing block code to process the array, but
I feel there must be a cleaner way to store and manipulate data of this
form in ruby. Something on the lines of Struct would be great to remove
the excessing labels, but Struct doesn’t help with the array
manipulation.

Suggestions would be really welcome, preferabling using standard
classes, but will look at extensions.

thanks,

Anthony W…

On Thu, 8 Apr 2010, Anthony W. wrote:

41 } ]

This is fine if I want to extract single pieces of information, but it’s
clumsy to manipulate the whole array, for example, extract out all the
names, sort the array by age, create a new array containing employees
over 40. It’s also inefficient as I’m repeating the labels on every record.

I realise I can do this by writing block code to process the array, but
I feel there must be a cleaner way to store and manipulate data of this
form in ruby. Something on the lines of Struct would be great to remove
the excessing labels, but Struct doesn’t help with the array manipulation.

Take a gander at Dr Nic’s map_by_method gem, it has this functionality

including sort_by, group_by, and index_by

http://drnicwilliams.com/category/ruby/map_by_method/

Anthony W. wrote:

I have data that I’ve extracted from a database, that I wish to pass to
a calling (external) piece of software through Drb.

The are a large number of data set formats, but an employee based
example might look something like:

ID Name Age
1 Mary Smith 23
3 Frank Zappa 52
19 Mary Jones 41

I’m trying to find a “nice” structure to pass this data around in, so
that it can be easily understood and manipulated.

I’m currently using an Array of Hashes so it looks something like:

[ { :id => 1, :name => “Mary Smith”, :age => 23 }, { :id => 3, :name =>
“Frank Zappa”, :age => 52 }, { :id =>19, :name => “Mary Jones”, :age =>
41 } ]

or more nicely formatted

[
{ :id => 1, :name => “Mary Smith”, :age => 23 },
{ :id => 3, :name => “Frank Zappa”, :age => 52 },
{ :id =>19, :name => “Mary Jones”, :age => 41 }
]

This is fine if I want to extract single pieces of information, but it’s
clumsy to manipulate the whole array, for example, extract out all the
names,

arr.map { |e| e[:name] }

sort the array by age,

arr.sort_by { |e| e[:age] }

create a new array containing employees
over 40.

arr.select { |e| e[:age] > 40 }

It’s also inefficient as I’m repeating the labels on every
record.

Beware of premature optimisation… do what’s simplest or cleanest, and
only when it’s too slow, profile it to work out where the improvement is
needed.

However I think you’ll find it’s pretty efficient as it is. You are
using symbols, so there is only a single object in the system for each
of :name, :age etc. It is the object ID (4 bytes on a 32-bit system)
which is the hash key.

Having said that, it will be unpacked to the sequence of characters by
DRb.

irb(main):002:0> Marshal.dump(:name)
=> “\004\b:\tname”

Using a Hash is slightly more efficient from this point of view than a
Struct.

irb(main):006:0> Emp = Struct.new(:id,:name,:age)
=> Emp
irb(main):007:0> frank = Emp.new(3, “Frank Zappa”, 52)
=> #<struct Emp id=3, name=“Frank Zappa”, age=52>
irb(main):008:0> Marshal.dump(frank)
=> “\004\bS:\bEmp\b:\aidi\b:\tname”\020Frank Zappa:\bagei9"
irb(main):009:0> frank2 = {:id=>3, :name=>“Frank Zappa”, :age=>52}
=> {:name=>“Frank Zappa”, :age=>52, :id=>3}
irb(main):010:0> Marshal.dump(frank2)
=> “\004\b{\b:\tname”\020Frank Zappa:\bagei9:\aidi\b"

You’ll squidge it down a bit by using an Array for each employee instead
of a Hash, but it’s no longer a ‘“nice” structure’ that ‘can be easily
understood and manipulated.’