Collections of structured-data objects: what approach?

Folks:

I’m new to Ruby, though not to programming in general. So I’m looking
around
for some of the mechanisms I’m used to finding, and one of them is an
apparatus for Collections (as for example used in C++, OP, VB etc).

I’m hoping that someone can point me in the right direction as to where
this
functionality can be found, or failing that suggest the most
advantageous
starting point to create it based on more primitive classes.

Main features:

  1. The contained objects are user-defined types having multiple fields
    (data
    members). This in itself seems no problem for Ruby.

  2. Order/Retrieve by index. The collection should stay ordered by
    insertion
    order and support retrieval by integer index. (Sorted collection is a
    separate issue, of course.)

  3. Insert/Append/Remove/Delete. (List behavior) Allow appending (at end)
    or
    inserting at arbitrary location, and removal or deletion of arbitrary
    members.

  4. Find by key: (Dictionary behavior etc) Ability to retrieve member
    objects
    by supplying a value to match against one of the object’s fields. Often
    this
    is simple a Name field on each object, which functions as a key, but at
    other times one might want lookup based on some other field or fields.

So far I’ve read plenty that seems related: arrays, hashes, Enumerable,
and
several related chapters in PickAxe and Ruby for Rails. Although
“collections” are mentioned in many of these sources, I’ve not yet hit
on a
treatment of the ruby way for the Collection basics described above (eg:
R
on R’s Collection chapter’s description of “insert” is simply replacing
the
Nth element of an array).

Needless to say, I’d much appreciate comments illuminating this area!

Thanks

Graham


Graham W.
Microsoft Visio MVP

Graham W. wrote:

Folks:

I’m new to Ruby, though not to programming in general. So I’m looking around
for some of the mechanisms I’m used to finding, and one of them is an
apparatus for Collections (as for example used in C++, OP, VB etc).

I’m hoping that someone can point me in the right direction as to where this
functionality can be found, or failing that suggest the most advantageous
starting point to create it based on more primitive classes.
You can get away with just using an Array for almost all of this.

  1. Find by key: (Dictionary behavior etc) Ability to retrieve member objects
    by supplying a value to match against one of the object’s fields. Often this
    is simple a Name field on each object, which functions as a key, but at
    other times one might want lookup based on some other field or fields.
    None of these are a problem for Array:

a = [{:a => 1, :b => 2}, {:c => 4, :d => 5}]
a[1] # => {:c=>4, :d=>5}
a << {:e => 6, :f => 7}
# => [{:b=>2, :a=>1}, {:c=>4, :d=>5}, {:e=>6, :f=>7}]
a.select {|x| x[:c] == 4} # => [{:c=>4, :d=>5}]

b = [1,2,3,4]
b.insert(2,5) # => [1, 2, 5, 3, 4]

So far I’ve read plenty that seems related: arrays, hashes, Enumerable, and
several related chapters in PickAxe and Ruby for Rails. Although
“collections” are mentioned in many of these sources, I’ve not yet hit on a
treatment of the ruby way for the Collection basics described above (eg: R
on R’s Collection chapter’s description of “insert” is simply replacing the
Nth element of an array).
If so, that’s simply wrong. Array#insert(n, element) makes the inserted
element the n’th and pushes everything after it along one.

On 25.08.2006 07:59, Graham W. wrote:

Main features:
members.
on R’s Collection chapter’s description of “insert” is simply replacing the
Nth element of an array).

Needless to say, I’d much appreciate comments illuminating this area!

For the sake of discussion I present a different approach. As long as
you don’t need 4 (lookup by key) an Array is perfect. However, with the
lookup IMHO a new data structure is preferred because then consistency
can be handled internally. I’ll attach a half grown quick hack to
demonstrate what I mean.

Kind regards

robert

Graham W. wrote:

And I’d add Array#delete

(… though I’m not yet clear on the sematics. Does this simply remove the
object from the array, or also delete it’s data? Ie: if there’s another
reference to this object, is the data still there? To be tested…)
a = “foo”
b = [a, “bar”, “qux”, a]
b.delete(a)
b # => [“bar”, “qux”]
a # => “foo”

So no, Array#delete doesn’t delete the data itself, only the reference
to it. You may also want:

b.delete_at(0)
b # => [“qux”]

> Now that you've pointed out that there's an insert method called, ahem, > "insert", I see that it actually IS the array class that handles Collection > duty, and I can focus there for some more serious RTFM-ing. Array is surprisingly powerful, especially when you take the inject method into account. It's a real workhorse.

Thanks again for the quick clarification!
No worries :slight_smile:

Robert:

Question and then a comment:

Question: What does the keyword “self” do when sitting on a line by
itself,
as in several of your methods?

Comment:

For the sake of discussion I present a different approach. As long as
you don’t need 4 (lookup by key) an Array is perfect. However, with the
lookup IMHO a new data structure is preferred because then consistency
can be handled internally. I’ll attach a half grown quick hack to
demonstrate what I mean.

If I’m reading you correctly, you are saying a bit more than this:

  1. For lookup (or for that matter if one want to implement a
    SortedCollection), then it’s desirable to have an index for the array on
    the
    key field(s), which can be a ruby hash. (Ie: one could use
    Enumerable#find
    or find_all, but on larger collections that’s slow?)

  2. But if you are going to implement an index using a helper hash, then
    you
    want a way to keep the hash up-to-date as items are inserted or deleted
    from
    the array.

  3. So you advocate a class to wrap these together so that “consistency
    can
    be handled internally”.

Thanks for your reply,

Graham


Graham W.
Microsoft Visio MVP

Alex:

So no, Array#delete doesn’t delete the data itself, only the reference to
it. You may also want:

b.delete_at(0)
b # => [“qux”]
[…]
Array is surprisingly powerful, especially when you take the inject method
into account. It’s a real workhorse.

Thanks,

Graham

Alex:

Ah-ha! Thanks for getting me on the right track.

Couple of points (for others following along…)

You can get away with just using an Array for almost all of this.

a = [{:a => 1, :b => 2}, {:c => 4, :d => 5}]
a[1] # => {:c=>4, :d=>5}
a << {:e => 6, :f => 7}

=> [{:b=>2, :a=>1}, {:c=>4, :d=>5}, {:e=>6, :f=>7}]

a.select {|x| x[:c] == 4} # => [{:c=>4, :d=>5}]

b = [1,2,3,4]
b.insert(2,5) # => [1, 2, 5, 3, 4]

Now THAT’s the set of examples I needed, and that should be in the major
Ruby docs.

And I’d add Array#delete

(… though I’m not yet clear on the sematics. Does this simply remove
the
object from the array, or also delete it’s data? Ie: if there’s another
reference to this object, is the data still there? To be tested…)

[…Ruby for Rail’s Collection chapter’s description of “insert”…]
If so, that’s simply wrong.

Yep, so it seems!

PickAxe’s “Containers” etc chapter manages to get us all the way to
lambda
functions and closures, but its “SongList” example is too simple to be
as
useful as your examples above.

Meanwhile, Ruby for Rails’ “Collections, containers” etc chapter says
the
following (emphasis mine):


11.2.2 Inserting, retrieving and removing array elements
Because an array is an ordered collection, any object you add to the
array
goes either at the beginning, at the end, or somewhere in the middle.
The
most general technique for INSERTING one or more items into an array is
the
setter method []= (square brackets…

… and goes on to give “insert” examples which do not insert, they
simply
assign to an array slot. (And there’s no mention of the insert method).
So
I had written off array as being the complete answer, since its supposed
insert syntax “obviously” didn’t really know how to insert.

Now that you’ve pointed out that there’s an insert method called, ahem,
“insert”, I see that it actually IS the array class that handles
Collection
duty, and I can focus there for some more serious RTFM-ing.

Thanks again for the quick clarification!

Graham

Graham W.
Microsoft Visio MVP

Graham W. wrote:

Robert:

Question and then a comment:

Question: What does the keyword “self” do when sitting on a line by itself,
as in several of your methods?

Simen answered that one. As an additional note, traditionally #each
returns self. Also, by doing this you can chain methods.

  1. For lookup (or for that matter if one want to implement a
    SortedCollection), then it’s desirable to have an index for the array on the
    key field(s), which can be a ruby hash. (Ie: one could use Enumerable#find
    or find_all, but on larger collections that’s slow?)

find is O(n) which can seriously hurt your applications performance if
the Array grows beyond a certain limit. Hash lookup on the other hand
is O(1).

  1. But if you are going to implement an index using a helper hash, then you
    want a way to keep the hash up-to-date as items are inserted or deleted from
    the array.

  2. So you advocate a class to wrap these together so that “consistency can
    be handled internally”.

Right. You got it. That’s what OO is about.

Kind regards

robert

Simen:

Thanks for your reply…

Now THAT’s the set of examples I needed, and that should be in the major
Ruby docs.

It is in the main Ruby docs. It’s spread between the methods in use,
because we couldn’t anticipate every single combination of methods you
might need to use and document them all in one place.

I was more commenting on the major ruby docs like pickaxe, Ruby for
Rails,
FAQs and so on. While some mention collections, this mostly just
introduces
basic features of hashes and arrays. None of the discussion that I’ve
run
across so far actually puts it all together into a comprehensive
discussion
of how the collections functionality familiar from other languages
(Java, C#, C++, OP, VB etc) can be acquired from array and hash plus
whatever.

Yes, I agree that all the features and methods are documented in the
language guide (eg: in pickaxe), but one needs to know which set of
those
features are to be assembled to create the familiar collection
functionality, in the absence of a big flashing sign on an actual
Collection
class. (Especially bearing in mind that other languages/libraries go to
some
pains to distinguish the concepts of Array and Collection – not to say
that
any particular other language is the most righteous).

Anyhow, for my part I’m a great deal further along than I was a couple
of
days ago, thanks in part to helpful responses here and elsewhere.

Thanks all!

Graham

On 8/25/06, Graham W. [email protected]
wrote:

Question: What does the keyword “self” do when sitting on a line by itself,
as in several of your methods?

Ruby returns the last value of a method. self on a line by itself
returns self.

Graham W. wrote:

Now THAT’s the set of examples I needed, and that should be in the major
Ruby docs.

It is in the main Ruby docs. It’s spread between the methods in use,
because we couldn’t anticipate every single combination of methods you
might need to use and document them all in one place.

Graham W. wrote:

Robert:

Thanks, and…

You’re welcome!

Right. You got it. That’s what OO is about.

Well, I already had the OO part, just needed the ruby version of it :slight_smile: plus
connecting the dots might help others stumbling along this way, and you
never know it might turn out that your “consistency” comment was referring
to something else I didn’t see.

I probably should have put “encapsulation” somewhere in that posting.
:slight_smile: Anyway, I’m glad I could help sort these things out.

Anyhow, this discussion has been most helpful to alert me to all the
capabilities that Array has, and calibrating my sense that a little bit of
assembly required to implement collections with various features, they’re
not just in some library that “of course” everyone knows about and uses :slight_smile:

Yeah, the reason is probably that Array and Hash are so powerful on the
one hand and that requirements are so different on the other hand. Java
follows a tad different approach by providing collections for usual
situations.

Kind regards

robert

Robert:

Thanks, and…

Right. You got it. That’s what OO is about.

Well, I already had the OO part, just needed the ruby version of it :slight_smile:
plus
connecting the dots might help others stumbling along this way, and you
never know it might turn out that your “consistency” comment was
referring
to something else I didn’t see.

Anyhow, this discussion has been most helpful to alert me to all the
capabilities that Array has, and calibrating my sense that a little bit
of
assembly required to implement collections with various features,
they’re
not just in some library that “of course” everyone knows about and uses
:slight_smile:

Graham

Hi –

On Fri, 25 Aug 2006, Graham W. wrote:

[…Ruby for Rail’s Collection chapter’s description of “insert”…]
If so, that’s simply wrong.

Yep, so it seems!

Actually I don’t talk about Array#insert, so my description of it
can’t be wrong :slight_smile:

goes either at the beginning, at the end, or somewhere in the middle. The
most general technique for INSERTING one or more items into an array is the
setter method []= (square brackets…

… and goes on to give “insert” examples which do not insert, they simply
assign to an array slot. (And there’s no mention of the insert method). So
I had written off array as being the complete answer, since its supposed
insert syntax “obviously” didn’t really know how to insert.

Yeah, I tend to use the term “insert” sort of generically; and the
insert method is among the many methods that didn’t make the “for
Rails” cut. (Keep in mind that R4R isn’t a complete Ruby core method
reference.)

But I can see that using a method name as an informal way to describe
what another method does might be awkward.

David

David:

Actually I don’t talk about Array#insert, so my description of it
can’t be wrong :slight_smile:

… and for balance, I should say that I am enjoying other parts of
the
book :slight_smile:

Graham