A Hash and a Set (sitting in a tree)

Shot_SPiotr_S.S · January 24, 2007, 10:44am

(For those keeping track at home of my Ruby vs. Ruby/C vs. C++
adventures, I decided to go with pure Ruby and optimise afterwards
if/where needed and my supervisor agreed. I also started coding
yesterday, and would like to state that (a) Ruby test-driven development
is even more sexy than advertised and (b) humblelittlerubybook.com is
a really great book.)

On to my problem, then. First, the terminology: a block is a set of
vectors (say, integers), a blanket is a set of blocks, and a blanket
might be, but in most cases isnâ€™t, encoded â€“ i.e., might have its blocks
named. Thus an example, unencoded two-block blanket might look like
this: {{1, 9, 4, 7}, {1, 9, 3, 7}}

My obvious first approach was to create a Block class with two
properties, a @vectors Set and an @encoding Symbol, and a Blanket
class with a @blocks Set.

Now that I think about it, though, most of the computing will happen
with unencoded blankets, and in this case thereâ€™s no need for a Block
class at all; just a Blanket with a Set of Sets. First, this (I presume)
would make things a bit faster and less memory hungry; second, the
encoding makes sense only at the Blanket level, really.

Ufortunately, an encoded blanket screams to be represented as a Hash of
Sets (keyed with the encoding Symbols), while an unencoded blanket is
best represented as a Set of Sets, and canâ€™t be a Hash.

What would be the best approach to this situation? An EncodedBlanket
subclass of Blanket? A Blanket class that is a Hash of Symbols keyed
with Sets (i.e., the other way around, with nil hash values for
unencoded blankets)? Something else? Basically, what I need is
a hash that doesnâ€™t have to have keysâ€¦

– Shot

Shot_SPiotr_S.S · January 24, 2007, 11:50am

On 24.01.2007 10:43, Shot (Piotr S.) wrote:

named. Thus an example, unencoded two-block blanket might look like
encoding makes sense only at the Blanket level, really.

Ufortunately, an encoded blanket screams to be represented as a Hash of
Sets (keyed with the encoding Symbols), while an unencoded blanket is
best represented as a Set of Sets, and can’t be a Hash.

What would be the best approach to this situation? An EncodedBlanket
subclass of Blanket? A Blanket class that is a Hash of Symbols keyed
with Sets (i.e., the other way around, with nil hash values for
unencoded blankets)? Something else? Basically, what I need is
a hash that doesn’t have to have keys…

Initially I would defer considerations of performance. I’d start with
the cleanest design. At the moment I don’t have any clue as to what
processing you are doing so that would probably determine how I would
set up classes. The most straightforward way would probably be to have
a BaseBlanket, an EncodedBlanket, a PlainBlanket (unencoded) and a
Block.

Whether the BaseBlanket makes sense is probably determined by how much
an EncodedBlanket and a PlainBlanket have in common, i.e. how much
functionality they share. That then would go into the BaseBlanket.

Kind regards

robert

Shot_SPiotr_S.S · January 24, 2007, 12:42pm

Thanks a lot for your reply, Robert!

Robert K.:

Initially I would defer considerations of performance. I’d start with
the cleanest design. At the moment I don’t have any clue as to what
processing you are doing so that would probably determine how I would
set up classes.

Mostly basic set operations on the blocks; Set#&, Set#|,
Set#- and Set#^ seem to cover quite a lot of the base.

The most straightforward way would probably be to have a BaseBlanket,
an EncodedBlanket, a PlainBlanket (unencoded) and a Block.

With this approach (i.e., moving the encoding information to the blanket
level, which actually makes sense) Block would be just a Set, nothing
more.

Whether the BaseBlanket makes sense is probably determined by how much
an EncodedBlanket and a PlainBlanket have in common, i.e. how much
functionality they share. That then would go into the BaseBlanket.

In theory, EncodedBlanket is just a Blanket with its blocks named,
so a Blanket class with an EncodedBlanket subclass should cover it.

The catch is, an unencoded blanket can be represented as a Set of
blocks, while and encoded blanket canâ€™t; conversely, an encoded blanket
can be represented as a Hash of (encoding => block) pairs, while an
unencoded blanket canâ€™t.

Maybe I should go with a single Blanket class with an Array of blocks
and a â€˜synchronisedâ€™ Array of encodings that would simply be nil for
unencoded blankets? I donâ€™t need the ordering Array would bring in,
but a common class for un- and encoded blankets would simplify things
greatly.

– Shot

Shot_SPiotr_S.S · January 24, 2007, 1:25pm

On 24.01.2007 12:41, Shot (Piotr S.) wrote:

Set#- and Set#^ seem to cover quite a lot of the base.
And how do you access or find blocks (and blankets)?

The most straightforward way would probably be to have a BaseBlanket,
an EncodedBlanket, a PlainBlanket (unencoded) and a Block.

With this approach (i.e., moving the encoding information to the blanket
level, which actually makes sense) Block would be just a Set, nothing
more.

Well, but with a block you get clear semantics at the price of a low
overhead. The way you speak of blocks seems to indicate to me that
there is more to them which then would make them a good thing to have.
At least it’s much easier to later add methods to Block than changing
the whole code from Set to Blanket if at some point in time you discover
that you need additional methods.

Whether the BaseBlanket makes sense is probably determined by how much
an EncodedBlanket and a PlainBlanket have in common, i.e. how much
functionality they share. That then would go into the BaseBlanket.

In theory, EncodedBlanket is just a Blanket with its blocks named,
so a Blanket class with an EncodedBlanket subclass should cover it.

Maybe.

The catch is, an unencoded blanket can be represented as a Set of
blocks, while and encoded blanket can’t; conversely, an encoded blanket
can be represented as a Hash of (encoding => block) pairs, while an
unencoded blanket can’t.

You should probably try to look at it more from the interface
perspective: you talk a lot about internal representation while I think
it’s more important to get the interface straight. Questions I would as
are: what entities are there? How are they connected? How are they
used? etc. (You could do this on a whiteboard or with CRC cards as
well.)

Maybe I should go with a single Blanket class with an Array of blocks
and a ‘synchronised’ Array of encodings that would simply be nil for
unencoded blankets? I don’t need the ordering Array would bring in,
but a common class for un- and encoded blankets would simplify things
greatly.

In that case it’s you could also have a single Array containing two
element Arrays (Blanket, Encoding). Whether to use that, your version
or a Hash is mostly determined by how you access (i.e. do you need the
encoding as a key?).

Kind regards

robert

Shot_SPiotr_S.S · January 27, 2007, 6:01pm

Robert K.:

Well, but with a block you get clear
semantics at the price of a low overhead.

The catch was, I considered even just a 10% overhead to be worth
engineering the classes in a simpler, if less-obvious way. First,
though, I figured that if test-driven Ruby development was so easy
to learn, maybe benchmarking in Ruby is also easy, and I came with
these: home.pl: Nr 1 w Polsce. Domeny, Hosting, Serwery WWW, Strony, eSklep, Office 365

The results suggest the overhead is negligible (to the point of
SetBlanket being often slower than Blanket), but if you (or somebody
else) could take a quick look whether the test is worth anything I would
be most grateful.

The way you speak of blocks seems to indicate to me that there is more
to them which then would make them a good thing to have. At least it’s
much easier to later add methods to Block than changing the whole code
from Set to Blanket if at some point in time you discover that you
need additional methods.

I agree, and I concede my previous approaches.

Now that I thought of it, Block could simply be a subclass of Set with
an additional instance variable of @encoding. [Googles a bit, finds
a way for a class to call itâ€™s parentâ€™s constructor, rejoices.]
Something to the point of

class Block < Set
def initialize enum = [], encoding = nil
super enum
@encoding = encoding
end
end

Any pitfalls a Ruby newbie should be aware of with such an approach?

You should probably try to look at it more from the interface
perspective: you talk a lot about internal representation while
I think it’s more important to get the interface straight.

The problem is that I need a base library with a Blanket class that
I could experiment on; Iâ€™m not yet sure what the final decomposition
algorithm will look like, I need a Blanket class that I could twist
and bend (almost all the algorithm parts are algebra operations on the
blankets as sets of blocks and blocks as sets of integers).

Questions I would as are: what entities are there?
How are they connected? How are they used? etc.

The blankets are sets of blocks which are sets of integers. Sometimes
the blankets are encoded, i.e., their blocks have labels. I need to be
able to multiply, merge and compare the blankets.

Thanks a lot for your time and your
answers, Robert; I really appreciate it!

– Shot

Shot_SPiotr_S.S · January 27, 2007, 6:27pm

On 27.01.2007 18:00, Shot (Piotr S.) wrote:

Robert K.:

Well, but with a block you get clear
semantics at the price of a low overhead.

The catch was, I considered even just a 10% overhead to be worth
engineering the classes in a simpler, if less-obvious way. First,
though, I figured that if test-driven Ruby development was so easy
to learn, maybe benchmarking in Ruby is also easy, and I came with
these: home.pl: Nr 1 w Polsce. Domeny, Hosting, Serwery WWW, Strony, eSklep, Office 365

Btw, you can more easily create your Arrays of randoms with the block
form:

irb(main):001:0> Array.new(5) { rand 8 }
=> [0, 5, 6, 6, 6]

Much less typing.

end

Any pitfalls a Ruby newbie should be aware of with such an approach?

Hm… It seems generally considered that delegation is preferred over
inheritance with basic classes like Set. You have to be at least aware
that you inherit all methods of Set - this might or might not be what
you want. In this case I cannot see a Set method that you might not
want to inherit but one should be aware of this.

Thanks a lot for your time and your
answers, Robert; I really appreciate it!

You’re welcome!

Kind regards

robert

Shot_SPiotr_S.S · January 27, 2007, 11:33pm

Robert K.:

Btw, you can more easily create your
Arrays of randoms with the block form:

irb(main):001:0> Array.new(5) { rand 8 }
=> [0, 5, 6, 6, 6]

Much less typing.

Thanks! The randomizer was a last-minute addition, in the middle of
writing the previous email, hence the clumsy approach. This being Ruby,
I knew there should be a more elegant way.

Hm… It seems generally considered that delegation is
preferred over inheritance with basic classes like Set.

Iâ€™m still trying to wrap my mind around delegation, but both the RDoc
and Pitchfork One examples seem too clever for me. Jay Fieldsâ€™s Ruby
Delegation Module[1], on the other hand, looks quite obviousâ€¦

[1] http://jayfields.blogspot.com/2006/03/ruby-delegation-module.html

You have to be at least aware that you inherit all methods of Set

this might or might not be what you want. In this case I cannot
see a Set method that you might not want to inherit but one should
be aware of this.

In the case of blocks, I think inheritance doesnâ€™t have any drawbacks.
I might consider delegation (to Set) with the Blanket class, though.
Thanks again for pointing in the right (and/or interesting) directions!

– Shot