Subclassing Hash to enforce value uniqueness ala key uniqueness

addywaddy · November 18, 2008, 5:37am

First of all, hello to everyone. This is my first message to this list.

I find myself needing something very much like a Hash, but with unique
one-to-one mapping in both directions, both key-to-value, and
value-to-key. Now, I’ve been looking around a bit on Google, then
Rubyforge, the RAA, and the archives of this list, and I’m a bit
surprised I wasn’t able to find a prior implimentation of this. Time to
impliment my own, right?

Well, before I got too far into the process, I though I would ask for
people’s thoughts here. First off, here is the effect I would like to
see. I’ll call my subclass OneToOne for now, though I’m open to better
names.

oto = OneToOne[:a=>1, :b=>2, :c=>3]
=> {:a=>1, :b=>2, :c=>3} # or on 1.8.x, {:b=>2, :c=>3, :a=>1}

oto[:d] = 1
=> 1

oto
=> {:b=>2,:c=>3,:d=>1}

OneToOne[:a=>1,:b=>1,:c=>1]
=> {:c=>1}
…
etc

Here is some code which begins to implement such a class:

class OneToOne < Hash

def self.
super(*args).invert.invert
end

def initialize(*args)
super(*args)
self.fix
end

def []=(key,val)
self.delete_if {|k,v| v == val}
super(key,val)
end

def store(key,val)
self.delete_if {|k,v| v == val}
super(key,val)
end

def key(val) #this is just a nicety to arrange for code to run in 1.9
and 1.8
if Hash.method_defined?(:key)
super(val)
else
self.index(val)
end
end

def fix
self.replace(self.invert.invert)
end
end

This code works to a certain degree, but I see two problems going
forward. First of all, Hash is a built-in in ruby which is written (as
far as I am aware) entirely in C. As such, there is no ‘core’ method
which can be overridden to affect change in all the methods which build
on it. At least, implementing []= did not give me store, and
implimenting store did not give me []=. Neither gives me merge. How many
other methods will I therefore have to override to ensure value
uniqueness? It seems to me, every method which could modify the contents
of the hash in some way (other than simply removing entries). Even then,
could I guarantee no two keys would have the same value, with the same
level of certainty I have that no two values have the same key?

The second problem is efficiency: It seems to me that this could
probably be done much more efficiently, especially if implemented in C.
It doesn’t affect my much for what I need it for right now, but if I
want to reuse it later (or if someone else wants to use it), it could
matter a great deal.

(Incidentally: searching for information about this has been somewhat
hampered by the fact that the phrase ‘hash value’ is very ambiguous
between the a value contained in a hash and paired with a key, or value
returned by a hashing algorithm such as Object#hash or MD5. It doesn’t
help that most of other terms I can think of to search for, such as
bidirectional or unique or one-to-one, are more common when talking
about the latter than the former. Now I understand why several other
languages choose the term ‘Dictionary’.)

Anyway, if someone has either a) a prior implementation of this, or b)
some suggestions on improving my implimentation, I’d love to hear it.

Thanks for your time,

Adam G.

addywaddy · November 18, 2008, 9:28am

On Nov 17, 11:33 pm, Adam G. [email protected] wrote:

Anyway, if someone has either a) a prior implementation of this, or b)
some suggestions on improving my implimentation, I’d love to hear it.

You may be able to save yourself some time with a little meta
programming. Eg.

class HashSet < Hash

Hash.instance_methods(false).each do |m|
  define_method(m, *a, &b)
    r = super
    fix
    r
  end
end

def fix
  self.replace(self.invert.invert)
end

end

You’ll want to improve upon that code of course, probably limit it to
only certain methods. But you get the idea.

T.

addywaddy · November 18, 2008, 9:41am

Adam G. wrote:

The second problem is efficiency: It seems to me that this could
probably be done much more efficiently, especially if implemented in C.

Every time you add a value you iterate over all the other values to
check
whether the value is already there. This makes adding an element O(n).
Having
adding to a datastructure be an O(n) operation is usually a bad idea.
Here’s
how I’d probably do it (untested):

class OneToOne
def initialize()
@key_value = {}
@value_key = {}
end

def
@key_value[k]
end

def []=(k,v)
if @key_value.has_key?(k)
@value_key.delete(@key_value[k])
end
if @value_key.has_key(?v)
@key_value.delete(@value_key[v])
end
@key_value[k] = v
@value_key[v] = k
end

def to_hash
@key_value
end
def invert
@value_key
end
…
end

HTH,
Sebastian

addywaddy · November 18, 2008, 11:37pm

Every time you add a value you iterate over all the other values to check
whether the value is already there. This makes adding an element O(n).
Having
adding to a datastructure be an O(n) operation is usually a bad idea.
Here’s
how I’d probably do it (untested):

example (elided)

This seems like a good idea, I’d have suggested the same. This is
exactly
what Set in the standard library does to ensure uniqueness, it puts the
members as keys in a hash and lets Ruby sort out the hashtable as
efficiently as possible. So a pair of mutually inverted hash tables is
probably the way to go.

addywaddy · November 18, 2008, 11:10pm

On 18.11.2008 09:37, Sebastian H. wrote:

Adam G. wrote:

The second problem is efficiency: It seems to me that this could
probably be done much more efficiently, especially if implemented in C.

Every time you add a value you iterate over all the other values to check
whether the value is already there. This makes adding an element O(n). Having
adding to a datastructure be an O(n) operation is usually a bad idea. Here’s
how I’d probably do it (untested):

good example

Right, my main point would be: why subclass? You inherit a lot of
methods that you need to watch out for. Rather, as Basti showed,
implement a new class with all the necessary methods.

Kind regards

robert

PS: Sorry for the nicknaming.

addywaddy · November 19, 2008, 8:40am

Sebastian H. wrote:

def to_hash
@key_value
end

This should return @key_value.dup

def invert
@value_key
end

And this should either return @value_key.dup or another instance of
OneToOne
where the both hashes are switched.

HTH,
Sebastian