Forum: Ruby Subclassing Hash to enforce value uniqueness ala key uniqueness.

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Adam G. (Guest)
on 2008-11-18 06:37
(Received via mailing list)
First of all, hello to everyone. This is my first message to this list.

I find myself needing something very much like a Hash, but with unique
one-to-one mapping in both directions, both key-to-value, and
value-to-key. Now, I've been looking around a bit on Google, then
Rubyforge, the RAA, and the archives of this list, and I'm a bit
surprised I wasn't able to find a prior implimentation of this. Time to
impliment my own, right?

Well, before I got too far into the process, I though I would ask for
people's thoughts here. First off, here is the effect I would like to
see. I'll call my subclass OneToOne for now, though I'm open to better
names.

 >> oto = OneToOne[:a=>1, :b=>2, :c=>3]
=> {:a=>1, :b=>2, :c=>3}  # or on 1.8.x, {:b=>2, :c=>3,  :a=>1}
 >> oto[:d] = 1
=> 1
 >> oto
=> {:b=>2,:c=>3,:d=>1}
 >> OneToOne[:a=>1,:b=>1,:c=>1]
=> {:c=>1}
...
etc

Here is some code which begins to implement such a class:

class OneToOne < Hash

  def self.[](*args)
    super(*args).invert.invert
  end

  def initialize(*args)
    super(*args)
    self.fix
  end

  def []=(key,val)
    self.delete_if {|k,v| v == val}
    super(key,val)
  end

  def store(key,val)
    self.delete_if {|k,v| v == val}
    super(key,val)
  end

  def key(val) #this is just a nicety to arrange for code to run in 1.9
and 1.8
    if Hash.method_defined?(:key)
      super(val)
    else
      self.index(val)
    end
  end

  def fix
    self.replace(self.invert.invert)
  end
end

This code works to a certain degree, but I see two problems going
forward. First of all, Hash is a built-in in ruby which is written (as
far as I am aware) entirely in C. As such, there is no 'core' method
which can be overridden to affect change in all the methods which build
on it. At least, implementing []= did not give me store, and
implimenting store did not give me []=. Neither gives me merge. How many
other methods will I therefore have to override to ensure value
uniqueness? It seems to me, every method which could modify the contents
of the hash in some way (other than simply removing entries). Even then,
could I guarantee no two keys would have the same value, with the same
level of certainty I have that no two values have the same key?

The second problem is efficiency: It seems to me that this could
probably be done much more efficiently, especially if implemented in C.
It doesn't affect my much for what I need it for right now, but if I
want to reuse it later (or if someone else wants to use it), it could
matter a great deal.

(Incidentally: searching for information about this has been somewhat
hampered by the fact that the phrase 'hash value' is very ambiguous
between the a value contained in a hash and paired with a key, or value
returned by a hashing algorithm such as Object#hash or MD5. It doesn't
help that most of other terms I can think of to search for, such as
bidirectional or unique or one-to-one, are more common when talking
about the latter than the former. Now I understand why several other
languages choose the term 'Dictionary'.)

Anyway, if someone has either a) a prior implementation of this, or b)
some suggestions on improving my implimentation, I'd love to hear it.

Thanks for your time,
- Adam G.
Thomas S. (Guest)
on 2008-11-18 10:28
(Received via mailing list)
On Nov 17, 11:33 pm, Adam G. <removed_email_address@domain.invalid> wrote:
> Anyway, if someone has either a) a prior implementation of this, or b)
> some suggestions on improving my implimentation, I'd love to hear it.

You may be able to save yourself some time with a little meta
programming. Eg.

  class HashSet < Hash

    Hash.instance_methods(false).each do |m|
      define_method(m, *a, &b)
        r = super
        fix
        r
      end
    end

    def fix
      self.replace(self.invert.invert)
    end

  end

You'll want to improve upon that code of course, probably limit it to
only certain methods. But you get the idea.

T.
Sebastian H. (Guest)
on 2008-11-18 10:41
(Received via mailing list)
Adam G. wrote:
> The second problem is efficiency: It seems to me that this could
> probably be done much more efficiently, especially if implemented in C.

Every time you add a value you iterate over all the other values to
check
whether the value is already there. This makes adding an element O(n).
Having
adding to a datastructure be an O(n) operation is usually a bad idea.
Here's
how I'd probably do it (untested):

class OneToOne
  def initialize()
    @key_value = {}
    @value_key = {}
  end

  def [](k)
    @key_value[k]
  end

  def []=(k,v)
    if @key_value.has_key?(k)
      @value_key.delete(@key_value[k])
    end
    if @value_key.has_key(?v)
      @key_value.delete(@value_key[v])
    end
    @key_value[k] = v
    @value_key[v] = k
  end

  def to_hash
    @key_value
  end
  def invert
    @value_key
  end
  ...
end


HTH,
Sebastian
Robert K. (Guest)
on 2008-11-19 00:10
(Received via mailing list)
On 18.11.2008 09:37, Sebastian H. wrote:
> Adam G. wrote:
>> The second problem is efficiency: It seems to me that this could
>> probably be done much more efficiently, especially if implemented in C.
>
> Every time you add a value you iterate over all the other values to check
> whether the value is already there. This makes adding an element O(n). Having
> adding to a datastructure be an O(n) operation is usually a bad idea. Here's
> how I'd probably do it (untested):

<snip>good example</snip>

Right, my main point would be: why subclass?  You inherit a lot of
methods that you need to watch out for.  Rather, as Basti showed,
implement a new class with all the necessary methods.

Kind regards

  robert


PS: Sorry for the nicknaming. :-)
James C. (Guest)
on 2008-11-19 00:37
(Received via mailing list)
>
> Every time you add a value you iterate over all the other values to check
> whether the value is already there. This makes adding an element O(n).
> Having
> adding to a datastructure be an O(n) operation is usually a bad idea.
> Here's
> how I'd probably do it (untested):
>
> # example (elided)



This seems like a good idea, I'd have suggested the same. This is
exactly
what Set in the standard library does to ensure uniqueness, it puts the
members as keys in a hash and lets Ruby sort out the hashtable as
efficiently as possible. So a pair of mutually inverted hash tables is
probably the way to go.
Sebastian H. (Guest)
on 2008-11-19 09:40
(Received via mailing list)
Sebastian H. wrote:
>   def to_hash
>     @key_value
>   end

This should return @key_value.dup

>   def invert
>     @value_key
>   end

And this should either return @value_key.dup or another instance of
OneToOne
where the both hashes are switched.

HTH,
Sebastian
This topic is locked and can not be replied to.