Problem using set with hash objects

I’m using set in the ruby standard library to produce collections of
unique objects from enumerable objects with duplicates but it’s
doesn’t appear to work with hash objects.

$ ruby --version
ruby 1.8.5 (2006-12-25 patchlevel 12) [i686-darwin8.9.1]
$ irb
irb(main):001:0> require ‘set’
=> true
irb(main):002:0> a = [1,1,2,3]
=> [1, 1, 2, 3]
irb(main):003:0> b = [{:a1 => “123”}, {:a1 => “123”}, {:b1 => “123”}]
=> [{:a1=>“123”}, {:a1=>“123”}, {:b1=>“123”}]
irb(main):004:0> seta = a.to_set
=> #<Set: {1, 2, 3}>
irb(main):005:0> setb = b.to_set
=> #<Set: {{:a1=>“123”}, {:a1=>“123”}, {:b1=>“123”}}>
irb(main):006:0> b[0] == b[1]
=> true

Am I doing something wrong?

Alle lunedì 17 settembre 2007, Stephen B. ha scritto:

=> [1, 1, 2, 3]
irb(main):003:0> b = [{:a1 => “123”}, {:a1 => “123”}, {:b1 => “123”}]
=> [{:a1=>“123”}, {:a1=>“123”}, {:b1=>“123”}]
irb(main):004:0> seta = a.to_set
=> #<Set: {1, 2, 3}>
irb(main):005:0> setb = b.to_set
=> #<Set: {{:a1=>“123”}, {:a1=>“123”}, {:b1=>“123”}}>
irb(main):006:0> b[0] == b[1]
=> true

Am I doing something wrong?

According to the ri documentation, Set internally stores items in a
hash.
Because of this, it uses the eql? and hash methods, and not ==, to test
objects for equality. Hash#eql? (actually, Kernel#eql?) only returns
true if
two objects are the same object. Since b[0] and b[1] are different
objects,
Set considers them not equal, and thus stores them both. If you put the
same
hash in two places of the array you convert to a set, only one of them
will
be kept:

irb: 001> require ‘set’
true
irb: 002> h = {‘a’ => 1, ‘b’ => 2}
{“a”=>1, “b”=>2}
irb: 003> a = [h, {‘c’ => 3}, h]
[{“a”=>1, “b”=>2}, {“c”=>3}, {“a”=>1, “b”=>2}]
irb: 004> a.to_set.size
2
irb: 005> p a.to_set
#<Set: {{“a”=>1, “b”=>2}, {“c”=>3}}>

Other classes, instead, provide their own definition of eql?, which
leads to
different (often less surprising) results. For instance, Array#eql?
returns
true if the two arrays have the same elements. String do the same.

I hope this helps

Stefano

On 17.09.2007 22:20, Jano S. wrote:

irb(main):002:0> a = [1,1,2,3]
Am I doing something wrong?

Your problem is in hash comparison. Set is internally using hash as
implementation (values are keys in the hash). So in order to obtain
uniqueness of the values, you need to define proper Hash#eq (IIRC).
The default one is comparing object ids - i.e. two totally equivalent
hashes are considered different.

To make things even worse, hash calls dup when creating a new key to
avoid someone else changing the object. So even if you insert the same
object more times, it will be added each time (resp. its copy).

This is only true for Strings - and even then only if the key is not
frozen.

For more information search the archives for something like “hash key
dup”. There were recent (as in last two months) threads that discussed
this.

Yeah, and “Hash as Hash key” is probably another helpful search phrase.
This topic comes up from time to time.

Kind regards

robert

Ribert, Stefano, and Jano, thanks for the pointers.

I’m now using “ara.t.howard” [email protected]’s arrayfields
gem to get hash like access to my data structures stored in arrays.
This combines well with set. Here’s an example:

$ sudo gem install arrayfields

$ cat a.rb
require ‘set’
require ‘arrayfields’
abc = Array.struct :a, :b, :c
a = abc.new [1,2,3] # => [1, 2, 3]
b = abc.new [1,2,3] # => [1, 2, 3]
c = abc.new [4,5,6] # => [4, 5, 6]
p a[:a] # => 1
p c[:a] # => 4
p a1 = [a,b,c] # => [[1, 2, 3], [1, 2, 3], [4, 5, 6]]
p b1 = a1.to_set.to_a # => [[1, 2, 3], [4, 5, 6]]
p b1[0][:a] # => 1
p b1[1][:c] # => 6

$ ruby a.rb
1
4
[[1, 2, 3], [1, 2, 3], [4, 5, 6]]
[[1, 2, 3], [4, 5, 6]]
1
6

On 9/17/07, Stephen B. [email protected] wrote:

=> [1, 1, 2, 3]
irb(main):003:0> b = [{:a1 => “123”}, {:a1 => “123”}, {:b1 => “123”}]
=> [{:a1=>“123”}, {:a1=>“123”}, {:b1=>“123”}]
irb(main):004:0> seta = a.to_set
=> #<Set: {1, 2, 3}>
irb(main):005:0> setb = b.to_set
=> #<Set: {{:a1=>“123”}, {:a1=>“123”}, {:b1=>“123”}}>
irb(main):006:0> b[0] == b[1]
=> true

Am I doing something wrong?

Your problem is in hash comparison. Set is internally using hash as
implementation (values are keys in the hash). So in order to obtain
uniqueness of the values, you need to define proper Hash#eq (IIRC).
The default one is comparing object ids - i.e. two totally equivalent
hashes are considered different.

To make things even worse, hash calls dup when creating a new key to
avoid someone else changing the object. So even if you insert the same
object more times, it will be added each time (resp. its copy).

For more information search the archives for something like “hash key
dup”. There were recent (as in last two months) threads that discussed
this.