Newbie q: stripping duplicates


#1

This is certainly well known, but not to me.

a = [{“aa”=>“bb”},{“aa”=>“bb”}]
=> [{“aa”=>“bb”}, {“aa”=>“bb”}]

a.uniq
=> [{“aa”=>“bb”}, {“aa”=>“bb”}]

Why? and, what should I use instead of .uniq
to remove the duplicate?

Thank you
Piero


#2

Alle Saturday 11 October 2008, removed_email_address@domain.invalid ha scritto:

Why? and, what should I use instead of .uniq
to remove the duplicate?

Thank you
Piero

It works for me (with ruby-1.8.7-p72):

irb(main):005:0> a = [{“aa” => “bb”}, {“aa” => “bb”}]
=> [{“aa”=>“bb”}, {“aa”=>“bb”}]
irb(main):006:0> a.uniq
=> [{“aa”=>“bb”}]

Which version of ruby are you using?

Stefano


#3

I get the same results as Piero on 1.8.6 p114, the most recent built-in
version on Mac OS X 10.5.5 (unless an update was included in the recent
security update that I haven’t yet applied).

a = [{“aa”=>“bb”},{“aa”=>“bb”}]
=> [{“aa”=>“bb”}, {“aa”=>“bb”}]

a.uniq
=> [{“aa”=>“bb”}, {“aa”=>“bb”}]

exit
slapshot:~ cdemyanovich$ ruby -v
ruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0]

Craig


#4

Stefano
$ ruby --version
$ ruby 1.8.6 (2008-03-03 patchlevel 114) [universal-darwin9.0]

I guess I am out of luck?
In case, any 1.8.6 solution?
Piero


#5

removed_email_address@domain.invalid pisze:

1.8.7 and 1.9.x use deep hashing for hashes, to achieve that in 1.8.6
you need to monkey patch Hash: http://pastie.org/pastes/272194

lopex


#6

On Oct 11, 3:59 pm, Marcin Miel¿yñski removed_email_address@domain.invalid wrote:

you need to monkey patch Hash:http://pastie.org/pastes/272194

Monkeypatched. What a shame.

Thanks!
Piero


#7

Thomas B. wrote:

with the three {“aa”=>“bb”} object still reported to be ==.

Yes, but not eql?. In 1.8.6 there were no Hash#hash and Hash#eql?
methods so
it used Object#hash and Object#eql?, which considers two objects equal
only
when they’re actually the same object.

HTH,
Sebastian


#8

Marcin Mielżyński wrote:

1.8.7 and 1.9.x use deep hashing for hashes, to achieve that in 1.8.6
you need to monkey patch Hash: http://pastie.org/pastes/272194

So I believe this is sort of a bug in the old version? Because now I can
even get to this absurd:

irb(main):007:0> z={a[0]=>:x,a[1]=>:y}
=> {{“aa”=>“bb”}=>:x, {“aa”=>“bb”}=>:y} # absurd number 1
irb(main):008:0> z[{“aa”=>“bb”}]
=> nil # absurd number two

with the three {“aa”=>“bb”} object still reported to be ==.

TPR.


#9

Thomas B. pisze:

=> nil # absurd number two

with the three {“aa”=>“bb”} object still reported to be ==.

It’s not an absurd, just a consequence Hash doesn’t have it’s own hash
(just default Object#hash). I agree it’s a bit surprising but deep
hashing is slower by a fair amount.

Comparison function will not even be called here since it fails earlier,
at hash bucket lookup:

a = {“aa”=>“bb”}
b = {“aa”=>“bb”}
a.hash != b.hash

Btw, Hash doesn’t use “==” for object comparison, it uses “eql?”

lopex


#10

Marcin Mielżyński:

1.8.7 and 1.9.x use deep hashing for hashes, to achieve that in
1.8.6 you need to monkey patch Hash: http://pastie.org/pastes/272194

Right, and I use ‘alias eql? ==’ and hand-crafted hash methods
in most of my classes that need to be sane Set elements.

I understand the idea behind the hash method (let’s have a quick way
to check whether two objects are different – and look closely with eql?
only if their hashes are the same), but I wonder whether there are any
rules-of-thumb for finding the sweet spot between making it fast and
making it return different results for different objects often.

For example, is 137 in the above pastie snippet a ‘magic number’ that
it’s good to multiply by? I understand how the bitwise XORs make every
key and value impact the hash, but why not also multiply by 137 between
the key+value iterations?

– Shot