Hash#collate

I wanted a method like Hash#update, but that preserved the values from
both the original and argument Hash. A little searching failed to find
it. (I did find that someone somewhere wrote a Hash#collate that’s in
my ri docs, but who knows where it came from. Its description appears
not to do at all what I wanted, anyhow.)

So, I wrote my own. Comments welcome. Efficiency patches particularly
welcome. Under a different name, perhaps Trans might consider it for
inclusion in Facets.

class Hash

Merge the values of this hash with those from another, setting all

values

to be arrays representing the values from both hashes.

{ :a=>1, :b=>2 }.collate :a=>3, :b=>4, :c=>5

#=> { :a=>[1,3], :b=>[2,4], :c=>[5] }

The ‘uniq’ option allows you to ensure all values are unique:

{ :a=>1, :b=>2 }.collate( { :a=>1, :b=>3 }, :uniq=>true )

#=> { :a=>[1], :b=>[2,3] }

By default, array values in either side are merged:

foo = { :a=>[1,2], :b=>[3] }

bar = { :a=>[4,5], :c=>[6,7] }

foo.collate( bar )

#=> { :a=>[1,2,4,5], :b=>[3], :c=>[6,7] }

Use the ‘preserve_arrays’ option to prevent them from being

merged:

foo = { :a=>[1,2], :b=>[3] }

bar = { :a=>[4,5], :c=>[6,7] }

foo.collate( bar, :preserve_arrays=>true )

#=> { :a=>[[1,2],[4,5]], :b=>[[3]], :c=>[[6,7]] }

Note that, as shown above, preserving arrays will cause array

values

to be wrapped up in another array.

def collate( other_hash, options={} )
dup.collate!( other_hash, options )
end

The same as #collate, but modifies the receiver in place.

def collate!( other_hash, options={} )
# Prepare, ensuring every existing key is already an Array
each{ |key, value|
if value.is_a?( Array ) && !options[ :preserve_arrays ]
self[key] = value
else
self[key] = [ value ]
end
}

# Collate with values from other_hash
other_hash.each{ |key, value|
  if self[ key ]
    if value.is_a?( Array ) && !options[ :preserve_arrays ]
      self[ key ].concat( value )
    else
      self[ key ] << value
    end
  elsif value.is_a?( Array ) && !options[ :preserve_arrays ]
    self[ key ] = value
  else
    self[ key ] = [ value ]
  end
}

each{ |key, value| value.uniq! } if options[ :uniq ]

self

end
end

if FILE == $0
require ‘test/unit’
class TestHashCollation < Test::Unit::TestCase
def setup
$a = { :a=>1, :b=>2, :z=>26, :all=>%w|a b z|, :stuff1=>%w|foo
bar|, :whee=>%w|a b| }
$b = { :a=>1, :b=>4, :c=>9, :all=>%w|a b c|, :stuff2=>%w|jim
jam|, :whee=>%w|a b| }
$c = { :a=>1, :b=>8, :c=>27 }
end
def test1_defaults
collated = $a.collate( $b )
assert_equal( 8, collated.keys.length, “There are 7 unique
keys” )
assert_equal( [1,1], collated[ :a ] )
assert_equal( [2,4], collated[ :b ] )
assert_equal( [9], collated[ :c ] )
assert_equal( [26], collated[ :z ] )
assert_equal( %w|a b z a b c|, collated[ :all ], “Arrays are
merged by default.” )
assert_equal( %w|foo bar|, collated[ :stuff1 ] )
assert_equal( %w|jim jam|, collated[ :stuff2 ] )
assert_equal( %w|a b a b|, collated[ :whee ] )
end
def test2_uniq
collated = $a.collate( $b, :uniq=>true )
assert_equal( 8, collated.keys.length, “There are 7 unique
keys” )
assert_equal( [1], collated[ :a ] )
assert_equal( [2,4], collated[ :b ] )
assert_equal( [9], collated[ :c ] )
assert_equal( [26], collated[ :z ] )
assert_equal( %w|a b z c|, collated[ :all ], “Arrays are merged
by default.” )
assert_equal( %w|foo bar|, collated[ :stuff1 ] )
assert_equal( %w|jim jam|, collated[ :stuff2 ] )
assert_equal( %w|a b|, collated[ :whee ] )
end
def test3_preserve_arrays
collated = $a.collate( $b, :preserve_arrays=>true )
assert_equal( 8, collated.keys.length, “There are 7 unique
keys” )
assert_equal( [1,1], collated[ :a ] )
assert_equal( [2,4], collated[ :b ] )
assert_equal( [9], collated[ :c ] )
assert_equal( [26], collated[ :z ] )
assert_equal( [ %w|a b z|, %w|a b c|], collated[ :all ], “Two
arrays are not merged.” )
assert_equal( [%w|foo bar|], collated[ :stuff1 ],
“Arrays unique to one side are wrapped” )
assert_equal( [%w|jim jam|], collated[ :stuff2 ],
“Arrays unique to one side are wrapped” )
assert_equal( [%w|a b|, %w|a b|], collated[ :whee ] )
end
def test4_preserve_and_uniq
collated = $a.collate( $b, :preserve_arrays=>true, :uniq=>true )
assert_equal( 8, collated.keys.length, “There are 7 unique
keys” )
assert_equal( [1], collated[ :a ] )
assert_equal( [2,4], collated[ :b ] )
assert_equal( [9], collated[ :c ] )
assert_equal( [26], collated[ :z ] )
assert_equal( [ %w|a b z|, %w|a b c|], collated[ :all ], “Two
arrays are not merged.” )
assert_equal( [%w|foo bar|], collated[ :stuff1 ],
“Arrays unique to one side are wrapped” )
assert_equal( [%w|jim jam|], collated[ :stuff2 ],
“Arrays unique to one side are wrapped” )
assert_equal( [%w|a b|], collated[ :whee ], “Preserve arrays +
uniq == duplicate arrays are removed” )
end
def test5_multi_collate
collated = $a.collate( $b ).collate( $c )
assert_equal( [1,1,1], collated[ :a ] )
assert_equal( [2,4,8], collated[ :b ] )
assert_equal( [9,27], collated[ :c ] )
end
def test6_multi_collate_with_preserve
collated = $a.collate( $b, :preserve_arrays=>1 ).collate( $c )
assert_equal( [1,1,1], collated[ :a ] )
assert_equal( [2,4,8], collated[ :b ] )
assert_equal( [9,27], collated[ :c ] )

  collated = $a.collate( $b ).collate( $c, :preserve_arrays=>1  )
  assert_equal( [[1,1],1], collated[ :a ] )
  assert_equal( [[2,4],8], collated[ :b ] )
  assert_equal( [[9],27],  collated[ :c ] )

  collated =

$a.collate( $b, :preserve_arrays=>1 ).collate( $c, :preserve_arrays=>1 )
assert_equal( [[1,1],1], collated[ :a ] )
assert_equal( [[2,4],8], collated[ :b ] )
assert_equal( [[9],27], collated[ :c ] )
end
end
end

On Dec 18, 5:01 pm, Phrogz [email protected] wrote:

I wanted a method like Hash#update, but that preserved the values from
both the original and argument Hash. A little searching failed to find
it. (I did find that someone somewhere wrote a Hash#collate that’s in
my ri docs, but who knows where it came from. Its description appears
not to do at all what I wanted, anyhow.)

So, I wrote my own. Comments welcome. Efficiency patches particularly
welcome. Under a different name, perhaps Trans might consider it for
inclusion in Facets.

Please find properly-formatted code @ http://pastie.caboo.se/130291
Sorry for the extra noise.

Phrogz wrote:

{ :a=>1, :b=>2 }.collate :a=>3, :b=>4, :c=>5

#=> { :a=>[1,3], :b=>[2,4], :c=>[5] }

Do these two give the same result? Does it matter?

{ :a=>1, :b=>2 }.collate :a=>3, :b=>4, :c=>5
{ :a=>1, :b=>2, :c=>5 }.collate :a=>3, :b=>4

On Dec 18, 5:16 pm, Joel VanderWerf [email protected] wrote:

Phrogz wrote:

{ :a=>1, :b=>2 }.collate :a=>3, :b=>4, :c=>5

#=> { :a=>[1,3], :b=>[2,4], :c=>[5] }

Do these two give the same result? Does it matter?

{ :a=>1, :b=>2 }.collate :a=>3, :b=>4, :c=>5
{ :a=>1, :b=>2, :c=>5 }.collate :a=>3, :b=>4

They don’t. In my particular use case today, I only used the result as
a set, so a proper Set might have been more appropriate. But I don’t
know; I think that preserving the order is probably useful, at least
when not using the #uniq option. (I’m thinking perhaps of a case where
you’re specifying a series of fallback results for a variety of
options.)

Totally up for grabs, though, if there’s a faster, more elegant
solution that doesn’t use that.

On Dec 18, 7:05 pm, Phrogz [email protected] wrote:

I wanted a method like Hash#update, but that preserved the values from
both the original and argument Hash. A little searching failed to find
it. (I did find that someone somewhere wrote a Hash#collate that’s in
my ri docs, but who knows where it came from. Its description appears
not to do at all what I wanted, anyhow.)

That’s from Facets, probably. But the latest version of Facets renamed
it to #mash, for “map hash”, which is more descriptive of what it
does. (#collate remains an alias for the time being).

I like your definition --actually I’m surprised I haven’t worked this
functionality into Facets yet. I guess I thought #weave took care of
it, but that’s slightly different b/c it only combines arrays if the
value is already an array. So I’m going to add this to Facets. A
couple thoughts though…

The options don’t feel quite right. Maybe it would more versatile to
define #uniq on Hash? So then

{ :a=>1, :b=>2 }.collate( { :a=>1, :b=>3 } ).uniq
#=> { :a=>[1], :b=>[2,3] }

As for preserving the arrays, I’m not sure. Is that really all that
useful? Well, if it is it seems like a better definition for Hash#zip.

T.

On Dec 18, 7:49 pm, Phrogz [email protected] wrote:

not to do at all what I wanted, anyhow.)

up and need to be preserved.

I would dearly love to get rid of the options hash altogether,
though. :slight_smile:

One alternative would be to drop the idea of preserving collation
order altogether, and instead accumulate the results as a Set.
Although the method would still need to branch on value type (since
set1 << set2 isn’t the same as set1.merge set2), it seems far less
likely that someone would have a Hash whose values were Sets and
wanted to maintain each set as a distinct ‘value’ during collation.

On Dec 18, 6:29 pm, Trans [email protected] wrote:

does. (#collate remains an alias for the time being).
{ :a=>1, :b=>2 }.collate( { :a=>1, :b=>3 } ).uniq
#=> { :a=>[1], :b=>[2,3] }

That’s an excellent point. I needed this functionality today and so I
included it in the script; however, since it’s a simple one-line (as
seen in the implementation) post-process step, perhaps it’s
appropriate to keep it out of this method.

As for preserving the arrays, I’m not sure. Is that really all that
useful? Well, if it is it seems like a better definition for Hash#zip.

The reason I made the arrays not be preserved by default is to enable
chained collation of 3 or more hashes. (test5_multicollate in the unit
tests.) I was actually collating hundreds today. However, I put in the
‘preserve arrays’ because it seemed almost arbitrary to treat them
differently from every other type of value. I don’t personally have a
use case that needs it now, but I know from experience (like #flatten
versus #flatten_once) how sometimes arrays of arrays can suddenly crop
up and need to be preserved.

I would dearly love to get rid of the options hash altogether,
though. :slight_smile:

On Dec 19, 9:32 am, Nobuyoshi N. [email protected] wrote:

{:a=>1, :b=>2 }.update(:a=>3, :b=>4, :c=>5) {|key, *values| values}

Woh! Little known is this karate!

You can even do:

{:a=>1, :b=>2 }.update(:a=>[1,3], :b=>4, :c=>5) {|key, *values|
values.flatten.uniq}
=> {:a=>[1, 3], :b=>[2, 4], :c=>5}

T.

Hi,

At Wed, 19 Dec 2007 09:05:11 +0900,
Phrogz wrote in [ruby-talk:284104]:

I wanted a method like Hash#update, but that preserved the values from
both the original and argument Hash. A little searching failed to find
it. (I did find that someone somewhere wrote a Hash#collate that’s in
my ri docs, but who knows where it came from. Its description appears
not to do at all what I wanted, anyhow.)

{:a=>1, :b=>2 }.update(:a=>3, :b=>4, :c=>5) {|key, *values| values}