Hash.merge_add! extension - how does this code look?

greg · October 26, 2008, 12:20pm

Hi,
I wanted a way to be able to “add” values to a Hash such that it keeps
the
old and the new values. For examples example adding an item to a hash
for
which the keys are dates. Here’s a first cut. Any feedback on coding
style etc?

================================
class Hash

Merges a hash with only one item into your hash. If there is

already a

hash entry for that key the both existing value(s) & new value are

kept
via

use of an Array

def merge_add!(h)
raise “Parameter passed in not a hash” if !h.instance_of?(Hash)
raise “Can only pass hash with one item to be added” if h.length > 1
h_key = h.to_a[0][0]
h_value = h.to_a[0][1]
if self.has_key?(h_key)
existing_value = self[h_key]
existing_value_as_array = existing_value.instance_of?(Array) ?
existing_value : [existing_value]
new_value = existing_value_as_array << h_value
self[h_key] = new_value
else
h_value.instance_of?(Array) ? self.merge!(h) : self.merge!( {h_key
=>
[h_value]} )
end
end
end

require File.expand_path(File.dirname(FILE) + ‘/…/spec_helper’)

describe “merge_add!” do
before(:each) do
@empty_hash = {}
@hash_with_number = {100 => 1}
@hash_with_array = {100 => [1]}
end

it “should raise an error if there is no object passed to the method”
do
lambda{ @empty_hash.merge_add!() }.should raise_error(ArgumentError)
end
it “should raise an error if the object passed is not a Hash” do
lambda{@empty_hash.merge_add!(123)}.should raise_error(RuntimeError,
“Parameter passed in not a hash”)
end
it “should raise an error if the number of items in the hash in not 1”
do
lambda{@empty_hash.merge_add!({101 => 2, 102 => 3})}.should
raise_error(RuntimeError, “Can only pass hash with one item to be
added”)
end
it “should create a new Array & wrap new value as an array” do
@empty_hash.merge_add!( {101 => 2} )
@empty_hash[101].should eql([2])
end
it “should create a new Array & use the array passed as the value” do
@empty_hash.merge_add!( {101 => [2]} )
@empty_hash[101].should eql([2])
end
it “should migrate existing value to an Array, then add the new value,
if
existing value is NOT an array” do
@hash_with_number.merge_add!( {100 => 2} )
@hash_with_number[100].should eql([1, 2])
end
it “should migrate add the new value to existing, if existing value IS
an
array” do
@hash_with_array.merge_add!( {100 => 2} )
@hash_with_array[100].should eql([1, 2])
end

end

================================

tks

greg · October 26, 2008, 1:14pm

Hi –

On Sun, 26 Oct 2008, Greg H. wrote:

Hi,
I wanted a way to be able to “add” values to a Hash such that it keeps the
old and the new values. For examples example adding an item to a hash for
which the keys are dates. Here’s a first cut. Any feedback on coding
style etc?

Just a quick (1/2-cup of coffee) point:

================================
class Hash

Merges a hash with only one item into your hash. If there is already a

hash entry for that key the both existing value(s) & new value are kept

via

use of an Array

def merge_add!(h)

That’s not a good name for it, because there’s no non-bang merge_add
method. The ! in method names only makes sense (with very few,
very specialized exceptions, like the way Jim W. uses them in Builder)
if the methods come in pairs: one regular method and one “dangerous”
method (“dangerous” being Matz’s term to describe the meaning of the
!).

The reasoning is as follows.

Bang method names never have, and never will, coincide completely with
receiver-altering methods in Ruby. That’s just not a possibility, and
has never been envisioned. (Consider Array#pop, String#replace, and
many, many more.) So adding a bang just because the receiver is being
changed doesn’t make sense, and dilutes the convention of the
regular/dangerous pairing.

Putting a bang on an unpaired method name just because the method
seems “dangerous”, in a unary way, doesn’t make sense either. Rails
does this, and it’s about my least favorite thing about the Rails
source code. If ! just means “This method does something dramatic!”,
we’re back to a very fuzzy, uninformative, impressionistic use of !.

Matz had it right: the regular/dangerous pairing of otherwise
same-named methods is an incredibly creative and useful application of
the bang. It actually tells you something. Every time people complain
because gsub! returns nil if the receiver doesn’t change, all I can
think is: “You were warned! There’s a bang in the name; that means it’s
similar to gsub but it’s dangerous. Heads up! Go read the docs! Don’t
complain!”

In Ruby, unpaired methods always have names that already incorporate
what they do and whether or not they change their receivers. Yes,
there are some glitches (like Array#delete and String#delete being
different as to receiver-changing). But Array#pop isn’t Array#pop!
because there’s no possible non-destructive “pop”; that would just be
array[-1]. So if you’ve got an unpaired method name and you feel like
it’s not communicating enough, it’s best to change the name rather
than add a !, for the reasons above.

Anyway, that’s the gist. Sorry to pounce on one thing. I’ll have more
coffee and look at the code itself

David

greg · October 26, 2008, 1:54pm

On Oct 26, 7:20 am, “Greg H.” [email protected]
wrote:

via

use of an Array

def merge_add!(h)
raise “Parameter passed in not a hash” if !h.instance_of?(Hash)

You don’t need to do this. It is “anti-ducktyping”. h could be a hash-
like object and that would be okay.

raise "Can only pass hash with one item to be added" if h.length > 1

Why artificially limit it to one entry hashes? Supporting any hash
would make it more flexible.

[h_value]} )
end
end
end

The idea is a good one. I think implementation could use some
improvement. For comparison have a look at Facets’ Hash#collate.

http://facets.rubyforge.org/doc/api/core/classes/Hash.html#M000112

t

greg · October 27, 2008, 12:11am

On Oct 26, 5:45 pm, Stefan R. [email protected] wrote:

Reading your problem and code I think you missed that Hash#merge
optionally takes a block which seems to solve your issue just fine but
in a more generic manner:

hash = {‘foo’ => ‘bar1’}
hash.merge! ‘foo’ => ‘bar2’ do |key, existing, new|
Array(existing) + [new]
end
p hash # → {“foo”=>[“bar1”, “bar2”]}

Nice. I didn’t know it could take a block either. Has that been there
all along?

T.

greg · October 26, 2008, 10:45pm

Greg H. wrote:

Hi, I wanted a way to be able to “add” values to a Hash such that it
keeps the old and the new values. For examples example adding
an item to a hash for which the keys are dates. Here’s a first cut.
Any feedback on coding style etc?

Reading your problem and code I think you missed that Hash#merge
optionally takes a block which seems to solve your issue just fine but
in a more generic manner:

hash = {‘foo’ => ‘bar1’}
hash.merge! ‘foo’ => ‘bar2’ do |key, existing, new|
Array(existing) + [new]
end
p hash # -> {“foo”=>[“bar1”, “bar2”]}

Additionally I would normalize all values to arrays upfront here, makes
dealing with the hash a lot easier.

Regards
Stefan

greg · October 27, 2008, 8:21am

Greg H. wrote:

thanks for highlighting this! Is the quickest way to normalise to Array
via
the following?
hashitem.instance_of?(Array) ? hashitem : [hashitem]

If you’re building up the hash yourself then I’d use the following
idiom:
collector = Hash.new { |h,k| h[k] = [] }
collector[‘key1’] << ‘value1’
collector[‘key2’] << ‘value1’
collector[‘key1’] << ‘value2’
p collector

Simplifies adding to and reading from the hash. But be aware that with
the hash having a different default value than nil, you can’t use ‘if
collector[key] then’ to test for existence of a key anymore, you have to
use has_key?.

If you’re not building the hash up yourself, then yes, I’d use what you
wroten in an each loop and override the existing value.

Regards
Stefan

greg · October 27, 2008, 4:48am

thanks for highlighting this! Is the quickest way to normalise to Array
via
the following?
hashitem.instance_of?(Array) ? hashitem : [hashitem]

greg · October 27, 2008, 8:42am

I just noticed the code suggested might have a problem when the
original hash is empty. That is using the ability to pass a block to
'merge!" is good, but when the source hash is empty it does not seem
to trigger use of the code in the block to merge.

-----code----
def merge_add!(h)
self.merge!(h) do |key, existing, new|
new = new.instance_of?(Array) ? new : [new]
existing = existing.instance_of?(Array) ? existing : [existing]
existing + new
end
end

—in console—
?> {}.merge_add!({1 => 100})
=> {1=>100} <<== DID NOT PUT THE ‘100’ IN AN ARRAY!!

{1 => 300}.merge_add!({1 => 100})
=> {1=>[300, 100]}

regards
Greg

greg · October 28, 2008, 1:44am

thanks all for feedback to date - here’s my latest take

def merge_add!(h)
raise “Parameter passed in not a hash” if !h.instance_of?(Hash)

# normalise input hash to contain arrays
h.each { |key, value| if !value.instance_of?(Array) then h[key] =

[value] end }

self.merge!(h) do |key, existing, new|
  existing = existing.instance_of?(Array) ? existing : [existing]
  existing + new
end

end

Macintosh-2:myequity greg$ cat spec/lib/hash_extensions_spec.rb
require File.expand_path(File.dirname(FILE) + ‘/…/spec_helper’)

describe “merge_add!” do
before(:each) do
@empty_hash = {}
@hash_with_number = {100 => 1}
@hash_with_array = {100 => [1]}
end

it “should raise an error if there is no object passed to the method”
do
lambda{ @empty_hash.merge_add!() }.should raise_error(ArgumentError)
end
it “should raise an error if the object passed is not a Hash” do
lambda{@empty_hash.merge_add!(123)}.should
raise_error(RuntimeError, “Parameter passed in not a hash”)
end
it “should raise an error if the number of items in the hash in not 1”
do
lambda{@empty_hash.merge_add!({101 => 2, 102 => 3})}.should_not
raise_error
end
it “should create a new Array & wrap new value as an array” do
@empty_hash.merge_add!( {101 => 2} )
@empty_hash[101].should eql([2])
end
it “should create a new Array & use the array passed as the value” do
@empty_hash.merge_add!( {101 => [2]} )
@empty_hash[101].should eql([2])
end
it “should migrate existing value to an Array, then add the new
value, if existing value is NOT an array” do
@hash_with_number.merge_add!( {100 => 2} )
@hash_with_number[100].should eql([1, 2])
end
it “should migrate add the new value to existing, if existing value
IS an array” do
@hash_with_array.merge_add!( {100 => 2} )
@hash_with_array[100].should eql([1, 2])
end

end

On Mon, Oct 27, 2008 at 5:41 PM, Greg H.

greg · October 28, 2008, 7:09am

interesting - is it really Ruby approach to let things ‘break’ during
a method so to speak as opposed to defensive coding and doing some
form of validation at the beginning of the method? I noted when I run
my spec’s with your code i get the error

 "<NoMethodError: undefined method `keys' for 123:Fixnum>"

which is quite a bit more cryptic than my

  "RuntimeError - "Parameter passed in not a hash"

Wouldn’t it be harder to debug code if one were getting the former
error message rather than the later?

greg · October 28, 2008, 2:39am

On Oct 27, 8:42 pm, “Greg H.” [email protected]
wrote:

thanks all for feedback to date - here’s my latest take

def merge_add!(h)
raise “Parameter passed in not a hash” if !h.instance_of?(Hash)

Again, this is considered poor form. The reason is, if it isn’t a Hash
it will blow up in the next couple of statements anyway, but more
importantly something other a Hash might emulate one. And there’s no
reason not to allow it to work.

# normalise input hash to contain arrays
h.each { |key, value| if !value.instance_of?(Array) then h[key] =

[value] end }

self.merge!(h) do |key, existing, new|
  existing = existing.instance_of?(Array) ? existing : [existing]
  existing + new
end

end

def merge_add!(h)
q = {}
(keys | h.keys).each do |k|
q[k] = Array(self[k]) + Array(h[k])
end
replace(q)
end

                          trans.

greg · October 28, 2008, 9:38am

Greg H. wrote:

interesting - is it really Ruby approach to let things ‘break’ during
a method so to speak

Generally speaking, yes it is.

Depending on your implementation, you probably only make use of a one or
two Hash methods - says ‘keys’ and ‘[]’

By not testing the class, you allow other objects which also implement
these methods to work, which typically makes your code more useful.

If you wanted to be really anal, you could do:

raise “Unsuitable object (doesn’t implement ‘keys’)” unless
hash.respond_to?(:keys)
raise “Unsuitable object (doesn’t implement ‘[]’)” unless
hash.respond_to?(:[])

However the error message you would then generate would be almost
identical to the default Ruby one:

 "<NoMethodError: undefined method `keys' for 123:Fixnum>"

Now, in this particular case, I’d say an ideal API would only depend on
‘each’, and then it could be used for any enumerable object which yields
two values. For example:

class Hash
def merge_add!(h)
h.each do |k,v|
if has_key?(k)
self[k] = Array[self[k]] + Array[v]
else
self[k] = v
end
end
end
end

tmp = {“one”=>1, “two”=>2}
tmp.merge_add!([[“three”,3], [“one”,999]])
p tmp

I should add that I don’t like the fact that the resulting hash will
have a mixture of array and non-array values, as it makes the values
harder to use later. If it were me I’d want all the values to be arrays.

A similar case I have come across before is a multi-valued Hash#invert,
where different keys map to the same value.

input: {“foo”=>1, “bar”=>1, “baz”=>2}

In that case, what I would want is

output: {1=>[“foo”,“bar”], 2=>[“baz”]}

rather than

output: {1=>[“foo”,“bar”], 2=>“baz”}

greg · October 28, 2008, 12:20pm

Thomas S. wrote:

ï¿½ # output: {1=>[“foo”,“bar”], 2=>“baz”}

{“foo”=>1, “bar”=>1, “baz”=>2}.invert
=> {1=>“bar”, 2=>“baz”}

That’s the standard Hash#invert, and as you demonstrate, it loses
information in the case where multiple keys map to the same value.

greg · October 28, 2008, 12:14pm

On Oct 28, 4:38 am, Brian C. [email protected] wrote:

these methods to work, which typically makes your code more useful.
 "<NoMethodError: undefined method `keys' for 123:Fixnum>"

Nice explanation.

Now, in this particular case, I’d say an ideal API would only depend on
‘each’, and then it could be used for any enumerable object which yields
two values. For example:

I agree. I should have done this myself, but I started playing golf

class Hash
def merge_add!(h)
h.each do |k,v|
if has_key?(k)
self[k] = Array[self[k]] + Array[v]
else
self[k] = v

self[k] = Array(v)

have a mixture of array and non-array values, as it makes the values
harder to use later. If it were me I’d want all the values to be arrays.

I think that was the intent. The above noted change might fix (?).

output: {1=>[“foo”,“bar”], 2=>“baz”}

{“foo”=>1, “bar”=>1, “baz”=>2}.invert
=> {1=>“bar”, 2=>“baz”}

t

greg · October 28, 2008, 12:42pm

Hi –

On Tue, 28 Oct 2008, Greg H. wrote:

Wouldn’t it be harder to debug code if one were getting the former
error message rather than the later?

Not really. The first message actually gives you more information
about what happened, which is likely to help you pinpoint it better.

You’ll see a fair amount of class-checking in the Ruby source code,
but mostly in the C code, which after all is a C program and not a
Ruby program, and which has the job of bootstrapping the actual Ruby
environment into being. In Ruby itself, it’s relatively unusual to
check class membership directly (or at least it’s unusual to have any
compelling reason to).

David

greg · October 28, 2008, 12:52pm

Hi –

On Tue, 28 Oct 2008, Brian C. wrote:

these methods to work, which typically makes your code more useful.
     self[k] = Array[self[k]] + Array[v]
   else
     self[k] = v
   end
 end
end
end

And to fix the name problem (the rogue “!”), you could also do:

class Hash
def merge_add(h)
dup.merge_add!(h)
end
end

which would give the ! a reason to be there and follows the semantics
of merge/merge!.

David

greg · October 28, 2008, 1:19pm

      self[k] = Array[self[k]] + Array[v]

Oops, I meant to write:

      self[k] = Array(self[k]) + Array(v)

Otherwise, if the value is already an array, it will be wrapped in
another one.

This does highlight a limitation of this API thought. What if the values
themselves are arrays?

h = { “foo” => [1,2] }
h.merge_add!({ “foo” => [3,4] })

do you want h = { “foo” => [[1,2], [3,4]] } ?

In that case, it would be better to insist that the hash values are
arrays in the first place, as shown by Stefan R.'s collector
example earlier.

Also, are you happy for duplicate values to be inserted?

h = { “foo” => “bar” }
h.merge_add!({ “foo” => “bar” })

h = { “foo” => [“bar”, “bar”] } ?

greg · October 30, 2008, 6:08am

thanks for the great feedback guys - will take all this on board

Hash.merge_add! extension - how does this code look?

Merges a hash with only one item into your hash. If there is

hash entry for that key the both existing value(s) & new value are

use of an Array

Merges a hash with only one item into your hash. If there is already a

hash entry for that key the both existing value(s) & new value are kept

use of an Array

use of an Array

-----code---- def merge_add!(h) self.merge!(h) do |key, existing, new| new = new.instance_of?(Array) ? new : [new] existing = existing.instance_of?(Array) ? existing : [existing] existing + new end end

end

input: {“foo”=>1, “bar”=>1, “baz”=>2}

output: {1=>[“foo”,“bar”], 2=>[“baz”]}

output: {1=>[“foo”,“bar”], 2=>“baz”}

output: {1=>[“foo”,“bar”], 2=>“baz”}

do you want h = { “foo” => [[1,2], [3,4]] } ?

h = { “foo” => [“bar”, “bar”] } ?

-----code----
def merge_add!(h)
self.merge!(h) do |key, existing, new|
new = new.instance_of?(Array) ? new : [new]
existing = existing.instance_of?(Array) ? existing : [existing]
existing + new
end
end