Forum: Ruby Hash.merge_add! extension - how does this code look?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Greg H. (Guest)
on 2008-10-26 13:20
(Received via mailing list)
Hi,
I wanted a way to be able to "add" values to a Hash such that it keeps
the
old and the new values.  For examples example adding an item to a hash
for
which the keys are dates.   Here's a first cut.  Any feedback on coding
style etc?

================================
class Hash
  # Merges a hash with only one item into your hash.  If there is
already a
  # hash entry for that key the both existing value(s) & new value are
kept
via
  # use of an Array
  def merge_add!(h)
    raise "Parameter passed in not a hash" if !h.instance_of?(Hash)
    raise "Can only pass hash with one item to be added" if h.length > 1
    h_key = h.to_a[0][0]
    h_value = h.to_a[0][1]
    if self.has_key?(h_key)
      existing_value = self[h_key]
      existing_value_as_array = existing_value.instance_of?(Array) ?
existing_value : [existing_value]
      new_value = existing_value_as_array << h_value
      self[h_key] = new_value
    else
      h_value.instance_of?(Array) ? self.merge!(h) : self.merge!( {h_key
=>
[h_value]} )
    end
  end
end
================================
require File.expand_path(File.dirname(__FILE__) + '/../spec_helper')

describe "merge_add!" do
  before(:each) do
    @empty_hash = {}
    @hash_with_number = {100 => 1}
    @hash_with_array = {100 => [1]}
  end

  it "should raise an error if there is no object passed to the method"
do
    lambda{ @empty_hash.merge_add!() }.should raise_error(ArgumentError)
  end
  it "should raise an error if the object passed is not a Hash" do
    lambda{@empty_hash.merge_add!(123)}.should raise_error(RuntimeError,
"Parameter passed in not a hash")
  end
  it "should raise an error if the number of items in the hash in not 1"
do
    lambda{@empty_hash.merge_add!({101 => 2, 102 => 3})}.should
raise_error(RuntimeError, "Can only pass hash with one item to be
added")
  end
  it "should create a new Array & wrap new value as an array" do
    @empty_hash.merge_add!( {101 => 2} )
    @empty_hash[101].should eql([2])
  end
  it "should create a new Array & use the array passed as the value" do
    @empty_hash.merge_add!( {101 => [2]} )
    @empty_hash[101].should eql([2])
  end
  it "should migrate existing value to an Array, then add the new value,
if
existing value is NOT an array" do
    @hash_with_number.merge_add!( {100 => 2} )
    @hash_with_number[100].should eql([1, 2])
  end
  it "should migrate add the new value to existing, if existing value IS
an
array" do
    @hash_with_array.merge_add!( {100 => 2} )
    @hash_with_array[100].should eql([1, 2])
  end

end



================================


tks
David A. Black (Guest)
on 2008-10-26 14:14
(Received via mailing list)
Hi --

On Sun, 26 Oct 2008, Greg H. wrote:

> Hi,
> I wanted a way to be able to "add" values to a Hash such that it keeps the
> old and the new values.  For examples example adding an item to a hash for
> which the keys are dates.   Here's a first cut.  Any feedback on coding
> style etc?

Just a quick (1/2-cup of coffee) point:

> ================================
> class Hash
>  # Merges a hash with only one item into your hash.  If there is already a
>  # hash entry for that key the both existing value(s) & new value are kept
> via
>  # use of an Array
>  def merge_add!(h)

That's not a good name for it, because there's no non-bang merge_add
method. The ! in method names only makes sense (with very few,
very specialized exceptions, like the way Jim W. uses them in Builder)
if the methods come in pairs: one regular method and one "dangerous"
method ("dangerous" being Matz's term to describe the meaning of the
!).

The reasoning is as follows.

Bang method names never have, and never will, coincide completely with
receiver-altering methods in Ruby. That's just not a possibility, and
has never been envisioned. (Consider Array#pop, String#replace, and
many, many more.) So adding a bang just because the receiver is being
changed doesn't make sense, and dilutes the convention of the
regular/dangerous pairing.

Putting a bang on an unpaired method name just because the method
seems "dangerous", in a unary way, doesn't make sense either. Rails
does this, and it's about my least favorite thing about the Rails
source code. If ! just means "This method does something dramatic!",
we're back to a very fuzzy, uninformative, impressionistic use of !.

Matz had it right: the regular/dangerous pairing of otherwise
same-named methods is an incredibly creative and useful application of
the bang. It actually tells you something. Every time people complain
because gsub! returns nil if the receiver doesn't change, all I can
think is: "You were warned! There's a bang in the name; that means it's
similar to gsub but it's dangerous. Heads up! Go read the docs! Don't
complain!" :-)

In Ruby, unpaired methods always have names that already incorporate
what they do and whether or not they change their receivers. Yes,
there are some glitches (like Array#delete and String#delete being
different as to receiver-changing). But Array#pop isn't Array#pop!
because there's no possible non-destructive "pop"; that would just be
array[-1]. So if you've got an unpaired method name and you feel like
it's not communicating enough, it's best to change the name rather
than add a !, for the reasons above.

Anyway, that's the gist. Sorry to pounce on one thing. I'll have more
coffee and look at the code itself :-)


David
Thomas S. (Guest)
on 2008-10-26 14:54
(Received via mailing list)
On Oct 26, 7:20 am, "Greg H." <removed_email_address@domain.invalid>
wrote:
> via
>   # use of an Array
>   def merge_add!(h)
>     raise "Parameter passed in not a hash" if !h.instance_of?(Hash)

You don't need to do this. It is "anti-ducktyping". h could be a hash-
*like* object and that would be okay.

>     raise "Can only pass hash with one item to be added" if h.length > 1

Why artificially limit it to one entry hashes? Supporting any hash
would make it more flexible.

> [h_value]} )
>     end
>   end
> end

The idea is a good one. I think implementation could use some
improvement. For comparison have a look at Facets' Hash#collate.

  http://facets.rubyforge.org/doc/api/core/classes/H...

t
Stefan R. (Guest)
on 2008-10-26 23:45
Greg H. wrote:
> Hi, I wanted a way to be able to "add" values to a Hash such that it
> keeps  the old and the new values.  For examples example adding
> an item to a hash for which the keys are dates.   Here's a first cut.
>  Any feedback on coding style etc?

Reading your problem and code I think you missed that Hash#merge
optionally takes a block which seems to solve your issue just fine but
in a more generic manner:

hash = {'foo' => 'bar1'}
hash.merge! 'foo' => 'bar2' do |key, existing, new|
  Array(existing) + [new]
end
p hash # -> {"foo"=>["bar1", "bar2"]}

Additionally I would normalize all values to arrays upfront here, makes
dealing with the hash a lot easier.

Regards
Stefan
Thomas S. (Guest)
on 2008-10-27 01:11
(Received via mailing list)
On Oct 26, 5:45 pm, Stefan R. <removed_email_address@domain.invalid> wrote:

> Reading your problem and code I think you missed that Hash#merge
> optionally takes a block which seems to solve your issue just fine but
> in a more generic manner:
>
> hash = {'foo' => 'bar1'}
> hash.merge! 'foo' => 'bar2' do |key, existing, new|
>   Array(existing) + [new]
> end
> p hash # -> {"foo"=>["bar1", "bar2"]}

Nice. I didn't know it could take a block either. Has that been there
all along?

T.
Greg H. (Guest)
on 2008-10-27 05:48
(Received via mailing list)
thanks for highlighting this!  Is the quickest way to normalise to Array
via
the following?
    hashitem.instance_of?(Array) ? hashitem : [hashitem]
Stefan R. (Guest)
on 2008-10-27 09:21
Greg H. wrote:
> thanks for highlighting this!  Is the quickest way to normalise to Array
> via
> the following?
>     hashitem.instance_of?(Array) ? hashitem : [hashitem]

If you're building up the hash yourself then I'd use the following
idiom:
collector = Hash.new { |h,k| h[k] = [] }
collector['key1'] << 'value1'
collector['key2'] << 'value1'
collector['key1'] << 'value2'
p collector

Simplifies adding to and reading from the hash. But be aware that with
the hash having a different default value than nil, you can't use 'if
collector[key] then' to test for existence of a key anymore, you have to
use has_key?.

If you're not building the hash up yourself, then yes, I'd use what you
wroten in an each loop and override the existing value.

Regards
Stefan
Greg H. (Guest)
on 2008-10-27 09:42
(Received via mailing list)
I just noticed the code suggested might have a problem when the
original hash is empty. That is using the ability to pass a block to
'merge!" is good, but when the source hash is empty it does not seem
to trigger use of the code in the block to merge.

-----code----
  def merge_add!(h)
    self.merge!(h) do |key, existing, new|
      new = new.instance_of?(Array) ? new : [new]
      existing = existing.instance_of?(Array) ? existing : [existing]
      existing + new
    end
  end
-------------
---in console---
?> {}.merge_add!({1 => 100})
=> {1=>100}               <<== DID NOT PUT THE '100' IN AN ARRAY!!

>> {1 => 300}.merge_add!({1 => 100})
=> {1=>[300, 100]}
--------------

regards
Greg
Greg H. (Guest)
on 2008-10-28 02:44
(Received via mailing list)
thanks all for feedback to date - here's my latest take

------------------------------------
  def merge_add!(h)
    raise "Parameter passed in not a hash" if !h.instance_of?(Hash)

    # normalise input hash to contain arrays
    h.each { |key, value| if !value.instance_of?(Array) then h[key] =
[value] end }

    self.merge!(h) do |key, existing, new|
      existing = existing.instance_of?(Array) ? existing : [existing]
      existing + new
    end
  end
------------------------------------
Macintosh-2:myequity greg$ cat spec/lib/hash_extensions_spec.rb
require File.expand_path(File.dirname(__FILE__) + '/../spec_helper')

describe "merge_add!" do
  before(:each) do
    @empty_hash = {}
    @hash_with_number = {100 => 1}
    @hash_with_array = {100 => [1]}
  end

  it "should raise an error if there is no object passed to the method"
do
    lambda{ @empty_hash.merge_add!() }.should raise_error(ArgumentError)
  end
  it "should raise an error if the object passed is not a Hash" do
    lambda{@empty_hash.merge_add!(123)}.should
raise_error(RuntimeError, "Parameter passed in not a hash")
  end
  it "should raise an error if the number of items in the hash in not 1"
do
    lambda{@empty_hash.merge_add!({101 => 2, 102 => 3})}.should_not
raise_error
  end
  it "should create a new Array & wrap new value as an array" do
    @empty_hash.merge_add!( {101 => 2} )
    @empty_hash[101].should eql([2])
  end
  it "should create a new Array & use the array passed as the value" do
    @empty_hash.merge_add!( {101 => [2]} )
    @empty_hash[101].should eql([2])
  end
  it "should migrate existing value to an Array, then add the new
value, if existing value is NOT an array" do
    @hash_with_number.merge_add!( {100 => 2} )
    @hash_with_number[100].should eql([1, 2])
  end
  it "should migrate add the new value to existing, if existing value
IS an array" do
    @hash_with_array.merge_add!( {100 => 2} )
    @hash_with_array[100].should eql([1, 2])
  end

end



On Mon, Oct 27, 2008 at 5:41 PM, Greg H.
Thomas S. (Guest)
on 2008-10-28 03:39
(Received via mailing list)
On Oct 27, 8:42 pm, "Greg H." <removed_email_address@domain.invalid>
wrote:
> thanks all for feedback to date - here's my latest take
>
> ------------------------------------
>   def merge_add!(h)
>     raise "Parameter passed in not a hash" if !h.instance_of?(Hash)

Again, this is considered poor form. The reason is, if it isn't a Hash
it will blow up in the next couple of statements anyway, but more
importantly something other a Hash might emulate one. And there's no
reason not to allow it to work.

>     # normalise input hash to contain arrays
>     h.each { |key, value| if !value.instance_of?(Array) then h[key] =
> [value] end }
>
>     self.merge!(h) do |key, existing, new|
>       existing = existing.instance_of?(Array) ? existing : [existing]
>       existing + new
>     end
>   end


  def merge_add!(h)
    q = {}
    (keys | h.keys).each do |k|
      q[k] = Array(self[k]) + Array(h[k])
    end
    replace(q)
  end


                              trans.
Greg H. (Guest)
on 2008-10-28 08:09
(Received via mailing list)
interesting - is it really Ruby approach to let things 'break' during
a method so to speak as opposed to defensive coding and doing some
form of validation at the beginning of the method?  I noted when I run
my spec's with your code i get the error

     "<NoMethodError: undefined method `keys' for 123:Fixnum>"

which is quite a bit more cryptic than my

      "RuntimeError - "Parameter passed in not a hash"

Wouldn't it be harder to debug code if one were getting the former
error message rather than the later?
Brian C. (Guest)
on 2008-10-28 10:38
Greg H. wrote:
> interesting - is it really Ruby approach to let things 'break' during
> a method so to speak

Generally speaking, yes it is.

Depending on your implementation, you probably only make use of a one or
two Hash methods - says 'keys' and '[]'

By not testing the class, you allow other objects which also implement
these methods to work, which typically makes your code more useful.

If you wanted to be really anal, you could do:

  raise "Unsuitable object (doesn't implement 'keys')" unless
        hash.respond_to?(:keys)
  raise "Unsuitable object (doesn't implement '[]')" unless
        hash.respond_to?(:[])

However the error message you would then generate would be almost
identical to the default Ruby one:

>      "<NoMethodError: undefined method `keys' for 123:Fixnum>"

Now, in this particular case, I'd say an ideal API would only depend on
'each', and then it could be used for any enumerable object which yields
two values. For example:

  class Hash
    def merge_add!(h)
      h.each do |k,v|
        if has_key?(k)
          self[k] = Array[self[k]] + Array[v]
        else
          self[k] = v
        end
      end
    end
  end

  tmp = {"one"=>1, "two"=>2}
  tmp.merge_add!([["three",3], ["one",999]])
  p tmp

I should add that I don't like the fact that the resulting hash will
have a mixture of array and non-array values, as it makes the values
harder to use later. If it were me I'd want all the values to be arrays.

A similar case I have come across before is a multi-valued Hash#invert,
where different keys map to the same value.

  # input: {"foo"=>1, "bar"=>1, "baz"=>2}

In that case, what I would want is

  # output: {1=>["foo","bar"], 2=>["baz"]}

rather than

  # output: {1=>["foo","bar"], 2=>"baz"}
Thomas S. (Guest)
on 2008-10-28 13:14
(Received via mailing list)
On Oct 28, 4:38 am, Brian C. <removed_email_address@domain.invalid> wrote:
> these methods to work, which typically makes your code more useful.
>
> >      "<NoMethodError: undefined method `keys' for 123:Fixnum>"

Nice explanation.

> Now, in this particular case, I'd say an ideal API would only depend on
> 'each', and then it could be used for any enumerable object which yields
> two values. For example:

I agree. I should have done this myself, but I started playing golf ;)

>   class Hash
>     def merge_add!(h)
>       h.each do |k,v|
>         if has_key?(k)
>           self[k] = Array[self[k]] + Array[v]
>         else
>           self[k] = v

  self[k] = Array(v)

> have a mixture of array and non-array values, as it makes the values
> harder to use later. If it were me I'd want all the values to be arrays.

I think that was the intent. The above noted change might fix (?).

>
>   # output: {1=>["foo","bar"], 2=>"baz"}

>> {"foo"=>1, "bar"=>1, "baz"=>2}.invert
=> {1=>"bar", 2=>"baz"}

t
Brian C. (Guest)
on 2008-10-28 13:20
Thomas S. wrote:
>> � # output: {1=>["foo","bar"], 2=>"baz"}
>
>>> {"foo"=>1, "bar"=>1, "baz"=>2}.invert
> => {1=>"bar", 2=>"baz"}

That's the standard Hash#invert, and as you demonstrate, it loses
information in the case where multiple keys map to the same value.
David A. Black (Guest)
on 2008-10-28 13:42
(Received via mailing list)
Hi --

On Tue, 28 Oct 2008, Greg H. wrote:

>
> Wouldn't it be harder to debug code if one were getting the former
> error message rather than the later?

Not really. The first message actually gives you more information
about what happened, which is likely to help you pinpoint it better.

You'll see a fair amount of class-checking in the Ruby source code,
but mostly in the C code, which after all is a C program and not a
Ruby program, and which has the job of bootstrapping the actual Ruby
environment into being. In Ruby itself, it's relatively unusual to
check class membership directly (or at least it's unusual to have any
compelling reason to).


David
David A. Black (Guest)
on 2008-10-28 13:52
(Received via mailing list)
Hi --

On Tue, 28 Oct 2008, Brian C. wrote:

> these methods to work, which typically makes your code more useful.
>
>          self[k] = Array[self[k]] + Array[v]
>        else
>          self[k] = v
>        end
>      end
>    end
>  end

And to fix the name problem (the rogue "!"), you could also do:

   class Hash
     def merge_add(h)
       dup.merge_add!(h)
     end
   end

which would give the ! a reason to be there and follows the semantics
of merge/merge!.


David
Brian C. (Guest)
on 2008-10-28 14:19
>           self[k] = Array[self[k]] + Array[v]

Oops, I meant to write:

          self[k] = Array(self[k]) + Array(v)

Otherwise, if the value is already an array, it will be wrapped in
another one.

This does highlight a limitation of this API thought. What if the values
themselves are arrays?

   h = { "foo" => [1,2] }
   h.merge_add!({ "foo" => [3,4] })
   # do you want h = { "foo" => [[1,2], [3,4]] } ?

In that case, it would be better to insist that the hash values are
arrays in the first place, as shown by Stefan R.'s collector
example earlier.

Also, are you happy for duplicate values to be inserted?

   h = { "foo" => "bar" }
   h.merge_add!({ "foo" => "bar" })
   # h = { "foo" => ["bar", "bar"] }  ?
Greg H. (Guest)
on 2008-10-30 07:08
(Received via mailing list)
thanks for the great feedback guys - will take all this on board
This topic is locked and can not be replied to.