Populating a hash from an array using inject

I was looking at this problem on Stack Overflow (this one:

The question is how to “convert” an array of objects into a hash.

Consider this code:

require 'pp'
p RUBY_VERSION

Product = Struct.new(:name, :category)
products = [
    ['Apple','Golden Delicious'],
    ['Apple','Granny Smith'],
    ['Orange','Navel']
].collect {|cat, name| Product.new(name, cat)}

foo = products.inject({}) {|h,p| h[p.category] ||= []; h[p.category] 

<< p; h}
pp foo

bar = products.inject(Hash.new([])) {|h,p| h[p.category] << p; h}
pp bar

baz = products.inject(Hash.new([])) {|h,p| h[p.category] += p; h}
pp baz

It outputs:

"1.8.7"
{"Orange"=>[#<struct Product name="Navel", category="Orange">],
 "Apple"=>
  [#<struct Product name="Golden Delicious", category="Apple">,
   #<struct Product name="Granny Smith", category="Apple">]}
{}
{"Orange"=>[#<struct Product name="Navel", category="Orange">],
 "Apple"=>
  [#<struct Product name="Golden Delicious", category="Apple">,
   #<struct Product name="Granny Smith", category="Apple">]}

My question: why is bar empty?

I don’t have an answer, but trying out the code in JRuby, I can see that
the default value for bar becomes simply the products array after the
inject() method is used. One more thing that requires an explanation.
You can see this by appending this line to your code:

pp bar[“random string here”]

Glenn J. wrote:

I was looking at this problem on Stack Overflow (this one:
http://stackoverflow.com/questions/1405657/reorganizing-ruby-array-into-hash

The question is how to “convert” an array of objects into a hash.

Consider this code:

require 'pp'
p RUBY_VERSION

Product = Struct.new(:name, :category)
products = [
    ['Apple','Golden Delicious'],
    ['Apple','Granny Smith'],
    ['Orange','Navel']
].collect {|cat, name| Product.new(name, cat)}

foo = products.inject({}) {|h,p| h[p.category] ||= []; h[p.category] 

<< p; h}
pp foo

bar = products.inject(Hash.new([])) {|h,p| h[p.category] << p; h}
pp bar

baz = products.inject(Hash.new([])) {|h,p| h[p.category] += p; h}
pp baz

It outputs:

"1.8.7"
{"Orange"=>[#<struct Product name="Navel", category="Orange">],
 "Apple"=>

Even though it is not very well written, if you read the documentation
on creating hashes with default values:

$ ri Hash.new

-------------------------------------------------------------- Hash::new
Hash.new => hash
Hash.new(obj) => aHash
Hash.new {|hash, key| block } => aHash

 Returns a new, empty hash. If this hash is subsequently accessed by
 a key that doesn't correspond to a hash entry, the value returned
 depends on the style of +new+ used to create the hash. In the first
 form, the access returns +nil+. If _obj_ is specified, this single
 object will be used for all _default values_. If a block is
 specified, it will be called with the hash object and the key, and
 should return the default value. It is the block's responsibility
 to store the value in the hash if required.

    h = Hash.new("Go Fish")
    h["a"] = 100
    h["b"] = 200
    h["a"]           #=> 100
    h["c"]           #=> "Go Fish"
    # The following alters the single default object
    h["c"].upcase!   #=> "GO FISH"
    h["d"]           #=> "GO FISH"
    h.keys           #=> ["a", "b"]

The key line is:

8It is the block’s responsibility to store the value in the hash if
required.

Because your block does not assign the array to a hash key, the array is
discarded.

Sorry, I pasted the ri output on top of the first part of my post, which
said something to the effect of:

My question: why is bar empty?

…because h[non_existent_key] sends an array that is unassociated with
any hash to the block. In other words, creating a hash with a default
value does not cause a key/value pair to be created in the hash when a
non-existent key is accessed.

7stud – wrote:

Sorry, I pasted the ri output on top of the first part of my post, which
said something to the effect of:

My question: why is bar empty?

…because h[non_existent_key] sends an array that is unassociated with
any hash to the block. In other words, creating a hash with a default
value does not cause a key/value pair to be created in the hash when a
non-existent key is accessed.

It was probably best that first sentence was erased. Here is what I was
trying to say:

h = Hash.new([])

result = h[“A”]
p result

–output:–
[]

p h

–output:–
{}

When you write:

h[p.category] << p

The example above demonstrates that is equivalent* to:

[] << p

…and that simply appends an object to an empty array, and does nothing
to the hash.

And you are in even worse shape than you realize. If you take the empty
array that is returned as the default value and append an object to the
array, and then manually assign the array to a key in the hash:

h = Hash.new([])

h[“A”] = h[“A”] << 10 #==>[10]

h[“B”] = h[“B”] << 20 #==>[20]

Look what happens:

p h

{“A”=>[10, 20], “B”=>[10, 20]}

Yep, ruby hands you a reference to the same array over and over again.

7stud – wrote:

h = Hash.new([])

h[“A”] = h[“A”] << 10 #==>[10]

h[“B”] = h[“B”] << 20 #==>[20]

Look what happens:

p h

{“A”=>[10, 20], “B”=>[10, 20]}

Yep, ruby hands you a reference to the same array over and over
again…when you try to access non-existent keys.

Hi –

On Fri, 11 Sep 2009, Glenn J. wrote:

bar = products.inject(Hash.new([])) {|h,p| h[p.category] << p; h}
baz = products.inject(Hash.new([])) {|h,p| h[p.category] += [p]; h}

Is it because += explicitly assigns a new object to the hash key?

Yes. Here’s another way to look at it:

array = []
hash = Hash.new(array)
hash[:x] << 1 # equivalent to: array << 1
hash[:y] << 2 # equivalent to: array << 2
hash[:z] += [:a,:b,:c] # equivalent to: h[:z] = array + [:a,:b,:c]

David


David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Ruby/Rails training, mentoring, consulting, code-review
Latest book: The Well-Grounded Rubyist (http://www.manning.com/black2)

September Ruby training in NJ has been POSTPONED. Details to follow.

At 2009-09-10 09:16PM, “7stud --” wrote:

p h

{“A”=>[10, 20], “B”=>[10, 20]}

Yep, ruby hands you a reference to the same array over and over
again…when you try to access non-existent keys.

While I thank you for takking the time, I can’t say I’m enlightened by
your explanation. What’s the real difference between the blocks

bar = products.inject(Hash.new([])) {|h,p| h[p.category] << p; h}
baz = products.inject(Hash.new([])) {|h,p| h[p.category] += [p]; h}

Is it because += explicitly assigns a new object to the hash key?

On Thu, Sep 10, 2009 at 8:42 PM, 7stud – [email protected]
wrote:

 "Apple"=>

   h["a"] = 100

The key line is:

8It is the block’s responsibility to store the value in the hash if
required.

Because your block does not assign the array to a hash key, the array is
discarded.

The thread has moved on a bit, but.

He actually isn’t USING the block form of Hash.new, the only blocks
are arguments to inject.

And Hash.new {|h,k| … } is what I tend to reach for in a situation
like this.

foo = products.inject(Hash.new {|h,k| h[k] = []}) {|h,p|
h[p.category] << p; h}


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Glenn J. wrote:

At 2009-09-10 09:16PM, “7stud --” wrote:

p h

{“A”=>[10, 20], “B”=>[10, 20]}

Yep, ruby hands you a reference to the same array over and over
again…when you try to access non-existent keys.

While I thank you for takking the time, I can’t say I’m enlightened by
your explanation. What’s the real difference between the blocks

bar = products.inject(Hash.new([])) {|h,p| h[p.category] << p; h}
baz = products.inject(Hash.new([])) {|h,p| h[p.category] += [p]; h}

Is it because += explicitly assigns a new object to the hash key?

Yes. When the key doesn’t exist the first line is equivalent to:

bar = products.inject(Hash.new([])) {|h,p| [] << p; h}

which does nothing to the hash–all it does is append p to an empty
array, and then the empty array is discarded.

The second line is equivalent to:

baz = products.inject(Hash.new([])) {|h,p| h[p.category] = h[p.category]

  • [p]; h}

If you access a non-existent key, say h[“A”], then that line is
equivalent to

baz = products.inject(Hash.new([])) {|h,p| h[“A”] = [] + [p]; h}

or

baz = products.inject(Hash.new([])) {|h,p| h[“A”] = [p]; h}

which explicitly sets a key in the hash to the value [p], thereby
altering the hash.

At 2009-09-11 02:07PM, “7stud --” wrote:

bar = products.inject(Hash.new([])) {|h,p| [] << p; h}

which does nothing to the hash–all it does is append p to an empty
array, and then the empty array is discarded.

As we’ve seen, it’s not discarded: it’s kept for h’s reference:

products = [[1,2],[3,4],[1,5]]
foo = products.inject(Hash.new([])) {|h,(a,b)| h[a] << b; h} # => {}
foo[:unknown] # => [2, 4, 5]

Glenn J. wrote:

At 2009-09-11 02:07PM, “7stud --” wrote:

bar = products.inject(Hash.new([])) {|h,p| [] << p; h}

which does nothing to the hash–all it does is append p to an empty
array, and then the empty array is discarded.

As we’ve seen, it’s not discarded: it’s kept for h’s reference:

products = [[1,2],[3,4],[1,5]]
foo = products.inject(Hash.new([])) {|h,(a,b)| h[a] << b; h} # => {}
foo[:unknown] # => [2, 4, 5]

…and this is a lie too:

If you access a non-existent key, say h[“A”], then that line is
equivalent to

baz = products.inject(Hash.new([])) {|h,p| h[“A”] = [] + [p]; h}

You asked for lies. I gave them to you.