Problem with Hash of Arrays

I am new to Ruby , but I consider this feature to be a bug.

What is happening is that i am creating a new Hash of Arrays. The
following code works fine

a = Hash.new(Array.new())

a[:first] += [“this”]
a[:first] += [“is a”]
a[:first] += [“string”]
puts a.to_yaml

The following also works…

a = Hash.new(Array.new())

a[:key] += [“first”]
a[:key].push(“second”)
a[:key].push(“third”)
puts a.to_yaml

But this does not

a = Hash.new(Array.new())

a[:key].push(“first”)
a[:key].push(“second”)
a[:key].push(“third”)

However, this does not work if you don’t use the “+=” operator first.
Note, this “feature” also
occurs for the “<<” operator , or any other methods that expect that
a[:key] is already a defined array.

I think if you already specified what the new object is going to be ,
then you should be able to call a method of that object.

Jimi Damon wrote:

I am new to Ruby , but I consider this feature to be a bug.

What is happening is that i am creating a new Hash of Arrays. The
following code works fine

What is happening is that you create a Hash whose default value is one
Array.

a = Hash.new(Array.new())

a[:first] += [“this”]
a[:first] += [“is a”]
a[:first] += [“string”]

Here you are assigning a new array to a[:first] three times.

a = Hash.new(Array.new())

a[:key] += [“first”]
a[:key].push(“second”)
a[:key].push(“third”)

Here you are assigining a new array once and then changing it two times.

a = Hash.new(Array.new())

a[:key].push(“first”)
a[:key].push(“second”)
a[:key].push(“third”)

Here you are changing the default array three times.

However, this does not work if you don’t use the “+=” operator first.

+= isn’t one operator. It’s a shortcut for a[:key] = a[:key] +
[“first”].
This will first call the ±method on a a[:key] (i.e. it will call the +
method
on the default object because that’s what a[:key] points to at this
time),
which will return a new array, and then assign the resulting array to
a[:key]. This does not change the default array. Calling push would
because
the push method modifies its receiver while the + method doesn’t.

Note, this “feature” also
occurs for the “<<” operator , or any other methods that expect that
a[:key] is already a defined array.

No method expect any such thing. They don’t know or care about the hash
a. All
they care about is their arguments and their receiver. Since you call
them on
the hash’s default object, they will operate on the hash’s default
object.

I think if you already specified what the new object is going to be ,
then you should be able to call a method of that object.

I’m sorry, but I don’t follow. What new object? << and pop don’t create
any
new objects. And where did you specify what such an object would be?
Array#+
creates a new Array (unlike Array#<< and Array#pop, which just modify an
existing one), but it doesn’t do so, because you specified anywhere that
you
want an array. It does so because it always returns an array, because
that’s
what it’s been defined to do.

In addition, to create different Arrays objects use a block…

irb(main):001:0> a = Hash.new{|hash, key| hash[key] = Array.new;}
=> {}
irb(main):002:0> a[:first]
=> []
irb(main):003:0> a[:first] << “test”
=> [“test”]
irb(main):004:0> a[:sec]
=> []

On Dec 7, 2007, at 7:00 PM, Jimi Damon wrote:

Yes, but if you run this example and type “a” you get
But it does not…

You’ve mistyped something. a will indeed be “{:first=>[“test”]}”
if you run Bernardo’s example.

You’re asking a lot of very common questions for programmers who
aren’t familiar with the behavior of Ruby’s Hash class. The bottom
line is that you are not pointing out bugs in Ruby’s implementation
of Hash, just common misunderstandings.

  1. Hash.new(x) returns x when a key isn’t found. It will be the
    same x for every key and won’t store x in the hash, just return
    it on a key miss.

  2. Hash.new { #code } will run the block on every key miss and
    return the resulting value but will not store the value in the
    hash.

  3. Hash.new { |h,k| h[k] = #code } will evaluate the code, store
    it in the hash, and return the value on a key miss. Because
    the value is stored in the hash on the first key miss, the code
    will not execute on a subsequent lookup on the same key.

Gary W.

Bernardo Rufino wrote:

In addition, to create different Arrays objects use a block…

irb(main):001:0> a = Hash.new{|hash, key| hash[key] = Array.new;}
=> {}
irb(main):002:0> a[:first]
=> []
irb(main):003:0> a[:first] << “test”
=> [“test”]

Yes, but if you run this example and type “a” you get

irb(main):003:0> a
=> {}

I’m sorry…but I think this is incorrect… You have defined an array
as being the default type , hence after
you have performed << “test” , a should contain

=> {:first=>[“test”]}

But it does not…

As for the other post.

When you define Hash.new( ) , this is the default value when you HAVE
NOT defined the key for that value.

Hence, if I do

a = Hash.new(“tmp”)

and type in irb
irb(main):006 a = Hash.new(“tmp”)
=> {}
irb(main):007 a[:first]
=> “tmp”
irb(main):008 a
=> {}
irb(main):009

This makes sense because it is accessing the default value for the key
which is “tmp”

However, with that being said, if my default type is an “array”, then I
should be able to Push into that array a value and have it stay around,
other wise why does

a[:key] += [“value”]
and then
a[:key].push(“another value”)
work ?

How do you suggest creating new Hash entries where by default I want
them to be Arrays ?

In Perl I can easily ( albeit it is ugly ) type

push ( @{$hash{key}}, “New value”)

This works as long as $hash{key} has either not been defined yet…or if
it is already an anonymous Array.

How can you use operators such as “Push” , or “<<” on a hash key/value
pair when the key has not been defined yet for the hash ? I want a
constructor for each “value” to make it an Array.

PLease note, I don’t want to write

if hash.has_key?(“key”).nil?
hash[“key”] = [“new value”]
else
hash[“key”].push( “new value” )
end

Thanks for any suggestions and also for straightening me out about
“Default” values.

However, really what I am looking for is a default constructore for the
blank value.

On Dec 7, 2007, at 7:17 PM, Jimi Damon wrote:

However, really what I am looking for is a default constructore for
the
blank value.

Bernardo already showed you:

In addition, to create different Arrays objects use a block…

irb(main):001:0> a = Hash.new{|hash, key| hash[key] = Array.new;}
=> {}
irb(main):002:0> a[:first]
=> []
irb(main):003:0> a[:first] << “test”
=> [“test”]
irb(main):004:0> a[:sec]
=> []

A block provided to Hash.new is called every time there is miss
during key lookup. The two arguments to the block are the hash
itself and the key that caused the missed lookup. If the code
in the block stores a value into the hash with that key, then there
won’t be any future misses for that key.

In the example that Bernardo showed, a new array is constructed
every time a key lookup fails and then stored in the hash using
that key. Subsequent lookups with that key get that array–that
is to say, an new array is only created once per key.

Gary W.

Jimi Damon wrote:

Yes, but if you run this example and type “a” you get

irb(main):003:0> a
=> {}

This is not true.

a = Hash.new{|hash, key| hash[key] = Array.new}
=> {}

a[:first] << “test”
=> [“test”]

a
=> {:first=>[“test”]}

I’m sorry…but I think this is incorrect… You have defined an array
as being the default type , hence after
you have performed << “test” , a should contain

=> {:first=>[“test”]}

But it does not…

Yes, it does. See above.

irb(main):007 a[:first]
=> “tmp”
irb(main):008 a
=> {}
irb(main):009

This makes sense because it is accessing the default value for the key
which is “tmp”

Exactly.

However, with that being said, if my default type is an “array”, then I
should be able to Push into that array a value and have it stay around,

If you push an item into the default array. It does stay around. The
array
doesn’t get assigned to the key, but the item stays in the array. See:

h = Hash.new(Array.new)
=> {}

h[:bla]
=> []

h[:bla] << “la”
=> [“la”]

h[:bla]
=> [“la”]

h[:blubb]
=> [“la”]

h
=> {}

After the first line the default item is an empty array. h[:bla] will
give you
this array. Calling << on it, will put “la” into the array. The default
item
is now [“la”]. It is not assigned to any hash key, but everytime you get
the
default item, this is what you will get.

other wise why does
a[:key] += [“value”]
and then
a[:key].push(“another value”)
work ?

As I explained in my previous post the first line expands to:
a[:key] = a[:key] + [“value”]
This will first evaluate a[:key] + [“value”].
This calls the + method on the default array with [“value”] as an
argument.
What the method does is it creates a new array containing all the items
of the
default array as well as the items of the argument. Assuming the default
array was empty before, this will evaluate to the array [“value”]
(without
changing the default array). So the expression now reads
a[:key] = [“value”]
This assigns the new array [“value”] to a[:key]. a[:key] now no longer
points
to the default array. As such calling push on it, no longer calls push
on the
default array, but on the new array it points to.

HTH,
Sebastian

Fearless F. wrote:

This is a long-inactive thread, but it glosses over an important gotcha
that nobody has mentioned.

As mentioned above, the sexy ruby-esque way to make a hash of arrays is
to use a block initializer for Hash.new:

h = Hash.new{|hash, key| hash[key] = []}
=> {}

This allows you to push items onto any slot of the hash without first
checking to see if it’s nil:

h[:first] << “yum!”
=> [“yum!”]

h[:first] << “tastes like chicken”
=> [“yum!”, “tastes like chicken”]

But here’s the gotcha: If you later test for a non-empty slot with
h[], h will automatically create an empty array whether you
intended to or not. Consider:

process(h[:second]) if h[:second]

Even though we never pushed anything onto h[:second], process() WILL get
called (and the hash will grow by one element), since the test
h[:second] automagically creates an empty array. This may not be a
problem if you use h entirely within your own code, but could be
surprising behavior – to you or to some other user – and lead to
elusive bugs. (As I discovered…)

MORAL: A better idiom for pushing items onto a hash of arrays is:

h = Hash.new
(h[:first] ||= []) << “yum!”

… since this avoids any unexpected behavior with the hash later on.

  • ff

This here is why you learn the core library.

irb(main):025:0> h = Hash.new {|h, k| h[k] = []}
=> {}
irb(main):026:0> h[:rocks].push “cliff”
=> [“cliff”]
irb(main):027:0> h[:rocks]
=> [“cliff”]
irb(main):028:0> h.has_key? :face
=> false

This is a long-inactive thread, but it glosses over an important gotcha
that nobody has mentioned.

As mentioned above, the sexy ruby-esque way to make a hash of arrays is
to use a block initializer for Hash.new:

h = Hash.new{|hash, key| hash[key] = []}
=> {}

This allows you to push items onto any slot of the hash without first
checking to see if it’s nil:

h[:first] << “yum!”
=> [“yum!”]

h[:first] << “tastes like chicken”
=> [“yum!”, “tastes like chicken”]

But here’s the gotcha: If you later test for a non-empty slot with
h[], h will automatically create an empty array whether you
intended to or not. Consider:

process(h[:second]) if h[:second]

Even though we never pushed anything onto h[:second], process() WILL get
called (and the hash will grow by one element), since the test
h[:second] automagically creates an empty array. This may not be a
problem if you use h entirely within your own code, but could be
surprising behavior – to you or to some other user – and lead to
elusive bugs. (As I discovered…)

MORAL: A better idiom for pushing items onto a hash of arrays is:

h = Hash.new
(h[:first] ||= []) << “yum!”

… since this avoids any unexpected behavior with the hash later on.

  • ff

On Sep 8, 2010, at 4:11 PM, Fearless F. wrote:

But here’s the gotcha: If you later test for a non-empty slot with
h[], h will automatically create an empty array whether you
intended to or not. Consider:

process(h[:second]) if h[:second]

There are lots of other ways around that problem:

process(h[:second]) if h.has_key?(:second)

h.each { |k,v|
case k
when :first
process_first(v)
when :second
process_second(v)
end
}

if list = h.fetch(:second, nil)
process(list)
end

Your use case seems a bit strange though since the choice of having an
empty array as the default value suggests that you intend to process
arrays, in which case you should just arrange for your processing code
to correctly handle empty lists rather than testing for the ‘existence’
of the key/value pair in the first place.

def process(list)
list.each { # do something }
end

process(h[:second]) # works if value was stored or if default empty
array returned

Gary W.

@Gary, @Chris:

You’re both right, of course: a real programmer will always use
has_key?() (or equivalent) to check for a non-empty slot in a hash
table! Or will she?

Consider the following example:

1: sentence = ParserModule.parse(string)
2: if (p = sentence[:preposition])
3: …

Depending on how the ‘sentence’ hash table is constructed, line 2 may or
may not modify the state of the hash table, and YOU CANNOT KNOW unless
you look inside the ParserModule. Why is it bad if the state of the
hash table changes unexpectedly? I dunno: maybe your code depends on
the number of elements in the hash, or (as the code above suggests) the
distinction between a nil slot and a non-nil slot is important.

But for line 2 to modify the state of the hash table is a clear
violation of “The Principle of (Matz’s) Least Surprise”.

  • ff

P.S.: If you claim that it’s better to write this as:

1: sentence = ParserModule.parse(string)
2. if sentence.has_key?(:preposition)
3. p = sentence[:preposition]

… I’ll sic the DRY police on you. :slight_smile:

On Sep 8, 2010, at 5:25 PM, Fearless F. wrote:

1: sentence = ParserModule.parse(string)
2. if sentence.has_key?(:preposition)
3. p = sentence[:preposition]

… I’ll sic the DRY police on you. :slight_smile:

It seems to me you’ve described two mutually contradictory
‘patterns’:

– a hash that defaults to a new array when a key is referenced
– a hash that is queried ‘randomly’ (as in individual keys are
accessed by name rather than all the entries being processed
similarly).

You claim that the combination is bad because the ‘state’ of
the hash changes as a side effect of the queries. These two
patterns don’t seem to go together. The first suggests that the hash
is a named collection of arrays but the second suggests that
the hash values are not homogeneous and that sometimes you don’t
want an array as the default value.

As I pointed out, you can always override the hash default on a per
‘lookup’ basis to avoid the problem you are complaining about:

if prep = sentence.fetch(:preposition, nil)
# do something with prep
end

nouns.sentence.fetch(:nouns, []).each { |n|
process_noun(n)
}

process_all_verbs(sentence[:verbs]) # if you don’t care if this
creates an empty entry for :verbs

Another solution would be to not use a Hash that defaults to an Array.
In that case you could still use ‘fetch’ to change the behavior on
each individual access when you really do want an array installed.

sentence.fetch(:nouns) { |miss| sentence[miss] = [] }

Gary W.

Gary:

All of your suggestions are clean and correct.

I apologize for not being clearer. For the “ParserModule.parse(string)”
example, I didn’t mean to suggest that the slots default to empty arrays
(as they did in the previous examples). I was just saying that – as a
developer of a module – handing a hash table to a user that changes
state when accessing its slots via [] can lead to surprising behavior
that would be difficult for the user to track down.

Of course, you can document the behavior and tell the user to use
fetch() rather than []. But isn’t it better to create a hash table that
doesn’t have this behavior in the first place?

I still believe the best solution is to avoid hash tables with block
initializers. Going back to the OP’s question about creating a hash of
arrays, we could use your suggestion of:
hash.fetch(key) {|miss| hash[miss] = []} << value
or the somewhat more succinct:
(hash[key] ||= []) << value
Either one works for me.

Thanks for the suggestions and feedback.

  • ff

On Sep 8, 2010, at 7:25 PM, Fearless F. wrote:

Of course, you can document the behavior and tell the user to use
fetch() rather than []. But isn’t it better to create a hash table that
doesn’t have this behavior in the first place?

I don’t think your question is valid. Does there need to be one best
way to use a Hash? A strength of Ruby’s core API is that it gives you
incredible flexibility to address different situations by picking and
choosing the features you want to use.

Sometimes nil as default is best, sometimes 0, sometimes [], sometimes
the value should be stored when a key is referenced and sometimes not.

There is need to have a single Hash recipe that is the ‘best’.

Gary W.