Newbie question > Hash workings


#1

Hi,

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

h = Hash.new([])
h[0] = [1,2,3]
p h #> {0=>[1, 2, 3]}
h[0] += [4]
p h #> {0=>[1, 2, 3, 4]}
h[1] += [10,11]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}
h[1] << 12
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}
h[2] << 20
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12] ?
p h[2] #>
[20]
???
h[2] << 21
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12]} ?
h[2] += [22]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
21, 22]} ???

Grtz. Andr


#2

2009/2/28 AD60 removed_email_address@domain.invalid:

Hi,

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

There’s no bug involved.

h = Hash.new([])

Here you create Hash with an empty array as
default value. For every lookup of a key that is
not in the hash, this single array instance is
returned, but not automatically stored in the hash.
Again: It’s a single object, not a new one on every
lookup.

h[0] = [1,2,3]

Here you associate 0 with an array. The hash default
value is not involved.

p h #> {0=>[1, 2, 3]}
h[0] += [4]
p h #> {0=>[1, 2, 3, 4]}

This is a syntactic shortcut for “h[0] = h[0] + [4]”.
It looks up the value of 0 in the hash, which is
an array. array + array creates a new array,
which overwrites the previous value of h[0].

h[1] += [10,11]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11]}

Again, this is a shortcut for “h[1] = h[1] + [10, 11]”.
The hash doesn’t have a value for 1 yet, so
the default value ([]) is returned. [] + [10, 11] is
the new array [10, 11] which is then assigned
to h[1].

h[1] << 12
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}

Look up the existing array stored in h[1] and append
12. No new array created.

h[2] << 20
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12] ?

The hash doesn’t have a value for 2, so the default
value, [], is returned. The default value is destructively
modified by appending 20. No new hash entry is
created (there is no assignment!).

Future lookups of non-existent keys will return
the modified default value!

p h[2] #> [20] ???

The hash still doesn’t have an entry for 2, so
the default value is returned, which by now is
our modified array.

h[2] << 21
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]} ?

Still no entry for 2, default value is returned and modified
by appending 21.

h[2] += [22]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20, 21, 22]} ???

Again, this is equivalent to “h[2] = h[2] + [22]”.
Still no entry for h[2], so the default value is returned,
which by now is [20, 21]. Then the two arrays
[20, 21] and [22] are concatenated, returning the
new array [20, 21, 22] which is then finally associated
with 2 in the hash (see: here is an assignment!).

What you probably want to achieve can be
done using Hash’s block constructor:

h = Hash.new { |hash, key| hash[key] = [] }

This automatically stores a new empty array
on every lookup of a non-existent key.

HTH,
Stefan


#3

On 28.02.2009 22:15, AD60 wrote:

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

p works correctly but…

p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12] ?
p h[2] #>
[20] ???
h[2] << 21
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12]} ?
h[2] += [22]
p h #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
21, 22]} ???

… you are modifying the default value of the Hash in those cases where
you place the question marks. The default value is the one returned in
case a key does not exist. But there is no automatic insertion, that’s
why you do not see key 2 after doing h[2] << 20.

You probably rather want this famous idiom:

h = Hash.new {|ha,ke| ha[ke] = []}

Now run your tests again. :slight_smile:

Kind regards

robert


#4

Thank you both for seeing the light at the end of the tunnel. I fully
understand what I have to do to get it working but just to clarify;

By doing “h[2] << 20” I modified the original empty array I set as a
default value and inserted no hash key because it is not an assignment
to h?

Replacing the “h = Hash.new([])” with “h = Hash.new { |h,k| h[k] =
[]}” and just doing “p h[2]” before any assigment to h[2] has the
surprising/logical side effect of creating a new key “2”, thank god
with value “[]” so no problems there, but still… I am now thinking
that stating “h = Hash.new([])” is problematic idiom under all
circumstances. Am I correct?

Kind regards,
Andr


#5

2009/3/1 AD60 removed_email_address@domain.invalid:

Thank you both for seeing the light at the end of the tunnel. I fully
understand what I have to do to get it working but just to clarify;

Good!

By doing “h[2] << 20” I modified the original empty array I set as a
default value and inserted no hash key because it is not an assignment
to h?

Yes. Although “assignment to h” is a bit ambiguous. I guess you
meant the correct thing but keep in mind that normally “assignment to
h” means “h = something” and not “h[key] = something”. The latter is
actually a method defined in class Hash.

Replacing the “h = Hash.new([])” with “h = Hash.new { |h,k| h[k] =
[]}” and just doing “p h[2]” before any assigment to h[2] has the
surprising/logical side effect of creating a new key “2”, thank god
with value “[]” so no problems there, but still… I am now thinking
that stating “h = Hash.new([])” is problematic idiom under all
circumstances. Am I correct?

No, because it works well in other cases - the most popular probably
being

counters = Hash.new 0

counters[foo] += 1

Kind regards

robert


#6

On 01.03.2009 12:35, AD60 wrote:

I see the use with an integer default, I specifically meant the
Hash.new([]) syntax as problematic, but it probably all depends on
knowing what your doing.

Ah. Since you talked about “idiom” I thought you were referring to the
general pattern.

Just to detail the difference:

hav = Hash.new { |h,k| h[k] = []}
hav[0] += [1]

Actually this is very inefficient. With this idiom you should rather do

hav[0] << [1]

because otherwise you’ll end up creating and deleting Array instances
all the time without any benefit.

it’s the difference between “{0=>[1], 1=>[]}” and “{0=>1}” result that
had me wondering. Anyway thanks again, I understand how to use it and
that’s more than enough for now.

Good.

Kind regards

robert


#7

I see the use with an integer default, I specifically meant the
Hash.new([]) syntax as problematic, but it probably all depends on
knowing what your doing.

Just to detail the difference:

hav = Hash.new { |h,k| h[k] = []}
hav[0] += [1]
p hav # {0=>[1]}
p hav[1] # []
p hav # {0=>[1], 1=>[]}

hiv = Hash.new 0
hiv[0] += 1
p hiv # {0=>1}
p hiv[1] # 0
p hiv # {0=>1}

it’s the difference between “{0=>[1], 1=>[]}” and “{0=>1}” result that
had me wondering. Anyway thanks again, I understand how to use it and
that’s more than enough for now.

Kind regards,
Andr


#8

Robert K. wrote:

On 01.03.2009 12:35, AD60 wrote:

I see the use with an integer default, I specifically meant the
Hash.new([]) syntax as problematic, but it probably all depends on
knowing what your doing.

Ah. Since you talked about “idiom” I thought you were referring to the
general pattern.

Just to detail the difference:

hav = Hash.new { |h,k| h[k] = []}
hav[0] += [1]

Actually this is very inefficient. With this idiom you should rather do

hav[0] << [1]

because otherwise you’ll end up creating and deleting Array instances
all the time without any benefit.

it’s the difference between “{0=>[1], 1=>[]}” and “{0=>1}” result that
had me wondering. Anyway thanks again, I understand how to use it and
that’s more than enough for now.

Good.

Kind regards

robert

The method I would use is this:

hav = Hash.new {[]}

This gives you the benefit of creating a new array when necessary
(rather than a single default array which can be destructively modified
in troubling ways), but means that you only get permanent effects if you
explicitly use assignment, so you don’t get this

p hav # {0=>[1]}
p hav[1] # []
p hav # {0=>[1], 1=>[]}

Of course, it means you can’t use the << operator to stick values into
non-present keys, but I rarely use the << operator anyway.


#9

You probably should get rid of that habit. And here’s why: your
approach has no means to efficiently work with Arrays as Hash values.
Either, you have to use += all the time which is inefficient because it
will create a lot of garbage

The problem is creating a lot of useless objects and copying over data,
not leaving garbage behind.

mfg, simon … l


#10

On 02.03.2009 19:18, Adam G. wrote:

hav = Hash.new { |h,k| h[k] = []}
that’s more than enough for now.
Good.

p hav[1] # []
p hav # {0=>[1], 1=>[]}

Of course, it means you can’t use the << operator to stick values into
non-present keys, but I rarely use the << operator anyway.

You probably should get rid of that habit. And here’s why: your
approach has no means to efficiently work with Arrays as Hash values.
Either, you have to use += all the time which is inefficient because it
will create a lot of garbage - or you have to explicitly test for
presence of the key, which is less efficient than letting Hash’s C
implementation do it. It is much more elegant to just always do
hash[key] << value.

If you just want to make sure that you always get an Array back, you
could as well use Hash.new([].freeze) to avoid accidentally appending to
the single default all the time.

But it’s a free country so if you like to stick with your habits…

Cheers

robert


#11

On 05.03.2009 14:47, Simon K. wrote:

You probably should get rid of that habit. And here’s why: your
approach has no means to efficiently work with Arrays as Hash values.
Either, you have to use += all the time which is inefficient because it
will create a lot of garbage

The problem is creating a lot of useless objects and copying over data,
not leaving garbage behind.

Well, both (creation of objects as well as garbage collection left
overs) contribute to wasted CPU and thus slower execution.

robert