Forum: Ruby Newbie question > Hash workings

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
97723949bfa51df51a8c4087a3b1ccd0?d=identicon&s=25 AD60 (Guest)
on 2009-02-28 22:20
(Received via mailing list)
Hi,

I am puzzled by the workings a the following program. I placed the
output after the corresponding lines. To me it lookes like a sort of
bug in the workings of de p method. Can anybody set me straight?

h = Hash.new([])
h[0] = [1,2,3]
p h                      #> {0=>[1, 2, 3]}
h[0] += [4]
p h                      #> {0=>[1, 2, 3, 4]}
h[1] += [10,11]
p h                      #> {0=>[1, 2, 3, 4], 1=>[10, 11]}
h[1] << 12
p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}
h[2] << 20
p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12]                          ?
p h[2]                 #>
[20]
???
h[2] << 21
p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11,
12]}                          ?
h[2] += [22]
p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
21, 22]}  ????????


Grtz. Andr
40613e55d7082e5f08429dfb50d0680e?d=identicon&s=25 Stefan Lang (Guest)
on 2009-02-28 23:30
(Received via mailing list)
2009/2/28 AD60 <andredeen@gmail.com>:
> Hi,
>
> I am puzzled by the workings a the following program. I placed the
> output after the corresponding lines. To me it lookes like a sort of
> bug in the workings of de p method. Can anybody set me straight?

There's no bug involved.

> h = Hash.new([])

Here you create Hash with an empty array as
default value. For every lookup of a key that is
not in the hash, this single array instance is
returned, but not automatically stored in the hash.
Again: It's a single object, not a new one on every
lookup.

> h[0] = [1,2,3]

Here you associate 0 with an array. The hash default
value is not involved.

> p h                      #> {0=>[1, 2, 3]}
> h[0] += [4]
> p h                      #> {0=>[1, 2, 3, 4]}

This is a syntactic shortcut for "h[0] = h[0] + [4]".
It looks up the value of 0 in the hash, which is
an array. array + array creates a new array,
which overwrites the previous value of h[0].

> h[1] += [10,11]
> p h                      #> {0=>[1, 2, 3, 4], 1=>[10, 11]}

Again, this is a shortcut for "h[1] = h[1] + [10, 11]".
The hash doesn't have a value for 1 yet, so
the default value ([]) is returned. [] + [10, 11] is
the new array [10, 11] which is then assigned
to h[1].

> h[1] << 12
> p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]}

Look up the existing array stored in h[1] and append
12. No new array created.

> h[2] << 20
> p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]  ?

The hash doesn't have a value for 2, so the default
value, [], is returned. The default value is destructively
modified by appending 20. _No new hash entry is
created (there is no assignment!)._

Future lookups of non-existent keys will return
the modified default value!

> p h[2]                 #> [20]  ???

The hash still doesn't have an entry for 2, so
the default value is returned, which by now is
our modified array.

> h[2] << 21
> p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12]} ?

Still no entry for 2, default value is returned and modified
by appending 21.

> h[2] += [22]
> p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20, 21, 22]}  ????????

Again, this is equivalent to "h[2] = h[2] + [22]".
Still no entry for h[2], so the default value is returned,
which by now is [20, 21]. Then the two arrays
[20, 21] and [22] are concatenated, returning the
new array [20, 21, 22] which is then finally associated
with 2 in the hash (see: here is an assignment!).

What you probably want to achieve can be
done using Hash's block constructor:

  h = Hash.new { |hash, key| hash[key] = [] }

This automatically stores a new empty array
on every lookup of a non-existent key.

HTH,
  Stefan
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-02-28 23:35
(Received via mailing list)
On 28.02.2009 22:15, AD60 wrote:
> I am puzzled by the workings a the following program. I placed the
> output after the corresponding lines. To me it lookes like a sort of
> bug in the workings of de p method. Can anybody set me straight?

p works correctly but...

> p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11,
> 12]                          ?
> p h[2]                 #>
> [20]                                                                   ???
> h[2] << 21
> p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11,
> 12]}                          ?
> h[2] += [22]
> p h                     #> {0=>[1, 2, 3, 4], 1=>[10, 11, 12], 2=>[20,
> 21, 22]}  ????????

... you are modifying the default value of the Hash in those cases where
you place the question marks.  The default value is the one returned in
case a key does not exist.  But there is no automatic insertion, that's
why you do not see key 2 after doing h[2] << 20.

You probably rather want this famous idiom:

h = Hash.new {|ha,ke| ha[ke] = []}

Now run your tests again. :-)

Kind regards

  robert
97723949bfa51df51a8c4087a3b1ccd0?d=identicon&s=25 AD60 (Guest)
on 2009-03-01 01:42
(Received via mailing list)
Thank you both for seeing the light at the end of the tunnel. I fully
understand what I have to do to get it working but just to clarify;

By doing "h[2] << 20" I modified the original empty array I set as a
default value and inserted no hash key because it is not an assignment
to h?

Replacing the "h = Hash.new([])" with "h = Hash.new { |h,k| h[k] =
[]}" and just doing "p h[2]" before any assigment to h[2] has the
surprising/logical side effect of creating a new key "2", thank god
with value "[]" so no problems there, but still... I am now thinking
that stating "h = Hash.new([])" is problematic idiom under all
circumstances. Am I correct?

Kind regards,
Andr
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-03-01 11:48
(Received via mailing list)
2009/3/1 AD60 <andredeen@gmail.com>:
> Thank you both for seeing the light at the end of the tunnel. I fully
> understand what I have to do to get it working but just to clarify;

Good!

> By doing "h[2] << 20" I modified the original empty array I set as a
> default value and inserted no hash key because it is not an assignment
> to h?

Yes.  Although "assignment to h" is a bit ambiguous.  I guess you
meant the correct thing but keep in mind that normally "assignment to
h" means "h = something" and not "h[key] = something".  The latter is
actually a method defined in class Hash.

> Replacing the "h = Hash.new([])" with "h = Hash.new { |h,k| h[k] =
> []}" and just doing "p h[2]" before any assigment to h[2] has the
> surprising/logical side effect of creating a new key "2", thank god
> with value "[]" so no problems there, but still... I am now thinking
> that stating "h = Hash.new([])" is problematic idiom under all
> circumstances. Am I correct?

No, because it works well in other cases - the most popular probably
being

counters = Hash.new 0
...
counters[foo] += 1

Kind regards

robert
97723949bfa51df51a8c4087a3b1ccd0?d=identicon&s=25 AD60 (Guest)
on 2009-03-01 12:41
(Received via mailing list)
I see the use with an integer default, I specifically meant the
Hash.new([]) syntax as problematic, but it probably all depends on
knowing what your doing.

Just to detail the difference:


hav = Hash.new { |h,k| h[k] = []}
hav[0] += [1]
p hav               # {0=>[1]}
p hav[1]            # []
p hav               # {0=>[1], 1=>[]}

hiv = Hash.new 0
hiv[0] += 1
p hiv               # {0=>1}
p hiv[1]            # 0
p hiv               # {0=>1}

it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
had me wondering. Anyway thanks again, I understand how to use it and
that's more than enough for now.

Kind regards,
Andr
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-03-01 17:00
(Received via mailing list)
On 01.03.2009 12:35, AD60 wrote:
> I see the use with an integer default, I specifically meant the
> Hash.new([]) syntax as problematic, but it probably all depends on
> knowing what your doing.

Ah.  Since you talked about "idiom" I thought you were referring to the
general pattern.

> Just to detail the difference:
>
>
> hav = Hash.new { |h,k| h[k] = []}
> hav[0] += [1]

Actually this is very inefficient.  With this idiom you should rather do

hav[0] << [1]

because otherwise you'll end up creating and deleting Array instances
all the time without any benefit.

> it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
> had me wondering. Anyway thanks again, I understand how to use it and
> that's more than enough for now.

Good.

Kind regards

  robert
Eacf5ed623e72db0b87707baabd6f21a?d=identicon&s=25 Adam Gardner (adamgardner)
on 2009-03-02 19:19
Robert Klemme wrote:
> On 01.03.2009 12:35, AD60 wrote:
>> I see the use with an integer default, I specifically meant the
>> Hash.new([]) syntax as problematic, but it probably all depends on
>> knowing what your doing.
>
> Ah.  Since you talked about "idiom" I thought you were referring to the
> general pattern.
>
>> Just to detail the difference:
>>
>>
>> hav = Hash.new { |h,k| h[k] = []}
>> hav[0] += [1]
>
> Actually this is very inefficient.  With this idiom you should rather do
>
> hav[0] << [1]
>
> because otherwise you'll end up creating and deleting Array instances
> all the time without any benefit.
>
>> it's the difference between "{0=>[1], 1=>[]}" and "{0=>1}" result that
>> had me wondering. Anyway thanks again, I understand how to use it and
>> that's more than enough for now.
>
> Good.
>
> Kind regards
>
>   robert

The method I would use is this:

hav = Hash.new {[]}

This gives you the benefit of creating a new array when necessary
(rather than a single default array which can be destructively modified
in troubling ways), but means that you only get permanent effects if you
explicitly use assignment, so you don't get this

p hav               # {0=>[1]}
p hav[1]            # []
p hav               # {0=>[1], 1=>[]}

Of course, it means you can't use the << operator to stick values into
non-present keys, but I rarely use the << operator anyway.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-03-02 21:56
(Received via mailing list)
On 02.03.2009 19:18, Adam Gardner wrote:
>>> hav = Hash.new { |h,k| h[k] = []}
>>> that's more than enough for now.
>> Good.

> p hav[1]            # []
> p hav               # {0=>[1], 1=>[]}
>
> Of course, it means you can't use the << operator to stick values into
> non-present keys, but I rarely use the << operator anyway.

You probably should get rid of that habit.  And here's why: your
approach has no means to efficiently work with Arrays as Hash values.
Either, you have to use += all the time which is inefficient because it
will create a lot of garbage - or you have to explicitly test for
presence of the key, which is less efficient than letting Hash's C
implementation do it.  It is much more elegant to just always do
hash[key] << value.

If you just want to make sure that you always get an Array back, you
could as well use Hash.new([].freeze) to avoid accidentally appending to
the single default all the time.

But it's a free country so if you like to stick with your habits...

Cheers

  robert
1d53b088a989e069b94597c282eebbbc?d=identicon&s=25 Simon Krahnke (Guest)
on 2009-03-05 15:07
(Received via mailing list)
* Robert Klemme <shortcutter@googlemail.com> (2009-03-02) schrieb:

> You probably should get rid of that habit.  And here's why: your
> approach has no means to efficiently work with Arrays as Hash values.
> Either, you have to use += all the time which is inefficient because it
> will create a lot of garbage

The problem is creating a lot of useless objects and copying over data,
not leaving garbage behind.

mfg,                          simon .... l
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-03-05 18:01
(Received via mailing list)
On 05.03.2009 14:47, Simon Krahnke wrote:
> * Robert Klemme <shortcutter@googlemail.com> (2009-03-02) schrieb:
>
>> You probably should get rid of that habit.  And here's why: your
>> approach has no means to efficiently work with Arrays as Hash values.
>> Either, you have to use += all the time which is inefficient because it
>> will create a lot of garbage
>
> The problem is creating a lot of useless objects and copying over data,
> not leaving garbage behind.

Well, both (creation of objects as well as garbage collection left
overs) contribute to wasted CPU and thus slower execution.

  robert
This topic is locked and can not be replied to.