Why Hash corrupts 'key' object?


#1

Hi, I have next script:
t.rb:

class TestStr < String
attr_accessor :dupstr
def initialize ( str )
@dupstr = str
super(str)
end
end
ts=TestStr.new(“aaa”)
puts ts.dupstr
h=Hash.new()
h[ts]=true
puts h.keys.first.class
puts h.keys.first
puts h.keys.first.dupstr

run it:

$ ruby t.rb
aaa
TestStr
aaa
nil
$

The question is - why there is ‘nil’ in the last line instead of “aaa” ?


#2

Mike G. wrote:

It must be the T_STRING test in hash.c, an optimization, I assume.

VALUE
rb_hash_aset(hash, key, val)
VALUE hash, key, val;
{
rb_hash_modify(hash);
if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
st_insert(RHASH(hash)->tbl, key, val);
}
else {
st_add_direct(RHASH(hash)->tbl, rb_str_new4(key), val);
}
return val;
}

You can still use your code as is, with this adjustment:

require ‘delegate’

class TestStr < DelegateClass(String)
# …
end

Also consider delegation as a primary strategy, not just as a workaround
for this case.

Thanks! It works (although I had to implement hash() and eql?() methods
for TestStr)
BTW, why delegation is more primary than standard inheritance ?


#3

Dmitry P. wrote:

Hi, I have next script:
t.rb:

class TestStr < String
attr_accessor :dupstr
def initialize ( str )
@dupstr = str
super(str)
end
end
ts=TestStr.new(“aaa”)
puts ts.dupstr
h=Hash.new()
h[ts]=true
puts h.keys.first.class
puts h.keys.first
puts h.keys.first.dupstr

run it:

$ ruby t.rb
aaa
TestStr
aaa
nil
$

The question is - why there is ‘nil’ in the last line instead of “aaa” ?

It must be the T_STRING test in hash.c, an optimization, I assume.

VALUE
rb_hash_aset(hash, key, val)
VALUE hash, key, val;
{
rb_hash_modify(hash);
if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
st_insert(RHASH(hash)->tbl, key, val);
}
else {
st_add_direct(RHASH(hash)->tbl, rb_str_new4(key), val);
}
return val;
}

You can still use your code as is, with this adjustment:

require ‘delegate’

class TestStr < DelegateClass(String)
# …
end

Also consider delegation as a primary strategy, not just as a workaround
for this case.


#4

2008/12/11 Dmitry P. removed_email_address@domain.invalid:

ts=TestStr.new(“aaa”)
$ ruby t.rb
aaa
TestStr
aaa
nil
$

The question is - why there is ‘nil’ in the last line instead of “aaa” ?

Because Hash copies #dup an unfrozen String.

http://www.ruby-doc.org/core/classes/Hash.html#M002878

Try this:

class TestStr < String
attr_accessor :dupstr
def initialize ( str )
@dupstr = str
super(str)
end
end
ts=TestStr.new(“aaa”)
puts ts.dupstr, ts.object_id
h=Hash.new()
h[ts]=true
puts h.keys.first.class, h.keys.first, h.keys.first.object_id,
h.keys.first.dupstr
puts “----”
ts.freeze
puts ts.dupstr, ts.object_id
h=Hash.new()
h[ts]=true
puts h.keys.first.class, h.keys.first, h.keys.first.object_id,
h.keys.first.dupstr

It is generally a bad idea to inherit core classes. Delegation is
much better (see Mike’s reply).

Cheers

robert


#5

Robert K. removed_email_address@domain.invalid writes:

end

$ ruby t.rb
aaa
TestStr
aaa
nil
$

The question is - why there is ‘nil’ in the last line instead of “aaa” ?

Because Hash copies #dup an unfrozen String.

Why would that cause this effect though? The docs say “a String passed
as a key will be duplicated and frozen”, but duplicating and freezing
a TestStr doesn’t nullify dupstr:

$ cat -n temp.rb
1 class TestStr < String
2 attr_accessor :dupstr
3 def initialize ( str )
4 @dupstr = str
5 super(str)
6 end
7 end
8 ts=TestStr.new(“aaa”)
9 puts ts.dupstr, ts.object_id
10 puts ‘—’
11 ts2 = ts.dup
12 ts2.freeze
13 puts ts2.dupstr, ts2.object_id
$ ruby temp.rb
aaa
81910

aaa
81880


#6

Dmitry P. wrote:

BTW, why delegation is more primary than standard inheritance ?

Delegation carries a greater degree of decoupling and flexibility.

Sometimes it’s useful to replace your car engine while you’re cruising
down the highway. That’s delegation.

With inheritance, the engine is welded onto the car body. You’ve
already tossed aside the possibility of hot-swapping. Perhaps that was
an arbitrary decision.

Delegation makes something new out of existing parts. Inheritance
grafts stuff onto existing parts.


#7

Mike G. removed_email_address@domain.invalid writes:

rb_str_new4(orig)

The docs could be made clearer: An unfrozen String passed as a key
will be duplicated and frozen.

That doesn’t answer my question. I showed that dup+freeze did not
adversely affect TestStr (i.e. dupstr was still accessible), so
something else is going on.


#8

Brian A. wrote:

Why would that cause this effect though? The docs say “a String passed
as a key will be duplicated and frozen”, but duplicating and freezing
a TestStr doesn’t nullify dupstr:

The answer lies in the rb_hash_aset() function I quoted. rb_str_new4()
begins like this:

VALUE
rb_str_new4(orig)
VALUE orig;
{
VALUE klass, str;

  if (OBJ_FROZEN(orig)) return orig;
  /* ... */

}

Like Robert showed, if it’s already frozen then the key is untouched (no
dup+freeze).

The docs could be made clearer: An unfrozen String passed as a key
will be duplicated and frozen.


#9

Brian A. wrote:

Mike G. removed_email_address@domain.invalid writes:

rb_str_new4(orig)

The docs could be made clearer: An unfrozen String passed as a key
will be duplicated and frozen.

That doesn’t answer my question. I showed that dup+freeze did not
adversely affect TestStr (i.e. dupstr was still accessible), so
something else is going on.

You passed in a frozen String, so Hash#[]= did not touch it. So it was
not adversely affected.

Could you write an assertion which you think should succeed but doesn’t?


#10

Mike G. removed_email_address@domain.invalid writes:

something else is going on.

You passed in a frozen String, so Hash#[]= did not touch it. So it was
not adversely affected.

No I didn’t. In fact, I didn’t even use Hash in the example. Maybe
you’re referring to a different post. I’ll repost below:

$ cat -n temp.rb
1 class TestStr < String
2 attr_accessor :dupstr
3 def initialize ( str )
4 @dupstr = str
5 super(str)
6 end
7 end
8 ts=TestStr.new(“aaa”)
9 puts ts.dupstr, ts.object_id
10 puts ‘—’
11 ts2 = ts.dup
12 ts2.freeze
13 puts ts2.dupstr, ts2.object_id
$ ruby temp.rb
aaa
81910

aaa
81880

What the above demonstrates is that dup’ing and freezing a TestStr
does not nilify TestStr#dupstr, so the C code for Hash seems to be
doing more than just dup’ing and freezing. Does that make sense?


#11

Brian A. wrote:

What the above demonstrates is that dup’ing and freezing a TestStr
does not nilify TestStr#dupstr, so the C code for Hash seems to be
doing more than just dup’ing and freezing. Does that make sense?

Yes, sorry, I misunderstood. I did re-read your post several times but
was still a little puzzled.

You are right that something appears to be amiss. rb_str_new4()
duplicates the String portion of the TestStr object, but does not copy
over the TestStr-specific data. A subclassed String is not duplicated;
the documentation is wrong.

I see the reason for it: the raw string data is being shared between the
object and its “dup”. I guess sharing should only be done if it’s a
real String and not a subclass thereof. This fixes the problem reported
in this thread.

— a/ruby-1.8.7-p72/hash.c 2008-06-08 14:25:01.000000000 -0400
+++ b/ruby-1.8.7-p72/hash.c 2008-12-13 13:47:33.000000000 -0500
@@ -986,7 +986,8 @@
VALUE hash, key, val;
{
rb_hash_modify(hash);

  • if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
  • if (rb_obj_class(key) != rb_cString ||
  •    st_lookup(RHASH(hash)->tbl, key, 0)) {
    
    st_insert(RHASH(hash)->tbl, key, val);
    }
    else {

#12

Mike G. removed_email_address@domain.invalid writes:

You are right that something appears to be amiss. rb_str_new4()
+++ b/ruby-1.8.7-p72/hash.c 2008-12-13 13:47:33.000000000 -0500
@@ -986,7 +986,8 @@
VALUE hash, key, val;
{
rb_hash_modify(hash);

  • if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
  • if (rb_obj_class(key) != rb_cString ||
  •    st_lookup(RHASH(hash)->tbl, key, 0)) {
    
    st_insert(RHASH(hash)->tbl, key, val);
    }
    else {

Wow, complete with a patch! Have you passed this on to the appropriate
channel for possible inclusion ?


#13

e$B$^$D$b$He(B e$B$f$-$R$m$G$9e(B

In message “Re: why Hash corrupts ‘key’ object ?”
on Thu, 18 Dec 2008 05:07:16 +0900, Brian A.
removed_email_address@domain.invalid writes:

|Wow, complete with a patch! Have you passed this on to the appropriate
|channel for possible inclusion ?

I committed the patch to the trunk. So it will be available somewhere
in 1.9 (1.9.1 or 1.9.2). I am not sure yet it goes into 1.8, since
it has small incompatibility.

          matz.