Forum: Ruby why Hash corrupts 'key' object ?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Dmitry P. (Guest)
on 2008-12-11 02:04
Hi, I have next script:
t.rb:
===========================
class TestStr < String
  attr_accessor :dupstr
  def initialize ( str )
    @dupstr = str
    super(str)
  end
end
ts=TestStr.new("aaa")
puts ts.dupstr
h=Hash.new()
h[ts]=true
puts h.keys.first.class
puts h.keys.first
puts h.keys.first.dupstr
===========================

run it:

$ ruby t.rb
aaa
TestStr
aaa
nil
$

The question is - why there is 'nil' in the last line instead of "aaa" ?
Mike G. (Guest)
on 2008-12-11 03:28
Dmitry P. wrote:
> Hi, I have next script:
> t.rb:
> ===========================
> class TestStr < String
>   attr_accessor :dupstr
>   def initialize ( str )
>     @dupstr = str
>     super(str)
>   end
> end
> ts=TestStr.new("aaa")
> puts ts.dupstr
> h=Hash.new()
> h[ts]=true
> puts h.keys.first.class
> puts h.keys.first
> puts h.keys.first.dupstr
> ===========================
>
> run it:
>
> $ ruby t.rb
> aaa
> TestStr
> aaa
> nil
> $
>
> The question is - why there is 'nil' in the last line instead of "aaa" ?

It must be the T_STRING test in hash.c, an optimization, I assume.

VALUE
rb_hash_aset(hash, key, val)
    VALUE hash, key, val;
{
    rb_hash_modify(hash);
    if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
  st_insert(RHASH(hash)->tbl, key, val);
    }
    else {
  st_add_direct(RHASH(hash)->tbl, rb_str_new4(key), val);
    }
    return val;
}

You can still use your code as is, with this adjustment:

  require 'delegate'

  class TestStr < DelegateClass(String)
    # ...
  end

Also consider delegation as a primary strategy, not just as a workaround
for this case.
Dmitry P. (Guest)
on 2008-12-11 09:41
Mike G. wrote:

> It must be the T_STRING test in hash.c, an optimization, I assume.
>
> VALUE
> rb_hash_aset(hash, key, val)
>     VALUE hash, key, val;
> {
>     rb_hash_modify(hash);
>     if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
>   st_insert(RHASH(hash)->tbl, key, val);
>     }
>     else {
>   st_add_direct(RHASH(hash)->tbl, rb_str_new4(key), val);
>     }
>     return val;
> }
>
> You can still use your code as is, with this adjustment:
>
>   require 'delegate'
>
>   class TestStr < DelegateClass(String)
>     # ...
>   end
>
> Also consider delegation as a primary strategy, not just as a workaround
> for this case.

Thanks! It works (although I had to implement hash() and eql?() methods
for TestStr)
BTW, why delegation is more primary than standard inheritance ?
Robert K. (Guest)
on 2008-12-11 10:17
(Received via mailing list)
2008/12/11 Dmitry P. <removed_email_address@domain.invalid>:
> ts=TestStr.new("aaa")
> $ ruby t.rb
> aaa
> TestStr
> aaa
> nil
> $
>
> The question is - why there is 'nil' in the last line instead of "aaa" ?

Because Hash copies #dup an unfrozen String.

http://www.ruby-doc.org/core/classes/Hash.html#M002878

Try this:

class TestStr < String
 attr_accessor :dupstr
 def initialize ( str )
   @dupstr = str
   super(str)
 end
end
ts=TestStr.new("aaa")
puts ts.dupstr, ts.object_id
h=Hash.new()
h[ts]=true
puts h.keys.first.class, h.keys.first, h.keys.first.object_id,
h.keys.first.dupstr
puts "----"
ts.freeze
puts ts.dupstr, ts.object_id
h=Hash.new()
h[ts]=true
puts h.keys.first.class, h.keys.first, h.keys.first.object_id,
h.keys.first.dupstr

It is generally a bad idea to inherit core classes.  Delegation is
much better (see Mike's reply).

Cheers

robert
Brian A. (Guest)
on 2008-12-11 19:51
(Received via mailing list)
Robert K. <removed_email_address@domain.invalid> writes:

>> end
>>
>> $ ruby t.rb
>> aaa
>> TestStr
>> aaa
>> nil
>> $
>>
>> The question is - why there is 'nil' in the last line instead of "aaa" ?
>
> Because Hash copies #dup an unfrozen String.

Why would that cause this effect though? The docs say "a String passed
as a key will be duplicated and frozen", but duplicating and freezing
a TestStr doesn't nullify dupstr:

$ cat -n temp.rb
     1  class TestStr < String
     2   attr_accessor :dupstr
     3   def initialize ( str )
     4     @dupstr = str
     5     super(str)
     6   end
     7  end
     8  ts=TestStr.new("aaa")
     9  puts ts.dupstr, ts.object_id
    10  puts '---'
    11  ts2 = ts.dup
    12  ts2.freeze
    13  puts ts2.dupstr, ts2.object_id
$ ruby temp.rb
aaa
81910
---
aaa
81880
Mike G. (Guest)
on 2008-12-11 21:43
Brian A. wrote:
>
> Why would that cause this effect though? The docs say "a String passed
> as a key will be duplicated and frozen", but duplicating and freezing
> a TestStr doesn't nullify dupstr:

The answer lies in the rb_hash_aset() function I quoted.  rb_str_new4()
begins like this:

  VALUE
  rb_str_new4(orig)
      VALUE orig;
  {
      VALUE klass, str;

      if (OBJ_FROZEN(orig)) return orig;
      /* ... */
  }

Like Robert showed, if it's already frozen then the key is untouched (no
dup+freeze).

The docs could be made clearer: An _unfrozen_ String passed as a key
will be duplicated and frozen.
Mike G. (Guest)
on 2008-12-11 22:32
Dmitry P. wrote:
> BTW, why delegation is more primary than standard inheritance ?

Delegation carries a greater degree of decoupling and flexibility.

Sometimes it's useful to replace your car engine while you're cruising
down the highway.  That's delegation.

With inheritance, the engine is welded onto the car body.  You've
already tossed aside the possibility of hot-swapping.  Perhaps that was
an arbitrary decision.

Delegation makes something new out of existing parts.  Inheritance
grafts stuff onto existing parts.
Brian A. (Guest)
on 2008-12-12 20:20
(Received via mailing list)
Mike G. <removed_email_address@domain.invalid> writes:

>   rb_str_new4(orig)
>
> The docs could be made clearer: An _unfrozen_ String passed as a key
> will be duplicated and frozen.

That doesn't answer my question. I showed that dup+freeze did not
adversely affect TestStr (i.e. dupstr was still accessible), so
something else is going on.
Mike G. (Guest)
on 2008-12-12 21:18
Brian A. wrote:
> Mike G. <removed_email_address@domain.invalid> writes:
>
>>   rb_str_new4(orig)
>>
>> The docs could be made clearer: An _unfrozen_ String passed as a key
>> will be duplicated and frozen.
>
> That doesn't answer my question. I showed that dup+freeze did not
> adversely affect TestStr (i.e. dupstr was still accessible), so
> something else is going on.

You passed in a frozen String, so Hash#[]= did not touch it.  So it was
not adversely affected.

Could you write an assertion which you think should succeed but doesn't?
Brian A. (Guest)
on 2008-12-13 20:40
(Received via mailing list)
Mike G. <removed_email_address@domain.invalid> writes:

>> something else is going on.
>
> You passed in a frozen String, so Hash#[]= did not touch it.  So it was
> not adversely affected.

No I didn't. In fact, I didn't even use Hash in the example. Maybe
you're referring to a different post. I'll repost below:

$ cat -n temp.rb
     1  class TestStr < String
     2   attr_accessor :dupstr
     3   def initialize ( str )
     4     @dupstr = str
     5     super(str)
     6   end
     7  end
     8  ts=TestStr.new("aaa")
     9  puts ts.dupstr, ts.object_id
    10  puts '---'
    11  ts2 = ts.dup
    12  ts2.freeze
    13  puts ts2.dupstr, ts2.object_id
$ ruby temp.rb
aaa
81910
---
aaa
81880

What the above demonstrates is that dup'ing and freezing a TestStr
does not nilify TestStr#dupstr, so the C code for Hash seems to be
doing more than just dup'ing and freezing. Does that make sense?
Mike G. (Guest)
on 2008-12-14 00:59
Brian A. wrote:
>
> What the above demonstrates is that dup'ing and freezing a TestStr
> does not nilify TestStr#dupstr, so the C code for Hash seems to be
> doing more than just dup'ing and freezing. Does that make sense?
>

Yes, sorry, I misunderstood.  I did re-read your post several times but
was still a little puzzled.

You are right that something appears to be amiss.  rb_str_new4()
duplicates the String portion of the TestStr object, but does not copy
over the TestStr-specific data.  A subclassed String is not duplicated;
the documentation is wrong.

I see the reason for it: the raw string data is being shared between the
object and its "dup".  I guess sharing should only be done if it's a
real String and not a subclass thereof.  This fixes the problem reported
in this thread.

--- a/ruby-1.8.7-p72/hash.c  2008-06-08 14:25:01.000000000 -0400
+++ b/ruby-1.8.7-p72/hash.c  2008-12-13 13:47:33.000000000 -0500
@@ -986,7 +986,8 @@
     VALUE hash, key, val;
 {
     rb_hash_modify(hash);
-    if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
+    if (rb_obj_class(key) != rb_cString ||
+        st_lookup(RHASH(hash)->tbl, key, 0)) {
   st_insert(RHASH(hash)->tbl, key, val);
     }
     else {
Brian A. (Guest)
on 2008-12-17 22:15
(Received via mailing list)
Mike G. <removed_email_address@domain.invalid> writes:

> You are right that something appears to be amiss.  rb_str_new4()
> +++ b/ruby-1.8.7-p72/hash.c  2008-12-13 13:47:33.000000000 -0500
> @@ -986,7 +986,8 @@
>      VALUE hash, key, val;
>  {
>      rb_hash_modify(hash);
> -    if (TYPE(key) != T_STRING || st_lookup(RHASH(hash)->tbl, key, 0)) {
> +    if (rb_obj_class(key) != rb_cString ||
> +        st_lookup(RHASH(hash)->tbl, key, 0)) {
>    st_insert(RHASH(hash)->tbl, key, val);
>      }
>      else {

Wow, complete with a patch! Have you passed this on to the appropriate
channel for possible inclusion ?
Yukihiro M. (Guest)
on 2008-12-18 17:39
(Received via mailing list)
まつもと ゆきひろです

In message "Re: why Hash corrupts 'key' object ?"
    on Thu, 18 Dec 2008 05:07:16 +0900, Brian A.
<removed_email_address@domain.invalid> writes:

|Wow, complete with a patch! Have you passed this on to the appropriate
|channel for possible inclusion ?

I committed the patch to the trunk.  So it will be available somewhere
in 1.9 (1.9.1 or 1.9.2).  I am not sure yet it goes into 1.8, since
it has small incompatibility.

              matz.
This topic is locked and can not be replied to.