Forum: Ruby WeakRef Hash

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-23 17:19
(Received via mailing list)
I need a Hash-like structure, using WeakRef, so that the key value
pairs can be garbage collected as needed.  I only need to support []
() and []=(), not the whole range of Hash functions.  If the pair has
been collected, I just need a simple nil returned.

I've tried implementing this a couple of different ways, but keep
running into trouble.  Does anyone have any tips?

James Edward Gray II
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-23 19:03
(Received via mailing list)
On Jan 23, 2006, at 10:17 AM, James Edward Gray II wrote:

> I need a Hash-like structure, using WeakRef, so that the key value
> pairs can be garbage collected as needed.  I only need to support []
> () and []=(), not the whole range of Hash functions.  If the pair
> has been collected, I just need a simple nil returned.
>
> I've tried implementing this a couple of different ways, but keep
> running into trouble.  Does anyone have any tips?

The following code, adapted from an old post by Guy Decoux seems to
do the trick:

class WeakCache
    def initialize( cache = Hash.new )
       @cache = cache
    end

    def []( key )
       value_id = @cache[key]
       return ObjectSpace._id2ref(value_id) unless value_id.nil?
       nil
    end

    def []=( key, value )
       ObjectSpace.define_finalizer(value, lambda { @cache.delete
(key) })
       @cache[key] = value.object_id
    end
end

__END__

I'm still interested in seeing a WeakRefHash though, if anyone has
rolled something similar in the past...

James Edward Gray II
Dd76a12d66f843de5c5f8782668e7127?d=identicon&s=25 Mauricio Fernandez (Guest)
on 2006-01-24 01:03
(Received via mailing list)
On Tue, Jan 24, 2006 at 03:02:04AM +0900, James Edward Gray II wrote:
> The following code, adapted from an old post by Guy Decoux seems to
> do the trick:
>
> class WeakCache
>    def initialize( cache = Hash.new )
>       @cache = cache
>    end
[...]
>    def []=( key, value )
>       ObjectSpace.define_finalizer(value, lambda { @cache.delete(key) })
                                            ==============================
This keeps a reference to value so it will never be reclaimed:

class WeakCache
   attr_reader :cache
   def initialize( cache = Hash.new )
      @cache = cache
   end

   def []( key )
      value_id = @cache[key]
      return ObjectSpace._id2ref(value_id) unless value_id.nil?
      nil
   end

   def []=( key, value )
      ObjectSpace.define_finalizer(value, lambda { @cache.delete(key) })
      @cache[key] = value.object_id
   end
end

RUBY_VERSION                                       # => "1.8.4"
RUBY_RELEASE_DATE                                  # => "2005-12-24"
c = WeakCache.new
puts c.cache.size
100000.times{|i| c[i] = i.to_s}
GC.start
puts c.cache.size
__END__
# >> 0
# >> 100000


Compare to this:


class WeakCache
   attr_reader :cache
   def initialize( cache = Hash.new )
      @cache = cache
   end

   def []( key )
      value_id = @cache[key]
      return ObjectSpace._id2ref(value_id) unless value_id.nil?
      nil
   end

   def make_lambda(key)
     lambda{|value| @cache.delete(key) }
   end

   def []=( key, value )
      ObjectSpace.define_finalizer(value, make_lambda(key))
      @cache[key] = value.object_id
   end
end

RUBY_VERSION                                       # => "1.8.4"
RUBY_RELEASE_DATE                                  # => "2005-12-24"
c = WeakCache.new
puts c.cache.size
100000.times{|i| c[i] = i.to_s}
GC.start
puts c.cache.size
__END__
# >> 0
# >> 3



That is very inefficient though, you want something more like


class WeakCache
   attr_reader :cache
   def initialize( cache = Hash.new )
      @cache = cache
      @rev_cache = {}
      @reclaim_method = method(:reclaim_value)
   end

   def []( key )
      value_id = @cache[key]
      return ObjectSpace._id2ref(value_id) unless value_id.nil?
      nil
   end

   def reclaim_value(value_id)
     @cache.delete @rev_cache[value_id]
     @rev_cache.delete value_id
   end

   def []=( key, value )
      @rev_cache[value.object_id] = key
      @cache[key] = value.object_id
      ObjectSpace.define_finalizer(value, @reclaim_method)
   end
end

RUBY_VERSION                                       # => "1.8.4"
RUBY_RELEASE_DATE                                  # => "2005-12-24"
c = WeakCache.new
puts c.cache.size
100000.times{|i| c[i] = i.to_s}
GC.start
puts c.cache.size
__END__
# >> 0
# >> 4
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-24 03:09
(Received via mailing list)
On Jan 23, 2006, at 6:01 PM, Mauricio Fernandez wrote:

>>       ObjectSpace.define_finalizer(value, lambda { @cache.delete
>> (key) })
>
> ==============================
> This keeps a reference to value so it will never be reclaimed:

[snip sensational examples]

Thank you for the excellent lesson!

James Edward Gray II
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 Robert Klemme (Guest)
on 2006-01-24 17:55
(Received via mailing list)
James Edward Gray II wrote:
> I need a Hash-like structure, using WeakRef, so that the key value
> pairs can be garbage collected as needed.  I only need to support []
> () and []=(), not the whole range of Hash functions.  If the pair has
> been collected, I just need a simple nil returned.
>
> I've tried implementing this a couple of different ways, but keep
> running into trouble.  Does anyone have any tips?

Can't you just do something like this?

require 'delegate'
require 'weakref'
WeakHash = DelegateClass(Hash)
class WeakHash
  def []=(key,val)
    __getobj__[WeakRef.new(key)] = WeakRef.new(val)
  end

  def [](key)
    __getobj__[WeakRef.new(key)]
  end

  def each(&b)
    __getobj__.each do |k,v|
      b[k.__getobj__, v.__getobj__] unless k.__getobj__.nil?
    end
    self
  end

  def cleanup
    delete_if {|k,v| k.__getobj__.nil?}
  end
end

Kind regards

    robert
149379873fe2cb70e550c6bff8fedd0c?d=identicon&s=25 Jeffrey Schwab (Guest)
on 2006-01-24 18:20
(Received via mailing list)
Mauricio Fernandez wrote:

>    end
>
>
>       @reclaim_method = method(:reclaim_value)
>      @rev_cache.delete value_id
> RUBY_RELEASE_DATE                                  # => "2005-12-24"
> c = WeakCache.new
> puts c.cache.size
> 100000.times{|i| c[i] = i.to_s}
> GC.start
> puts c.cache.size
> __END__
> # >> 0
> # >> 4
>
>

I like your technique, but suppose the same value is stored for more
than one key in a WeakCache?  When the value is finalized, only the last
key will be removed from the cache.  I think it might be better to let
the reverse cache hold more than one key per value.
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-24 18:51
(Received via mailing list)
On Jan 24, 2006, at 9:51 AM, Robert Klemme wrote:

>   def [](key)
>   def cleanup
>     delete_if {|k,v| k.__getobj__.nil?}
>   end
> end

I didn't see this message while the gateway was down.  Sorry about that.

I can't use this code as is.  I assume you didn't set up delegation
quite right:

class WeakHash < DelegateClass(Hash)
   def initialize
     super(Hash.new)
   end

   # ...
end

Some questions the above raises for me:

1.  What will Ruby do if a key is GCed, but not a value?
2.  How does this code deal with a value being GCed when the key
remains?
3.  Don't we need to shut off GC in places, to keep references from
disappearing before we can use them?

Thanks for the help.

James Edward Gray II
Dd76a12d66f843de5c5f8782668e7127?d=identicon&s=25 Mauricio Fernandez (Guest)
on 2006-01-24 20:00
(Received via mailing list)
On Wed, Jan 25, 2006 at 12:58:04AM +0900, Jeffrey Schwab wrote:
> >end
[...]
> I like your technique, but suppose the same value is stored for more
> than one key in a WeakCache?  When the value is finalized, only the last
> key will be removed from the cache.  I think it might be better to let
> the reverse cache hold more than one key per value.

You're very right, I should have written something like

class WeakCache
   attr_reader :cache
   def initialize( cache = Hash.new )
      @cache = cache
      @rev_cache = Hash.new{|h,k| h[k] = {}}
      @reclaim_lambda = lambda do |value_id|
        if @rev_cache.has_key? value_id
          @rev_cache[value_id].each_key{|key| @cache.delete key}
          @rev_cache.delete value_id
        end
      end
   end

   def []( key )
      value_id = @cache[key]
      return ObjectSpace._id2ref(value_id) unless value_id.nil?
      nil
   end

   def []=( key, value )
      @rev_cache[value.object_id][key] = true
      @cache[key] = value.object_id
      ObjectSpace.define_finalizer(value, @reclaim_lambda)
   end
end


puts "=" * 20
c = WeakCache.new
puts c.cache.size
100000.times{|i| c[i] = (i % 1000).to_s}
GC.start
puts c.cache.size
puts "=" * 20
# >> ====================
# >> 0
# >> 3
# >> ====================



(not thread-safe yet though)
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-01-24 20:13
(Received via mailing list)
2006/1/24, James Edward Gray II <james@grayproductions.net>:
> >   end
> >   end
> >
> >   def cleanup
> >     delete_if {|k,v| k.__getobj__.nil?}
> >   end
> > end
>
> I didn't see this message while the gateway was down.  Sorry about that.
>
> I can't use this code as is.  I assume you didn't set up delegation
> quite right:

What problem exactly do you have with it?

> class WeakHash < DelegateClass(Hash)
>    def initialize
>      super(Hash.new)
>    end
>
>    # ...
> end

No need to inherit to fix this. You can simply do

require 'delegate'
Wh=DelegateClass(Hash)
class Wh
  def initialize(h={})
    __setobj__ h
  end
end

> Some questions the above raises for me:
>
> 1.  What will Ruby do if a key is GCed, but not a value?

This was just rough outline code. For a production version I'd
probably change it to consider a pair as gone if either of them is
GC'ed.

> 2.  How does this code deal with a value being GCed when the key
> remains?

It will yield nil - but see 1.

> 3.  Don't we need to shut off GC in places, to keep references from
> disappearing before we can use them?

No.  Because while the yield takes place instances are hard referenced
and cannot be gc'ed.

> Thanks for the help.

Ywc

Kind regards

robert
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-24 20:55
(Received via mailing list)
On Jan 24, 2006, at 1:06 PM, Robert Klemme wrote:

>> I can't use this code as is.  I assume you didn't set up delegation
>> quite right:
>
> What problem exactly do you have with it?

When I placed the code you posted in a file and tried to create
WeahHash, the program crashed.  (Wrong number of arguments to
initialize().)  I assume it would have worked if I passed the Hash
manually though.

> require 'delegate'
> Wh=DelegateClass(Hash)
> class Wh
>   def initialize(h={})
>     __setobj__ h
>   end
> end

Here's another question for you:  What are we gaining by delegating
to an object here, as opposed to subclassing?

>
> It will yield nil - but see 1.
>
>> 3.  Don't we need to shut off GC in places, to keep references from
>> disappearing before we can use them?
>
> No.  Because while the yield takes place instances are hard referenced
> and cannot be gc'ed.

Good answers all around.  Thanks.

James Edward Gray II
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-01-24 21:50
(Received via mailing list)
2006/1/24, James Edward Gray II <james@grayproductions.net>:
> manually though.
Yep.

> > require 'delegate'
> > Wh=DelegateClass(Hash)
> > class Wh
> >   def initialize(h={})
> >     __setobj__ h
> >   end
> > end
>
> Here's another question for you:  What are we gaining by delegating
> to an object here, as opposed to subclassing?

Note, the difference between yours and mine was that you first
delegated and then subclassed the delegating class. I'd say do either
of both in this case but not both.  Generally I'd prefer delegation in
this use case because we really have a case of wrapping and unwrapping
during insertion and retrieval ops. IMHO inheritance is often used in
many cases where it's not appropriate (although in this case you could
argue that a WeakHash is really a Hash - but then again, it doesn't
share some of Hash's basic properties, namely that the contents do not
change suddenly). There's no hard written rule to be found here.

> >
> > It will yield nil - but see 1.
> >
> >> 3.  Don't we need to shut off GC in places, to keep references from
> >> disappearing before we can use them?
> >
> > No.  Because while the yield takes place instances are hard referenced
> > and cannot be gc'ed.
>
> Good answers all around.  Thanks.

In fact to be really sure, one should probably do something like this:

def each
  __getobj__.each do |wk,wv|
   k,v=wk.__getobj__, wv.__getobj__
  yield unless k.nil? || v.nil?
  end
end

Cheers

robert
Dd76a12d66f843de5c5f8782668e7127?d=identicon&s=25 Mauricio Fernandez (Guest)
on 2006-01-24 22:17
(Received via mailing list)
On Tue, Jan 24, 2006 at 01:17:07AM +0900, James Edward Gray II wrote:
> I need a Hash-like structure, using WeakRef, so that the key value
> pairs can be garbage collected as needed.  I only need to support []
> () and []=(), not the whole range of Hash functions.  If the pair has
> been collected, I just need a simple nil returned.
>
> I've tried implementing this a couple of different ways, but keep
> running into trouble.  Does anyone have any tips?

I summarized the solutions seen in this thread so far, and presented a
couple
new ones at
  http://eigenclass.org/hiki.rb?weakhash+and+weakref

There are other ways to implement WeakHash, I'll probably add some to
the
above page (later).

HTH
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-01-24 22:38
(Received via mailing list)
2006/1/24, Mauricio Fernandez <mfp@acm.org>:
> I summarized the solutions seen in this thread so far, and presented a couple
> new ones at
>   http://eigenclass.org/hiki.rb?weakhash+and+weakref
>
> There are other ways to implement WeakHash, I'll probably add some to the
> above page (later).

Nice work. Btw, it's expected that my lookup code is slow because it
has to create a weakref for every insertion.  We could try to optimize
this by providing a single weakref for the hash that is used for
lookups.

class WR < WeakRef
  attr_accessor :__getobj__
end

class RKWeakHash
  def initialize(...)
    @lookup = WR.new nil
  end

  def [](x)
     @lookup.__getobj__ = x
    __getobj__[@lookup]
  end
end

Unfortunately this is not thread safe, unless one invests more and
stores lookup thread local...

robert
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2006-01-25 04:13
(Received via mailing list)
On Jan 24, 2006, at 2:47 PM, Robert Klemme wrote:

> Note, the difference between yours and mine was that you first
> delegated and then subclassed the delegating class.

My usage was straight out of the delegate docs:

http://www.ruby-doc.org/stdlib/libdoc/delegate/rdo...

Technically, I wrote those docs, but I certainly didn't invent that
idiom.  I mainly copied it from Matz's usage in TempFile.  (I believe
he wrote that, my apologies if I mis-credited there.)

James Edward Gray II
This topic is locked and can not be replied to.