Subclass of hash, that times last read

Hi all. I’m rather unfamiliar with the subclass / superclass principle,
but I just tried to write one. The result is below here, and should be a
hash-like class, with the exception that it stores the last-accessed
time (the purpose of this being that I can more easily delete entries
from the hash that haven’t been used in a while - I’m using a hash for
caching a costly function, but over time the hash grows big and slow,
with a lot of old entries in it… I’m trying out different ways of
pruning the hash, and this seemed to be an interesting option).

The below code seems to work fine, but it would be a comfort to me to
know if I’m introducing several hairy bug-prone things if I add this to
my code, or that this is the relatively safe and correct way to do it.

class Myhash < Hash
alias :old_write :[]=
def []=(key,value)
old_write(key,[value,Time.now])
end

alias :old_read :[]
def
value = old_read(key)
old_write(key,[value[0],Time.now]) if value.respond_to?(:[])
old_read(key)
end
end

hash = Myhash.new
1000.times {hash[rand] = rand}
sleep 7
1000.times {hash[rand] = rand}
hash.delete_if {|k,v| Time.now - v[1] > 5}

Which should delete the 1000 entries from the hash that were created in
the first loop.

On Wednesday 28 May 2008 13:28:58 Boris S. wrote:

know if I’m introducing several hairy bug-prone things if I add this to
my code, or that this is the relatively safe and correct way to do it.

The only bug I’ve found so far is that it won’t work if the user
provides a
default value for the hash.

But allow me to nitpick a bit:

class Myhash < Hash
alias :old_write :[]=
def []=(key,value)
old_write(key,[value,Time.now])
end

I would avoid aliasing things out of the way. It pollutes the namespace
– you
now have an old_write method, which is really only used by your one
writer
method.

And I think a conventional way to do that is to choose names that
reflect what
you’re doing to change it – for example, here, you’d choose something
like :write_without_myhash. (Imagine someone inherits from the stack,
and you
both do the exact same thing, and define a method called old_write.
You’ll
create an infinite loop.)

Since you’re in a subclass anyway, though, you get this for free:

def []=(key,value)
super(key,[value,Time.now])
end

alias :old_read :[]
def
value = old_read(key)
old_write(key,[value[0],Time.now]) if value.respond_to?(:[])
old_read(key)
end
end

The same can be done here, reducing some code duplication:

def
value = super(key)
value[1] = Time.now if self.include? key
value
end

If that makes sense, you can skip the part where I take five times as
long to
explain it…

I’m setting value[1] to Time.now. This will stay, because it’s still the
same
array object – I don’t have to change anything in the hash.

I’m checking for self.include?, because this will tell me if that
particular
key has been assigned yet. If it hasn’t, we’re dealing with a default
value,
and I’m not sure quite what you want to do – probably store it and
assign a
timestamp anyway? If so:

def
value = super(key)
if self.include? key
value[1] = Time.now
else
self[key] = value
value = self[key]
end
value
end

And since you already have the result of super(key) in a value, I’m
returning
that value, rather than looking it up again, when I an. Because it’s
still
the same array, it will have the new timestamp, which is what your old
code
did.

hash.delete_if {|k,v| Time.now - v[1] > 5}

You might consider putting this in your class. For example:

def delete_before(time)
delete_if { |k,v| v.last < time }
end
hash.delete_before(Time.now - 5)

Maybe even:

def delete_older_than(seconds)
delete_before(Time.now - seconds)
end
hash.delete_older_than(5)

My approach would be quite different. I’d use files to store the values
(“costly functions”), with the name of the file being determined by the
key. (A sort of mini-database I suppose.)

It’s very easy to determine when a file was last accessed, so you could
have a separate process running in the background looking for files to
delete. When I did this, I used a cron job, but other approaches are
possible of course.

File accesses are going to be slower than hash lookups, but whether this
is important or not depends on your application. And if some keys are
used very frequently, the associated files should get cached in memory
by the OS, so they should be quick to read.

Of course you could wrap this in a class called MyHash. :wink: