Get the real object in a Hash key

Hi, let’s suppose this simple code in which I add internal attributes
to String instances and use such String objects as Hash keys:


h = {}

k1 = “aaa”
k1.instance_variable_set :@name, “Aaa-011”

k2 = “bbb”
k2.instance_variable_set :@name, “Bbb-268”

h[k1] = “Hello”
h[k2] = “Bye”

Now I want to lookup in the hash the element whose key matches “aaa”
(using String#eql?):

h[“aaa”]
=> “Hello”

But I don’t want just to get the key associated value (“Hello”), but
also the key object itself (not the “aaa” I passed but k1 object) so I
can check its @name attribute. And I need it in a very efficient way.

However I’ve realized right now that it’s not possible. The hash key
doesn’t store the given key as a reference to such object:


puts k1.object_id
=> 18140060

puts k2.object_id
=> 16245980

h.keys.each {|k| puts k.object_id}
=> 16182220
=> 20359940
------------------------------------------.

I’ve realized of it while writting this mail, so forget the previous
question. Now I have another question:


myobject = MyCustomClass.new

@h = {}

@h[myobject] = “lalalala”

In this case, will Ruby GC delete myobject? or will it remain alive as
it has been used as a key of a hash (which is not GC’d in a supposed
code)?

Thanks a lot.

On Fri, Apr 15, 2011 at 2:50 PM, Iaki Baz C. [email protected]
wrote:

However I’ve realized right now that it’s not possible. The hash key
doesn’t store the given key as a reference to such object:

This is a special optimization for unfrozen Strings as Hash keys.

In this case, will Ruby GC delete myobject? or will it remain alive as
it has been used as a key of a hash (which is not GC’d in a supposed
code)?

The key stays alive at least as long as the Hash instance.

Cheers

robert

2011/4/15 Robert K. [email protected]:

On Fri, Apr 15, 2011 at 2:50 PM, Iñaki Baz C. [email protected] wrote:

However I’ve realized right now that it’s not possible. The hash key
doesn’t store the given key as a reference to such object:

This is a special optimization for unfrozen Strings as Hash keys.

Oopss, if I freeze the string before inserting it as Hash key it
doesn’t occur (I get some object_id) :slight_smile:
Same occurs if I use a class inheriting from String. Good to know!

Then I come back to my original question:


k1 = “aaa”
k1.freeze

h = {}

h[k1] = “HELLO”

On Fri, Apr 15, 2011 at 3:14 PM, Iaki Baz C. [email protected]
wrote:


Given a string “aaa”, how can I get the object k1 from the hash? (I
mean without comparing String#eql? each key with the string “aaa”)
Unfortunatelly I think Hash class does not provide a method for it.

Exactly. And you don’t want to do it. A Hash is an associative
storage which associates the value with your key. If you need to
stuff in more information - you need to add it to the value and not
the key. The simplest would be to define a Struct, e.g.

Value = Struct.new :name, :val

Then put this into the Hash as values

h[k1] = Value[“a name”, “HELLO”]

Kind regards

robert

Robert K. wrote in post #993000:

On Fri, Apr 15, 2011 at 3:14 PM, Iaki Baz C. [email protected]

Given a string “aaa”, how can I get the object k1 from the hash? (I
mean without comparing String#eql? each key with the string “aaa”)
Unfortunatelly I think Hash class does not provide a method for it.

Exactly. And you don’t want to do it. A Hash is an associative
storage which associates the value with your key. If you need to
stuff in more information - you need to add it to the value and not
the key…

Well you may want to do it – that’s why Hash#assoc exists. Hash keys
can be objects of any sort, and there are use cases for storing
nonsimple keys.

The reason there’s no constant-time equivalent of Hash#assoc is
because hashing, by its very nature, cannot be reversed. There’s no
method for it because one cannot possibly exist. It’s not because one
should never be interested in the key object. Hash#assoc is there for
a reason.

Lispers will recognize assoc as relating to the Lisp function of the
same name which has exactly that use case: key/value pairs where the
key and the value matter as objects in their own right, apart from the
the hashing function result.

On Fri, Apr 15, 2011 at 4:47 PM, Kevin M. [email protected]
wrote:

Well you may want to do it – that’s why Hash#assoc exists. Hash keys
can be objects of any sort, and there are use cases for storing
nonsimple keys.

I did not argue against complex keys. The issue is with mutable
keys. And since adding data to the key object is also associating
(which is done with the value as well) the most natural way would be
to place that additional information there. Not to mention the
questionable approach to stuff something into what is usually
considered a simple value (String).

Kind regards

robert

2011/4/15 Robert K. [email protected]:

Then put this into the Hash as values

h[k1] = Value[“a name”, “HELLO”]

Yes, that seems a good solution.

Thanks.

Robert K. wrote in post #993026:

I did not argue against complex keys. The issue is with mutable
keys. And since adding data to the key object is also associating
(which is done with the value as well) the most natural way would be
to place that additional information there. Not to mention the
questionable approach to stuff something into what is usually
considered a simple value (String).

You said “And you don’t want to do it.” In fact doing it has its uses.
Mutable keys or not is totally irrelevant, especially when the data
was there before the hash was introduced, as in the original example.

Of course making repeated calls to Hash#assoc in order to update
stuff in the key would be stupid. That goes without saying. What would
the purpose of the hash be? If that was your only point then we agree,
although it was a vacuous point.

Also do you realize that an example tends to stand for something which
is not literally the example itself? He has a key. It contains some
data. It’s not necessarily true that he should duplicate that data in
the mapped-to values. Mutable or not is beside the point.

I notice this phenomenon a lot: undergeneralization. The String stands
for something. It’s his key data. If it were a simple value then the
example wouldn’t make sense in the first place. Gee, thanks for
telling us that we shouldn’t stuff random shit into a simple value and
then use that as a hash key, whereupon we can’t look up stuff in the
hash directly but must use Hash#assoc instead. Again, if that was your
point then we agree, albeit in the obvious and nearly information-free
sense. I’m sure we would also agree that cats would be a poor building
material for helicopters.

On 15.04.2011 19:39, Kevin M. wrote:

Robert K. wrote in post #993026:

I did not argue against complex keys. The issue is with mutable
keys. And since adding data to the key object is also associating
(which is done with the value as well) the most natural way would be
to place that additional information there. Not to mention the
questionable approach to stuff something into what is usually
considered a simple value (String).

You said “And you don’t want to do it.” In fact doing it has its uses.

Please do not quote out of context: that was referring to the example
with a String instance used as a Hash key and stuffed with additional
instance variables.

Mutable keys or not is totally irrelevant, especially when the data
was there before the hash was introduced, as in the original example.

The topic of key mutability is especially relevant for keys stored in a
Hash. Of course mutations before storing are irrelevant. But if you
change fields of an object which are part of the key (i.e. included in
#hash and #eql?) you need to rehash in order for the Hash to do lookups
properly.

Basically you can have two types of fields in an object used as a Hash
key:

  1. key properties (used in #hash and #eql?)

  2. non key properties (neither used in #hash nor #eql?)

Type 1 properties need of course be part of the key and of course you
need to know them to make any lookups.

Type 2 properties are irrelevant for lookups you can merely consider
them being “associated with the key”. This leads to a situation where
you have one instance (per key) with the associated data and potentially
many other instances which might or might not have these properties. If
they are actually defined to be properties (either through attr_accessor
or manually) you end up carrying around baggage which is not used most
of the time.

Type 2 properties should rather go into another instance which should be
stored as value. This also makes it much clearer what’s going on.
Splitting up associated data into properties of key objects and an
instance stored in the Hash doesn’t really make sense. Then we could as
well store everything in the key instance and don’t need the Hash at
all.

Of course making repeated calls to Hash#assoc in order to update
stuff in the key would be stupid. That goes without saying. What would
the purpose of the hash be? If that was your only point then we agree,
although it was a vacuous point.

Why is the point vacuous? Apparently OP has / had some questions about
these topics and what may look obvious to you might not to others.

Also do you realize that an example tends to stand for something which
is not literally the example itself? He has a key. It contains some
data. It’s not necessarily true that he should duplicate that data in
the mapped-to values. Mutable or not is beside the point.

Well, but we cannot read other people’s minds. We have to take the
example at face value. Stuffing additional data into a String is not a
good idea and I am not sure whether that occurred to OP or not. So this
might really be what he is attempting. In this case “stuffing the data
into the key” was part of the example and it was nowhere expressed that
this is a fact that could not be changed.

And btw, I did not recommend to duplicate that data in the mapped-to
value. I specifically suggested to place it there exclusively.

I notice this phenomenon a lot: undergeneralization. The String stands
for something. It’s his key data. If it were a simple value then the
example wouldn’t make sense in the first place. Gee, thanks for
telling us that we shouldn’t stuff random shit into a simple value and
then use that as a hash key, whereupon we can’t look up stuff in the
hash directly but must use Hash#assoc instead. Again, if that was your
point then we agree, albeit in the obvious and nearly information-free
sense. I’m sure we would also agree that cats would be a poor building
material for helicopters.

As is rudeness for a community.

robert

hi Iñaki,

i may well not understand exactly what you need to do, and so be
oversimplifying, but could you do something similar to what Robert
suggested (but a bit simpler,) and just use an array as each key’s
value? the header’s original name could be added as the first element
of the array - something like this:

request = Hash.new{|key, value| key[value] = []}

request[“FROM”] = [“fRoM”, “sip:[email protected]”]

p hash[“FROM”][0]

#=> “fRoM”

  • j

2011/4/15 Kevin M. [email protected]:

He has a key. It contains some
data. It’s not necessarily true that he should duplicate that data in
the mapped-to values.

To clarify, my exact case is the following:

I’ve coded a parser for SIP (similar to HTTP). The parser generates a
Request object which inherits from Hash, and each SIP request header
(i.e. “From: sip:[email protected]”) becomes an entry of the hash
(Request object) as follows:

  • The key is “FROM” (capitalized).
  • The value is an Array of strings (a s header can have multiple
    values).

I need to store the key capitalized for fastest lookup, but I also
want to store the original header name (which could be “from”, “From”,
“frOM” and so).
So my parser adds an instance variable @real_name within the header
name string (“FROM”).

When I do the lookup of a header in the Request object, I would like
also to retrieve the key’s @real_name, but I’ve already understood
that this is only possible if taint the key string before inserting it
in the hash and use Hash#assoc. This solution is not good for
performance.

The solution suggested by Robert is adding such information (the
header original name) as a field in the hash entry value, so instead
of having:

request[“FROM”]
=> [ "sip:[email protected] ]

I would end with something like:

request[“FROM”]
=> Struct ( “From”, [ "sip:[email protected] ] )

The problem this last suggestion introduces is that it breaks the
existing API and makes more complext for a developer to handle the
Request class (which should be as easy as handling a Hash).

Thanks to both for your comments.

On Sat, Apr 16, 2011 at 9:51 AM, Iaki Baz C. [email protected]
wrote:

(Request object) as follows:
When I do the lookup of a header in the Request object, I would like
=> [ "sip:[email protected] ]

Thanks to both for your comments.


Iaki Baz C.
[email protected]

You don’t have to have a hash to implement a hash interface. How about
simply creating your own class that supports the interface you want, but
also the functionality you want. Something like this:

class Request

Header = Struct.new :key , :value

def self.parse(headers)
request = Request.new
headers.each_line do |header|
key, value = header.split ": "
request.add_header key , value.chomp
end
request
end

def initialize
@headers = Hash.new
end

def add_header(key, value)
@headers[key.upcase] = Header[key,value]
end

def
@headers[key][:value]
end

def original(key)
@headers[key][:key]
end

end

headers = <<HEADER
frOM: sip:[email protected]
To: sip:[email protected]
HEADER

request = Request.parse headers

request[“FROM”] # => “sip:[email protected]
request.original “FROM” # => “frOM”

request[“TO”] # => “sip:[email protected]
request.original “TO” # => “To”

2011/4/16 Robert K. [email protected]:

I’ve coded a parser for SIP (similar to HTTP). The parser generates a
Request object which inherits from Hash,

Usually it’s better to use composition instead of inheritance to achieve
this. Now your SipRequest inherits all methods from Hash including some
that you might not want users to be able to invoke.

Thanks to both. However the SIP parser is already done. I’ve coded it
at C level as a Ruby extension (similar to Mongrel HTTP parser which
returns a Hash instance). I can change it for generating a Hash object
rather than a custom SipRequest object, and then behave as both of you
suggest:

class SipRequest
def initialize(headers={})
@headers = headers
end
end

I will consider it and also the suggested methods to handle header
names and values.

Thanks a lot.

On 16.04.2011 16:51, Iaki Baz C. wrote:

2011/4/15 Kevin M.[email protected]:

He has a key. It contains some
data. It’s not necessarily true that he should duplicate that data in
the mapped-to values.

To clarify, my exact case is the following:

Now it gets interesting. :slight_smile:

I’ve coded a parser for SIP (similar to HTTP). The parser generates a
Request object which inherits from Hash,

Usually it’s better to use composition instead of inheritance to achieve
this. Now your SipRequest inherits all methods from Hash including
some that you might not want users to be able to invoke.

and each SIP request header
(i.e. “From: sip:[email protected]”) becomes an entry of the hash
(Request object) as follows:

  • The key is “FROM” (capitalized).
  • The value is an Array of strings (a s header can have multiple values).

I need to store the key capitalized for fastest lookup, but I also
want to store the original header name (which could be “from”, “From”,
“frOM” and so).

So, to sum it up: you want to have a class for SIP request which allows
(efficient) header field access through [] using header name in any case
spelling.

header original name) as a field in the hash entry value, so instead
The problem this last suggestion introduces is that it breaks the
existing API and makes more complext for a developer to handle the
Request class (which should be as easy as handling a Hash).

Here’s how I’d do it. First, I would start with the interface, maybe
something like this

module SIP
class Request
def self.parse(io)
# …
end

 # get a header field by symbol
 def [](header_name_sym)
 end

 # return the real name used
 def header_name(header_name_sym)
 end

end
end

Then I’d think how I could make that API work properly. For example two
variants, error and default value:

module SIP
class Request
HdrInfo = Struct.new name, values
DUMMY = HdrInfo[nil, [].freeze].freeze
LT = “\r\n”.freeze

 def self.parse(io)
   hdr = {}

   io.each_line LT do |l|
     case l
     when /^([^:]+:\s*(.*)$/
       # too simplistic parsing!
       hdr[$1] = $2.split(/,/).each(&:strip!)
     when /^$/
       break
     else
       raise "Not a header line: %p" % l
     end
   end

   new(hdr)
 end

 def initialize(headers)
   @hdr = {}

   # assume hdr is String and values is parsed
   headers.each do |hdr, values|
     @hdr[normalize(hdr)] = HdrInfo[hdr, values]
   end
 end

 # get a header field by symbol
 def [](header_name_sym)
   @hdr.fetch(normalize(header_name_sym)) do |k|
     DUMMY
   end.values
 end

 # return the real name used
 def header_name(header_name_sym)
   @hdr.fetch(normalize(header_name_sym)).do |k|
     raise ArgumentError,
       "Header not found %p" % header_name_sym
   end.name
 end

private
def normalize(h)
/[A-Z]/ =~ h ? h.downcase : h).to_sym
end
end
end

Of course we could build the internal hash straight away during parsing.
The main focus of the example was how to use the header once parsed.

Thanks to both for your comments.

You’re welcome.

Kind regards

robert