OpenStruct problem

pool007 · February 11, 2006, 10:39am

OpenStruct class seems to be misbehave when it comes to hashing.

For example,
irb(main):008:0> o1 = OpenStruct.new
=>
irb(main):009:0> o1.val = “a”
=> “a”
irb(main):010:0> o1 = OpenStruct.new
=>
irb(main):011:0> o1.foo = “a”
=> “a”
irb(main):012:0> o1.bar = “b”
=> “b”
irb(main):013:0> o2 = OpenStruct.new
=>
irb(main):014:0> o2.foo = “a”
=> “a”
irb(main):015:0> o2.bar = “b”
=> “b”
irb(main):016:0> h = { o1 => 1 }
=> {=>1}
irb(main):017:0> h[o2]
=> nil
irb(main):018:0>

If it’s not possible to retrieve the same hash entry using
OpenStruct, what should I have to use instead?

Don’t you think OpenStruct have to support eql? method
automatically?

Best,
Minkoo S.

pool007 · February 11, 2006, 11:08am

Hashes don’t behave well as keys either so it’s no surprise openstructs
(which are based on them) don’t. Nested hashes are usually sufficient,
but I agree the hashing of hashes and openstructs is pretty strange
strange. If you really need to use openstructs as keys you could use:

class OpenStruct
def hash
@table.to_a.hash
end
end

This is, however, very inefficient and depends on all variables stored
in the openstruct also being good keys. As a general rule, a key should
be a simple value. Try to avoid using complex structures as keys.

pool007 · February 11, 2006, 11:18am

As a general rule, a key should be a simple value.
Try to avoid using complex structures as keys.

Or rather, if you are going to use objects as keys, think of them as
each instance being a unique key, not “this object happens to have
the same contents as that other one, so I think that they should act
like the same key”.

As Timothy notes: you can do what you’re asking for, but it’s
reasonably inefficient (from a CPU standpoint, if not a
programmer-enabling one).

pool007 · February 11, 2006, 12:23pm

That’s because the hash (I mean the result of #hash) of both
OpenStruct objects differ.

$ ri Object#hash

… gives you:

“Generates a +Fixnum+ hash value for this object. This function
must have the property that +a.eql?(b)+ implies +a.hash ==
b.hash+. The hash value is used by class +Hash+. Any hash value
that exceeds the capacity of a +Fixnum+ will be truncated
before being used.”

gegroet,
Erik V. - http://www.erikveen.dds.nl/

require “ostruct”

o1 = OpenStruct.new
o1.foo = “a”
o1.bar = “b”

o2 = OpenStruct.new
o2.foo = “a”
o2.bar = “b”

p o1.hash
p o2.hash

pool007 · February 11, 2006, 1:13pm

“Timothy G.” [email protected] writes:

Hashes don’t behave well as keys either so it’s no surprise openstructs
(which are based on them) don’t. Nested hashes are usually sufficient,
but I agree the hashing of hashes and openstructs is pretty strange
strange. If you really need to use openstructs as keys you could use:

class OpenStruct
def hash
@table.to_a.hash
end
end

Careful! @table.to_a is ordered arbitrarily, so that hash function
won’t quite work. Better sort that array, or use another hashing
strategy.

@table.to_a.sort_by{|k,v| k.to_i}.hash

pool007 · February 11, 2006, 5:27pm

Minkoo S. wrote:

That being said, what’s the benefit of OpenStruct? If OpenStruct
is nothing but a hashtable with syntactic sugar, I see reason
of using it. Am I missing something?

Well, I talked to the author once (on this list – you could look
it up) and he said something like, “OpenStruct was only a toy, I
am surprised people use it so much.”

The file says Matz wrote it, although I thought it was someone
else. My memory is faulty.

My impression is that it was thrown together as a little proof-
of-concept and was not meant as a serious tool.

Have you ever looked at the code? It’s instructive if you haven’t.
Once you strip out the comments, it’s only about 77 lines.

You might consider my SuperStruct class [1] – I think it has bugs,
so let me know if you find them. And it doesn’t reimplement eql?
so you may still have a problem there.

Cheers,
Hal

[1] http://sstruct/rubyforge.org

pool007 · February 11, 2006, 9:06pm

2006/2/11, Minkoo S. [email protected]:

generation.
That being said, what’s the benefit of OpenStruct? If OpenStruct
is nothing but a hashtable with syntactic sugar, I see reason
of using it. Am I missing something?

Probably not. That being said, an OpenStruct does not fit well as a
hash key because a hash key should be immutable (containers like Array
and Hash typically calculate their hash based on elements and so the
hash code will change once the container changes).

In Your case the best solution is to use a Struct.

Bigram = Struct.new :s1, :s2

You get #hash and #eql? for free and it’s a one liner. This seems to
be much more appropriate than using an OpenStruct as the number of
fields is fixed, isn’t it? An OpenStruct really only makes sense when
the number of fields changes frequently or if you do not not all the
fields beforehand and find Hash a bit too clumnsy. It’s often used for
configuration data (like in the sample code of OptionParser).

Kind regards

robert

pool007 · February 11, 2006, 12:18pm

What I want to do with OpenStruct is to store bigram in hash table.
(bigram can be thought as two consecutive strings)

So, I have two choices. First one is to define new class. The other
option is to use OpenStruct. However, OpenStruct can’t be used
as key of hash, so I made my own class.

At first, I thought that OpenStruct might be implemented by
adding methods and instance variables on the fly. But, it’s
hashtable. Also, it doesn’t have automatic hash, eql? method
generation.

I understand that computing good hash value is beyond
the scope of langauge design, but I’m talking about library.
OpenStruct might use one of the field for computing hash
key, stating in the docuement that “default implementation
is not efficient enough. If you want, override hash.”

If such a implementaion is available, people like me, who is
developing a prototype sysmte, will gladly use the default.

That being said, what’s the benefit of OpenStruct? If OpenStruct
is nothing but a hashtable with syntactic sugar, I see reason
of using it. Am I missing something?

Best,
Minkoo S.

pool007 · February 11, 2006, 10:13pm

You’re right. The order depends on the order keys were added (although
not directly). This makes it even more inefficient and completely
impractical for normal use. It would be really useful to have a more
meaningful hash function for hash objects, but this is probably
difficult in practice as it uses a 3rd party hash table library.

pool007 · February 12, 2006, 11:13am

I meant to say “The .hash (of the hash) depends on the order keys were
added…”, but on second thoughts I think I was wrong. Two identical
hashes seem to have different .hash values even when the keys are added
in the same order. This obviously makes them completely unsuitable as
hash keys.

pool007 · February 12, 2006, 12:05am

Timothy G. wrote:

You’re right. The order depends on the order keys were added (although
not directly). This makes it even more inefficient and completely
impractical for normal use. It would be really useful to have a more
meaningful hash function for hash objects, but this is probably
difficult in practice as it uses a 3rd party hash table library.

Your comments about order lost me, can you clarify?

Thanks,
Hal

pool007 · February 12, 2006, 1:14pm

“Timothy G.” [email protected] writes:

I meant to say “The .hash (of the hash) depends on the order keys were
added…”, but on second thoughts I think I was wrong. Two identical
hashes seem to have different .hash values even when the keys are added
in the same order. This obviously makes them completely unsuitable as
hash keys.

Hash does not provide a #hash method at all; it falls back to
Object#hash, which is essentially #object_id.

But you’d be correct to infer that inserting keys in a hash, h', in a different order could yield a different value for h.to_a.hash’.
You’ll probably see the difference with:

h1 = {}
h2 = {}
(1…1000).each{|i| h1[i] = true}
(1…1000).sort_by{rand}.each{|i| h2[i] = true}

p h1.to_a.hash == h2.to_a.hash
p h1.to_a.sort.hash == h2.to_a.sort.hash

The reason is that if two keys hit the same bucket, then which ever
one was inserted last will appear at the head of the chain (ruby
hashes use chaining). #each (and #to_a) simply scan each chain in
turn and yields each element as it finds it. Thus insertion order can
easily have an effect on the order of the array, and hence the array’s
#hash.

As for usefulness, that will always depend on the application.
Claiming it as “completely impractical for normal use” sounds akin to
premature optimization to me, unless you’re referring to a specific
use case which you know well. You could also use a faster hash
function which doesn’t depend on the order, such as XORing values
computed from each pair.

I agree though, that a standard Hash#hash could be useful. It’s come
up before; I think the issue was how to handle the #default and
#default_proc, especially the latter. I would probably favor using
default_proc.object_id in the computed hash, even if it means two
default_procs that always yield the same value end up giving different
hashes.