Forum: Ruby OpenStruct problem

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Minkoo S. (Guest)
on 2006-02-11 11:39
(Received via mailing list)
OpenStruct class seems to be misbehave when it comes to hashing.

For example,
irb(main):008:0> o1 = OpenStruct.new
=> <OpenStruct>
irb(main):009:0> o1.val = "a"
=> "a"
irb(main):010:0> o1 = OpenStruct.new
=> <OpenStruct>
irb(main):011:0> o1.foo = "a"
=> "a"
irb(main):012:0> o1.bar = "b"
=> "b"
irb(main):013:0> o2 = OpenStruct.new
=> <OpenStruct>
irb(main):014:0> o2.foo = "a"
=> "a"
irb(main):015:0> o2.bar = "b"
=> "b"
irb(main):016:0> h = { o1 => 1 }
=> {<OpenStruct bar="b" foo="a">=>1}
irb(main):017:0> h[o2]
=> nil
irb(main):018:0>

If it's not possible to retrieve the same hash entry using
OpenStruct, what should I have to use instead?

Don't you think OpenStruct have to support eql? method
automatically?

Best,
Minkoo S.
Timothy G. (Guest)
on 2006-02-11 12:08
(Received via mailing list)
Hashes don't behave well as keys either so it's no surprise openstructs
(which are based on them) don't. Nested hashes are usually sufficient,
but I agree the hashing of hashes and openstructs is pretty strange
strange. If you really need to use openstructs as keys you could use:

class OpenStruct
  def hash
    @table.to_a.hash
  end
end

This is, however, very inefficient and depends on all variables stored
in the openstruct also being good keys. As a general rule, a key should
be a simple value. Try to avoid using complex structures as keys.
Phrogz (Guest)
on 2006-02-11 12:18
(Received via mailing list)
> As a general rule, a key should be a simple value.
> Try to avoid using complex structures as keys.

Or rather, if you are going to use objects as keys, think of them as
each *instance* being a unique key, not "this object happens to have
the same contents as that other one, so I think that they should act
like the same key".

As Timothy notes: you *can* do what you're asking for, but it's
reasonably inefficient (from a CPU standpoint, if not a
programmer-enabling one).
Minkoo S. (Guest)
on 2006-02-11 13:18
(Received via mailing list)
What I want to do with OpenStruct is to store bigram in hash table.
(bigram can be thought as two consecutive strings)

So, I have two choices. First one is to define new class. The other
option is to use OpenStruct. However, OpenStruct can't be used
as key of hash, so I made my own class.

At first, I thought that OpenStruct might be implemented by
adding methods and instance variables on the fly. But, it's
hashtable. Also, it doesn't have automatic hash, eql? method
generation.

I understand that computing good hash value is beyond
the scope of langauge design, but I'm talking about library.
OpenStruct might use one of the field for computing hash
key, stating in the docuement that "default implementation
is not efficient enough. If you want, override hash."

If such a implementaion is available, people like me, who is
developing a prototype sysmte, will gladly use the default.

That being said, what's the benefit of OpenStruct? If OpenStruct
is nothing but a hashtable with syntactic sugar, I see reason
of using it. Am I missing something?

Best,
Minkoo S.
Erik V. (Guest)
on 2006-02-11 13:23
(Received via mailing list)
That's because the hash (I mean the result of #hash) of both
OpenStruct objects differ.

$ ri Object#hash

.... gives you:

"Generates a +Fixnum+ hash value for this object. This function
must have the property that +a.eql?(b)+ implies +a.hash ==
b.hash+. The hash value is used by class +Hash+. Any hash value
that exceeds the capacity of a +Fixnum+ will be truncated
before being used."

gegroet,
Erik V. - http://www.erikveen.dds.nl/

----------------------------------------------------------------

 require "ostruct"

 o1 = OpenStruct.new
 o1.foo = "a"
 o1.bar = "b"

 o2 = OpenStruct.new
 o2.foo = "a"
 o2.bar = "b"

 p o1.hash
 p o2.hash
George O. (Guest)
on 2006-02-11 14:13
(Received via mailing list)
"Timothy G." <removed_email_address@domain.invalid> writes:

> Hashes don't behave well as keys either so it's no surprise openstructs
> (which are based on them) don't. Nested hashes are usually sufficient,
> but I agree the hashing of hashes and openstructs is pretty strange
> strange. If you really need to use openstructs as keys you could use:
>
> class OpenStruct
>   def hash
>     @table.to_a.hash
>   end
> end

Careful!  @table.to_a is ordered arbitrarily, so that hash function
won't quite work.  Better sort that array, or use another hashing
strategy.

    @table.to_a.sort_by{|k,v| k.to_i}.hash
Hal F. (Guest)
on 2006-02-11 18:27
(Received via mailing list)
Minkoo S. wrote:
>
> That being said, what's the benefit of OpenStruct? If OpenStruct
> is nothing but a hashtable with syntactic sugar, I see reason
> of using it. Am I missing something?
>

Well, I talked to the author once (on this list -- you could look
it up) and he said something like, "OpenStruct was only a toy, I
am surprised people use it so much."

The file says Matz wrote it, although I thought it was someone
else. My memory is faulty.

My impression is that it was thrown together as a little proof-
of-concept and was not meant as a serious tool.

Have you ever looked at the code? It's instructive if you haven't.
Once you strip out the comments, it's only about 77 lines.

You might consider my SuperStruct class [1] -- I think it has bugs,
so let me know if you find them. And it doesn't reimplement eql?
so you may still have a problem there.


Cheers,
Hal


[1]  http://sstruct/rubyforge.org
Robert K. (Guest)
on 2006-02-11 22:06
(Received via mailing list)
2006/2/11, Minkoo S. <removed_email_address@domain.invalid>:
> generation.
> That being said, what's the benefit of OpenStruct? If OpenStruct
> is nothing but a hashtable with syntactic sugar, I see reason
> of using it. Am I missing something?

Probably not. That being said, an OpenStruct does not fit well as a
hash key because a hash key should be immutable (containers like Array
and Hash typically calculate their hash based on elements and so the
hash code will change once the container changes).

In Your case the best solution is to use a Struct.

Bigram = Struct.new :s1, :s2

You get #hash and #eql? for free and it's a one liner.  This seems to
be much more appropriate than using an OpenStruct as the number of
fields is fixed, isn't it? An OpenStruct really only makes sense when
the number of fields changes frequently or if you do not not all the
fields beforehand and find Hash a bit too clumnsy. It's often used for
configuration data (like in the sample code of OptionParser).

Kind regards

robert
Timothy G. (Guest)
on 2006-02-11 23:13
(Received via mailing list)
You're right. The order depends on the order keys were added (although
not directly). This makes it even more inefficient and completely
impractical for normal use. It would be really useful to have a more
meaningful hash function for hash objects, but this is probably
difficult in practice as it uses a 3rd party hash table library.
Hal F. (Guest)
on 2006-02-12 01:05
(Received via mailing list)
Timothy G. wrote:
> You're right. The order depends on the order keys were added (although
> not directly). This makes it even more inefficient and completely
> impractical for normal use. It would be really useful to have a more
> meaningful hash function for hash objects, but this is probably
> difficult in practice as it uses a 3rd party hash table library.
>

Your comments about order lost me, can you clarify?

Thanks,
Hal
Timothy G. (Guest)
on 2006-02-12 12:13
(Received via mailing list)
I meant to say "The .hash (of the hash) depends on the order keys were
added...", but on second thoughts I think I was wrong. Two identical
hashes seem to have different .hash values even when the keys are added
in the same order. This obviously makes them completely unsuitable as
hash keys.
George O. (Guest)
on 2006-02-12 14:14
(Received via mailing list)
"Timothy G." <removed_email_address@domain.invalid> writes:

> I meant to say "The .hash (of the hash) depends on the order keys were
> added...", but on second thoughts I think I was wrong. Two identical
> hashes seem to have different .hash values even when the keys are added
> in the same order. This obviously makes them completely unsuitable as
> hash keys.

Hash does not provide a #hash method at all; it falls back to
Object#hash, which is essentially #object_id.

But you'd be correct to infer that inserting keys in a hash, `h', in a
different order could yield a different value for `h.to_a.hash'.
You'll probably see the difference with:

  h1 = {}
  h2 = {}
  (1..1000).each{|i| h1[i] = true}
  (1..1000).sort_by{rand}.each{|i| h2[i] = true}

  p h1.to_a.hash == h2.to_a.hash
  p h1.to_a.sort.hash == h2.to_a.sort.hash

The reason is that if two keys hit the same bucket, then which ever
one was inserted last will appear at the head of the chain (ruby
hashes use chaining).  #each (and #to_a) simply scan each chain in
turn and yields each element as it finds it.  Thus insertion order can
easily have an effect on the order of the array, and hence the array's
#hash.

As for usefulness, that will always depend on the application.
Claiming it as "completely impractical for normal use" sounds akin to
premature optimization to me, unless you're referring to a specific
use case which you know well.  You could also use a faster hash
function which doesn't depend on the order, such as XORing values
computed from each pair.

I agree though, that a standard Hash#hash could be useful.  It's come
up before; I think the issue was how to handle the #default and
#default_proc, especially the latter.  I would probably favor using
default_proc.object_id in the computed hash, even if it means two
default_procs that always yield the same value end up giving different
hashes.
This topic is locked and can not be replied to.