Duck Typing Hash-Like Objects

pfharlock · March 1, 2007, 11:25pm

I often find that when writing initialize (or alternate constructors)
I want to examine the class of the arguments to decide how to
proceed. An example is Array.new, which behaves differently if it
is given an integer argument or an array argument:

Array.new 2      # [nil,nil]
Array.new [1,2]  # [1,2]

These sorts of tests can be done via Class#=== or Kernel#is_a? or
Kernel#kind_of? but that can lead to artificial constraints. Using
Kernel#respond_to? seems to avoid many of those constraints.

My question is: What is the least constraining test to determine
if you’ve got a hash-like object? Is arg.respond_to?(:has_key?)
reasonable? At first I thought a test for :[] would be great but
that catches strings also. I’m thinking that if someone hands my
method a Hash or a HashWithIndifferentAccess or an OrderedHash or
a tree of some sort, I’d like to be able to accept all of them.

All I really want to know is “Does this object provide key/value
pair lookups via the #[] method?”, but I don’t want to get strings
and integers along for the ride (for example).

Gary W.

pfharlock · March 1, 2007, 11:32pm

On Mar 1, 2007, at 4:25 PM, Gary W. wrote:

Kernel#respond_to? seems to avoid many of those constraints.

My question is: What is the least constraining test to determine
if you’ve got a hash-like object? Is arg.respond_to?(:has_key?)
reasonable?

I would check for to_hash(), then call that method on the argument to
get its Hash representation.

James Edward G. II

pfharlock · March 2, 2007, 12:23am

On Mar 1, 2007, at 5:30 PM, James Edward G. II wrote:

I would check for to_hash(), then call that method on the argument
to get its Hash representation.

That might work but what if the object is an interface to some sort
of database? You don’t
really want to convert the external data structure into a Hash just
to access a single item.

Gary W.

pfharlock · March 2, 2007, 1:21am

On 3/1/07, Gary W. [email protected] wrote:

My question is: What is the least constraining test to determine
if you’ve got a hash-like object? Is arg.respond_to?(:has_key?)
reasonable? At first I thought a test for :[] would be great but
that catches strings also. I’m thinking that if someone hands my
method a Hash or a HashWithIndifferentAccess or an OrderedHash or
a tree of some sort, I’d like to be able to accept all of them.

I’d do it like this:

def foo(duck)
# if the duck claims to have keys and indexing, we’ll just use it as
is
unless duck.respond_to?(:keys) and duck.respond_to?(:[])
# otherwise, we’ll ask it to turn itself into a hash for us
if duck.responds_to?(:to_hash)
duck = duck.to_hash
else
# not close enough to a hash…
raise ArgumentError, “want something with keys and indexing,
or that supports to_hash”
end
end
…
end

This requires the keys method though, which thinking back, I usually
don’t provide in my hash-like classes. So I don’t know…

Jacob F.

pfharlock · March 2, 2007, 1:03am

On Mar 1, 2007, at 5:22 PM, Gary W. wrote:

On Mar 1, 2007, at 5:30 PM, James Edward G. II wrote:

I would check for to_hash(), then call that method on the argument
to get its Hash representation.

That might work but what if the object is an interface to some sort
of database? You don’t
really want to convert the external data structure into a Hash just
to access a single item.

OK, what about using Hash#fetch and trapping the IndexError for an
invalid key?

James Edward G. II

pfharlock · March 2, 2007, 1:32am

On Mar 1, 2007, at 7:03 PM, James Edward G. II wrote:

really want to convert the external data structure into a Hash
just to access a single item.

OK, what about using Hash#fetch and trapping the IndexError for an
invalid key?

Yes, I think #fetch might be a better choice, but not exactly in the
way you suggest.
I’m thinking specifically about the construction of objects such as:

class A
def initialize(arg, &b)
case
when arg.respond_to?(:nonzero?)
# do construction based on integer-like behavior
when arg.respond_to?(:fetch)
# do construction based on hash-like behavior
when arg.respond_to?(:to_str)
# do construction based on string-like behavior
else
# punt
end
end

I was going to use :[] for hash-like behavior but that doesn’t sift
out Integer and Strings so
I started using :has_key?, but that seemed wrong so I posted my
question.

Your suggestion to use fetch seems promising, but ActiveRecord, for
example doesn’t define
ActiveRecord::Base.fetch. The correct choice would be find for
ActiveRecord. Hash#fetch,
and Array#fetch exist, so that does permit some nice duck-typing
between those two collections.
RBtree also defines #fetch, which is convenient.

It looks like #fetch might be the best approach.

Gary W.

pfharlock · March 2, 2007, 2:02am

On Mar 1, 2007, at 7:38 PM, James Edward G. II wrote:

You’re still type checking, you’re just doing it in a more fragile
way. If you want to type check, use the class, I say.

Yet if I test for (Hash == mystery_obj) that would not
allow someone to pass an RBTree object instead, which I think
is a very reasonable thing to allow and works just fine if
I only use #fetch.

A minimum interface to an indexable collection might be:

has_key?(key)
fetch(key)
store(key, val)

In a quick look it seems like only Hash and RBTree implement
those methods though.

Gary W.

pfharlock · March 2, 2007, 3:11am

On Fri, 02 Mar 2007 10:01:06 +0900, Gary W. wrote:

has_key?(key)
fetch(key)
store(key, val)

In a quick look it seems like only Hash and RBTree implement those
methods though.

Is there a good reason why you can’t just use different constructors for
different types of objects, then just trust that they duck-type OK?

–Ken

pfharlock · March 2, 2007, 1:39am

On Mar 1, 2007, at 6:31 PM, Gary W. wrote:

class A
def initialize(arg, &b)
case
when arg.respond_to?(:nonzero?)
# do construction based on integer-like behavior

Floats have nonzero?() too. I really think picking arbitrary methods
like this to find a type is a big mistake.

You’re still type checking, you’re just doing it in a more fragile
way. If you want to type check, use the class, I say.

If you want it to be an Integer, ask it if it can:

Integer(…) rescue # nope…

when arg.respond_to?(:fetch)
# do construction based on hash-like behavior

Arrays have fetch too.

when arg.respond_to?(:to_str)
# do construction based on string-like behavior

String(…) rescue # nope…

else
# punt
end
end

James Edward G. II

pfharlock · March 2, 2007, 3:27am

Hi –

On Fri, 2 Mar 2007, Gary W. wrote:

On Mar 1, 2007, at 7:38 PM, James Edward G. II wrote:

You’re still type checking, you’re just doing it in a more fragile way.
If you want to type check, use the class, I say.

Yet if I test for (Hash == mystery_obj) that would not
allow someone to pass an RBTree object instead, which I think
is a very reasonable thing to allow and works just fine if
I only use #fetch.

I had the impression James was talking about the Integer and String
methods, though then again those aren’t actually the classes. So I’m
not sure what he meant But I don’t think it was just to test
class membership, since that manifestly doesn’t help in the kind of
situation you’re describing.

David

pfharlock · March 2, 2007, 3:23am

Hi –

On Fri, 2 Mar 2007, Jacob F. wrote:

def foo(duck)
end
end
…
end

Or you could just do:

duck[whatever]…

and rescue the exception(s), possibly cascading down into a to_hash
operation. You might as well fail without bothering with the
respond_to? calls – just ask the object to do what it’s supposed to,
and handle the error cases.

David

pfharlock · March 2, 2007, 3:50am

Hi –

On Fri, 2 Mar 2007, [email protected] wrote:

is a very reasonable thing to allow and works just fine if
I only use #fetch.

I had the impression James was talking about the Integer and String
methods, though then again those aren’t actually the classes. So I’m
not sure what he meant But I don’t think it was just to test
class membership, since that manifestly doesn’t help in the kind of
situation you’re describing.

Well, I should say: it’s a way to deal with some of the practicalities
of a situation where you really only want objects of certain classes,
at the expense of duck typing. But (a) it sounds like you want
something more elastic, and (b) testing class membership doesn’t tell
you anything definitive, so it doesn’t solve the problem if you’re
thinking that rogue objects might be coming in to the method (since if
someone can roguely send it, say, a Proc, which responds to [], they
can presumably send it a hash that responds to [] irresponsibly).

I guess I tend to think in terms of error handling: that is, let
objects call [], but catch the ones that fail, or the ones that hand
back nonsense (in the context) values.

It’s funny sometimes how discussions of duck typing come at the same
thing from two directions: protecting systems from supposed gremlins
that are engineering its demise by extending objects with destructive
but well-camouflaged behaviors, and exploring the coolness of the
openness of Ruby objects. Or something.

David

pfharlock · March 2, 2007, 3:46am

On Mar 1, 2007, at 8:26 PM, [email protected] wrote:

is a very reasonable thing to allow and works just fine if
I only use #fetch.

I had the impression James was talking about the Integer and String
methods, though then again those aren’t actually the classes. So I’m
not sure what he meant

I was probably just babbling, not making sense. I do that.

But I don’t think it was just to test class membership, since that
manifestly doesn’t help in the kind of situation you’re describing.

Yeah, you’re right. I was feeling that this is just an attempt to
sidestep type checking by inventing a clever new type checking
system. It’s really just trying to provide a flexible interface though.

Given that, I’m changing my answer.

This is a documentation problem. As long as the documentation tells
me your method needs a put_stuff_in() and a pull_stuff_out() to work,
tells me what they will be passed, and doesn’t type check, you
support ALL data structures. I can always wrap Hash, RBTree,
Integer, JamesCustomDataVoid, or whatever in a trivial class
implementing those calls.

Am I making sense yet, or do I just need to go to sleep now?

James Edward G. II

pfharlock · March 2, 2007, 3:59am

Hi –

On Fri, 2 Mar 2007, James Edward G. II wrote:

Yeah, you’re right. I was feeling that this is just an attempt to
sidestep type checking by inventing a clever new type checking system.

Or an attempt to sidestep class-checking by inventing a type-checking
system A few years ago there were some interesting attempts to
come up with a systematic way to determine an object’s type, in the
sense of its full profile and interface, at any given point in its
life. The idea was to be able to get some kind of rich response from
the object, well beyond what respond_to? and is_a? provide, in order
to determine whether you’d gotten hold of the type of object you
needed. I seem to recall it turned out to be very challenging,
perhaps impossible, to come up with a complete system for this. I’m
not sure if anyone is still working on it. But it’s an interesting
area.

Am I making sense yet, or do I just need to go to sleep now?

Definitely the former, and perhaps the latter too – 'tis up to you
I’m also very tired, and feeling semi-coherent at best, but
enjoying the thread.

David

pfharlock · March 2, 2007, 4:44am

On Mar 1, 2007, at 9:49 PM, [email protected] wrote:

I guess I tend to think in terms of error handling: that is, let
objects call [], but catch the ones that fail, or the ones that hand
back nonsense (in the context) values.

Let me make the situation a little more concrete.

I’d like to define a class that accepts the following syntax for
construction:

A.new
A.new(1)
A.new(1,2)
A.new(3 => 4)
A.new(1, 3 => 4)
A.new(1, 2, 3 => 4)

So the arguments to A.new are zero or more objects followed by an
optional hash. I can certainly look for that trailing hash via
(Hash === args.last) but what if I don’t want to lock it down to
a Hash?

tree = RBTree.new
A.new(1, 2, tree)

I’d like that to work also and I’m sure there are other sorts of
objects that would work just fine (i.e. respond to #fetch/#[], has_key?,
and perhaps is Enumerable). If I use a class based test to discover
if the last argument is an instance of Hash, I’m eliminating those
other possibilities. I also don’t want to use args.last[key] and
catch an exception because that is only useful after I’ve
discovered if an optional final hash-like object has been passed.

I could have different constructors:

A.new(1)
A.new_with_hash(1, 1=>2)

but it really isn’t as nice, IMHO.

At first I thought I could use respond_to?(:[]) on the last argument,
but as I said in the original post integers and strings will create
a false-positive for a hash-like trailing argument using that test.

Perhaps I’m trying to push the duck-typing too far and should just stick
with testing for Hash but it seems like testing for #fetch gives at
least
a little more flexibility.

It also seems like it might be nice to encourage a practice of defining
#fetch, #store, and #has_key? for data structures that are ‘indexable’.

Gary W.

pfharlock · March 2, 2007, 6:10am

On Fri, 02 Mar 2007 10:01:06 +0900, Gary W. wrote:

has_key?(key)
fetch(key)
store(key, val)

In a quick look it seems like only Hash and RBTree implement those
methods though.

Sounds like you want C++200x concept checking, but that depends very
heavily on static typing.

Basically, I think you want to know (in a non-mutating way) whether #[]
supports various types non-integer parameters. I doubt there’s any way
to
do that in Ruby. You could try indexing it and see if it throws a
TypeError (like an Array will), but when you call #[] on Hash.new{|h,v|
h
[v]=0}, #[] is mutating.

–Ken

pfharlock · March 2, 2007, 7:16am

On Mar 2, 2007, at 12:39 AM, Joel VanderWerf wrote:

There seems to be still some ambiguity in this description. In this
case:

h = {3 => 4}
A.new(1, 2, h)

how do you know if h is intended as the third object (in the
“zero or more objects” part) or as the optional hash?

You don’t. There just has to be a clear documentation for
the disambiguation rule. The caller could use:

A.new(1,2, h, {})

If they wanted to force h to be part of the list of objects
instead of the optional trailing hash.

Gary W.

pfharlock · March 2, 2007, 6:39am

Gary W. wrote:

a Hash?
There seems to be still some ambiguity in this description. In this
case:

h = {3 => 4}
A.new(1, 2, h)

how do you know if h is intended as the third object (in the “zero or
more objects” part) or as the optional hash?

Sometimes I have wished that the hash generated by this syntax:

meth(k=>v)

were flagged in some way, so that you could distinguish it from

meth({k=>v})

But I’m not sure that would help in this case anyway.

pfharlock · March 2, 2007, 7:36am

Gary W. wrote:

You don’t. There just has to be a clear documentation for
the disambiguation rule. The caller could use:

A.new(1,2, h, {})

If they wanted to force h to be part of the list of objects
instead of the optional trailing hash.

Another possibility, unless you need to use the block that’s passed to
A.new for something else:

class A
def initialize(*args)
@args = args
@opts = block_given? ? yield : {}
puts “args=#{@args.inspect} opts=#{@opts.inspect}”
end
end

A.new(1, 2, 3) # args=[1, 2, 3] opts={}
A.new(1, 2, 3) {{4=>5, 6=>7}} # args=[1, 2, 3] opts={6=>7, 4=>5}
A.new(1, 2, {3=>4}) # args=[1, 2, {3=>4}] opts={}
A.new(1, 2, 3=>4) # args=[1, 2, {3=>4}] opts={}

But that’s syntactically less tidy.

pfharlock · March 2, 2007, 1:55pm

On 3/2/07, Gary W. [email protected] wrote:

a Hash?
discovered if an optional final hash-like object has been passed.

I could have different constructors:
    A.new(1)
    A.new_with_hash(1, 1=>2)
but it really isn’t as nice, IMHO.

At first I thought I could use respond_to?(:[]) on the last argument,
but as I said in the original post integers and strings will create

Is it really a problem that strings and integers produce values that
your method would make use of? Say someone wants to encode those input
parameters into a string - as long as [] works, they can. Why is this
a problem?