Making Hash a first-class citizen

Erik_Michaels-Ober · August 9, 2009, 9:21pm

I’ve noticed a couple inconsistencies in the way Ruby hashes are treated
(compared to, say, arrays). While I’ve been able to monkey-patch
solutions,
I thought I would bring these inconsistencies to the attention of the
group
with they hope that they might be resolved in a future version of the
language, which I love dearly.

For starters, there’s no to_hash method on nil. This causes a problem
when
I want to call an instance method on an object that may or may not be
nil
(for example, a hash of parameters from an HTTP request).

A common way to avoid a NoMethodError is by casting an object to a
particular type before calling a method on it. For example:

string_or_nil.to_s.capitalize

This technique can be used for arrays, floats, integers, and strings.
It
would seem to follow that I could do the same for hashes:

hash_or_nil.to_hash.rehash

but instead I must do something like this:

(hash_or_nil || nil).rehash

or patch NilClass like this:

class NilClass
def to_hash
{}
end
end

which seems to me like it shouldn’t be necessary for a “primitive” type.

Second, I would argue that there should be + (plus), - (minus), and &
(ampersand) methods on hashes, that function the same way they do for
arrays
(concatenation, difference, and intersection, respectively).

These few changes would go a long way toward making Hash a first-class
citizen in Ruby.

Erik_Michaels-Ober · August 9, 2009, 9:57pm

Erik Michaels-Ober wrote:

Second, I would argue that there should be + (plus), - (minus), and &
(ampersand) methods on hashes, that function the same way they do for arrays
(concatenation, difference, and intersection, respectively).

Hash is a very flexible citizen, though. It plays many roles. So how
does this come out:

{0=>1, 1=>0} - {1=>1} = ?

{0=>1, 1=>0}

if you look at a hash as a set of pairs

{0=>0}

if you look at this hash as another way of expressing

the same indexed collection as the array [1,0], and use

Array#-

{0=>1}

if you look at a hash as a set of keys, with the value, as

a boolean, representing membership or non-membership in the set

{0=>1, 1=>-1}

maybe, if you look at a hash as a mathematical function

And then there are “Bags”:

{1=>6} - {1=>1} == {1=>5}

Erik_Michaels-Ober · August 9, 2009, 10:50pm

{0=>1, 1=>0} - {1=>1} = ?

{0=>1, 1=>0}

if you look at a hash as a set of pairs

I do look at Hash as a set of pairs and I would consider the other uses
you
cite to be “non-standard”. The first line of RDoc for the class states:
“A
Hash is a collection of key-value pairs.”

I would argue it’s better for +, -, and & to be defined for the standard
use
case than to remain undefined in the language. Those using Hash in a
non-standard way can simply avoid these methods.

You could just as well argue that someone might want [6] - [1] to return
[5], but that wouldn’t make sense given that “Arrays are ordered,
integer-indexed collections of any object.”

Any objections to nil.to_hash?

Note: in the example I meant to type (hash_or_nil || {}).rehash instead
of
(hash_or_nil || nil).rehash

Erik_Michaels-Ober · August 10, 2009, 7:00am

Concatenation does not make sense with an unordered collection.

There are also the merge and delete methods. Should + be simply
aliases for those? I personally would favour making those operators
work on the set of keys.

Any objections to nil.to_hash?

Nil is nil. Maybe there is a reason why the variable/value in question
is nil and not {}? If so, you probably shouldn’t ignore that
distinction?

Erik_Michaels-Ober · August 9, 2009, 11:10pm

On 09.08.2009 22:49, Erik Michaels-Ober wrote:

{0=>1, 1=>0}

if you look at a hash as a set of pairs

I do look at Hash as a set of pairs and I would consider the other uses you
cite to be “non-standard”. The first line of RDoc for the class states: “A
Hash is a collection of key-value pairs.”

You omit a very important additional property: no key can occur twice in
a Hash. Actually I would consider at least option one and three of
Joel’s list as common usage of a Hash.

I would argue it’s better for +, -, and & to be defined for the standard use
case than to remain undefined in the language. Those using Hash in a
non-standard way can simply avoid these methods.

The problem is that there is no standard use case. Removing based on
the identical pair seems to me as valid as removing based on the key
only as implementations for Hash#-.

Concatenation does not make sense with an unordered collection. See
also various discussions why Hash does not (or rather did not) implement
#hash and #eql? as many people expected.

You could just as well argue that someone might want [6] - [1] to return
[5], but that wouldn’t make sense given that “Arrays are ordered,
integer-indexed collections of any object.”

Well, the implementation of Array#- does not fit there well, does it?
Because it works like set substraction while Array#+ works as array
concatenation - not very consistent either - but apparently useful.

Any objections to nil.to_hash?

Note: in the example I meant to type (hash_or_nil || {}).rehash instead of
(hash_or_nil || nil).rehash

What other classes do implement to_hash? In 1.9.1:

irb(main):007:0> ObjectSpace.each_object(Module) {|cl| p cl if
cl.instance_methods.include? :to_hash}
Hash
=> 407
irb(main):008:0>

Btw, invoking x.to_hash is not a cast. First, there are no casts in
Ruby because variables do not have types and second, to_hash is an
ordinary method. Casts on the other hand are usually not implemented
via ordinary methods. (In C++ for example casts are operators which can
be overloaded.)

Kind regards

robert

Erik_Michaels-Ober · August 10, 2009, 8:01pm

Erik, when I’ve needed set-like behavior in ruby, I just used the Set
class. You should see if a set of hashes does what you need.

As far as having a .to_h method for Nil, I think you’ve got a good idea
there. Nil has to_a, to_c, to_f, to_i, to_r, and to_s; having a to_h
seems consistent and expected to me. I recommend you write a patch and
submit it to the ruby developers. Even if C isn’t your language, you
should be able to figure out something this simple by looking at the
source for similar methods.