#map, #select semantics

james-calivar · August 28, 2009, 11:47am

I imagine this has come up before, though I can’t find anything about
it. I
was talking to a friend last night who mentioned that in Smalltalk, the
#collect and #select methods (and some others) return the same type as
the
object they’re called on, and in Ruby they always return an Array, no
matter
what the receiver type is. This seems like a good idea to me, though
there
are some questions:

What do they mean when applied to a Hash? Do you want a list of
values,
or keys, or pairs? Or do you want to map each key and value to a
different
key and value, and have #map build a Hash for you? If so, how do you
represent that as a return value? Is it [key, value] or {key => value} ?
Is part of the contract of #map that you get back a collection of the
same size as the input? If so, what happens if you #map a Set and
produce
duplicates? The output will be smaller than the input (similarly for
hash
key collisions). Certainly from an FP viewpoint, a map is a one-to-one
transform from input to output values, so it would seem logical to have
as
many output values as input values.
There is no uniform method for adding items to a collection. We have
Array#<<, Hash#[]=, Set#add and Set#add?. For linked lists, adding to
the
collection would involve changing pointers on existing members of the
collection. Is it possible to come up with a single method/message whose
role is to provide a uniform way to add something to any collection?

These are pretty much of the top of my head, but I’d be very interested
in
hearing discussion of pros and cons, and any history that shows why Ruby
differs from Smalltalk. I also heard that Ruby 2.0 will “correct” this
behaviour – is this true? Any discussion and/or links to further
reading
welcome.

P.S. This was raised through a discussion of JS.Class[1], which
implements
Enumerable, Hash and Set. The Enumerable methods return native
JavaScript
arrays, which don’t have the Enumerable methods – this causes some
frustration and breaks the Ruby-like model the library is designed to
support.

[1] http://jsclass.jcoglan.com

james-calivar · August 28, 2009, 12:13pm

Hi –

On Fri, 28 Aug 2009, James C. wrote:

I imagine this has come up before, though I can’t find anything about it. I
was talking to a friend last night who mentioned that in Smalltalk, the
#collect and #select methods (and some others) return the same type as the
object they’re called on, and in Ruby they always return an Array, no matter
what the receiver type is. This seems like a good idea to me, though there
are some questions:

I’m one of the few, I think, who prefer to get everything back in an
array. I like the idea of a uniform structure for result sets from
Enumerable operations. Among other things, it’s impossible for it to
work universally the other way, so it’s always going to be “return an
array, except for a few special cases.”

What do they mean when applied to a Hash? Do you want a list of values,
or keys, or pairs? Or do you want to map each key and value to a different
key and value, and have #map build a Hash for you? If so, how do you
represent that as a return value? Is it [key, value] or {key => value} ?

Hash#select and #reject return hashes in 1.9:

h = { 1 => 2, 3 => 4, 5 => 6 }
=> {1=>2, 3=>4, 5=>6}

h.select {|k,v| k > 1 }
=> {3=>4, 5=>6}

#map returns an array. I don’t think it can be otherwise, since you’re
only returning one value from the code block. Of course it would be
possible to write something like this:

module Enumerable
def map2hash
res = {}
each do |k,v|
res[k] = yield(v)
end
res
end
end

Is part of the contract of #map that you get back a collection of the
same size as the input? If so, what happens if you #map a Set and produce
duplicates? The output will be smaller than the input (similarly for hash
key collisions). Certainly from an FP viewpoint, a map is a one-to-one
transform from input to output values, so it would seem logical to have as
many output values as input values.

I think any quasi-mappish operation that departs from real map
semantics would have to be a different method. Mapping an enumerable
through a function to an array is just too basic and too useful not to
be available.

There is no uniform method for adding items to a collection. We have
Array#<<, Hash#[]=, Set#add and Set#add?. For linked lists, adding to the
collection would involve changing pointers on existing members of the
collection. Is it possible to come up with a single method/message whose
role is to provide a uniform way to add something to any collection?

You could try – it should be implementable in Ruby, at least for
proof of concept.

David

james-calivar · August 28, 2009, 3:35pm

On Fri, Aug 28, 2009 at 5:42 AM, James C.[email protected]
wrote:

I imagine this has come up before, though I can’t find anything about it. I
was talking to a friend last night who mentioned that in Smalltalk, the
#collect and #select methods (and some others) return the same type as the
object they’re called on, and in Ruby they always return an Array, no matter
what the receiver type is. This seems like a good idea to me, though there
are some questions:

I’m not sure what you mean is a good idea, the Smalltalk way, the Ruby
way, or that there’s a difference?

Actually, it’s not completely true that in Smalltalk collect and
select return the same class (and in both Ruby and Smalltalk classes
are not types), as the receiver. In Smalltalk they return an instance
of the species of the receiver. The abstract Collection class
implements collect and select which calls self species to get the
class to instantiate the result.

The collect method then enumerates the receiver and adds each element
to the result using the add: method. Select pretty much is the same
except that adds the element conditional on what the block evaluation
returns.

The abstract implementation of species is just self class, but it is
overridden when that’s not appropriate. For example the Interval class
(ST’s rough equivalent of Range) overrides species to return Array.

I’m going to reorder your questions a bit.

Is part of the contract of #map that you get back a collection of the
same size as the input? If so, what happens if you #map a Set and produce
duplicates? The output will be smaller than the input (similarly for hash
key collisions). Certainly from an FP viewpoint, a map is a one-to-one
transform from input to output values, so it would seem logical to have as
many output values as input values.

Not in Smalltalk. If you send collect to a Set, you’ll get back a Set
of the results of evaluating the block for each element of the
receiver, which means that duplicates will be removed. So the result
Set may be smaller than the original.

role is to provide a uniform way to add something to any collection?
Smalltalk implements Dictionary (which is the rough equivalent of
Ruby’s Hash) as a subclass of Set. A Smalltalk Dictionary is a Set of
Associations. An instance of Association represents a key-value pair,
and two Associations are considered equal if the keys are equal. When
you enumerate a Dictionary in Smalltalk with the do: method (the rough
equivalent of #each), the values of the associations are enumerated.
There’s also an associationsDo: method which enumerates the
associations.

In Smalltalk there is a uniform method to add an item to a collection
add: So to add an item to a Dictionary you add an association.

So what about collect/select for Dictionaries.

In Smalltalk, Dictionary overrides collect: to return an instance of
Bag holding the values in the Dictionary, a Bag is just what it sounds
like, it’s a bag of objects which is unordered and may have
duplicates.

Dictionary also overrides select: to use associationsDo: instead of
do: it returns an instance of the species of the receiver (a
Dictionary for Dictionary instances, but maybe something else for a
subclass) after adding each association value from evaluating the
block argument for each association in the receiver. So in this case
you’d get back a Dictionary or something like a Dictionary.

All of this is based on “Blue-Book” Smalltalk-80, there might be some
slight variation in more modern Smalltalk implementations/dialects.

Now, is the Ruby or Smalltalk approach ‘better’ I guess if I had my
druthers, I guess I differ a bit from my friend David A. Black here,
and I’d prefer it if Ruby were a bit more Smalltalk like here, and
Ruby 1.9 has made some steps in that direction.

But it is what it is.

–
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

james-calivar · August 28, 2009, 3:59pm

Hi –

On Fri, 28 Aug 2009, Rick DeNatale wrote:

Now, is the Ruby or Smalltalk approach ‘better’ I guess if I had my
druthers, I guess I differ a bit from my friend David A. Black here,
and I’d prefer it if Ruby were a bit more Smalltalk like here, and
Ruby 1.9 has made some steps in that direction.

But it is what it is.

The problem is that it isn’t what it is Maybe it’s just Hash
that’s special-cased. It’s certainly impossible to return, say, a file
handle from doing fh.map, meaningless to return a range from
range.map (though range.map is sort of hybrid behavior anyway), and
maybe or maybe not meaningful for other classes, including
user-defined ones, to return an object of the same class.

I guess I like the simplicity of defining each, including Enumerable,
and getting pre-defined behavior. I know that Hash and Array already
override a few things, though, so it’s never been quite that simple.
Still, given the underlying mechanism, there’s no way to generalize
returning a same-class object.

David

james-calivar · August 28, 2009, 3:58pm

On Fri, Aug 28, 2009 at 6:11 AM, David A. Black[email protected]
wrote:

#map returns an array. I don’t think it can be otherwise, since you’re
only returning one value from the code block.

Both of these are more like Smalltalk. As I pointed out in another
posting to this thread, in the second case Smalltalk would return a
Bag of values from the receiver.

I honestly can’t recall ever using map with a Dictionary in Smalltalk,
or with a Hash in Ruby for that matter.

end
end

It’s trickier coming up with a map for Hash which enumerates over the
key,value pairs and allows the block to return a new key,value pair,
what do you do if there are duplicate keys?

Smalltalk finessed that question.

through a function to an array is just too basic and too useful not to
be available.

I think there are two sides to that. If you think of the world of
collections primarily as arrays, then this makes sense.

OTOH, if you think of collections more abstractly, then having map on
a Set return a Set which might be smaller because of duplication in
the results of the block is useful as well, and if you want the array
behavior you can make an Array from the set ant then map that.

There is no uniform method for adding items to a collection. We have
Array#<<, Hash#[]=, Set#add and Set#add?. For linked lists, adding to the
collection would involve changing pointers on existing members of the
collection. Is it possible to come up with a single method/message whose
role is to provide a uniform way to add something to any collection?

You could try – it should be implementable in Ruby, at least for
proof of concept.

The difficulty is that for some Ruby classes, I’m thinking of Hash in
particular here, it’s not clear what an element is, in Smalltalk every
collection is a collection of elements, with Dictionary elements being
key-value associations.

–
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

james-calivar · August 29, 2009, 6:51am

On Aug 28, 9:54 am, “David A. Black” [email protected] wrote:

I guess I like the simplicity of defining each, including Enumerable,
and getting pre-defined behavior. I know that Hash and Array already
override a few things, though, so it’s never been quite that simple.
Still, given the underlying mechanism, there’s no way to generalize
returning a same-class object.

Sure there is --or at least it can be made complete predictable. Just
define a special class method, eg.

class Array
def self.enumerable_instance; []; end
end

class Hash
def self.enumerable_instance; {}; end
end

Then a uniform “add”. Eg. #<< (btw I love Hash#<<) and the whole thing
can be generalized.

This has been discussed before, but for whatever reason the idea has
been disregarded. I suspect efficiency hacks have come to out way any
of these nice “uniformalizaions”.

T.

james-calivar · August 28, 2009, 4:12pm

Hi –

On Fri, 28 Aug 2009, Rick DeNatale wrote:

semantics would have to be a different method. Mapping an enumerable
through a function to an array is just too basic and too useful not to
be available.

I think there are two sides to that. If you think of the world of
collections primarily as arrays, then this makes sense.

I wouldn’t extrapolate a world-view from it None of this has to
be winner-take-all; there’s room for all of these methods to exist. I
just like having a foundational map operation that returns a result
set in an array. I’m not ruling out other kinds of functionality, nor
claiming that non-array collections are actually arrays.

For example:

class Hash
def map2 # or whatever
h = {}
each do |k,v|
k2, v2 = yield(k,v)
h[k2] = v2
end
h
end
end

h = {1 => 2, 3 => 4 }

p h.map2 {|k,v| [k*10, v-1] }

=> {30=>3, 10=>1}

Nothing wrong with having such a method. I just don’t think it’s a
candidate to replace Hash#map.

David

james-calivar · August 29, 2009, 2:37pm

On Aug 29, 5:50 am, “David A. Black” [email protected] wrote:

I don’t mean that something can’t be defined for every class; I mean
specifically that it will never be the case that every Enumerable
class will return an instance of itself from its iterations.

That’s true. And we would never really want it to. There are many
classes we may make Enumerable just to have the conveniences for some
internal property. For example, I once defined an #each method for
ObjectSpace, and made it enumerable. I sure wouldn’t want a new
ObjectSpace returned!

I think people who suggest the idea, don’t really mean exactly what
they think they do. That is to say, they see Set and Hash enumerable
methods returning Arrays and think, why not Sets and Hashes? Then they
take the mental leap to the idea that enumerables ought to return an
instance of the originating class. But on further reflection, they
would realize that the ideal solution is a predictable and
controllable system for determining the return container.

It’s a pretty powerful idea. It could even be designed to allow the
container to be changed on the fly. Let’s say we have

[[:a,1], [:b,2]].map{ |(k,v)| [k, v+1] } #=> [[:a,2], [:b,3]]

We might have something like:

[[:a,1], [:b,2]].using({}).map{ |(k,v)| [k, v+1] } #=>
{:a=>2, :b=>3}

james-calivar · August 29, 2009, 11:51am

Hi –

On Sat, 29 Aug 2009, 7rans wrote:

Sure there is --or at least it can be made complete predictable. Just
Then a uniform “add”. Eg. #<< (btw I love Hash#<<) and the whole thing
can be generalized.

I don’t mean that something can’t be defined for every class; I mean
specifically that it will never be the case that every Enumerable
class will return an instance of itself from its iterations.

David

–
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Ruby/Rails training, mentoring, consulting, code-review
Latest book: The Well-Grounded Rubyist (The Well-Grounded Rubyist)

September Ruby training in NJ has been POSTPONED. Details to follow.

james-calivar · August 29, 2009, 3:26pm

Hi,

In message “Re: #map, #select semantics”
on Sat, 29 Aug 2009 21:32:21 +0900, 7rans [email protected]
writes:

|Let’s say we have
|
| [[:a,1], [:b,2]].map{ |(k,v)| [k, v+1] } #=> [[:a,2], [:b,3]]
|
|We might have something like:
|
| [[:a,1], [:b,2]].using({}).map{ |(k,v)| [k, v+1] } #=> {:a=>2, :b=>3}

[[:a,1], [:b,2]].inject({}){|h,(k,v)| h[k]=v+1; h} #=> {:a=>2,
:b=>3}

          matz.

james-calivar · August 29, 2009, 5:14pm

Hi –

On Sat, 29 Aug 2009, Rick DeNatale wrote:

|We might have something like:
preference for having ruby enumeration work more like Smalltalk, from
a pragmatic perspective, I’d prefer it if changes wouldn’t break my
existing code.

Does that mean that if I learn Smalltalk, I have to want it to be more
like Ruby? (I know, I know.)

David

–
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Ruby/Rails training, mentoring, consulting, code-review
Latest book: The Well-Grounded Rubyist (The Well-Grounded Rubyist)

September Ruby training in NJ has been POSTPONED. Details to follow.

james-calivar · August 29, 2009, 5:15pm

Hi –

On Sat, 29 Aug 2009, Yukihiro M. wrote:

|
| [[:a,1], [:b,2]].using({}).map{ |(k,v)| [k, v+1] } #=> {:a=>2, :b=>3}

[[:a,1], [:b,2]].inject({}){|h,(k,v)| h[k]=v+1; h} #=> {:a=>2, :b=>3}

Or (at least in 1.9):

[[:a,1], [:b,2]].each_with_object({}){|(k,v),h| h[k]=v+1 }

David

–
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Ruby/Rails training, mentoring, consulting, code-review
Latest book: The Well-Grounded Rubyist (The Well-Grounded Rubyist)

September Ruby training in NJ has been POSTPONED. Details to follow.

james-calivar · August 29, 2009, 4:01pm

On Sat, Aug 29, 2009 at 9:25 AM, Yukihiro M.[email protected]
wrote:

|
| [[:a,1], [:b,2]].using({}).map{ |(k,v)| [k, v+1] } #=> {:a=>2, :b=>3}

[[:a,1], [:b,2]].inject({}){|h,(k,v)| h[k]=v+1; h} #=> {:a=>2, :b=>3}

+1

Proving that inside every map there’s an inject struggling to break
free!

Seriously, although my Smalltalk background gives me a philosophical
preference for having ruby enumeration work more like Smalltalk, from
a pragmatic perspective, I’d prefer it if changes wouldn’t break my
existing code.

–
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

james-calivar · August 30, 2009, 12:00am

On Sun, 30 Aug 2009, Rick DeNatale wrote:

On Sat, Aug 29, 2009 at 11:13 AM, David A. Black[email protected] wrote:

Does that mean that if I learn Smalltalk, I have to want it to be more
like Ruby?

No, but it wouldnt hurt you!

I would say: no, and it wouldn’t hurt me I’d be happy to know
more about Smalltalk, and I strongly suspect I wouldn’t want it to be
more like Ruby.

David

–
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Ruby/Rails training, mentoring, consulting, code-review
Latest book: The Well-Grounded Rubyist (The Well-Grounded Rubyist)

September Ruby training in NJ has been POSTPONED. Details to follow.

james-calivar · August 29, 2009, 11:43pm

On Sat, Aug 29, 2009 at 11:13 AM, David A. Black[email protected]
wrote:

Does that mean that if I learn Smalltalk, I have to want it to be more
like Ruby?

No, but it wouldnt hurt you!

–
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

james-calivar · August 30, 2009, 6:12pm

On Aug 29, 9:25 am, Yukihiro M. [email protected] wrote:

|
|[[:a,1], [:b,2]].using({}).map{|(k,v)| [k, v+1]} #=> {:a=>2, :b=>3}

[[:a,1], [:b,2]].inject({}){|h,(k,v)| h[k]=v+1; h} #=> {:a=>2, :b=>3}

Ok. There are lots of ways to do it. Often I actually will use the
faster:

h={};[[:a,1], [:b,2]].each{|(k,v)| h[k]=v+1}; h

But the ideal was to also achieve:

{:a=>1, :b=>2}.map{ |k,v| [k, v+1] } #=> {:a=>2, :b=>3}

Which is far more clear then the alternatives (and easily converted to
an Array with #to_a). The idea of #using is just icing on the cake,
but is itself clearer than #inject (or the new #each_with_object),
both of which strike me as primarily inventions to make up for the
lack of the above, which I imagine represents a large chunk of the use
cases.

However, it does break backwards compatibility, so I don’t expect it’s
uptake irregardless of it’s merits.