Is this a sensible implementation for Array#group_by?

I was looking for a method to split an array into smaller arrays based
on some property of the members (set with a code block). I came up with
the following and wanted to know what you guys think about a) whether
it’s sensible b) whether it’s been done already:

class Array
def group_by
result_array = []
self.each do |element|
key = yield element
if (found = result_array.assoc(key))
found[1] << element
else
result_array << [key, [element]]
end
end
return result_array.collect{|a| a[1]}
end
end

arr = %w(apple banana pear plum nectarine orange melon)
=> [“apple”, “banana”, “pear”, “plum”, “nectarine”, “orange”, “melon”]
#group by length of name

arr.group_by{|s| s.length}
=> [[“apple”, “melon”], [“banana”, “orange”], [“pear”, “plum”],
[“nectarine”]]
#group by whether it has an e

arr.group_by{|s| (s =~ /e/) == nil}
=> [[“apple”, “pear”, “nectarine”, “orange”, “melon”], [“banana”,
“plum”]]

One thing i haven’t done is make it sort: since the grouping key is just
an object and most objects aren’t sortable. I could get round this but
not without slowing it down and the user can always sort the results by
comparing the first member of each subarray.

I’m just after some feedback really. I’m guessing i just couldn’t find
the good implementation of it :slight_smile:

On Mon, Sep 8, 2008 at 10:04 AM, Max W.
[email protected] wrote:

 if (found = result_array.assoc(key))
   found[1] << element
 else
   result_array << [key, [element]]
 end

end
return result_array.collect{|a| a[1]}
end
end

I’m just after some feedback really. I’m guessing i just couldn’t find
the good implementation of it :slight_smile:

It seems sensible to me, although I’d use a hash instead of a
associative array - it just looks cleaner. I didn’t check the
performance difference though.

class Array
def group_by
result={}
self.each do |element|
(result[yield(element)]||=[]) << element

  • end
    return result.values
    end
    end

I think the Facets library already has a similar method.
-Adam

Adam S. wrote:
(…)>

I’m just after some feedback really. I’m guessing i just couldn’t find
the good implementation of it :slight_smile:

It seems sensible to me, although I’d use a hash instead of a
associative array - it just looks cleaner. I didn’t check the
performance difference though.

-Adam
It’s also in Ruby 1.8.7 Enumerable.

Another possibility: use a set.

require ‘set’
fruits = %w(apple banana pear plum nectarine orange melon).to_set
p fruits.classify{|s| s.length}
p fruits.classify{|s| s.include?(“e”)}

regards,

Siep

Adam S. wrote:

It seems sensible to me, although I’d use a hash instead of a
associative array - it just looks cleaner. I didn’t check the
performance difference though.

I thought about a hash first, but for some reason shied away from a hash
where the keys could be any object, including nil. There’s no reason to
be afraid of that though, is there? I think a hash would probably be
faster.

It’s also in Ruby 1.8.7 Enumerable.

ah…we’re still on 1.8.6 round these parts. We need to change up
really…i keep seeing this cool stuff.

Another possibility: use a set.
investigates…ah yes, sets, i’d completely overlooked those. For some
reason the Set class is hard to find in the api, or at least hard for me
to find in this particular api - RDoc Documentation

Converting to_set, then calling divide, then calling to_a again can’t be
very efficient though, can it?

thanks

Adam Shelley wrote

I think the Facets library already has a similar method.

Erik V. wrote:

Indeed, Facets does have an Enumerable#group_by. And it has an
Enumerable#cluster_by as well. And the latter is the one you’re
looking for, because you want an Array and not a Hash.

Facets - investigates again… now that is very good indeed. Wow. I
had a feeling this would exist already in a better form :slight_smile:

thanks a lot everyone.

2008/9/9 Max W. [email protected]:

Adam Shelley wrote

I think the Facets library already has a similar method.

Erik V. wrote:

Indeed, Facets does have an Enumerable#group_by. And it has an
Enumerable#cluster_by as well. And the latter is the one you’re
looking for, because you want an Array and not a Hash.

Facets - investigates again… now that is very good indeed. Wow. I
had a feeling this would exist already in a better form :slight_smile:

If I would want to do it myself, then I’d probably do

module Enumerable
def group_by
result = Hash.new {|h,k| h[k] = []}
each {|el| result[yield el] << el}
result
end
end

Kind regards

robert

Indeed, Facets does have an Enumerable#group_by. And it has an
Enumerable#cluster_by as well. And the latter is the one you’re
looking for, because you want an Array and not a Hash.

Group_by uses each, because it’s faster than inject.

gegroet,
Erik V. - http://www.erikveen.dds.nl/


module Enumerable
def group_by
res = {}
each{|e| (res[yield(e)] ||= []) << e}
res
end

def cluster_by(&block)
#group_by(&block).values # In case of unsortable keys.
group_by(&block).sort.transpose.pop || []
end
end


a = %w(apple banana pear plum nectarine orange melon)

a.group_by{|e| e.length} # ==> {5=>[“apple”, “melon”],
6=>[“banana”, “orange”], 9=>[“nectarine”], 4=>[“pear”, “plum”]}
a.cluster_by{|e| e.length} # ==> [[“pear”, “plum”], [“apple”,
“melon”], [“banana”, “orange”], [“nectarine”]]