I was looking for a method to split an array into smaller arrays based
on some property of the members (set with a code block). I came up with
the following and wanted to know what you guys think about a) whether
it’s sensible b) whether it’s been done already:
class Array
def group_by
result_array = []
self.each do |element|
key = yield element
if (found = result_array.assoc(key))
found[1] << element
else
result_array << [key, [element]]
end
end
return result_array.collect{|a| a[1]}
end
end
arr = %w(apple banana pear plum nectarine orange melon)
=> [“apple”, “banana”, “pear”, “plum”, “nectarine”, “orange”, “melon”] #group by length of name
arr.group_by{|s| s.length}
=> [[“apple”, “melon”], [“banana”, “orange”], [“pear”, “plum”],
[“nectarine”]] #group by whether it has an e
One thing i haven’t done is make it sort: since the grouping key is just
an object and most objects aren’t sortable. I could get round this but
not without slowing it down and the user can always sort the results by
comparing the first member of each subarray.
I’m just after some feedback really. I’m guessing i just couldn’t find
the good implementation of it
if (found = result_array.assoc(key))
found[1] << element
else
result_array << [key, [element]]
end
end
return result_array.collect{|a| a[1]}
end
end
I’m just after some feedback really. I’m guessing i just couldn’t find
the good implementation of it
It seems sensible to me, although I’d use a hash instead of a
associative array - it just looks cleaner. I didn’t check the
performance difference though.
class Array
def group_by
result={}
self.each do |element|
(result[yield(element)]||=[]) << element
end
return result.values
end
end
I think the Facets library already has a similar method.
-Adam
I’m just after some feedback really. I’m guessing i just couldn’t find
the good implementation of it
It seems sensible to me, although I’d use a hash instead of a
associative array - it just looks cleaner. I didn’t check the
performance difference though.
-Adam
It’s also in Ruby 1.8.7 Enumerable.
Another possibility: use a set.
require ‘set’
fruits = %w(apple banana pear plum nectarine orange melon).to_set
p fruits.classify{|s| s.length}
p fruits.classify{|s| s.include?(“e”)}
It seems sensible to me, although I’d use a hash instead of a
associative array - it just looks cleaner. I didn’t check the
performance difference though.
I thought about a hash first, but for some reason shied away from a hash
where the keys could be any object, including nil. There’s no reason to
be afraid of that though, is there? I think a hash would probably be
faster.
ah…we’re still on 1.8.6 round these parts. We need to change up
really…i keep seeing this cool stuff.
Another possibility: use a set.
investigates…ah yes, sets, i’d completely overlooked those. For some
reason the Set class is hard to find in the api, or at least hard for me
to find in this particular api - RDoc Documentation
Converting to_set, then calling divide, then calling to_a again can’t be
very efficient though, can it?
I think the Facets library already has a similar method.
Erik V. wrote:
Indeed, Facets does have an Enumerable#group_by. And it has an
Enumerable#cluster_by as well. And the latter is the one you’re
looking for, because you want an Array and not a Hash.
Facets - investigates again… now that is very good indeed. Wow. I
had a feeling this would exist already in a better form
I think the Facets library already has a similar method.
Erik V. wrote:
Indeed, Facets does have an Enumerable#group_by. And it has an
Enumerable#cluster_by as well. And the latter is the one you’re
looking for, because you want an Array and not a Hash.
Facets - investigates again… now that is very good indeed. Wow. I
had a feeling this would exist already in a better form
If I would want to do it myself, then I’d probably do
module Enumerable
def group_by
result = Hash.new {|h,k| h[k] = []}
each {|el| result[yield el] << el}
result
end
end
Indeed, Facets does have an Enumerable#group_by. And it has an
Enumerable#cluster_by as well. And the latter is the one you’re
looking for, because you want an Array and not a Hash.
Group_by uses each, because it’s faster than inject.