Thoughts on a more generic Array#partition function

Rudi_C · July 3, 2008, 5:36am

An experiment in a more generic partition function. The current
Array#partition function takes a code block and interprets the result
in a boolean way to create two lists. What if the user wants to create
more than two lists?

A common idea is to define a sort of indicator function that maps each
element of the array to a value that indicates which list to assign the
element to. Then, all elements that resulted in the same value would be
assigned the same partition list. This is a good system, but has one
small problem: It cannot return the results in a definitely ordered
Array because there is no way to know what order the return values are
unless
they are Sortable. Since TrueClass and FalseClass are not sortable, it
means that it is hopeless to try to preserve compatibility with the old
partition function without extending the TrueClass and FalseClass to at
least allow <=> comparisons between true,true and true,false, etc.

Therefore, in this demonstration I want to show four points:

It is possible to write a generic partition function that can sort
into
any number of lists and not just two. This generic partition function,
called npartition, can use the boolean types true and false as
well as any integer, string, symbol, or other code block return types
instead.
In the program below, I demonstrate using the boolean and integer modes
to
partition a small array of integers.
A generic partition function can be perfectly compatible with
the pre-existing two-way split partition function as long as users are
willing to apply a boolean cast in the cases where booleans are not
explicitly returned: return i must become return i ? true : false to be
safe
in those cases that previously relied on a Boolean context cast.
It is impossible to have a non-array-data-dependent (canonical)
ordering
for the returned partition lists unless we require the possible sorting
key (returned from code block) values to be Sortable. If we do not
require
this, then we would be forced to use the input key appearance sequence
as
the output ordering and this would shift depending on which element is
listed first in the Array. It is desirable to have a standard order
that
occurs regardless of the order of appearance in the input array. In
fact,
the current partition function effectively imposes an ordering on the
true
and false implicitly by its return ordering of true first then false.
It is convenient, therefore, to allow a (any) standard ordering to be
imposed on TrueClass and FalseClass. By not imposing a standard
ordering,
we make it difficult for others to do so in a safe way that won’t
conflict.
Yet, there are often cases where code could make great use of any
particular
ordering for true,false as is implied by mathematical lattice theory.
It
would also be convenient to include nil in this ordering: true nil false
I cannot tell if this is possible in Ruby without breaking other things.

class Array
def npartition(&cb)
labels = inject( { } ) { |n, a|
k = cb.call(a)
n[k] = [ ] unless n.has_key?(k)
n[k] << a
n
}
labels.keys.sort.map { |a| labels[a] }
end
end

returns a boolean sorting key given an integer

def bfunc(q)
q % 2 == 0
end

returns an integer sorting key given an integer

def ifunc(q)
q % 2
end

Reverse Alphabetical by class name for singleton classes only

def ofunc(sa, sb)
sb.class.to_s <=> sa.class.to_s
end

def testpart(a)
r1 = r2 = “failed”
begin
r1 = a.partition() { |i| yield i }
rescue
puts $!
end
puts “From partition: #{r1.inspect}”
begin
r2 = a.npartition() { |i| yield i }
rescue
puts $!
end
puts “From npartition: #{r2.inspect}”
end

a = [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ]
puts “Without TrueClass FalseClass ordering extension:”
puts “Boolean case:”
testpart(a) { |i| bfunc(i) }
puts “Integer case:”
testpart(a) { |i| ifunc(i) }
class TrueClass
def <=>(b) ofunc(self,b) end
end
class FalseClass
def <=>(b) ofunc(self,b) end
end
puts “\nWith TrueClass FalseClass ordering extension:”
puts “Boolean case:”
testpart(a) { |i| bfunc(i) }
puts “Integer case:”
testpart(a) { |i| ifunc(i) }

puts “\nSpecial (new extension) three-way list split”
testpart(a) { |i| i % 3 }
exit 0

Rudi_C · July 3, 2008, 5:48am

From: Rudi C. [mailto:[email protected]]

Subject: thoughts on a more generic Array#partition function

…

puts “\nSpecial (new extension) three-way list split”

testpart(a) { |i| i % 3 }

how does it compare to #group_by

RUBY_VERSION
=> “1.9.0”
a = [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ]
=> [1, 2, 3, 4, 7, 9, 11, 12, 13, 14]
a.group_by { |i| i % 3 }
=> {1=>[1, 4, 7, 13], 2=>[2, 11, 14], 0=>[3, 9, 12]}

RUBY_VERSION
=> “1.8.7”
a = [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ]
=> [1, 2, 3, 4, 7, 9, 11, 12, 13, 14]
a.group_by { |i| i % 3 }
=> {0=>[3, 9, 12], 1=>[1, 4, 7, 13], 2=>[2, 11, 14]}

kind regards -botp

Rudi_C · July 3, 2008, 6:05am

Hi,

In message “Re: thoughts on a more generic Array#partition function”
on Thu, 3 Jul 2008 12:31:51 +0900, “Rudi C.”
[email protected] writes:

|An experiment in a more generic partition function. The current
|Array#partition function takes a code block and interprets the result
|in a boolean way to create two lists. What if the user wants to create
|more than two lists?

How about using group_by method found in 1.9/1.8.7?

          matz.

Rudi_C · July 3, 2008, 6:20am

Ooops, I just realized npartition requires eql? and hash methods
defined too of course.
So there is not much difference left. I am wondering if it is worth
it for the language
otherwise though to consider each of the small adjustments.

Best regards,

Rudi

On Wed, Jul 2, 2008 at 9:13 PM, Rudi C. [email protected]
wrote:

Hi Botp,
This is not so bad, but perhaps has a small disadvantage that it
requires eql? and hash methods defined unlike npartition. But I
^^ meant
like

Rudi_C · July 3, 2008, 6:16am

Hi Botp,

On Wed, Jul 2, 2008 at 8:44 PM, Peña, Botp [email protected] wrote:

From: Rudi C. [mailto:[email protected]]

Subject: thoughts on a more generic Array#partition function

…

puts “\nSpecial (new extension) three-way list split”

testpart(a) { |i| i % 3 }

how does it compare to #group_by

It is very similar. Thank you Botp for calling it to my attention.
And thank you Matz for reading my email.
The differences I can understand are as follows:

group_by requires eql? and hash methods defined for return types.
npartition requires Sortable .
group_by returns a hash, npartition returns an array of arrays

Now that I see group_by I can see that it is not hard to implement the
general partition function using it as just:
h = a.group_by() { |i| func(i) }
h.keys.sort.map { |k| h[k] }

This is not so bad, but perhaps has a small disadvantage that it
requires eql? and hash methods defined unlike npartition. But I
cannot figure out a simpler version to get array of array’s output.
Best regards,

Rudi

Rudi_C · July 3, 2008, 6:57am

From: Rudi C. [mailto:[email protected]]

h = a.group_by() { |i| func(i) }

h.keys.sort.map { |k| h[k] }

This is not so bad, but perhaps has a small disadvantage that it

requires eql? and hash methods defined unlike npartition. But I

cannot figure out a simpler version to get array of array’s output.

just apply #values to it,

h=a.group_by { |i| i % 3 }
=> {0=>[3, 9, 12], 1=>[1, 4, 7, 13], 2=>[2, 11, 14]}
h.values
=> [[3, 9, 12], [1, 4, 7, 13], [2, 11, 14]]

kind regards -botp

Rudi_C · July 3, 2008, 2:17pm

2008/7/3 Rick DeNatale [email protected]:

Not exactly the same, values doesn’t guarantee the desired ordering by key.

That’s easily fixed:

arbitrary_hash.sort_by {|k,| k}.map {|k,v| v}

Having said that the #group_by implementation seems the more general
solution (because of less requirements, i.e. keys do not need to
provide <=>).

Kind regards

robert

Rudi_C · July 3, 2008, 1:44pm

On Thu, Jul 3, 2008 at 12:53 AM, PeÃ±a, Botp [email protected]
wrote:

From: Rudi C. [mailto:[email protected]]

h = a.group_by() { |i| func(i) }

h.keys.sort.map { |k| h[k] }

This is not so bad, but perhaps has a small disadvantage that it

requires eql? and hash methods defined unlike npartition. But I

cannot figure out a simpler version to get array of array’s output.

just apply #values to it,

Not exactly the same, values doesn’t guarantee the desired ordering by
key.
Ruby 1.9 orders hashes by insertion history, Ruby < 1.8.7 doesn’t order
them
at all, not sure about 1.8.7

–
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Rudi_C · July 3, 2008, 3:45pm

On Thu, Jul 3, 2008 at 3:16 PM, Rudi C. [email protected]
wrote:

small typing difference (Arrays versus hashses – can go either way here)

less requirements for group_by

Thanks for the great input so far. Best regards,

Maybe I’m being dense, but isn’t the expression you assign to labels
in your definition of npartition an implementation of group_by? If so,
I would expect npartition to be slower than Ruby’s group_by unless
your implementation of group_by is more efficient than Ruby’s.

Peter

Rudi_C · July 3, 2008, 3:21pm

On Thu, Jul 3, 2008 at 5:13 AM, Robert K.
[email protected] wrote:

solution (because of less requirements, i.e. keys do not need to
provide <=>).

Hi Robert,

I have come to agree with your analysis. I think this one-line
expression you made is very good: As far as source-code brevity,
clarity, and correctness this expression you invented gets high marks
because there are no dangling outer variable etc and that is great.
But my small problem now is the extra code-block calls during runtime.
I guess there is one more code-block call per element in the
expression as written compared to npartition. So it is now down to a

small efficiency difference (probably npartition is slightly faster
but untested)
small typing difference (Arrays versus hashses – can go either way
here)
less requirements for group_by

Thanks for the great input so far. Best regards,

Rudi

Rudi_C · July 3, 2008, 5:04pm

2008/7/3 Rudi C. [email protected]:

Having said that the #group_by implementation seems the more general
solution (because of less requirements, i.e. keys do not need to
provide <=>).

I have come to agree with your analysis. I think this one-line
expression you made is very good: As far as source-code brevity,
clarity, and correctness this expression you invented gets high marks
because there are no dangling outer variable etc and that is great.

Oh, thank you!

But my small problem now is the extra code-block calls during runtime.
I guess there is one more code-block call per element in the
expression as written compared to npartition. So it is now down to a

small efficiency difference (probably npartition is slightly faster
but untested)

I’d be guessing that #group_by is faster - because it’s implemented in
C. But you should benchmark for a definitive answer. I have been
surprise more often than not - following big O calculus can be quite
misleading at times.

Cheers

robert

Rudi_C · July 4, 2008, 3:18am

From: Rick DeNatale [mailto:[email protected]]

Not exactly the same, values doesn’t guarantee the desired

ordering by key. Ruby 1.9 orders hashes by insertion history,

Ruby < 1.8.7 doesn’t order them

at all, not sure about 1.8.7

current test:

1.9 not sorted
1.8.6 not sorted
1.8.7 sorted

i prefer 1.8.7 default behaviour

kind regards -botp

Rudi_C · July 4, 2008, 9:24am

From: Yukihiro M. [mailto:[email protected]]

[email protected] writes:

|current test:

|

|1.9 not sorted

|1.8.6 not sorted

|1.8.7 sorted

|

|i prefer 1.8.7 default behaviour

Hmm, from my understanding, 1.9 preserves the order, prior version

does not. Could you show us the test you got the above conclusion?

sir Matz, as requested,

[RUBY_VERSION, RUBY_RELEASE_DATE, RUBY_REVISION]
=> [“1.9.0”, “2008-06-20”, 17482]

RUBY_PLATFORM
=> “i386-mswin32”

a = [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ]
=> [1, 2, 3, 4, 7, 9, 11, 12, 13, 14]

a.group_by { |i| i % 3 }
=> {1=>[1, 4, 7, 13], 2=>[2, 11, 14], 0=>[3, 9, 12]}

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
at this point, 1.9 and 1.8.7 differs

h=a.group_by { |i| i % 3 }
=> {1=>[1, 4, 7, 13], 2=>[2, 11, 14], 0=>[3, 9, 12]}

h.values
=> [[1, 4, 7, 13], [2, 11, 14], [3, 9, 12]]

kind regards -botp

Rudi_C · July 4, 2008, 8:32am

Hi,

In message “Re: thoughts on a more generic Array#partition function”
on Fri, 4 Jul 2008 10:14:18 +0900, Peña, Botp [email protected]
writes:

Hmm, from my understanding, 1.9 preserves the order, prior version
does not. Could you show us the test you got the above conclusion?

          matz.

Rudi_C · July 4, 2008, 9:31am

2008/7/4 Peña, Botp [email protected]:

does not. Could you show us the test you got the above conclusion?

=> [1, 2, 3, 4, 7, 9, 11, 12, 13, 14]

a.group_by { |i| i % 3 }
=> {1=>[1, 4, 7, 13], 2=>[2, 11, 14], 0=>[3, 9, 12]}

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
at this point, 1.9 and 1.8.7 differs

Where exactly? I cannot seem to see it. Above and below look
identical. What am I missing?

h=a.group_by { |i| i % 3 }
=> {1=>[1, 4, 7, 13], 2=>[2, 11, 14], 0=>[3, 9, 12]}

I get

09:29:30 bas$ ruby --version
ruby 1.8.7 (2008-06-20 patchlevel 22) [i386-cygwin]
09:29:40 bas$ ruby -e ‘p [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ].group_by
{ |i| i % 3 }’
{0=>[3, 9, 12], 1=>[1, 4, 7, 13], 2=>[2, 11, 14]}

09:29:43 bas$ ruby19 --version
ruby 1.9.0 (2008-03-01 revision 15664) [i386-cygwin]
09:29:47 bas$ ruby19 -e ‘p [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14
].group_by { |i| i % 3 }’
{1=>[1, 4, 7, 13], 2=>[2, 11, 14], 0=>[3, 9, 12]}

And there is a difference.

Kind regards

robert

Rudi_C · July 4, 2008, 5:35pm

On Fri, Jul 4, 2008 at 3:27 AM, Robert K.
[email protected]
wrote:

Hmm, from my understanding, 1.9 preserves the order, prior version

=> {1=>[1, 4, 7, 13], 2=>[2, 11, 14], 0=>[3, 9, 12]}
ruby 1.9.0 (2008-03-01 revision 15664) [i386-cygwin]
09:29:47 bas$ ruby19 -e ‘p [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14
].group_by { |i| i % 3 }’
{1=>[1, 4, 7, 13], 2=>[2, 11, 14], 0=>[3, 9, 12]}

And there is a difference.

Yes there is. And I suspect that some following this thread are
confused by
the difference between sorted and ordered.

Ruby 1.8 (at least before 1.8.7) gives no specification of the order in
which a hash will yield it’s keys, values or key-value pairs for methods
in
the each family.

Ruby 1.9 keeps track of the INSERTION order so what we are seeing in the
1.9
output is a result of the fact that group_by inserts a key value pair
each
time it encounters a NEW value from the block., so with the array [1, 2,
3,
…] the block (which returns element % 3) will first return 1, then 2,
then
0, after which all of the possible return values have been exhausted.
So
the hash is ordered by key as 1 => … , 2=> …, 3=>…

In the case of Ruby 1.8, the enumeration order is an accident of the way
the
key values hash, since the hash value of a fixnum n in MRI Ruby 1.8 is
n*2+1, in this case the enumeration order is somewhat predictable.

But in general hashes are not enumerated in key-sort order in any
version
of Ruby as far as I know.

–
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/