On Jun 19, 2006, at 9:08, Kroeger, Simon (ext) wrote:
I think it’s a little bit to simple to be put in the standard lib.
a.sort_by{rand}.each {|e| print e , ’ '}
But it turns out to not be that simple. This can give a biased sort
depending on the underlying sort algorithm, and it’s not as efficient
as it could be, as Alex observes:
On Jun 19, 2006, at 10:52, Alex Y. wrote:
For large collections, it’d be much nicer not to have to either
duplicate or sort - isn’t this something that quasirandom sequences
are good for? Is there some GSL-guru out there who can comment?
A quasi-random sequence is a theoretically better flavour of random
than a pseudo-random sequence in a lot of cases, but there’s no
direct application to this problem (apart from being used in place of
Kernel#rand). It’d be overkill for all but the most specialised
uses, I’d say.
On Jun 19, 2006, at 11:29, [email protected] wrote:
On second thought, that still sorts an array of the same size. Hmm…
What we’re looking for is a shuffle; a random selection of elements
from an array might not include all the elements from the source,
whereas a shuffle is a permutation of the source.
Like a lot of simple-sounding problems, this one’s already been
solved, see:
The linked Haskell implementation also gives a very thorough
discussion, including a demonstration of how a stable sorting
algorithm will yield biased results. The imperative solution given
in that discussion is identical to the Fisher-Yates shuffle on the
NIST site, which is what I’ll use here:
- for i: 0…n, exchange each element e_i with another element
e_i…e_n.
Which is the best possible case of O(n), compared to O(n lg n) when
using #sort_by{rand}. This is proven to be unbiased (see the
reference to Knuth given on the NIST site).
Intuitively, the best place for this seems like Enumerable, since
anything that can be sorted can be shuffled. Is this the sort of
thing that needs an RCR? Anyway, I doubt I’d use it myself, but I
can see the following arguments for its inclusion:
- It’s easy to do wrong:
- inefficiently and possibly biased using #sort_by.
- biased under any other sort of element exchange:
e.g., exchanging e_i with any element in the
array instead of just the unshuffled elements
leads to a biased shuffle.
- It seems fairly generic.
- People migrating from other languages may expect
it (e.g., PHP, Java).
On the other hand, people who REALLY care about unbiased shuffles
will already know how to do them.
Quick Ruby implementation for Array: In-place shuffling is the
simplest case. We can use it to create a more Ruby-esque shuffle by
using an approach similar to trans’ one given above: shuffle an array
of indices and use that as a mapping between the original and
shuffled arrays.
class Array
def shuffle!
self.each_index { |i|
r = rand(size - i) + i
tmp = self[i]
self[i] = self[r]
self[r] = tmp
}
end
this would work if dup produced a deep copy.
def dup_shuffle
self.dup.shuffle!
end
use an in-place shuffle of an array of indices
def shuffle
res = []
indices = (0…size).to_a.shuffle!
indices.each_with_index { |x, i|
block_given? ? yield(self[x]) : res[i] = self[x]
}
block_given? ? self : res
end
end
matthew smillie.