Need some Ruby magic

hammed · December 7, 2005, 9:28pm

This is really just meant to be a simple hack. I think a better
algorithm could assign a random index to each element, then sort on
those indices.

http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code,
with a nice proof that it’s unbiased.

http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on
shuffling algorithms.

martin

hammed · December 7, 2005, 4:14pm

On 07/12/05, [email protected] [email protected] wrote:

Why not
a.dup.sort { 2*rand(12345) <=> 12345 }

? that way, we’d have NO colisions ever.

What are you all trying to achieve by variations of
rand <=> rand,

why not simply use
rand(3) - 1
?

which is not in danger of decreasing randomness by combining random
numbers.

cheers,

Brian

–
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

hammed · December 7, 2005, 10:18pm

Martin DeMello wrote:

http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code,
with a nice proof that it’s unbiased.

http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on
shuffling algorithms.

Thanks Martin!

–Steve

hammed · December 6, 2005, 10:19pm

On Wed, Dec 07, 2005 at 05:04:13AM +0900, Stephen W. wrote:

Stephen W. wrote:

I forgot to deal with collisions in the hash!

Here’s another shot… still faster than sort_by{rand} on my system, and
now should be unbiased:

def shuffle3
h = Hash.new
self.each do |v|
k = rand(1000000000) until !h.has_key?(k)

This is not equivalent to
begin
k = rand(1000000000)
end until !h.has_key?

Compare
k = 1 until true
k # => nil
to
begin; k = 1 end until true
k # => 1

  h[k] = v

You forgot
h.keys.sort.collect { |k| h[k] }

end
end

shuffle3 (10000) 3.625000 0.016000 3.641000 ( 3.656000)
sort_by (10000) 5.250000 0.000000 5.250000 ( 5.312000)

After fixing shuffle3, I get:

module Enumerable
def shuffle3
h = Hash.new
k = rand(1000000000)
self.each do |v|
k = rand(1000000000) until !h.has_key?(k)
h[k] = v
end
h.keys.sort.collect { |k| h[k] }
end

def shuffle4
h = {}
k = rand(1000000000)
self.sort_by{ k = rand(1000000000) while h.has_key?(k); h[k] = k }
end
end

GC.disable
(1…10).shuffle4 # => [8, 3, 10, 2, 4,
6, 7, 5, 9, 1]
a = 1…100000
t0 = Time.new # => Tue Dec 06
22:13:56 CET 2005
a.shuffle4
(t1 = Time.new) - t0 # => 0.766411
a.shuffle3
(t2 = Time.new) - t1 # => 0.575967
a.sort_by{ rand }
Time.new - t2 # => 0.637288

hammed · December 7, 2005, 4:22pm

[email protected] wrote:

Why not
a.dup.sort { 2*rand(12345) <=> 12345 }

? that way, we’d have NO colisions ever.

What you have posted does not avoid collisions.
You probably mean something like

a.sort {2*rand(12345) <=> 2 * rand(12345) + 1}

However, the problem of the sort {rand<=>rand}
methods are not collisions, but the skewed results
from the algorithm that assumes a consistent
comparison operator. Quoting from an earlier
post of mine:

ar = [0, 1, 2]; result = Hash.new(0)
100000.times {result[ar.sort {rand <=> rand}] += 1}
p result

=> {[2, 1, 0]=>25235,

   [2, 0, 1]=>12285,
   [1, 0, 2]=>12426,
   [0, 1, 2]=>25111,
   [1, 2, 0]=>12468,
   [0, 2, 1]=>12475}

You get basically the same result with your method

HTH,

Michael

hammed · December 6, 2005, 4:07pm

Uwe S. wrote:

from random import random

def shuffle(a):
b = [ (random(), i) for i in a]
b.sort()
return [ x[1] for x in b ]

print shuffle(range(10))

I would have thought those two functions do the same thing
with the exception that shuffle does not sort in place.
I’m probably missing something obvious. Could you explain
the difference between the two versions?

Michael