On Tue, Dec 06, 2005 at 08:23:52AM +0900, [email protected] wrote:
the big difference is that a value will have rand called multiple times for
That might be the very reason why it is ultimately much more biased than
sort_by{ rand }… I’m keeping the latter.
If the limited precision of rand() became a problem, I could always use
arr.sort_by{ [rand, rand] } where collisions happen with a probability
around 6.16298e-19 for a 10_000_000 element array, or more generally
Array.new(N){rand} (but then we’d have to look into MT for another
possible source of residual bias).
it - in otherwords an element compareds differently every time. this is
bound to help shuffle more uniformly it seems:
We cannot trust our intuition in these cases; in the following example,
the
tenth element is more likely to stay where it was (look at pos[9]):
a = (1…100); pos = Array.new(100, 0)
t = Time.new; 100000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1 }
Time.new - t
=> 223.891608
pos
=> [1176, 1208, 1083, 1153, 1191, 1130, 1098, 1133, 1189, 1438, 1150,
1043,
========
1121, 1067, 1076, 1069, 1078, 1113, 1129, 1143,1115, 1074, 1081, 1069,
1057,
[…]
t = Time.new; 10000000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1
}
Time.new - t
=> 22781.664168
pos
=> [105944, 105861, 105546, 105681, 104059, 101196, 99080, 100675,
110851,
132223, 101670, 96018, 96147, 96688, 97709, 99051, 97823, 97345,
98799,
[…]
The bias can be observed easily with smaller arrays:
a = (1…20); pos = Array.new(20, 0)
t = Time.new
100000.times{ pos[a.sort{rand <=> rand}.index(3)] += 1 }
Time.new - t # => 20.848851
pos
=> [5084, 5694, 6561, 4326, 4205, 4792, 4913, 4827, 4791, 4571, 4560,
4707,
4863, 4934, 5112, 5011, 5071, 5326, 5482, 5170]
a = (1…10); pos = Array.new(10, 0)
t = Time.new
1_000_000.times{ pos[a.sort{rand <=> rand}.index(5)] += 1 }
Time.new - t # => 77.438912
pos
=> [113836, 103238, 99442, 88682, 101306, 83294, 93296, 100830,
100744, 115332]
Compare the latter to
a = (1…10); pos = Array.new(10, 0)
t = Time.new
1_000_000.times{ pos[a.sort_by{rand}.index(5)] += 1 }
Time.new - t # => 32.163499
pos
=> [99822, 100020, 100328, 100040, 99603, 100098, 100195, 100023,
99875, 99996]