I'd like to sort collections randomly. This is what I tried first: my_array.sort { |a,b| rand(2) } > but the results weren't very random. I then tried the following: class Array > def random_copy > b = Array.new > while b.length < self.length > random_value = self[rand self.length] > b.push random_value unless b.include? random_value > end > b > end > end This works but I'm sure there's some Ruby magic for doing this better. thanks

on 2005-12-01 20:45

on 2005-12-01 20:48

hammed wrote: > I'd like to sort collections randomly. This is what I tried first: > > my_array.sort { |a,b| rand(2) } my_array.sort_by { rand }

on 2005-12-01 20:49

Hi -- On Fri, 2 Dec 2005, Hammed Malik wrote: >> while b.length < self.length >> random_value = self[rand self.length] >> b.push random_value unless b.include? random_value >> end >> b >> end >> end > > > > This works but I'm sure there's some Ruby magic for doing this better. I think the most common idiom is: array.sort_by { rand } David -- David A. Black dblack@wobblini.net "Ruby for Rails", forthcoming from Manning Publications, April 2006!

on 2005-12-01 22:51

Take a look @ the Facets project. There are extension methods for array to have a number of randomized behaviors. j. On 12/1/05, Vincent Foley <vfoley@gmail.com> wrote: > > Maybe there should be Array#randomize... > > > -- "Remember. Understand. Believe. Yield! -> http://ruby-lang.org" Jeff Wood

on 2005-12-03 18:33

In article <dd3f270e4d20842e121bb970bc9a8386@ruby-forum.com>, Jim Weirich <jim-keyword-rforum.c88827@weirichhouse.org> wrote: > hammed wrote: > > I'd like to sort collections randomly. This is what I tried first: > > > > my_array.sort { |a,b| rand(2) } > > my_array.sort_by { rand } A quick test seems to indicate that this works with the ruby 1.8.2 implementation on my Mac. However, there is no guarantee that this call will select each of the n! permutations with equal probability. Whether it does will depend on the exact algorithm used by 'sort'. For instance, if sort uses the (inefficient for large arrays) bubble sort, the last key will end up staying there in half the cases. Reinder

on 2005-12-03 20:48

On Dec 3, 2005, at 9:32 AM, Reinder Verlinde wrote: > However, there is no guarantee that this call > will select each of the n! permutations with equal probability. How about just doing this? class Array def shuffle newarr = self self.length.times do |x| y = rand(self.length) newarr[x], newarr[y] = newarr[y], newarr[x] end newarr end end shuffled_array = my_array.shuffle --Steve

on 2005-12-03 22:23

On 12/4/05, Reinder Verlinde <reinder@verlinde.invalid> wrote: > A quick test seems to indicate that this works with the ruby 1.8.2 > implementation on my Mac. However, there is no guarantee that this call > will select each of the n! permutations with equal probability. > > Whether it does will depend on the exact algorithm used by 'sort'. For > instance, if sort uses the (inefficient for large arrays) bubble sort, > the last key will end up staying there in half the cases. Actually, sort_by caches every value passed and uses those for comparison (aka a Schwartzian Transform), so it would essentially return the same results as sorting a list of random numbers, regardless of algorithm. Sam

on 2005-12-04 00:48

reinder wrote: >> my_array.sort_by { rand } > > A quick test seems to indicate that this works with the ruby 1.8.2 > implementation on my Mac. However, there is no guarantee that this call > will select each of the n! permutations with equal probability. Actually, it will. sort_by uses a Schwartzian transform to do the sorting. That means that rand is only called once for each element of the array. As long as rand gives a decent distribution of random numbers, the permutation will be random too. Here's a little experiment: ------------------------------------------------- N = (ARGV.shift || 1000).to_i A = %w(a b c) Perms = %w(abc acb bac bca cab cba) Score = Hash.new { |h, k| h[k] = 0 } N.times do sorted = A.dup.sort_by { rand } Score[sorted.join("")] += 1 end Score.keys.sort.each do |key| puts "#{key}: #{Score[key]}" end ------------------------------------------------- Gives: $ ruby sortdist.rb 100000 abc: 16693 acb: 16688 bac: 16752 bca: 16590 cab: 16475 cba: 16802 Thats a pretty good distribution for the number of iterations. -- Jim Weirich

on 2005-12-04 02:06

```
On Sun, Dec 04, 2005 at 08:48:11AM +0900, Jim Weirich wrote:
> numbers, the permutation will be random too.
sort_by{ rand } is actually biased, since sort_by will preserve the
relative order of the elements for which rand() returned the same value.
%w[a b c d].sort_by{ 0 } # => ["a", "b", "c",
"d"]
i = 0
%w[a b c d].sort_by{ 10 - (i += 1) / 2 } # => ["d", "b", "c",
"a"]
This means that permutations preserving the relative order of one (or
more) pair of elements of the original array are a bit more probable.
In praxis, the bias this introduces is often so small that the mere
succinctness of sort_by{ rand } more than makes up for it.
```

on 2005-12-04 02:22

On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote: >> the array. As long as rand gives a decent distribution of random > more) pair of elements of the original array are a bit more probable. > > In praxis, the bias this introduces is often so small that the mere > succinctness of sort_by{ rand } more than makes up for it. does %w( a b c d ).sort{ rand <=> rand } help? -a

on 2005-12-04 16:59

> sort_by{ rand } is actually biased, since sort_by will preserve the > relative order of the elements for which rand() returned the same value. How frequently would rand return a same value? (In theory it would be never but does anyone know the reality?)

on 2005-12-04 17:56

On Dec 4, 2005, at 7:59 AM, Brian Buckley wrote: > How frequently would rand return a same value? (In theory it would be > never but does anyone know the reality?) According to the Pickaxe, Kernel.rand is an implementation of the [Mersenne Twister][1] algorithm. If that is still the case, then in reality it will begin repeating exactly the same numbers after 2^19937-1 (approx. 4e6001) pseudo-random numbers, as that's the period of the MT. The period is the point at which the exact same series of PRNs will begin again. For example, if it started out as 1, 2, 3, ..., then after 2^19937-1, it'd start again at 1, 2, 3. But, that's not what you really care about. Numbers will certainly repeat within the period, since the period is so much larger than our integer representation. For example, a 32 bit integer can only represent about 4e9. So, the likelihood of getting any specific integer, in the 32 bit case, is 1/(2^32). This is an extreme example: 5.times { p rand(2) } 5.times { p rand(1000) } The bottom line is that the PRNs should be approximately uniformly distributed. Apply that to your specific range, and you have your answer. --Steve [1]: http://en.wikipedia.org/wiki/Mersenne_twister

on 2005-12-04 18:17

On Dec 4, 2005, at 8:54 AM, Stephen Waits wrote: > Numbers will certainly repeat within the period, since the period > is so much larger than our integer representation. For example, a > 32 bit integer can only represent about 4e9. So, the likelihood of > getting any specific integer, in the 32 bit case, is 1/(2^32). I should add to this that you can approximate when you'll get repeats, and it's likely to be much sooner than you'd guess. For more information, see the [Birthday Problem][1]. --Steve [1]: http://mathworld.wolfram.com/BirthdayProblem.html

on 2005-12-04 21:33

> end > end > > shuffled_array = my_array.shuffle "Just" doing that 1) changes the receiver (use new_arr = self.dup instead) 2) definitely has a bias. Look e.g. at an array of length three. You will first swap the first element with 3 possibilities, then the second and so on; you're going through 3**3 possibilities. There are 3! possible permutations, but alas, 3! == 6 is not a perfect divisor of 3**3 == 27, thus you have a bias.

on 2005-12-04 22:22

On Dec 4, 2005, at 12:32 PM, Kero wrote: > 1) changes the receiver (use new_arr = self.dup instead) > 2) definitely has a bias. Look e.g. at an array of length three. > You will > first swap the first element with 3 possibilities, then the > second and so > on; you're going through 3**3 possibilities. There are 3! possible > permutations, but alas, 3! == 6 is not a perfect divisor of 3**3 > == 27, > thus you have a bias. Good points, thanks. How about this? class Array def shuffle newarr = self.dup 2.times do self.length.times do |x| y = rand(self.length) newarr[x], newarr[y] = newarr[y], newarr[x] end end newarr end end # test code lifted from Jim Weirich's earlier post N = (ARGV.shift || 1000).to_i A = %w(a b c) Perms = %w(abc acb bac bca cab cba) Score = Hash.new { |h, k| h[k] = 0 } N.times do sorted = A.shuffle Score[sorted.join("")] += 1 end Score.keys.sort.each do |key| puts "#{key}: #{Score[key]}" end [~/Code/private/code/snippets] 382% ./shuffle.rb 100000 abc: 16596 acb: 16394 bac: 16700 bca: 16684 cab: 16841 cba: 16785 0:07.87 seconds, 96.4% CPU This is really just meant to be a simple hack. I think a better algorithm could assign a random index to each element, then sort on those indices. --Steve

on 2005-12-05 15:50

> repeats, and it's likely to be much sooner than you'd guess. For > more information, see the [Birthday Problem][1]. > > --Steve > > [1]: http://mathworld.wolfram.com/BirthdayProblem.html I liked the Birthday Problem. Thanks.

on 2005-12-05 21:26

On Sun, Dec 04, 2005 at 10:21:02AM +0900, ara.t.howard@noaa.gov wrote: > On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote: > >sort_by{ rand } is actually biased, since sort_by will preserve the > >relative order of the elements for which rand() returned the same value. I did some numbers and found the following probability estimates: array size P(#rand() collision) 1000 5.54558e-11 10000 5.55056e-09 1000000 5.55096e-05 10000000 5.53574e-03 A collision implies that the associated pair of elements has got the same ordering in the output array, instead of 50-50 chances of it being reversed. More info, including the Ruby code I used, available at http://eigenclass.org/hiki.rb?sort_by+rand+is+biased > does > > %w( a b c d ).sort{ rand <=> rand } > > help? I'm not totally sure it's really unbiased, but at first sight it looks good. When called with no arguments, rand will use static double genrand_real(void) { unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6; return(a*67108864.0+b)*(1.0/9007199254740992.0); } so it would seem that two consecutive calls to rand() cannot return the same value, but I still have to think about the effect of qsort. It is however somewhat slower than sort_by{ rand }. [100, 1000, 10000, 100000].each do |n| GC.start t0 = Time.new (1..n).sort_by{rand} a = (t1 = Time.new) - t0 # => 0.000425, 0.003002, 0.054392, 0.939692 (1..n).sort{rand <=> rand} b = Time.new - t1 # => 0.001161, 0.026275, 0.289807, 3.791833 "%4.2f" % [b / a] # => "2.73", "8.75", "5.33", "4.04" end RUBY_VERSION # => "1.8.3" RUBY_RELEASE_DATE # => "2005-09-24"

on 2005-12-05 23:45

Mauricio Fernández wrote: >> does >> >> %w( a b c d ).sort{ rand <=> rand } >> >> help? > > It is however somewhat slower than sort_by{ rand }. It's considerably slower. Changing it to sort { rand(10000) <=> 5000 } shows a 30% speedup on 10k element Arrays on my system. Don't know why I chose those numbers, I suppose bigger numbers reduce the "0" comparison, considering { rand(3) <=> 1 }. Anyway, I put together a small benchmark to compare my (revised) shuffle, to sort, and sort_by: #!/usr/bin/env ruby class Array def shuffle newarr = self.dup 2.times do self.length.times do |x| y = rand(self.length) newarr[x], newarr[y] = newarr[y], newarr[x] end end newarr end end printf(" size shuffle sort sort_by\n") [10, 100, 1000, 10000].each do |n| # populate an Array size n a = Array.new(n) { |i| i } # my method start = Time.new 100.times do b = a.shuffle end a_time = Time.new - start # Array's sort start = Time.new 100.times do b = a.sort { rand(10000) <=> 5000 } end b_time = Time.new - start # Enumerable's sort_by start = Time.new 100.times do b = a.sort_by { rand } end c_time = Time.new - start # results printf("%6d %5.2f %5.2f %5.2f\n", n, a_time, b_time, c_time) end Which, on my P4/3.0GHz (Win32) emits this: size shuffle sort sort_by 10 0.00 0.02 0.00 100 0.16 0.13 0.03 1000 1.61 2.03 0.44 10000 16.31 28.14 4.78 --Steve

on 2005-12-06 00:26

On Tue, 6 Dec 2005, Mauricio [iso-8859-1] Fernández wrote: > 1000000 5.55096e-05 >> %w( a b c d ).sort{ rand <=> rand } >> >> help? > > I'm not totally sure it's really unbiased, but at first sight it looks good. the big difference is that a value will have rand called multiple times for it - in otherwords an element compareds differently every time. this is bound to help shuffle more uniformly it seems: harp:~ > cat a.rb %w( a b c d ).sort{|a,b| ra = rand; rb = rand; p a => ra; p b => rb; ra <=> rb } harp:~ > ruby a.rb {"a"=>0.181439380218727} {"c"=>0.579403886629126} {"c"=>0.713513989939958} {"d"=>0.573687508540169} {"a"=>0.851534740906118} {"d"=>0.00156074151427565} {"b"=>0.803591007691647} {"a"=>0.494066110872563} {"b"=>0.834676789626288} {"c"=>0.792106134460754} so, note that "a" was sorted using a different value each time. can anyone who remembers stats and knows the sort algorithm weigh in? -a

on 2005-12-06 11:33

|| || > More info, including the Ruby code I used, available at || > http://eigenclass.org/hiki.rb?sort_by+rand+is+biased || > || >> does || >> || >> %w( a b c d ).sort{ rand <=> rand } || >> || >> help? Hi, I'm new to ruby, but I use Python for some years now. So I suggest the following 'pythonic' solution ;-) def shuffle(a) b=[] # decorate a.each { |x| b << [rand,x]; } # sort b.sort! { |x,y| x[0] <=> y[0] } # undecorate b.collect { |x| x[1] } end a=%w{a b c d e f} puts shuffle a Is there a shorter or more elegant way to do this 'decorate-sort-undecorate' in ruby ? In Python it is quite easy by using list comprehensions... Greetings, Uwe

on 2005-12-06 11:41

On Tue, Dec 06, 2005 at 08:23:52AM +0900, ara.t.howard@noaa.gov wrote: > the big difference is that a value will have rand called multiple times for That might be the very reason why it is ultimately much more biased than sort_by{ rand }... I'm keeping the latter. If the limited precision of rand() became a problem, I could always use arr.sort_by{ [rand, rand] } where collisions happen with a probability around 6.16298e-19 for a 10_000_000 element array, or more generally Array.new(N){rand} (but then we'd have to look into MT for another possible source of residual bias). > it - in otherwords an element compareds differently every time. this is > bound to help shuffle more uniformly it seems: We cannot trust our intuition in these cases; in the following example, the tenth element is more likely to stay where it was (look at pos[9]): a = (1..100); pos = Array.new(100, 0) t = Time.new; 100000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1 } Time.new - t # => 223.891608 pos # => [1176, 1208, 1083, 1153, 1191, 1130, 1098, 1133, 1189, 1438, 1150, 1043, ======== # 1121, 1067, 1076, 1069, 1078, 1113, 1129, 1143,1115, 1074, 1081, 1069, 1057, # [...] t = Time.new; 10000000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1 } Time.new - t # => 22781.664168 pos # => [105944, 105861, 105546, 105681, 104059, 101196, 99080, 100675, 110851, # 132223, 101670, 96018, 96147, 96688, 97709, 99051, 97823, 97345, 98799, ======== # [...] The bias can be observed easily with smaller arrays: a = (1..20); pos = Array.new(20, 0) t = Time.new 100000.times{ pos[a.sort{rand <=> rand}.index(3)] += 1 } Time.new - t # => 20.848851 pos # => [5084, 5694, 6561, 4326, 4205, 4792, 4913, 4827, 4791, 4571, 4560, 4707, # 4863, 4934, 5112, 5011, 5071, 5326, 5482, 5170] a = (1..10); pos = Array.new(10, 0) t = Time.new 1_000_000.times{ pos[a.sort{rand <=> rand}.index(5)] += 1 } Time.new - t # => 77.438912 pos # => [113836, 103238, 99442, 88682, 101306, 83294, 93296, 100830, 100744, 115332] Compare the latter to a = (1..10); pos = Array.new(10, 0) t = Time.new 1_000_000.times{ pos[a.sort_by{rand}.index(5)] += 1 } Time.new - t # => 32.163499 pos # => [99822, 100020, 100328, 100040, 99603, 100098, 100195, 100023, 99875, 99996]

on 2005-12-06 14:25

Mauricio Fernández wrote: > On Tue, Dec 06, 2005 at 08:23:52AM +0900, ara.t.howard@noaa.gov wrote: > >>the big difference is that a value will have rand called multiple times for > > > That might be the very reason why it is ultimately much more biased than > sort_by{ rand }... I'm keeping the latter. Indeed, ar = [0, 1, 2]; result = Hash.new(0) 100000.times {result[ar.sort {rand <=> rand}] += 1} p result # => {[2, 1, 0]=>25235, [2, 0, 1]=>12285, [1, 0, 2]=>12426, [0, 1, 2]=>25111, [1, 2, 0]=>12468, [0, 2, 1]=>12475} Michael

on 2005-12-06 14:37

On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe Schmitt wrote: > Hi, I'm new to ruby, but I use Python for some years now. > So I suggest the following 'pythonic' solution ;-) [...] > Is there a shorter or more elegant way to do this > 'decorate-sort-undecorate' in ruby ? arr.sort_by{rand} > In Python it is quite easy by using list comprehensions... What does that look like in Python? Is it shorter than the above one? }:-)

on 2005-12-06 15:34

|| || On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe Schmitt wrote: || > Hi, I'm new to ruby, but I use Python for some years now. || > So I suggest the following 'pythonic' solution ;-) || [...] || > Is there a shorter or more elegant way to do this || > 'decorate-sort-undecorate' in ruby ? || || arr.sort_by{rand} that is not equivalent to my solution. the python equivalent to yours is arr.sort(key = lambda x: random()) || || > In Python it is quite easy by using list comprehensions... || What does that look like in Python? from random import random def shuffle(a): b = [ (random(), i) for i in a] b.sort() return [ x[1] for x in b ] print shuffle(range(10)) || Is it shorter than the above one? }:-) || || -- || Mauricio Fernandez greetings, Uwe ||

on 2005-12-06 16:07

Uwe Schmitt wrote: > > from random import random > > def shuffle(a): > b = [ (random(), i) for i in a] > b.sort() > return [ x[1] for x in b ] > > print shuffle(range(10)) > I would have thought those two functions do the same thing with the exception that shuffle does not sort in place. I'm probably missing something obvious. Could you explain the difference between the two versions? Michael

on 2005-12-06 16:44

```
Mauricio Fernández <mfp@acm.org> writes:
> same value, but I still have to think about the effect of qsort.
And that is a good thing? How can the sequence be "random" if it
never contains the same two numbers in row? (Remember Enigma...)
```

on 2005-12-06 17:04

|| > || arr.sort_by{rand} || > that is not equivalent to my solution. || > the python equivalent to yours is || > arr.sort(key = lambda x: random()) || > || > In Python it is quite easy by using list comprehensions... || > || What does that look like in Python? || > || > from random import random || > || > def shuffle(a): || > b = [ (random(), i) for i in a] || > b.sort() || > return [ x[1] for x in b ] || > || > print shuffle(range(10)) || I would have thought those two functions do the same thing || with the exception that shuffle does not sort in place. || I'm probably missing something obvious. Could you explain || the difference between the two versions? If you use random as sorting key, the result of a<=>b will differ each time you evaluate a<=>b. In my case I attach a random number to each item, then I sort my list according to this number and remove this 'decoration' afterwards. Pythonprogrammers call this pattern "decorate-sort-undecorate". As I do not now anything about the ruby interanls of sort, I prefer my version. Greetings, Uwe.

on 2005-12-06 17:08

```
>>>>> "U" == Uwe Schmitt <schmitt@num.uni-sb.de> writes:
U> If you use random as sorting key, the result of a<=>b
U> will differ each time you evaluate a<=>b.
U> In my case I attach a random number to each item,
U> then I sort my list according to this number and remove
U> this 'decoration' afterwards.
run
ri Enum#sort_by
and see what it do.
Guy Decoux
```

on 2005-12-06 17:12

```
Hi,
From: "Uwe Schmitt" <schmitt@num.uni-sb.de>
> arr.sort(key = lambda x: random())
No, Array#sort_by does the 'decorate-sort-undecorate', aka.
Schwartzian Transform.
I don't read Python very well, but I think the ruby equivalent
to 'arr.sort(key = lambda x: random())' might be:
arr.sort { rand <=> rand }
But that is different from arr.sort_by {rand}
Regards,
Bill
```

on 2005-12-06 17:37

|| || >>>>> "U" == Uwe Schmitt <schmitt@num.uni-sb.de> writes: || || U> If you use random as sorting key, the result of a<=>b || U> will differ each time you evaluate a<=>b. || U> In my case I attach a random number to each item, || U> then I sort my list according to this number and remove || U> this 'decoration' afterwards. || || run || || ri Enum#sort_by || || and see what it do. Thanks. As I told you before I'm new to ruby ;-) Greetings, Uwe

on 2005-12-06 18:26

On Wed, Dec 07, 2005 at 12:41:39AM +0900, Christian Neukirchen wrote: > > > > so it would seem that two consecutive calls to rand() cannot return the > > same value, but I still have to think about the effect of qsort. > > And that is a good thing? How can the sequence be "random" if it > never contains the same two numbers in row? (Remember Enigma...) Nah, I misread genrand_real; the above shows it is possible.

on 2005-12-06 18:58

Uwe Schmitt ha scritto: > || > def shuffle(a): > > If you use random as sorting key, the result of a<=>b > will differ each time you evaluate a<=>b. > In my case I attach a random number to each item, > then I sort my list according to this number and remove > this 'decoration' afterwards. > > Pythonprogrammers call this pattern "decorate-sort-undecorate". Are you sure of what you say? AFAIK sort(key=lambda x: somefunc..)) will do decorate/sort/undecorate under the hood, behaving like ruby's #sort_by and like your version. OTOH sort(cmp=lambda x,y: something.. ) will behave like ruby's #sort.

on 2005-12-06 20:53

Ok, I think I've beat sort_by{ rand } on large Arrays , by about 30%. I ended up with this: h = Hash.new self.each { |v| h[rand(1000000000)] = v } h.keys.sort.collect { |k| h[k] } Explanation.. * Hash works better than an Array of Arrays [[rand,x[0]],[rand,x[1]],...] because Array#<=> seems to be slow. * Got some speedup by avoiding Float#<=>, hence the Integer version of rand. * On rand(1000000000); chose 1billion rather arbitrarily to keep things out of the range of BigNum. Oddly, using 2^31-1 caused BigNums to show up. Ideally we'd use the largest FixNum, but I don't know how to retrieve that. * Bias should be minimal for most applications, even at rand(1Billion). * Can it be improved further??? Find my benchmark results, and script below. --Steve user system total real shuffle (10) 0.016000 0.000000 0.016000 ( 0.016000) shuffle2 (10) 0.000000 0.000000 0.000000 ( 0.000000) shuffle3 (10) 0.000000 0.000000 0.000000 ( 0.000000) sort (10) 0.000000 0.000000 0.000000 ( 0.000000) sort_by (10) 0.000000 0.000000 0.000000 ( 0.000000) user system total real shuffle (100) 0.156000 0.000000 0.156000 ( 0.156000) shuffle2 (100) 0.047000 0.000000 0.047000 ( 0.046000) shuffle3 (100) 0.031000 0.000000 0.031000 ( 0.032000) sort (100) 0.125000 0.000000 0.125000 ( 0.125000) sort_by (100) 0.031000 0.000000 0.031000 ( 0.031000) user system total real shuffle (1000) 1.625000 0.000000 1.625000 ( 1.625000) shuffle2 (1000) 0.656000 0.000000 0.656000 ( 0.672000) shuffle3 (1000) 0.266000 0.000000 0.266000 ( 0.281000) sort (1000) 2.094000 0.000000 2.094000 ( 2.093000) sort_by (1000) 0.453000 0.000000 0.453000 ( 0.453000) user system total real shuffle (10000) 16.469000 0.032000 16.501000 ( 16.578000) shuffle2 (10000) 8.156000 0.000000 8.156000 ( 8.203000) shuffle3 (10000) 3.344000 0.047000 3.391000 ( 3.391000) sort (10000) 25.453000 0.000000 25.453000 ( 25.593000) sort_by (10000) 4.765000 0.000000 4.765000 ( 4.796000) #!/usr/bin/env ruby require 'benchmark' class Array def shuffle newarr = self.dup 2.times do self.length.times do |x| y = rand(self.length) newarr[x], newarr[y] = newarr[y], newarr[x] end end newarr end def shuffle2 tmparr = self.collect { |x| [rand(1000000000),x] } tmparr.sort.collect { |x| x[1] } end def shuffle3 h = Hash.new self.each { |v| h[rand(1000000000)] = v } h.keys.sort.collect { |k| h[k] } end end [10, 100, 1000, 10000].each do |n| # populate an Array size n a = Array.new(n) { |i| i } # run our benchmarks Benchmark.bm(16) do |bb| # my method bb.report("shuffle ("+n.to_s+")") do 100.times do a.shuffle end end # decorate, sort, undecorate bb.report("shuffle2 ("+n.to_s+")") do 100.times do a.shuffle2 end end # decorate, sort, undecorate with a hash bb.report("shuffle3 ("+n.to_s+")") do 100.times do a.shuffle3 end end # Array's sort bb.report("sort ("+n.to_s+")") do 100.times do a.sort { rand(10000) <=> 5000 } end end # Enumerable's sort_by bb.report("sort_by ("+n.to_s+")") do 100.times do a.sort_by { rand } end end printf("\n") end end

on 2005-12-06 20:57

Stephen Waits wrote: > * On rand(1000000000); chose 1billion rather arbitrarily to keep things > out of the range of BigNum. Oddly, using 2^31-1 caused BigNums to show > up. Ideally we'd use the largest FixNum, but I don't know how to > retrieve that. 2**30 - 1 is reliably a fixnum (though I guess it might not be the biggest one on some platforms).

on 2005-12-06 21:01

Stephen Waits wrote: > > h = Hash.new > self.each { |v| h[rand(1000000000)] = v } > h.keys.sort.collect { |k| h[k] } OOPS! I forgot to deal with collisions in the hash! :( --Steve

on 2005-12-06 21:05

```
Stephen Waits wrote:
> I forgot to deal with collisions in the hash! :(
Here's another shot.. still faster than sort_by{rand} on my system, and
now should be unbiased:
def shuffle3
h = Hash.new
self.each do |v|
k = rand(1000000000) until !h.has_key?(k)
h[k] = v
end
end
shuffle3 (10000) 3.625000 0.016000 3.641000 ( 3.656000)
sort_by (10000) 5.250000 0.000000 5.250000 ( 5.312000)
--Steve
```

on 2005-12-06 22:19

On Wed, Dec 07, 2005 at 05:04:13AM +0900, Stephen Waits wrote: > Stephen Waits wrote: > >I forgot to deal with collisions in the hash! :( > > Here's another shot.. still faster than sort_by{rand} on my system, and > now should be unbiased: > > def shuffle3 > h = Hash.new > self.each do |v| > k = rand(1000000000) until !h.has_key?(k) This is not equivalent to begin k = rand(1000000000) end until !h.has_key? Compare k = 1 until true k # => nil to begin; k = 1 end until true k # => 1 > h[k] = v You forgot h.keys.sort.collect { |k| h[k] } > end > end > > shuffle3 (10000) 3.625000 0.016000 3.641000 ( 3.656000) > sort_by (10000) 5.250000 0.000000 5.250000 ( 5.312000) After fixing shuffle3, I get: module Enumerable def shuffle3 h = Hash.new k = rand(1000000000) self.each do |v| k = rand(1000000000) until !h.has_key?(k) h[k] = v end h.keys.sort.collect { |k| h[k] } end def shuffle4 h = {} k = rand(1000000000) self.sort_by{ k = rand(1000000000) while h.has_key?(k); h[k] = k } end end GC.disable (1..10).shuffle4 # => [8, 3, 10, 2, 4, 6, 7, 5, 9, 1] a = 1..100000 t0 = Time.new # => Tue Dec 06 22:13:56 CET 2005 a.shuffle4 (t1 = Time.new) - t0 # => 0.766411 a.shuffle3 (t2 = Time.new) - t1 # => 0.575967 a.sort_by{ rand } Time.new - t2 # => 0.637288

on 2005-12-07 16:01

Why not a.dup.sort { 2*rand(12345) <=> 12345 } ? that way, we'd have NO colisions ever.

on 2005-12-07 16:14

On 07/12/05, hmassa@gmail.com <hmassa@gmail.com> wrote: > Why not > a.dup.sort { 2*rand(12345) <=> 12345 } > > ? that way, we'd have NO colisions ever. > > > What are you all trying to achieve by variations of rand <=> rand, why not simply use rand(3) - 1 ? which is not in danger of decreasing randomness by combining random numbers. cheers, Brian -- http://ruby.brian-schroeder.de/ Stringed instrument chords: http://chordlist.brian-schroeder.de/

on 2005-12-07 16:22

hmassa@gmail.com wrote: > Why not > a.dup.sort { 2*rand(12345) <=> 12345 } > > ? that way, we'd have NO colisions ever. > What you have posted does not avoid collisions. You probably mean something like a.sort {2*rand(12345) <=> 2 * rand(12345) + 1} However, the problem of the sort {rand<=>rand} methods are not collisions, but the skewed results from the algorithm that assumes a consistent comparison operator. Quoting from an earlier post of mine: ar = [0, 1, 2]; result = Hash.new(0) 100000.times {result[ar.sort {rand <=> rand}] += 1} p result # => {[2, 1, 0]=>25235, [2, 0, 1]=>12285, [1, 0, 2]=>12426, [0, 1, 2]=>25111, [1, 2, 0]=>12468, [0, 2, 1]=>12475} You get basically the same result with your method HTH, Michael

on 2005-12-07 16:54

On 07/12/05, Michael Ulm <michael.ulm@isis-papyrus.com> wrote: > > a.sort {2*rand(12345) <=> 2 * rand(12345) + 1} > > [snip] > and then he should do us a favour and write > a.sort { [-1, 1][rand(2)] } instead, which gives the same skewed results but at least does not do it as complicated. Brian -- http://ruby.brian-schroeder.de/ Stringed instrument chords: http://chordlist.brian-schroeder.de/

on 2005-12-07 17:31

On Dec 7, 2005, at 6:57 AM, hmassa@gmail.com wrote: > Why not > a.dup.sort { 2*rand(12345) <=> 12345 } Because it's the slowest version I've tested. See the rest of the thread for my benchmarks. In order of speed, we have 'shuffle3', 'sort_by {rand}', and 'shuffle4'. --Steve

on 2005-12-07 21:28

Stephen Waits <steve@waits.net> wrote: > > > This is really just meant to be a simple hack. I think a better > algorithm could assign a random index to each element, then sort on > those indices. http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code, with a nice proof that it's unbiased. http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on shuffling algorithms. martin

on 2005-12-07 22:18

Martin DeMello wrote: > http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code, > with a nice proof that it's unbiased. > > http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on > shuffling algorithms. Thanks Martin! --Steve