Forum: Ruby need some Ruby magic

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
hammed (Guest)
on 2005-12-01 21:45
(Received via mailing list)
I'd like to sort collections randomly. This is what I tried first:

my_array.sort { |a,b| rand(2) }
>

but the results weren't very random. I then tried the following:

class Array
>   def random_copy
>     b = Array.new
>     while b.length < self.length
>       random_value = self[rand self.length]
>       b.push random_value unless b.include? random_value
>     end
>     b
>   end
> end



This works but I'm sure there's some Ruby magic for doing this better.

thanks
Jim W. (Guest)
on 2005-12-01 21:48
hammed wrote:
> I'd like to sort collections randomly. This is what I tried first:
>
> my_array.sort { |a,b| rand(2) }

my_array.sort_by { rand }
dblack (Guest)
on 2005-12-01 21:49
(Received via mailing list)
Hi --

On Fri, 2 Dec 2005, Hammed M. wrote:

>>     while b.length < self.length
>>       random_value = self[rand self.length]
>>       b.push random_value unless b.include? random_value
>>     end
>>     b
>>   end
>> end
>
>
>
> This works but I'm sure there's some Ruby magic for doing this better.

I think the most common idiom is:

   array.sort_by { rand }


David

--
David A. Black
removed_email_address@domain.invalid

"Ruby for Rails", forthcoming from Manning Publications, April 2006!
hammed (Guest)
on 2005-12-01 22:10
(Received via mailing list)
excellent! thanks.
vfoley (Guest)
on 2005-12-01 23:47
(Received via mailing list)
Maybe there should be Array#randomize...
jeff.darklight (Guest)
on 2005-12-01 23:51
(Received via mailing list)
Take a look @ the Facets project. There are extension methods for array
to
have a number of randomized behaviors.

j.

On 12/1/05, Vincent F. <removed_email_address@domain.invalid> wrote:
>
> Maybe there should be Array#randomize...
>
>
>


--
"Remember. Understand. Believe. Yield! -> http://ruby-lang.org"

Jeff W.
reinder (Guest)
on 2005-12-03 19:33
(Received via mailing list)
In article <removed_email_address@domain.invalid>,
 Jim W. <removed_email_address@domain.invalid> wrote:

> hammed wrote:
> > I'd like to sort collections randomly. This is what I tried first:
> >
> > my_array.sort { |a,b| rand(2) }
>
> my_array.sort_by { rand }

A quick test seems to indicate that this works with the ruby 1.8.2
implementation on my Mac. However, there is no guarantee that this call
will select each of the n! permutations with equal probability.

Whether it does will depend on the exact algorithm used by 'sort'. For
instance, if sort uses the (inefficient for large arrays) bubble sort,
the last key will end up staying there in half the cases.

Reinder
steve (Guest)
on 2005-12-03 21:48
(Received via mailing list)
On Dec 3, 2005, at 9:32 AM, Reinder V. wrote:

> However, there is no guarantee that this call
> will select each of the n! permutations with equal probability.

How about just doing this?

class Array
   def shuffle
     newarr = self
     self.length.times do |x|
       y = rand(self.length)
       newarr[x], newarr[y] = newarr[y], newarr[x]
     end
     newarr
   end
end

shuffled_array = my_array.shuffle

--Steve
sgentle (Guest)
on 2005-12-03 23:23
(Received via mailing list)
On 12/4/05, Reinder V. <removed_email_address@domain.invalid> wrote:
> A quick test seems to indicate that this works with the ruby 1.8.2
> implementation on my Mac. However, there is no guarantee that this call
> will select each of the n! permutations with equal probability.
>
> Whether it does will depend on the exact algorithm used by 'sort'. For
> instance, if sort uses the (inefficient for large arrays) bubble sort,
> the last key will end up staying there in half the cases.

Actually, sort_by caches every value passed and uses those for
comparison (aka a Schwartzian Transform), so it would essentially
return the same results as sorting a list of random numbers,
regardless of algorithm.

Sam
Jim W. (Guest)
on 2005-12-04 01:48
reinder wrote:
>> my_array.sort_by { rand }
>
> A quick test seems to indicate that this works with the ruby 1.8.2
> implementation on my Mac. However, there is no guarantee that this call
> will select each of the n! permutations with equal probability.

Actually, it will.  sort_by uses a Schwartzian transform to do the
sorting.  That means that rand is only called once for each element of
the array.  As long as rand gives a decent distribution of random
numbers, the permutation will be random too.

Here's a little experiment:

-------------------------------------------------
N = (ARGV.shift || 1000).to_i
A = %w(a b c)
Perms = %w(abc acb bac bca cab cba)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
  sorted = A.dup.sort_by { rand }
  Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
  puts "#{key}: #{Score[key]}"
end
-------------------------------------------------

Gives:

  $ ruby sortdist.rb 100000
  abc: 16693
  acb: 16688
  bac: 16752
  bca: 16590
  cab: 16475
  cba: 16802

Thats a pretty good distribution for the number of iterations.

-- Jim W.
mfp (Guest)
on 2005-12-04 03:06
(Received via mailing list)
On Sun, Dec 04, 2005 at 08:48:11AM +0900, Jim W. wrote:
> numbers, the permutation will be random too.
sort_by{ rand } is actually biased, since sort_by will preserve the
relative order of the elements for which rand() returned the same value.

%w[a b c d].sort_by{ 0 }                           # => ["a", "b", "c",
"d"]
i = 0
%w[a b c d].sort_by{ 10 - (i += 1) / 2 }           # => ["d", "b", "c",
"a"]

This means that permutations preserving the relative order of one (or
more) pair of elements of the original array are a bit more probable.

In praxis, the bias this introduces is often so small that the mere
succinctness of sort_by{ rand } more than makes up for it.
ara.t.howard (Guest)
on 2005-12-04 03:22
(Received via mailing list)
On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:

>> the array.  As long as rand gives a decent distribution of random
> more) pair of elements of the original array are a bit more probable.
>
> In praxis, the bias this introduces is often so small that the mere
> succinctness of sort_by{ rand } more than makes up for it.

does

   %w( a b c d ).sort{ rand <=> rand }

help?

-a
Brian B. (Guest)
on 2005-12-04 17:59
(Received via mailing list)
> sort_by{ rand } is actually biased, since sort_by will preserve the
> relative order of the elements for which rand() returned the same value.

How frequently would rand return a same value?  (In theory it would be
never but does anyone know the reality?)
steve (Guest)
on 2005-12-04 18:56
(Received via mailing list)
On Dec 4, 2005, at 7:59 AM, Brian B. wrote:

> How frequently would rand return a same value?  (In theory it would be
> never but does anyone know the reality?)

According to the Pickaxe, Kernel.rand is an implementation of the
[Mersenne Twister][1] algorithm.  If that is still the case, then in
reality it will begin repeating exactly the same numbers after
2^19937-1 (approx. 4e6001) pseudo-random numbers, as that's the
period of the MT.  The period is the point at which the exact same
series of PRNs will begin again.  For example, if it started out as
1, 2, 3, ..., then after 2^19937-1, it'd start again at 1, 2, 3.

But, that's not what you really care about.

Numbers will certainly repeat within the period, since the period is
so much larger than our integer representation.  For example, a 32
bit integer can only represent about 4e9.  So, the likelihood of
getting any specific integer, in the 32 bit case, is 1/(2^32).  This
is an extreme example:

5.times { p rand(2) }
5.times { p rand(1000) }

The bottom line is that the PRNs should be approximately uniformly
distributed.  Apply that to your specific range, and you have your
answer.

--Steve

[1]: http://en.wikipedia.org/wiki/Mersenne_twister
steve (Guest)
on 2005-12-04 19:17
(Received via mailing list)
On Dec 4, 2005, at 8:54 AM, Stephen W. wrote:

> Numbers will certainly repeat within the period, since the period
> is so much larger than our integer representation.  For example, a
> 32 bit integer can only represent about 4e9.  So, the likelihood of
> getting any specific integer, in the 32 bit case, is 1/(2^32).

I should add to this that you can approximate when you'll get
repeats, and it's likely to be much sooner than you'd guess.  For
more information, see the [Birthday Problem][1].

--Steve

[1]: http://mathworld.wolfram.com/BirthdayProblem.html
kero (Guest)
on 2005-12-04 22:33
(Received via mailing list)
>    end
> end
>
> shuffled_array = my_array.shuffle

"Just" doing that
1) changes the receiver (use new_arr = self.dup instead)
2) definitely has a bias. Look e.g. at an array of length three. You
will
   first swap the first element with 3 possibilities, then the second
and so
   on; you're going through 3**3 possibilities. There are 3! possible
   permutations, but alas, 3! == 6 is not a perfect divisor of 3**3 ==
27,
   thus you have a bias.
steve (Guest)
on 2005-12-04 23:22
(Received via mailing list)
On Dec 4, 2005, at 12:32 PM, Kero wrote:

> 1) changes the receiver (use new_arr = self.dup instead)
> 2) definitely has a bias. Look e.g. at an array of length three.
> You will
>    first swap the first element with 3 possibilities, then the
> second and so
>    on; you're going through 3**3 possibilities. There are 3! possible
>    permutations, but alas, 3! == 6 is not a perfect divisor of 3**3
> == 27,
>    thus you have a bias.

Good points, thanks.  How about this?

class Array
   def shuffle
     newarr = self.dup
     2.times do
       self.length.times do |x|
         y = rand(self.length)
         newarr[x], newarr[y] = newarr[y], newarr[x]
       end
     end
     newarr
   end
end

# test code lifted from Jim W.'s earlier post
N = (ARGV.shift || 1000).to_i
A = %w(a b c)
Perms = %w(abc acb bac bca cab cba)
Score = Hash.new { |h, k| h[k] = 0 }
N.times do
   sorted = A.shuffle
   Score[sorted.join("")] += 1
end

Score.keys.sort.each do |key|
   puts "#{key}: #{Score[key]}"
end


[~/Code/private/code/snippets] 382% ./shuffle.rb 100000
abc: 16596
acb: 16394
bac: 16700
bca: 16684
cab: 16841
cba: 16785

0:07.87 seconds, 96.4% CPU


This is really just meant to be a simple hack.  I think a better
algorithm could assign a random index to each element, then sort on
those indices.

--Steve
Brian B. (Guest)
on 2005-12-05 16:50
(Received via mailing list)
> repeats, and it's likely to be much sooner than you'd guess.  For
> more information, see the [Birthday Problem][1].
>
> --Steve
>
> [1]: http://mathworld.wolfram.com/BirthdayProblem.html

I liked the Birthday Problem.   Thanks.
mfp (Guest)
on 2005-12-05 22:26
(Received via mailing list)
On Sun, Dec 04, 2005 at 10:21:02AM +0900, removed_email_address@domain.invalid 
wrote:
> On Sun, 4 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:
> >sort_by{ rand } is actually biased, since sort_by will preserve the
> >relative order of the elements for which rand() returned the same value.

I did some numbers and found the following probability estimates:

array size  P(#rand() collision)
 1000       5.54558e-11
 10000      5.55056e-09
 1000000    5.55096e-05
 10000000   5.53574e-03

A collision implies that the associated pair of elements has got the
same
ordering in the output array, instead of 50-50 chances of it being
reversed.

More info, including the Ruby code I used, available at
 http://eigenclass.org/hiki.rb?sort_by+rand+is+biased

> does
>
>   %w( a b c d ).sort{ rand <=> rand }
>
> help?

I'm not totally sure it's really unbiased, but at first sight it looks
good.

When called with no arguments, rand will use

static double
genrand_real(void)
{
    unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6;
    return(a*67108864.0+b)*(1.0/9007199254740992.0);
}

so it would seem that two consecutive calls to rand() cannot return the
same value, but I still have to think about the effect of qsort.

It is however somewhat slower than sort_by{ rand }.

[100, 1000, 10000, 100000].each do |n|
    GC.start
    t0 = Time.new
    (1..n).sort_by{rand}
    a = (t1 = Time.new) - t0            # => 0.000425, 0.003002,
0.054392, 0.939692
    (1..n).sort{rand <=> rand}
    b = Time.new - t1                   # => 0.001161, 0.026275,
0.289807, 3.791833
    "%4.2f" % [b / a]                   # => "2.73", "8.75", "5.33",
"4.04"
end
RUBY_VERSION                            # => "1.8.3"
RUBY_RELEASE_DATE                       # => "2005-09-24"
steve (Guest)
on 2005-12-06 00:45
(Received via mailing list)
Mauricio Fernández wrote:
>> does
>>
>>   %w( a b c d ).sort{ rand <=> rand }
>>
>> help?
>
> It is however somewhat slower than sort_by{ rand }.

It's considerably slower.  Changing it to sort { rand(10000) <=> 5000 }
shows a 30% speedup on 10k element Arrays on my system.  Don't know why
I chose those numbers, I suppose bigger numbers reduce the "0"
comparison, considering { rand(3) <=> 1 }.

Anyway, I put together a small benchmark to compare my (revised)
shuffle, to sort, and sort_by:

#!/usr/bin/env ruby

class Array
   def shuffle
     newarr = self.dup
     2.times do
       self.length.times do |x|
         y = rand(self.length)
         newarr[x], newarr[y] = newarr[y], newarr[x]
       end
     end
     newarr
   end
end

printf(" size shuffle sort sort_by\n")

[10, 100, 1000, 10000].each do |n|

   # populate an Array size n
   a = Array.new(n) { |i| i }

   # my method
   start = Time.new
   100.times do
     b = a.shuffle
   end
   a_time = Time.new - start

   # Array's sort
   start = Time.new
   100.times do
     b = a.sort { rand(10000) <=> 5000 }
   end
   b_time = Time.new - start

   # Enumerable's sort_by
   start = Time.new
   100.times do
     b = a.sort_by { rand }
   end
   c_time = Time.new - start

   # results
   printf("%6d %5.2f %5.2f %5.2f\n", n, a_time, b_time, c_time)

end


Which, on my P4/3.0GHz (Win32) emits this:

  size shuffle sort sort_by
     10  0.00  0.02  0.00
    100  0.16  0.13  0.03
   1000  1.61  2.03  0.44
  10000 16.31 28.14  4.78

--Steve
ara.t.howard (Guest)
on 2005-12-06 01:26
(Received via mailing list)
On Tue, 6 Dec 2005, Mauricio [iso-8859-1] Fernández wrote:

> 1000000    5.55096e-05
>>   %w( a b c d ).sort{ rand <=> rand }
>>
>> help?
>
> I'm not totally sure it's really unbiased, but at first sight it looks good.

the big difference is that a value will have rand called multiple times
for it
- in otherwords an element compareds differently every time.  this is
bound to
help shuffle more uniformly it seems:

   harp:~ > cat a.rb
   %w( a b c d ).sort{|a,b| ra = rand; rb = rand; p a => ra; p b => rb;
ra <=> rb }

   harp:~ > ruby a.rb
   {"a"=>0.181439380218727}
   {"c"=>0.579403886629126}
   {"c"=>0.713513989939958}
   {"d"=>0.573687508540169}
   {"a"=>0.851534740906118}
   {"d"=>0.00156074151427565}
   {"b"=>0.803591007691647}
   {"a"=>0.494066110872563}
   {"b"=>0.834676789626288}
   {"c"=>0.792106134460754}

so, note that "a" was sorted using a different value each time.

can anyone who remembers stats and knows the sort algorithm weigh in?

-a
schmitt (Guest)
on 2005-12-06 12:33
(Received via mailing list)
||

|| > More info, including the Ruby code I used, available at
|| > http://eigenclass.org/hiki.rb?sort_by+rand+is+biased
|| >
|| >> does
|| >>
|| >>   %w( a b c d ).sort{ rand <=> rand }
|| >>
|| >> help?

Hi, I'm new to ruby, but I use Python for some years now.
So I suggest the following 'pythonic' solution ;-)

  def shuffle(a)

      b=[]
      # decorate
      a.each { |x| b << [rand,x]; }

      # sort
      b.sort! { |x,y| x[0] <=> y[0] }

      # undecorate
      b.collect { |x| x[1] }
  end

  a=%w{a b c d e f}
  puts shuffle a

Is there a shorter or more elegant way to do this
'decorate-sort-undecorate' in ruby ?
In Python it is quite easy by using list comprehensions...


Greetings, Uwe
mfp (Guest)
on 2005-12-06 12:41
(Received via mailing list)
On Tue, Dec 06, 2005 at 08:23:52AM +0900, removed_email_address@domain.invalid 
wrote:
> the big difference is that a value will have rand called multiple times for

That might be the very reason why it is ultimately much more biased than
sort_by{ rand }... I'm keeping the latter.
If the limited precision of rand() became a problem, I could always use
arr.sort_by{ [rand, rand] } where collisions happen with a probability
around 6.16298e-19 for a 10_000_000 element array, or more generally
Array.new(N){rand} (but then we'd have to look into MT for another
possible source of residual bias).

> it - in otherwords an element compareds differently every time.  this is
> bound to help shuffle more uniformly it seems:

We cannot trust our intuition in these cases; in the following example,
the
tenth element is more likely to stay where it was (look at pos[9]):

a = (1..100); pos = Array.new(100, 0)

t = Time.new; 100000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1 }
Time.new - t
# => 223.891608

pos
# => [1176, 1208, 1083, 1153, 1191, 1130, 1098, 1133, 1189, 1438, 1150,
1043,
                                                          ========
# 1121, 1067, 1076, 1069, 1078, 1113, 1129, 1143,1115, 1074, 1081, 1069,
1057,
# [...]

t = Time.new; 10000000.times{ pos[a.sort{rand <=> rand}.index(10)] += 1
}
Time.new - t
# => 22781.664168

pos
# => [105944, 105861, 105546, 105681, 104059, 101196, 99080, 100675,
110851,
# 132223, 101670, 96018, 96147, 96688, 97709, 99051, 97823, 97345,
98799,
 ========
# [...]

The bias can be observed easily with smaller arrays:

 a = (1..20); pos = Array.new(20, 0)
 t = Time.new
 100000.times{ pos[a.sort{rand <=> rand}.index(3)] += 1 }
 Time.new - t                                      # => 20.848851
 pos
 # => [5084, 5694, 6561, 4326, 4205, 4792, 4913, 4827, 4791, 4571, 4560,
4707,
 # 4863, 4934, 5112, 5011, 5071, 5326, 5482, 5170]

 a = (1..10); pos = Array.new(10, 0)
 t = Time.new
 1_000_000.times{ pos[a.sort{rand <=> rand}.index(5)] += 1 }
 Time.new - t                                      # => 77.438912
 pos
 # => [113836, 103238, 99442, 88682, 101306, 83294, 93296, 100830,
100744, 115332]

Compare the latter to

 a = (1..10); pos = Array.new(10, 0)
 t = Time.new
 1_000_000.times{ pos[a.sort_by{rand}.index(5)] += 1 }
 Time.new - t                                      # => 32.163499
 pos
 # => [99822, 100020, 100328, 100040, 99603, 100098, 100195, 100023,
99875, 99996]
michael.ulm (Guest)
on 2005-12-06 15:25
(Received via mailing list)
Mauricio Fernández wrote:

> On Tue, Dec 06, 2005 at 08:23:52AM +0900, removed_email_address@domain.invalid wrote:
>
>>the big difference is that a value will have rand called multiple times for
>
>
> That might be the very reason why it is ultimately much more biased than
> sort_by{ rand }... I'm keeping the latter.

Indeed,

ar = [0, 1, 2]; result = Hash.new(0)
100000.times {result[ar.sort {rand <=> rand}] += 1}
p result
# => {[2, 1, 0]=>25235,
       [2, 0, 1]=>12285,
       [1, 0, 2]=>12426,
       [0, 1, 2]=>25111,
       [1, 2, 0]=>12468,
       [0, 2, 1]=>12475}

Michael
mfp (Guest)
on 2005-12-06 15:37
(Received via mailing list)
On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe S. wrote:
> Hi, I'm new to ruby, but I use Python for some years now.
> So I suggest the following 'pythonic' solution ;-)
[...]
> Is there a shorter or more elegant way to do this
> 'decorate-sort-undecorate' in ruby ?

arr.sort_by{rand}

> In Python it is quite easy by using list comprehensions...

What does that look like in Python?
Is it shorter than the above one? }:-)
schmitt (Guest)
on 2005-12-06 16:34
(Received via mailing list)
||
|| On Tue, Dec 06, 2005 at 07:32:34PM +0900, Uwe S. wrote:
|| > Hi, I'm new to ruby, but I use Python for some years now.
|| > So I suggest the following 'pythonic' solution ;-)
|| [...]
|| > Is there a shorter or more elegant way to do this
|| > 'decorate-sort-undecorate' in ruby ?
||
|| arr.sort_by{rand}

that is not equivalent to my solution.
the python equivalent to yours is

  arr.sort(key = lambda x: random())

||
|| > In Python it is quite easy by using list comprehensions...
|| What does that look like in Python?

from random import random

def shuffle(a):
    b = [ (random(), i) for i in a]
    b.sort()
    return [ x[1] for x in b ]

print shuffle(range(10))

|| Is it shorter than the above one? }:-)
||
|| --
|| Mauricio F.

greetings, Uwe


||
michael.ulm (Guest)
on 2005-12-06 17:07
(Received via mailing list)
Uwe S. wrote:

>
> from random import random
>
> def shuffle(a):
>     b = [ (random(), i) for i in a]
>     b.sort()
>     return [ x[1] for x in b ]
>
> print shuffle(range(10))
>

I would have thought those two functions do the same thing
with the exception that shuffle does not sort in place.
I'm probably missing something obvious. Could you explain
the difference between the two versions?

Michael
chneukirchen (Guest)
on 2005-12-06 17:44
(Received via mailing list)
Mauricio Fernández <removed_email_address@domain.invalid> writes:

> same value, but I still have to think about the effect of qsort.
And that is a good thing?  How can the sequence be "random" if it
never contains the same two numbers in row?  (Remember Enigma...)
schmitt (Guest)
on 2005-12-06 18:04
(Received via mailing list)
|| > || arr.sort_by{rand}
|| > that is not equivalent to my solution.
|| > the python equivalent to yours is
|| >   arr.sort(key = lambda x: random())

|| > || > In Python it is quite easy by using list comprehensions...
|| > || What does that look like in Python?
|| >
|| > from random import random
|| >
|| > def shuffle(a):
|| >     b = [ (random(), i) for i in a]
|| >     b.sort()
|| >     return [ x[1] for x in b ]
|| >
|| > print shuffle(range(10))

|| I would have thought those two functions do the same thing
|| with the exception that shuffle does not sort in place.
|| I'm probably missing something obvious. Could you explain
|| the difference between the two versions?

If you use random as sorting key, the result of a<=>b
will differ each time you evaluate a<=>b.
In my case I attach a random number to each item,
then I sort my list according to this number and remove
this 'decoration' afterwards.

Pythonprogrammers call this pattern "decorate-sort-undecorate".

As I do not now anything about the ruby interanls of sort,
I prefer my version.

Greetings, Uwe.
decoux (Guest)
on 2005-12-06 18:08
(Received via mailing list)
>>>>> "U" == Uwe S. <removed_email_address@domain.invalid> writes:

U> If you use random as sorting key, the result of a<=>b
U> will differ each time you evaluate a<=>b.
U> In my case I attach a random number to each item,
U> then I sort my list according to this number and remove
U> this 'decoration' afterwards.

 run

  ri Enum#sort_by

 and see what it do.


Guy Decoux
billk (Guest)
on 2005-12-06 18:12
(Received via mailing list)
Hi,

From: "Uwe S." <removed_email_address@domain.invalid>
>   arr.sort(key = lambda x: random())
No, Array#sort_by does the 'decorate-sort-undecorate', aka.
Schwartzian Transform.

I don't read Python very well, but I think the ruby equivalent
to 'arr.sort(key = lambda x: random())' might be:

  arr.sort { rand <=> rand }

But that is different from arr.sort_by {rand}


Regards,

Bill
schmitt (Guest)
on 2005-12-06 18:37
(Received via mailing list)
||
|| >>>>> "U" == Uwe S. <removed_email_address@domain.invalid> writes:
||
|| U> If you use random as sorting key, the result of a<=>b
|| U> will differ each time you evaluate a<=>b.
|| U> In my case I attach a random number to each item,
|| U> then I sort my list according to this number and remove
|| U> this 'decoration' afterwards.
||
||  run
||
||   ri Enum#sort_by
||
||  and see what it do.

Thanks. As I told you before I'm new to ruby ;-)

Greetings, Uwe
mfp (Guest)
on 2005-12-06 19:26
(Received via mailing list)
On Wed, Dec 07, 2005 at 12:41:39AM +0900, Christian N. wrote:
> >
> > so it would seem that two consecutive calls to rand() cannot return the
> > same value, but I still have to think about the effect of qsort.
>
> And that is a good thing?  How can the sequence be "random" if it
> never contains the same two numbers in row?  (Remember Enigma...)

Nah, I misread genrand_real; the above shows it is possible.
surrender_it (Guest)
on 2005-12-06 19:58
(Received via mailing list)
Uwe S. ha scritto:
> || > def shuffle(a):
>
> If you use random as sorting key, the result of a<=>b
> will differ each time you evaluate a<=>b.
> In my case I attach a random number to each item,
> then I sort my list according to this number and remove
> this 'decoration' afterwards.
>
> Pythonprogrammers call this pattern "decorate-sort-undecorate".

Are you sure of what you say? AFAIK
sort(key=lambda x: somefunc..))

will do decorate/sort/undecorate under the hood, behaving like ruby's
#sort_by and like your version.
OTOH
sort(cmp=lambda x,y: something.. ) will behave like ruby's #sort.
steve (Guest)
on 2005-12-06 21:53
(Received via mailing list)
Ok, I think I've beat sort_by{ rand } on large Arrays , by about 30%.  I
ended up with this:

     h = Hash.new
     self.each { |v| h[rand(1000000000)] = v }
     h.keys.sort.collect { |k| h[k] }

Explanation..

* Hash works better than an Array of Arrays
[[rand,x[0]],[rand,x[1]],...] because Array#<=> seems to be slow.

* Got some speedup by avoiding Float#<=>, hence the Integer version of
rand.

* On rand(1000000000); chose 1billion rather arbitrarily to keep things
out of the range of BigNum.  Oddly, using 2^31-1 caused BigNums to show
up.  Ideally we'd use the largest FixNum, but I don't know how to
retrieve that.

* Bias should be minimal for most applications, even at rand(1Billion).

* Can it be improved further???

Find my benchmark results, and script below.

--Steve


                       user     system      total        real
shuffle (10)      0.016000   0.000000   0.016000 (  0.016000)
shuffle2 (10)     0.000000   0.000000   0.000000 (  0.000000)
shuffle3 (10)     0.000000   0.000000   0.000000 (  0.000000)
sort (10)         0.000000   0.000000   0.000000 (  0.000000)
sort_by (10)      0.000000   0.000000   0.000000 (  0.000000)

                       user     system      total        real
shuffle (100)     0.156000   0.000000   0.156000 (  0.156000)
shuffle2 (100)    0.047000   0.000000   0.047000 (  0.046000)
shuffle3 (100)    0.031000   0.000000   0.031000 (  0.032000)
sort (100)        0.125000   0.000000   0.125000 (  0.125000)
sort_by (100)     0.031000   0.000000   0.031000 (  0.031000)

                       user     system      total        real
shuffle (1000)    1.625000   0.000000   1.625000 (  1.625000)
shuffle2 (1000)   0.656000   0.000000   0.656000 (  0.672000)
shuffle3 (1000)   0.266000   0.000000   0.266000 (  0.281000)
sort (1000)       2.094000   0.000000   2.094000 (  2.093000)
sort_by (1000)    0.453000   0.000000   0.453000 (  0.453000)

                       user     system      total        real
shuffle (10000)  16.469000   0.032000  16.501000 ( 16.578000)
shuffle2 (10000)  8.156000   0.000000   8.156000 (  8.203000)
shuffle3 (10000)  3.344000   0.047000   3.391000 (  3.391000)
sort (10000)     25.453000   0.000000  25.453000 ( 25.593000)
sort_by (10000)   4.765000   0.000000   4.765000 (  4.796000)



#!/usr/bin/env ruby

require 'benchmark'

class Array

   def shuffle
     newarr = self.dup
     2.times do
       self.length.times do |x|
         y = rand(self.length)
         newarr[x], newarr[y] = newarr[y], newarr[x]
       end
     end
     newarr
   end

   def shuffle2
     tmparr = self.collect { |x| [rand(1000000000),x] }
     tmparr.sort.collect { |x| x[1] }
   end

   def shuffle3
     h = Hash.new
     self.each { |v| h[rand(1000000000)] = v }
     h.keys.sort.collect { |k| h[k] }
   end

end

[10, 100, 1000, 10000].each do |n|

   # populate an Array size n
   a = Array.new(n) { |i| i }

   # run our benchmarks
   Benchmark.bm(16) do |bb|

     # my method
     bb.report("shuffle ("+n.to_s+")") do
       100.times do
         a.shuffle
       end
     end

     # decorate, sort, undecorate
     bb.report("shuffle2 ("+n.to_s+")") do
       100.times do
         a.shuffle2
       end
     end

     # decorate, sort, undecorate with a hash
     bb.report("shuffle3 ("+n.to_s+")") do
       100.times do
         a.shuffle3
       end
     end

     # Array's sort
     bb.report("sort ("+n.to_s+")") do
       100.times do
         a.sort { rand(10000) <=> 5000 }
       end
     end

     # Enumerable's sort_by
     bb.report("sort_by ("+n.to_s+")") do
       100.times do
         a.sort_by { rand }
       end
     end

     printf("\n")

   end

end
vjoel (Guest)
on 2005-12-06 21:57
(Received via mailing list)
Stephen W. wrote:

> * On rand(1000000000); chose 1billion rather arbitrarily to keep things
> out of the range of BigNum.  Oddly, using 2^31-1 caused BigNums to show
> up.  Ideally we'd use the largest FixNum, but I don't know how to
> retrieve that.

2**30 - 1 is reliably a fixnum (though I guess it might not be the
biggest one on some platforms).
steve (Guest)
on 2005-12-06 22:01
(Received via mailing list)
Stephen W. wrote:
>
>     h = Hash.new
>     self.each { |v| h[rand(1000000000)] = v }
>     h.keys.sort.collect { |k| h[k] }

OOPS!

I forgot to deal with collisions in the hash!   :(

--Steve
steve (Guest)
on 2005-12-06 22:05
(Received via mailing list)
Stephen W. wrote:
> I forgot to deal with collisions in the hash!   :(

Here's another shot.. still faster than sort_by{rand} on my system, and
now should be unbiased:

   def shuffle3
     h = Hash.new
     self.each do |v|
       k = rand(1000000000) until !h.has_key?(k)
       h[k] = v
     end
   end

shuffle3 (10000)  3.625000   0.016000   3.641000 (  3.656000)
sort_by (10000)   5.250000   0.000000   5.250000 (  5.312000)

--Steve
mfp (Guest)
on 2005-12-06 23:19
(Received via mailing list)
On Wed, Dec 07, 2005 at 05:04:13AM +0900, Stephen W. wrote:
> Stephen W. wrote:
> >I forgot to deal with collisions in the hash!   :(
>
> Here's another shot.. still faster than sort_by{rand} on my system, and
> now should be unbiased:
>
>   def shuffle3
>     h = Hash.new
>     self.each do |v|
>       k = rand(1000000000) until !h.has_key?(k)

This is not equivalent to
    begin
      k = rand(1000000000)
    end until !h.has_key?


Compare
    k = 1 until true
    k                                                  # => nil
to
    begin; k = 1 end until true
    k                                                  # => 1


>       h[k] = v

You forgot
     h.keys.sort.collect { |k| h[k] }

>     end
>   end
>
> shuffle3 (10000)  3.625000   0.016000   3.641000 (  3.656000)
> sort_by (10000)   5.250000   0.000000   5.250000 (  5.312000)


After fixing shuffle3, I get:

module Enumerable
  def shuffle3
    h = Hash.new
    k = rand(1000000000)
    self.each do |v|
      k = rand(1000000000) until !h.has_key?(k)
      h[k] = v
    end
    h.keys.sort.collect { |k| h[k] }
  end

  def shuffle4
    h = {}
    k = rand(1000000000)
    self.sort_by{ k = rand(1000000000) while h.has_key?(k); h[k] = k }
  end
end

GC.disable
(1..10).shuffle4                                   # => [8, 3, 10, 2, 4,
6, 7, 5, 9, 1]
a = 1..100000
t0 = Time.new                                      # => Tue Dec 06
22:13:56 CET 2005
a.shuffle4
(t1 = Time.new) - t0                               # => 0.766411
a.shuffle3
(t2 = Time.new) - t1                               # => 0.575967
a.sort_by{ rand }
Time.new - t2                                      # => 0.637288
hmassa (Guest)
on 2005-12-07 17:01
(Received via mailing list)
Why not
 a.dup.sort { 2*rand(12345) <=> 12345 }

? that way, we'd have NO colisions ever.
ruby.brian (Guest)
on 2005-12-07 17:14
(Received via mailing list)
On 07/12/05, removed_email_address@domain.invalid 
<removed_email_address@domain.invalid> wrote:
> Why not
>  a.dup.sort { 2*rand(12345) <=> 12345 }
>
> ? that way, we'd have NO colisions ever.
>
>
>

What are you all trying to achieve by variations of
   rand <=> rand,

 why not simply use
  rand(3) - 1
?

which is not in danger of decreasing randomness by combining random
numbers.

cheers,

Brian

--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/
michael.ulm (Guest)
on 2005-12-07 17:22
(Received via mailing list)
removed_email_address@domain.invalid wrote:

> Why not
>  a.dup.sort { 2*rand(12345) <=> 12345 }
>
> ? that way, we'd have NO colisions ever.
>

What you have posted does not avoid collisions.
You probably mean something like

a.sort {2*rand(12345) <=> 2 * rand(12345) + 1}

However, the problem of the  sort {rand<=>rand}
methods are not collisions, but the skewed results
from the algorithm that assumes a consistent
comparison operator. Quoting from an earlier
post of mine:

ar = [0, 1, 2]; result = Hash.new(0)
100000.times {result[ar.sort {rand <=> rand}] += 1}
p result
# => {[2, 1, 0]=>25235,
       [2, 0, 1]=>12285,
       [1, 0, 2]=>12426,
       [0, 1, 2]=>25111,
       [1, 2, 0]=>12468,
       [0, 2, 1]=>12475}

You get basically the same result with your method

HTH,

Michael
ruby.brian (Guest)
on 2005-12-07 17:54
(Received via mailing list)
On 07/12/05, Michael U. <removed_email_address@domain.invalid> wrote:
>
> a.sort {2*rand(12345) <=> 2 * rand(12345) + 1}
>
> [snip]
>

and then he should do us a favour and write
> a.sort { [-1, 1][rand(2)] }
instead, which gives the same skewed results but at least does not do
it as complicated.

Brian


--
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/
steve (Guest)
on 2005-12-07 18:31
(Received via mailing list)
On Dec 7, 2005, at 6:57 AM, removed_email_address@domain.invalid wrote:

> Why not
>  a.dup.sort { 2*rand(12345) <=> 12345 }

Because it's the slowest version I've tested.  See the rest of the
thread for my benchmarks.  In order of speed, we have 'shuffle3',
'sort_by {rand}', and 'shuffle4'.

--Steve
martindemello (Guest)
on 2005-12-07 22:28
(Received via mailing list)
Stephen W. <removed_email_address@domain.invalid> wrote:
>
>
> This is really just meant to be a simple hack.  I think a better
> algorithm could assign a random index to each element, then sort on
> those indices.

http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code,
with a nice proof that it's unbiased.

http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on
shuffling algorithms.

martin
steve (Guest)
on 2005-12-07 23:18
(Received via mailing list)
Martin DeMello wrote:
> http://www.rubygarden.org/ruby?ArrayShuffle is a variant of your code,
> with a nice proof that it's unbiased.
>
> http://c2.com/cgi/wiki?LinearShuffle has a lot of discussion on
> shuffling algorithms.

Thanks Martin!

--Steve
This topic is locked and can not be replied to.