Not So Random (#173)

mansfiem · August 22, 2008, 5:30am

Apologies for the late summary… I’m moving to a new place today,
tomorrow, this weekend… and start classes on Monday. Also, look for
the next quiz to show up this Saturday, not tomorrow.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Procedural generation is a valuable technique for computer
graphics and games that relies on algorithms to produce content on the
fly. Use of procedural techniques have increased in recent years, due
to the large volume of content that must be created… too large or
unwieldy to be completed by hand in a short duration. But procedural
generation is not new. Early computer games such as Elite and
Rescue on Fractalus used algorithms, rather than preformed data, to
represent large worlds on machines with limited storage.

On one hand, CPU cycles are traded for storage limitations. On the
other, CPU cycles are exchanged for human and schedule limitations.
Essentially, procedural generation is a way to swap CPU time for other
limited resources. But if the same algorithms are run repeatedly,
won’t everything look the same? An algorithm, given the same input, by
definition produces the same output.

That’s where pseudorandom numbers can help, both to provide variety
and maintain repeatability. A random sequence starting with a seed
value of 42 will – when put through a particular algorithm – always
generate the same output. Yet, start the sequence with seed value 43,
and the same algorithm will generate new output. We can generate a
forest of randomly generated trees, each looking similar but unique,
and not have the trees randomly change every frame.

Do we need more than one generator? It depends on your algorithms and
how you generate the content. It could work, but it may be easier to
keep some things separate. One random number generator for the forest.
Another generator for the city buildings. Another one for people. That
way, I can avoid generating information for the forest if it happens
to go offscreen, because the PRNGs for the buildings and the people
are independent of the forest PRNG.

Generally, for task such as this, you don’t usually need a generator
such as the Mersenne Twister. A linear congruential generator,
while not really very random, is quite simple, fast and often
sufficient. For such a task, Ruby code of a few lines would have been
completely sufficient.

But I wanted to see how we could convince Ruby’s rand to do what we
wanted, without writing a new PRNG. I’m glad I asked, because after
playing with closures and continuations, I conceived only one solution
myself… I knew there had to be another, better way.

But first, let’s look at the first solution from Frederick C.,
which implements the only idea I had.

class Random
   def initialize(*rand_args)
     @rand_args = rand_args
     ignored, @start_seed = srand, srand
     @sequence = 0
   end

   def next
     srand(@start_seed)
     @sequence.times { rand *@rand_args}
     @sequence += 1
     rand *@rand_args
   end

   def reset
     @sequence = 0
   end
end

At first, I thought Frederick was the only one who implemented a
solution that would allow rand to produce random floats between zero
and one (i.e. what you get when you call rand with no parameters). I
consider this important behavior. However, it turns out that passing
zero to rand induces the same behavior, so props to everyone who
provided a default value of zero.

This implementation is straightforward. Each time we call next to
get the next random number in the sequence, we restart the sequence,
calling rand a number of times equal to @sequence, which is
incremented each call. This is very simple and guarantees we always
get back to the correct point, whether or not other Random objects
have generated numbers.

(Note that while I did mention concurrent in the quiz description, I
did not mean to imply multithreaded, so for our purposes here, this is
sufficient.)

I found this line in the initializer a little curious:

   ignored, @start_seed = srand, srand

srand returns the previous seed. Calling it twice in a row will
first set the seed, then return that and set it again to a new value.
But the saved seed is what is used in calls to next. It’s a nice
trick to get the (time-based) seed generated by srand.

While simple, this solution is highly inefficient. Procedural
generation can require a lot of random numbers, and this PRNG
algorithm is quadratic. (That is, O(n^2).) Very quickly, this
generator would become painfully slow.

One possible improvement, provided in a few solutions, is caching.
Since we’re generating the same exact data, we can save our work and
reuse it later. In our contrived situation, This isn’t as bad as it
sounds. Although memory use must be considered, it’s not unlikely that
a particular Random object – a particular sequence – will always
generate the same quantity of numbers every time. If we’re not
uprooting trees in that forest, it’s likely we need to continue
generating the same number of trees. Of course, in typical situations,
you wouldn’t force yourself into this quiz’s confines, but if a small
cache and a one-time generation is sufficient, you could trade a
little memory for CPU time, if need be.

To finish up, let’s look at brabuhr’s clean, compact solution.

require 'slave'

class Random
  def initialize(ceiling, seed = nil)
    @ceiling = ceiling
    @seed = seed || rand(2112)
    @slave = Slave::new{
      lambda {|reset|
        if reset
          srand(@seed)
        else
          rand(@ceiling)
        end
      }
    }
    self.reset
  end

  def next
    @slave.object.call(false)
  end

  def reset
    @slave.object.call(true)
  end
end

The Slave library combines a forked Ruby process with DRb
(Distributed Ruby) and creates a client-server relationship between
the two processes. This relationship is created in the call to
Slave::new, which passes the following block to the slave:

      lambda { |reset|
        if reset
          srand(@seed)
        else
          rand(@ceiling)
        end
      }

Calling this block with a boolean (as done in the next and reset
methods) simply invokes either srand or rand. But because the
slave object is a forked Ruby process, it contains a separate built-in
seed for those methods. Each Random instance will create such a
slave, thereby creating independent seeds that cannot interfere with
one another. In my own experience, I am so used to threading (i.e.
shared state and address space), the idea of using a forked process
(i.e. independent state and address space) didn’t cross my mind.

Using Slave may have some overhead from the interprocess
communication. However, it is a constant-time operation: it should
easily beat the first Random implementation for performance for
typical use.

mansfiem · August 23, 2008, 6:11pm

On 22 Aug 2008, at 04:24, Matthew M. wrote:

I found this line in the initializer a little curious:
  ignored, @start_seed = srand, srand
srand returns the previous seed. Calling it twice in a row will
first set the seed, then return that and set it again to a new value.
But the saved seed is what is used in calls to next. It’s a nice
trick to get the (time-based) seed generated by srand.

It’s worth noting that (when available) srand with no arguments will
try and use things like /dev/random to seed itself so you can get
yourself a nice random seed.

Fred