Forum: Ruby Genetic Programming (#212)

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
33117162fff8a9cf50544a604f60c045?d=identicon&s=25 Daniel X Moore (yahivin)
on 2009-07-03 20:52
(Received via mailing list)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Quiz:

1.  Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have elapsed from the time this message was
sent.

2.  Support Ruby Quiz by submitting ideas and responses
as often as you can!
Visit: http://rubyquiz.strd6.com/suggestions

3.  Enjoy!

Suggestion:  A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion.  Please reply to
the original quiz message, if you can.

RSS Feed: http://rubyquiz.strd6.com/quizzes.rss

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

## Genetic Programming (#212)

Buon giorno Rubyists,

Let's say that we have a table of outputs for an unknown function, and
we want to find out what that function is. One possible method of
finding a solution is to use genetic programming[1]. In genetic
programming we begin with a random population of programs, then rank
them according to how well they meet the solution criteria. The top
ranked programs are then modified to create a new generation of
programs. The new generation is ranked by how well the solve the
problem in the same way and the process repeats until, hopefully,
we'll have evolved a strong solution.

The modifications made to the programs are based on techniques
observed in biological evolution: mutation and crossover. In mutation
a piece of one program is altered randomly, for example `x + 3` could
become `x + (y - 1)`. The mutation occurs at one node in the parse
tree and replaces it with a new randomly generated node.

Crossover works in much the same way, except instead of randomly
altering the node it is taken from another node of another program.
For example: `x + (y * x)` and `3 - (y + x*x)` could yield something
like `(y*x) - (y + x*x)`.

In order to mutate our programs it would be nice to work with the
structure of the code directly. The ParseTree[2] gem provides a way of
getting an Abstract Syntax Tree[3] to manipulate. The AST represents
the code as S-expressions[4], arrays containing operators as the first
argument and the operands as the remaining arguments. If you are
familiar with LISP then you know that LISP programmers program
directly in S-expressions. Using S-expressions makes evolution and
manipulation of the programs much easier, as S-expressions are the
essence of programs.

When manipulating your ASTs you may need a deep copy to prevent
unwanted side effects Here is a deep copy idiom that may be useful:

    def deep_copy(tree)
      Marshal.load(Marshal.dump(tree)
    end

In order to get the AST back into executable ruby you'll need
ruby2ruby[5] or something like it. As always, feel encouraged to use
and share other tools if you feel they are good for the task. Also, if
you have any questions don't be afraid to ask, there are many people
on the mailing list who are happy to help!

Here is the table of outputs for the mystery function that we would
like to evolve an algorithm for:
    ["x", "y", "result"]
    [8, 16, 20808]
    [22, 31, 150847]
    [5, 16, 20685]
    [12, 19, 34895]
    [18, 25, 79349]
    [20, 33, 181525]
    [31, 1, -119]
    [19, 33, 181433]
    [0, 12, 8640]
    [13, 12, 9017]

In order to determine how well suited a program is we can use the
following metric function:

    # Return how far off from the data the given method is
    # A lower score is better, a score of 0 matches the data perfectly
    def fitness(program, data)
      delta = 0
      data.each do |row|
        value = program.calculate(row[0], row[1])
        delta += (value - row[2]).abs
      end
      return delta
    end

Have Fun!

[1]: http://en.wikipedia.org/wiki/Genetic_programming
[2]: http://rubyforge.org/projects/parsetree/
[3]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
[4]: http://en.wikipedia.org/wiki/S-Expression
[5]: http://seattlerb.rubyforge.org/ruby2ruby/
38d738b4c7b89ab44ad6c0f7d56fb2e4?d=identicon&s=25 Sander Land (Guest)
on 2009-07-08 15:06
(Received via mailing list)
%w[rubygems charlie].each{|lib| require lib}
class Quiz212 < TreeGenotype([:x,:y,proc{rand(20)-10}],  [:-@],
[:+,:*,:-])
  DATA = [[8, 16, 20808]  , [22, 31, 150847],  [5, 16, 20685], [12,
19, 34895], [18, 25, 79349],
          [20, 33, 181525], [31, 1, -119]  , [19, 33, 181433],  [0,
12, 8640], [13, 12, 9017]]
  SIZE_PENALTY = 100
  def fitness
    -DATA.inject(0){|f,(x,y,z)| f + ( z - eval_genes(:x=>x,:y=>y)
).abs } - size * SIZE_PENALTY
  end
end

Population.new(Quiz212).evolve_on_console(200)

The best result I got using size penalty 100 has fitness -2296 and tree:
[:*, [:term, :y], [:+, [:+, [:term, :y], [:term, :x]], [:*, [:*,
[:term, -5], [:term, :y]], [:-@, [:term, :y]]]]]
Which simplifies to 5y^3 + y^2 + xy.

Not including a penalty for size in the fitness function causes bloat.
However, including such a penalty causes it to never find the lower
order terms as the smaller size is considered more important than
slightly better accuracy. This is a well-known problem in GP.
E0c987f680cd640c14912ebfbf0f0f07?d=identicon&s=25 unknown (Guest)
on 2009-07-09 22:35
(Received via mailing list)
> them according to how well they meet the solution criteria. The top
> ranked programs are then modified to create a new generation of
> programs. The new generation is ranked by how well the solve the
> problem in the same way and the process repeats until, hopefully,
> we'll have evolved a strong solution.

Some time ago, I ran across DRP, Directed Ruby Programming[1], and
thought that might be something to look at later; I thought this quiz
might be a bit of a chance.  (Though I only barely scratched the surface
of the library's documentation.)  First, here are some sample functions
my programs generated (fitness: function):

89416: 4312 + y * 7 * y * x
82779: ((x * 4 * (93 + y * 6 + x + 9 - x)) - 3) * y / 8 - 62 + 1 * x - 3
       - 80 + 1395 + 0 / 870 + y * 6 + y - y - ((y)) - x - y + y - 16
       - 657 + 880 - 7 * (6 / 6) * (y / 1) - 870 + 19 + y * y * (y) - y
       - (62935) / (((x - x) - y / 1 * y / y) - x) - ((y + 4 - y + y * y
       + x)) + 6 - x * 6 * (((8 + 6 / 2563 * (y) + x / 9871671)) - y - 7
       - x - x - 8) - 3361 + 9 - (43) / 675670 - y - x - 45 + 8 + (y /
1)
       - 870 + 19 + y * y * (y) - y + y * x * (6 / 6) + 870 + 19 + y + x
       * x - ((((10)))) - 8 + y - (x)
17387: x + (y) * 5 * y * y - ((x + y)) / 65 / y / y + y + 2 / (y * x +
       95381 - x + x) - y * 0 / 50 * x * ((x)) * 7 - x * 4 + 4 * 9 * 6 *
       y - 0 * (((y + x + x))) * (x + y) * (y + 78 * ((x / y * (y + 22 -
       9) - ((x + y) - 684) - x - y - x + ((7 + y * ((785 / 4 - x * 8))
+
       (8 - 4) + 8 / 7 + x + x))) / 4) - (8 - 9 * x - 0) * x + 2) * 91 *
       76 * x + 220 + x * 4 - 2 * ((x)) * (y) * 4 - y * ((x / (y * 34 +
y
       + 4 - x * 8)) + 24) + y * 1 - (y) * (((y + x + x)))
8570:  5 * y * + + y * y - 29
8424:  y * + 5 * + y * y + - x
8360:  5 * y * y * y / y * y - x / - - - y
8346:  5 * y * y * y - - - 1 + x - x
8338:  y * ((5 * y)) * ((y))
8102:  (8) - y + 60 + y * y * 5 * y
8056:  47 + 5 * y * y * y
7288:  988 + 5 * y * y * y - y * y / y + 0 * 894 / 2 - x * x / - + 1 - -
7
3948:  y * 5 * (y + (y) * y - x)

Following the first example in drp's INTRO file, I created a simple
class
to generate random initial functions:

require 'drp'

class ProgramGenerator
  extend DRP::RuleEngine

  begin_rules
    def program
      "Class.new{ #{function} }"
    end

    def function
      "def calculate(x, y); #{expression}; end"
    end

    def expression; "#{expression} #{binaryop} #{expression}"; end
    def expression; "#{atom} #{binaryop} #{expression}"; end
    def expression; "#{expression} #{binaryop} #{atom}"; end
    def expression; "(#{expression})"; end
    def expression; "#{atom}"; end

    def binaryop; "+"; end
    def binaryop; "-"; end
    def binaryop; "*"; end
    def binaryop; "/"; end

    def unaryop; "-"; end

    def atom; "#{variable}"; end
    def atom; "#{literal}"; end

    def variable; "x"; end
    def variable; "y"; end

    def literal; "#{rand(10)}"; end
    def literal; "#{rand(9) + 1}#{literal}"; end
  end_rules
end

irb(main):001:0> require 'program_generator'
=> true
irb(main):002:0> pg = ProgramGenerator.new
...
irb(main):003:0> pg.expression
=> "x"
irb(main):004:0> pg.expression
=> "(y * 4 + 924)"
irb(main):006:0> pg.function
=> "def calculate(x, y); 26 / x + (60 - x) + y; end"
irb(main):007:0> pg.program
=> "Class.new{ def calculate(x, y); 136 * x + 14; end }"

I setup a "magic" hash to store my population of programs and a few
helpful methods:

Programs = Hash.new{|hash, program|
  hash[program] = fitness(program)
}

class <<Programs
  def generate_some_more(n = 100, pg = ProgramGenerator.new)
    n.times{ self[pg.expression] }
    self
  end

  def reject_the_unfit
    self.delete_if{|program, fit| !fit}
  end

  def select_the_best(n = 5)
    self.sort_by_fitness[0, n]
  end

  def sort_by_fitness(dont_randomize = true)
    self.reject_the_unfit
    self.sort_by{|program, fitness| dont_randomize ? fitness :
rand(fitness)}
  end
end

irb(main):001:0> Programs
=> {}
irb(main):002:0> Programs.generate_some_more(2)
=> {"y"=>687122, "(x - 4 * x / 31 + (0 * 964 / 65 / y - 8 + 4 + (y * ((y
/ x *
 54 - (x / ((986 - y + x - 24 + ((6 / 197 / 29 - 287)) / 9 + y * 2 / 2 *
7)) *
 23 - 7) + x - 44 * 659 + (y * 1 - 8 / y / ((5 * x)) - 5) * 7632 / ((y))
/ x /
 8856))) / x + x) - ((x))) - 7 - 0 - x"=>nil}
irb(main):003:0> Programs.select_the_best(1)
=> [["y", 687122]]

And wrapped that in a loop to generate/mutate a bunch of functions:

pm = ProgramMutator.new

1000.times do
  Programs.generate_some_more.
           sort_by_fitness(true).
           each_cons(2).
           each do |(p1, f1), (p2, f2)|
    new_program = case rand(10)
      when 3..9 then pm.cross_breed(p1, p2)
      when    2 then pm.mutate(p2)
      when    1 then pm.mutate(p1)
      else           ProgramGenerator.new.expression
      end
    new_fitness = Programs[new_program]
    #puts "#{f1} X #{f2}\t#{new_fitness}"
  end

  program, fitness = Programs.select_the_best(1).flatten
  puts "Best: #{fitness}\t#{program[0..60]}"
end

> ruby1.9 -rubygems 212.rb
Best: 569018  28298
Best: 494580  3170 * x - y
Best: 494580  3170 * x - y
Best: 494580  3170 * x - y
Best: 357554  (4257 * y - 1)
Best: 357554  (4257 * y - 1)
Best: 357554  (4257 * y - 1)
Best: 357554  (4257 * y - 1)
Best: 212330  (y) * - 142 / - 25 * y * x
Best: 212330  (y) * - 142 / - 25 * y * x
Best: 187980  0 / y - - 8 * 25 * y * x
Best: 173870  248 * y * (x)
Best: 104800  26 - 2 + x * 7 * y * (y)
Best: 104800  26 - 2 + x * 7 * y * (y)
Best: 104800  26 - 2 + x * 7 * y * (y)
Best: 92312  8526 - x + y * + 8 * y * x
^C

That leaves the ProgramMutator class:

class ProgramMutator
  def cross_breed(p1, p2)
    #...
  end

  def mutate(program)
    #...
  end
end

The machine I'm using today doesn't have ParseTree ("ParseTree doesn't
work
with ruby 1.9.0 (LoadError)").  So, I was just doing string
manipulations.
I don't know which functions produced which of the results from above,
but
a few samples:

  def cross_breed(p1, p2)
    p1 + %{w + - * /}[rand(4)] + p2
  end

  def cross_breed(p1, p2)
    s1 = p1[0..rand(p1.size)]
    s2 = p2[rand(p2.size)..-1]

    s1 + %w{ + - * / }[rand(4)] + s2
  end

  def mutate(program)
    program.gsub(
      "#{%w{ + - * / }[rand(4)]}",
      "#{%w{ + - * / }[rand(4)]}"
    )
  end

[1] http://drp.rubyforge.org/
E0c987f680cd640c14912ebfbf0f0f07?d=identicon&s=25 unknown (Guest)
on 2009-07-10 16:00
(Received via mailing list)
On Thu, Jul 9, 2009 at 4:34 PM, <brabuhr@gmail.com> wrote:
> some sample functions my programs generated (fitness: function):
>
> 89416: ...
> 8338:  y * ((5 * y)) * ((y))
> 8102:  (8) - y + 60 + y * y * 5 * y
> 8056:  47 + 5 * y * y * y
> 7288:  988 + 5 * y * y * y - y * y / y + 0 * 894 / 2 - x * x / - + 1 - - 7
> 3948:  y * 5 * (y + (y) * y - x)

I let it run over night:

3804:  y * y + y * + y * + + - + - y * 5
33117162fff8a9cf50544a604f60c045?d=identicon&s=25 Daniel X Moore (yahivin)
on 2009-07-15 05:50
(Received via mailing list)
Attachment: 212.tar.gz (3 KB)
Sander Land's solution used Charlie[1] a library for genetic
algorithms and genetic programming. Let's take a look at Sander's
solution:

    %w[rubygems charlie].each{|lib| require lib}
    class Quiz212 < TreeGenotype([:x,:y,proc{rand(20)-10}],  [:-@],
[:+,:*,:-])
      DATA = [[8, 16, 20808], [22, 31, 150847],  [5, 16, 20685], [12,
19, 34895], [18, 25, 79349],
             [20, 33, 181525], [31, 1, -119], [19, 33, 181433],  [0,
12, 8640], [13, 12, 9017]]
      SIZE_PENALTY = 100

      def fitness
        -DATA.inject(0) do |f,(x,y,z)|
          f + ( z - eval_genes(:x=>x,:y=>y)).abs
        end - size * SIZE_PENALTY
      end
    end

Here Sander creates a class named `Quiz212`. `Quiz212` extends
TreeGenotype, provided by the Charlie gem. TreeGenotype takes three
arguments: `terminals, unary_ops, binary_ops`. These are used to
construct the tree representing the program. The class also defines a
fitness function which is used to evaluate the programs generated by
TreeGenotype. The data, as well as a size penalty are stored as
constants within the class.

The Charlie gem also provides a `Population` class. The population's
initializer takes a genotype class as a parameter and uses that to
evolve the population. The population also provides an
`evolve_on_console` method that displays the current best program each
generation.

    Population.new(Quiz212).evolve_on_console(200)

Charlie looks like a great way to get started trying out genetic
programming. The TreeGenotype is just one of many possible options
available in the library. Please do check it out.

Brabuhr provided a solution using Directed Ruby Programming[2]. DRP
uses Grammatical Evolution[3] as a foundation for Genetic Programming.
DRP let's you define a grammar to constrain the kinds of programs that
are generated. Here's an example of defining an expression using
`DRP::RuleEngine`:

    def expression; "#{expression} #{binaryop} #{expression}"; end

We can also define a binary operator in a similar way:

    def binaryop; "/"; end

These rules, along with any others that we define, are used by the
rule engine as a grammar to generate candidate programs. brabuhr
collects these generated programs and applies a fitness function to
see how well their output matches the data. These programs are then
sorted and mutated/crossed to create the next generation of programs.
Be sure to take a look at brabuhr's full solution for more details.

This week we saw two interesting options for working with genetic
programming, Charlie, and DRP. Both of these libraries make working
with genetic programming much easier by providing an interface to the
common tools and techniques used.

And now what you've all been waiting for... the mystery function:

    def mystery(x, y)
      3 * x * y + 5 * (y**3) - 7 * x
    end


Thank you Sander and brabuhr for your solutions to this week's quiz!

[1]: http://charlie.rubyforge.org/
[2]: http://drp.rubyforge.org/
[3]: http://en.wikipedia.org/wiki/Grammatical_evolution
E0c987f680cd640c14912ebfbf0f0f07?d=identicon&s=25 unknown (Guest)
on 2009-07-16 04:28
(Received via mailing list)
On Tue, Jul 14, 2009 at 11:49 PM, Daniel Moore<yahivin@gmail.com> wrote:
> And now what you've all been waiting for... the mystery function:
>
>    def mystery(x, y)
>      3 * x * y + 5 * (y**3) - 7 * x
>    end

Cool.  I was not thinking and I rebooted while my script and output
were in /tmp; the machine is set to clear tmp on boot: so, I lost my
lowest-delta functions.  But, I recovered these:

> cat foo
DATA = [[8, 16, 20808], [22, 31, 150847],  [5, 16, 20685], [12, 19,
34895],
        [18, 25, 79349], [20, 33, 181525], [31, 1, -119]  , [19, 33,
181433],
        [0, 12, 8640], [13, 12, 9017]]

# An example of one of the closer ones I found:

f = "(y) * + x - - - - + x - x - 3 * 82 / y + - 5 + + + x - y * y + y
- 2 * (x + y) / 459 + + - + - - 5 + (y) - 2 + - + 5 + (y) / + 5 - - -
+ + - + x - 3 * 8 + x - + (y) * 5 - x - - x + (y) * 5 + y * (y) / (y)
* y * (y) * 5 + y * (y) * 5 / 3 / (y) * - - + x - 3 + (y) * y / (y) *
y"
puts DATA.map{|(x, y, ev)| puts "#{x}\t#{y}\t#{ev}\t#{av =
eval(f)}\t#{delta = (ev - av).abs}";  delta}.inject{|s,v| v += s}
puts

# I think this was the second best I found; unfortunately, I did not
save
# the original output.  The version here was after some hand editing
while
# pasting it into a spreadsheet, e.g.:
#   "x + + + - + y" => "x - y";  "x + - + - + y" => "x + y";  "(x)" =>
"x"

f = "-5 * y / 4 - x - y - 5 * 8 + y / 5 * -y / 5 + 5 * y - 8 / -8 - y
/ 5 / 4 * 4 + 5 - 7 - 3 / 6 - y / 5 * -4 / y * y + 3 / 6 * -x + 4 + 5
+ 1 - 3 / 6 / -8 * y + 8 / -8 + 8 * + 1 + y / 5 * -y / 5 * -4 * 4 / +
y + 6 - y - 8 + 8 / -8 + 5 / 8 * -8 + y / 5 * 8 / -8 - y + 7 + y * x +
5 * y * y * y + 3 * -x + y + 8 / 3 / 6 - y - 8 / -8 - y / 5 * -4 + y *
y / y * x + 8 * y / -4 + 5 / 84 * 8 * -3 * 4 - x + y * x"
puts DATA.map{|(x, y, ev)| puts "#{x}\t#{y}\t#{ev}\t#{av =
eval(f)}\t#{delta = (ev - av).abs}";  delta}.inject{|s,v| v += s}
puts

# I added "- 5" to the above:

f = "-5 * y / 4 - x - y - 5 * 8 + y / 5 * -y / 5 + 5 * y - 8 / -8 - y
/ 5 / 4 * 4 + 5 - 7 - 3 / 6 - y / 5 * -4 / y * y + 3 / 6 * -x + 4 + 5
+ 1 - 3 / 6 / -8 * y + 8 / -8 + 8 * + 1 + y / 5 * -y / 5 * -4 * 4 / +
y + 6 - y - 8 + 8 / -8 + 5 / 8 * -8 + y / 5 * 8 / -8 - y + 7 + y * x +
5 * y * y * y + 3 * -x + y + 8 / 3 / 6 - y - 8 / -8 - y / 5 * -4 + y *
y / y * x + 8 * y / -4 + 5 / 84 * 8 * -3 * 4 - x + y * x - 5"
puts DATA.map{|(x, y, ev)| puts "#{x}\t#{y}\t#{ev}\t#{av =
eval(f)}\t#{delta = (ev - av).abs}";  delta}.inject{|s,v| v += s}
puts

> ruby foo
8  16  20808  20816  8
22  31  150847  150842  5
5  16  20685  20681  4
12  19  34895  34916  21
18  25  79349  79369  20
20  33  181525  181526  1
31  1  -119  -128  9
19  33  181433  181435  2
0  12  8640  8602  38
13  12  9017  9057  40
148

8  16  20808  20810  2
22  31  150847  150859  12
5  16  20685  20681  4
12  19  34895  34902  7
18  25  79349  79361  12
20  33  181525  181530  5
31  1  -119  -78  41
19  33  181433  181436  3
0  12  8640  8625  15
13  12  9017  9028  11
112

8  16  20808  20805  3
22  31  150847  150854  7
5  16  20685  20676  9
12  19  34895  34897  2
18  25  79349  79356  7
20  33  181525  181525  0
31  1  -119  -83  36
19  33  181433  181431  2
0  12  8640  8620  20
13  12  9017  9023  6
92
This topic is locked and can not be replied to.