Genetic Programming (#212)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Q.:

  1. Please do not post any solutions or spoiler discussion for this
    quiz until 48 hours have elapsed from the time this message was
    sent.

  2. Support Ruby Q. by submitting ideas and responses
    as often as you can!
    Visit: http://rubyquiz.strd6.com/suggestions

  3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby T. follow the discussion. Please reply to
the original quiz message, if you can.

RSS Feed: http://rubyquiz.strd6.com/quizzes.rss

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Genetic Programming (#212)

Buon giorno Rubyists,

Let’s say that we have a table of outputs for an unknown function, and
we want to find out what that function is. One possible method of
finding a solution is to use genetic programming1. In genetic
programming we begin with a random population of programs, then rank
them according to how well they meet the solution criteria. The top
ranked programs are then modified to create a new generation of
programs. The new generation is ranked by how well the solve the
problem in the same way and the process repeats until, hopefully,
we’ll have evolved a strong solution.

The modifications made to the programs are based on techniques
observed in biological evolution: mutation and crossover. In mutation
a piece of one program is altered randomly, for example x + 3 could
become x + (y - 1). The mutation occurs at one node in the parse
tree and replaces it with a new randomly generated node.

Crossover works in much the same way, except instead of randomly
altering the node it is taken from another node of another program.
For example: x + (y * x) and 3 - (y + x*x) could yield something
like (y*x) - (y + x*x).

In order to mutate our programs it would be nice to work with the
structure of the code directly. The ParseTree2 gem provides a way of
getting an Abstract Syntax Tree3 to manipulate. The AST represents
the code as S-expressions4, arrays containing operators as the first
argument and the operands as the remaining arguments. If you are
familiar with LISP then you know that LISP programmers program
directly in S-expressions. Using S-expressions makes evolution and
manipulation of the programs much easier, as S-expressions are the
essence of programs.

When manipulating your ASTs you may need a deep copy to prevent
unwanted side effects Here is a deep copy idiom that may be useful:

def deep_copy(tree)
  Marshal.load(Marshal.dump(tree)
end

In order to get the AST back into executable ruby you’ll need
ruby2ruby5 or something like it. As always, feel encouraged to use
and share other tools if you feel they are good for the task. Also, if
you have any questions don’t be afraid to ask, there are many people
on the mailing list who are happy to help!

Here is the table of outputs for the mystery function that we would
like to evolve an algorithm for:
[“x”, “y”, “result”]
[8, 16, 20808]
[22, 31, 150847]
[5, 16, 20685]
[12, 19, 34895]
[18, 25, 79349]
[20, 33, 181525]
[31, 1, -119]
[19, 33, 181433]
[0, 12, 8640]
[13, 12, 9017]

In order to determine how well suited a program is we can use the
following metric function:

# Return how far off from the data the given method is
# A lower score is better, a score of 0 matches the data perfectly
def fitness(program, data)
  delta = 0
  data.each do |row|
    value = program.calculate(row[0], row[1])
    delta += (value - row[2]).abs
  end
  return delta
end

Have Fun!

%w[rubygems charlie].each{|lib| require lib}
class Quiz212 < TreeGenotype([:x,:y,proc{rand(20)-10}], [:-@],
[:+,:*,:-])
DATA = [[8, 16, 20808] , [22, 31, 150847], [5, 16, 20685], [12,
19, 34895], [18, 25, 79349],
[20, 33, 181525], [31, 1, -119] , [19, 33, 181433], [0,
12, 8640], [13, 12, 9017]]
SIZE_PENALTY = 100
def fitness
-DATA.inject(0){|f,(x,y,z)| f + ( z - eval_genes(:x=>x,:y=>y)
).abs } - size * SIZE_PENALTY
end
end

Population.new(Quiz212).evolve_on_console(200)

The best result I got using size penalty 100 has fitness -2296 and tree:
[:, [:term, :y], [:+, [:+, [:term, :y], [:term, :x]], [:, [:*,
[:term, -5], [:term, :y]], [:-@, [:term, :y]]]]]
Which simplifies to 5y^3 + y^2 + xy.

Not including a penalty for size in the fitness function causes bloat.
However, including such a penalty causes it to never find the lower
order terms as the smaller size is considered more important than
slightly better accuracy. This is a well-known problem in GP.

them according to how well they meet the solution criteria. The top
ranked programs are then modified to create a new generation of
programs. The new generation is ranked by how well the solve the
problem in the same way and the process repeats until, hopefully,
we’ll have evolved a strong solution.

Some time ago, I ran across DRP, Directed Ruby P.ming[1], and
thought that might be something to look at later; I thought this quiz
might be a bit of a chance. (Though I only barely scratched the surface
of the library’s documentation.) First, here are some sample functions
my programs generated (fitness: function):

89416: 4312 + y * 7 * y * x
82779: ((x * 4 * (93 + y * 6 + x + 9 - x)) - 3) * y / 8 - 62 + 1 * x - 3
- 80 + 1395 + 0 / 870 + y * 6 + y - y - ((y)) - x - y + y - 16
- 657 + 880 - 7 * (6 / 6) * (y / 1) - 870 + 19 + y * y * (y) - y
- (62935) / (((x - x) - y / 1 * y / y) - x) - ((y + 4 - y + y * y
+ x)) + 6 - x * 6 * (((8 + 6 / 2563 * (y) + x / 9871671)) - y - 7
- x - x - 8) - 3361 + 9 - (43) / 675670 - y - x - 45 + 8 + (y /
1)
- 870 + 19 + y * y * (y) - y + y * x * (6 / 6) + 870 + 19 + y + x
* x - ((((10)))) - 8 + y - (x)
17387: x + (y) * 5 * y * y - ((x + y)) / 65 / y / y + y + 2 / (y * x +
95381 - x + x) - y * 0 / 50 * x * ((x)) * 7 - x * 4 + 4 * 9 * 6 *
y - 0 * (((y + x + x))) * (x + y) * (y + 78 * ((x / y * (y + 22 -
9) - ((x + y) - 684) - x - y - x + ((7 + y * ((785 / 4 - x * 8))
+
(8 - 4) + 8 / 7 + x + x))) / 4) - (8 - 9 * x - 0) * x + 2) * 91 *
76 * x + 220 + x * 4 - 2 * ((x)) * (y) * 4 - y * ((x / (y * 34 +
y
+ 4 - x * 8)) + 24) + y * 1 - (y) * (((y + x + x)))
8570: 5 * y * + + y * y - 29
8424: y * + 5 * + y * y + - x
8360: 5 * y * y * y / y * y - x / - - - y
8346: 5 * y * y * y - - - 1 + x - x
8338: y * ((5 * y)) * ((y))
8102: (8) - y + 60 + y * y * 5 * y
8056: 47 + 5 * y * y * y
7288: 988 + 5 * y * y * y - y * y / y + 0 * 894 / 2 - x * x / - + 1 - -
7
3948: y * 5 * (y + (y) * y - x)

Following the first example in drp’s INTRO file, I created a simple
class
to generate random initial functions:

require ‘drp’

class ProgramGenerator
extend DRP::RuleEngine

begin_rules
def program
“Class.new{ #{function} }”
end

def function
  "def calculate(x, y); #{expression}; end"
end

def expression; "#{expression} #{binaryop} #{expression}"; end
def expression; "#{atom} #{binaryop} #{expression}"; end
def expression; "#{expression} #{binaryop} #{atom}"; end
def expression; "(#{expression})"; end
def expression; "#{atom}"; end

def binaryop; "+"; end
def binaryop; "-"; end
def binaryop; "*"; end
def binaryop; "/"; end

def unaryop; "-"; end

def atom; "#{variable}"; end
def atom; "#{literal}"; end

def variable; "x"; end
def variable; "y"; end

def literal; "#{rand(10)}"; end
def literal; "#{rand(9) + 1}#{literal}"; end

end_rules
end

irb(main):001:0> require ‘program_generator’
=> true
irb(main):002:0> pg = ProgramGenerator.new

irb(main):003:0> pg.expression
=> “x”
irb(main):004:0> pg.expression
=> “(y * 4 + 924)”
irb(main):006:0> pg.function
=> “def calculate(x, y); 26 / x + (60 - x) + y; end”
irb(main):007:0> pg.program
=> “Class.new{ def calculate(x, y); 136 * x + 14; end }”

I setup a “magic” hash to store my population of programs and a few
helpful methods:

Programs = Hash.new{|hash, program|
hash[program] = fitness(program)
}

class <<Programs
def generate_some_more(n = 100, pg = ProgramGenerator.new)
n.times{ self[pg.expression] }
self
end

def reject_the_unfit
self.delete_if{|program, fit| !fit}
end

def select_the_best(n = 5)
self.sort_by_fitness[0, n]
end

def sort_by_fitness(dont_randomize = true)
self.reject_the_unfit
self.sort_by{|program, fitness| dont_randomize ? fitness :
rand(fitness)}
end
end

irb(main):001:0> Programs
=> {}
irb(main):002:0> Programs.generate_some_more(2)
=> {“y”=>687122, “(x - 4 * x / 31 + (0 * 964 / 65 / y - 8 + 4 + (y * ((y
/ x *
54 - (x / ((986 - y + x - 24 + ((6 / 197 / 29 - 287)) / 9 + y * 2 / 2 *
7)) *
23 - 7) + x - 44 * 659 + (y * 1 - 8 / y / ((5 * x)) - 5) * 7632 / ((y))
/ x /
8856))) / x + x) - ((x))) - 7 - 0 - x”=>nil}
irb(main):003:0> Programs.select_the_best(1)
=> [[“y”, 687122]]

And wrapped that in a loop to generate/mutate a bunch of functions:

pm = ProgramMutator.new

1000.times do
Programs.generate_some_more.
sort_by_fitness(true).
each_cons(2).
each do |(p1, f1), (p2, f2)|
new_program = case rand(10)
when 3…9 then pm.cross_breed(p1, p2)
when 2 then pm.mutate(p2)
when 1 then pm.mutate(p1)
else ProgramGenerator.new.expression
end
new_fitness = Programs[new_program]
#puts “#{f1} X #{f2}\t#{new_fitness}”
end

program, fitness = Programs.select_the_best(1).flatten
puts “Best: #{fitness}\t#{program[0…60]}”
end

ruby1.9 -rubygems 212.rb
Best: 569018 28298
Best: 494580 3170 * x - y
Best: 494580 3170 * x - y
Best: 494580 3170 * x - y
Best: 357554 (4257 * y - 1)
Best: 357554 (4257 * y - 1)
Best: 357554 (4257 * y - 1)
Best: 357554 (4257 * y - 1)
Best: 212330 (y) * - 142 / - 25 * y * x
Best: 212330 (y) * - 142 / - 25 * y * x
Best: 187980 0 / y - - 8 * 25 * y * x
Best: 173870 248 * y * (x)
Best: 104800 26 - 2 + x * 7 * y * (y)
Best: 104800 26 - 2 + x * 7 * y * (y)
Best: 104800 26 - 2 + x * 7 * y * (y)
Best: 92312 8526 - x + y * + 8 * y * x
^C

That leaves the ProgramMutator class:

class ProgramMutator
def cross_breed(p1, p2)
#…
end

def mutate(program)
#…
end
end

The machine I’m using today doesn’t have ParseTree (“ParseTree doesn’t
work
with ruby 1.9.0 (LoadError)”). So, I was just doing string
manipulations.
I don’t know which functions produced which of the results from above,
but
a few samples:

def cross_breed(p1, p2)
p1 + %{w + - * /}[rand(4)] + p2
end

def cross_breed(p1, p2)
s1 = p1[0…rand(p1.size)]
s2 = p2[rand(p2.size)…-1]

s1 + %w{ + - * / }[rand(4)] + s2

end

def mutate(program)
program.gsub(
“#{%w{ + - * / }[rand(4)]}”,
“#{%w{ + - * / }[rand(4)]}”
)
end

[1] http://drp.rubyforge.org/

Sander L.'s solution used Charlie1 a library for genetic
algorithms and genetic programming. Let’s take a look at Sander’s
solution:

%w[rubygems charlie].each{|lib| require lib}
class Quiz212 < TreeGenotype([:x,:y,proc{rand(20)-10}],  [:-@], 

[:+,:*,:-])
DATA = [[8, 16, 20808], [22, 31, 150847], [5, 16, 20685], [12,
19, 34895], [18, 25, 79349],
[20, 33, 181525], [31, 1, -119], [19, 33, 181433], [0,
12, 8640], [13, 12, 9017]]
SIZE_PENALTY = 100

  def fitness
    -DATA.inject(0) do |f,(x,y,z)|
      f + ( z - eval_genes(:x=>x,:y=>y)).abs
    end - size * SIZE_PENALTY
  end
end

Here Sander creates a class named Quiz212. Quiz212 extends
TreeGenotype, provided by the Charlie gem. TreeGenotype takes three
arguments: terminals, unary_ops, binary_ops. These are used to
construct the tree representing the program. The class also defines a
fitness function which is used to evaluate the programs generated by
TreeGenotype. The data, as well as a size penalty are stored as
constants within the class.

The Charlie gem also provides a Population class. The population’s
initializer takes a genotype class as a parameter and uses that to
evolve the population. The population also provides an
evolve_on_console method that displays the current best program each
generation.

Population.new(Quiz212).evolve_on_console(200)

Charlie looks like a great way to get started trying out genetic
programming. The TreeGenotype is just one of many possible options
available in the library. Please do check it out.

Brabuhr provided a solution using Directed Ruby P.ming2. DRP
uses Grammatical Evolution3 as a foundation for Genetic Programming.
DRP let’s you define a grammar to constrain the kinds of programs that
are generated. Here’s an example of defining an expression using
DRP::RuleEngine:

def expression; "#{expression} #{binaryop} #{expression}"; end

We can also define a binary operator in a similar way:

def binaryop; "/"; end

These rules, along with any others that we define, are used by the
rule engine as a grammar to generate candidate programs. brabuhr
collects these generated programs and applies a fitness function to
see how well their output matches the data. These programs are then
sorted and mutated/crossed to create the next generation of programs.
Be sure to take a look at brabuhr’s full solution for more details.

This week we saw two interesting options for working with genetic
programming, Charlie, and DRP. Both of these libraries make working
with genetic programming much easier by providing an interface to the
common tools and techniques used.

And now what you’ve all been waiting for… the mystery function:

def mystery(x, y)
  3 * x * y + 5 * (y**3) - 7 * x
end

Thank you Sander and brabuhr for your solutions to this week’s quiz!

On Thu, Jul 9, 2009 at 4:34 PM, [email protected] wrote:

some sample functions my programs generated (fitness: function):

89416: …
8338: y * ((5 * y)) * ((y))
8102: (8) - y + 60 + y * y * 5 * y
8056: 47 + 5 * y * y * y
7288: 988 + 5 * y * y * y - y * y / y + 0 * 894 / 2 - x * x / - + 1 - - 7
3948: y * 5 * (y + (y) * y - x)

I let it run over night:

3804: y * y + y * + y * + + - + - y * 5

On Tue, Jul 14, 2009 at 11:49 PM, Daniel M.[email protected] wrote:

And now what you’ve all been waiting for… the mystery function:

def mystery(x, y)
3 * x * y + 5 * (y**3) - 7 * x
end

Cool. I was not thinking and I rebooted while my script and output
were in /tmp; the machine is set to clear tmp on boot: so, I lost my
lowest-delta functions. But, I recovered these:

cat foo
DATA = [[8, 16, 20808], [22, 31, 150847], [5, 16, 20685], [12, 19,
34895],
[18, 25, 79349], [20, 33, 181525], [31, 1, -119] , [19, 33,
181433],
[0, 12, 8640], [13, 12, 9017]]

An example of one of the closer ones I found:

f = "(y) * + x - - - - + x - x - 3 * 82 / y + - 5 + + + x - y * y + y

  • 2 * (x + y) / 459 + + - + - - 5 + (y) - 2 + - + 5 + (y) / + 5 - - -
        • x - 3 * 8 + x - + (y) * 5 - x - - x + (y) * 5 + y * (y) / (y)
  • y * (y) * 5 + y * (y) * 5 / 3 / (y) * - - + x - 3 + (y) * y / (y) *
    y"
    puts DATA.map{|(x, y, ev)| puts “#{x}\t#{y}\t#{ev}\t#{av =
    eval(f)}\t#{delta = (ev - av).abs}”; delta}.inject{|s,v| v += s}
    puts

I think this was the second best I found; unfortunately, I did not

save

the original output. The version here was after some hand editing

while

pasting it into a spreadsheet, e.g.:

“x + + + - + y” => “x - y”; “x + - + - + y” => “x + y”; “(x)” =>

“x”

f = "-5 * y / 4 - x - y - 5 * 8 + y / 5 * -y / 5 + 5 * y - 8 / -8 - y
/ 5 / 4 * 4 + 5 - 7 - 3 / 6 - y / 5 * -4 / y * y + 3 / 6 * -x + 4 + 5

  • 1 - 3 / 6 / -8 * y + 8 / -8 + 8 * + 1 + y / 5 * -y / 5 * -4 * 4 / +
    y + 6 - y - 8 + 8 / -8 + 5 / 8 * -8 + y / 5 * 8 / -8 - y + 7 + y * x +
    5 * y * y * y + 3 * -x + y + 8 / 3 / 6 - y - 8 / -8 - y / 5 * -4 + y *
    y / y * x + 8 * y / -4 + 5 / 84 * 8 * -3 * 4 - x + y * x"
    puts DATA.map{|(x, y, ev)| puts “#{x}\t#{y}\t#{ev}\t#{av =
    eval(f)}\t#{delta = (ev - av).abs}”; delta}.inject{|s,v| v += s}
    puts

I added “- 5” to the above:

f = "-5 * y / 4 - x - y - 5 * 8 + y / 5 * -y / 5 + 5 * y - 8 / -8 - y
/ 5 / 4 * 4 + 5 - 7 - 3 / 6 - y / 5 * -4 / y * y + 3 / 6 * -x + 4 + 5

  • 1 - 3 / 6 / -8 * y + 8 / -8 + 8 * + 1 + y / 5 * -y / 5 * -4 * 4 / +
    y + 6 - y - 8 + 8 / -8 + 5 / 8 * -8 + y / 5 * 8 / -8 - y + 7 + y * x +
    5 * y * y * y + 3 * -x + y + 8 / 3 / 6 - y - 8 / -8 - y / 5 * -4 + y *
    y / y * x + 8 * y / -4 + 5 / 84 * 8 * -3 * 4 - x + y * x - 5"
    puts DATA.map{|(x, y, ev)| puts “#{x}\t#{y}\t#{ev}\t#{av =
    eval(f)}\t#{delta = (ev - av).abs}”; delta}.inject{|s,v| v += s}
    puts

ruby foo
8 16 20808 20816 8
22 31 150847 150842 5
5 16 20685 20681 4
12 19 34895 34916 21
18 25 79349 79369 20
20 33 181525 181526 1
31 1 -119 -128 9
19 33 181433 181435 2
0 12 8640 8602 38
13 12 9017 9057 40
148

8 16 20808 20810 2
22 31 150847 150859 12
5 16 20685 20681 4
12 19 34895 34902 7
18 25 79349 79361 12
20 33 181525 181530 5
31 1 -119 -78 41
19 33 181433 181436 3
0 12 8640 8625 15
13 12 9017 9028 11
112

8 16 20808 20805 3
22 31 150847 150854 7
5 16 20685 20676 9
12 19 34895 34897 2
18 25 79349 79356 7
20 33 181525 181525 0
31 1 -119 -83 36
19 33 181433 181431 2
0 12 8640 8620 20
13 12 9017 9023 6
92