Genetic Algorithms, Part 2

Improve Your Computer’s Sex Life

by Matt Neuburg

Matt Neuburg is the author of REALbasic: The Definitive Guide, and a member of the editorial board of REALbasic Developer. This article was originally published in REALbasic Developer Issue 2.1 (Aug/Sept 2003).

Last month’s column introduced genetic algorithms, where we simulate Mendelian / Darwinian evolution as a way of solving some problem. We start with a population of individuals, each endowed with some DNA. We regard each individual’s DNA as a possible answer to our problem, and we rank it according to how good an answer it is. We take the individuals with the best DNA and mate them, generating a new population, repeating the process again and again.

The “problem” in this case, you may recall, was to arrive at a good sequence of bytes, where “goodness” is defined as follows. Every byte is translated into an alphanumeric character or a space, thus arriving at a series of words. A word like “9BJ6” is bad, and reduces the cumulative value of the DNA by multiplying it by 0.8. A word comprising entirely digits, like “46”, adds to the cumulative value of the DNA a number of points equal to its length. The word “M” doubles the cumulative value of the DNA. And a word like “AG12589” — the letter “A” followed by only non-digits followed by only digits — multiplies the cumulative value of the DNA by the numeric value of its digits part. Although this is not an earth-shattering problem, it makes for a good example, since it turns out that in 100 generations the computer is able to optimize its byte sequence very strongly with respect to these rules.

As I said last time, the heart of the algorithm is a loop that looks a lot like one’s idea of real-life evolution:

Core loop:

  do
    evaluateFitness
    letTotallyUnfitDie
    selectParents
    breed
    mutate
    generation = generation + 1
  loop until generation >= generation_max

Recall that we have a class called Individual; that an Individual has a DNA property, which is a memoryblock of size 64, and a Fitness (double) property, to hold the results of evaluating its DNA; and that there is a Population, which is an array of 1024 Individuals. EvaluateFitness is a simple process of expressing in REALbasic the rules we’ve already described verbally. LetTotallyUnfitDie eliminates all Individuals whose Fitness is below 0.1. SelectParents was described last time: based on each Individual’s relative Fitness, he gets a certain number of “tickets” each of which allots him the right to have sex once. These tickets appear as the value of each Individual’s Parent property (an integer). We are now ready to breed our current Population of Individuals and obtain the Population of the next generation.

My approach to breeding is extremely simple. I mate two Individuals chosen at random, and append the resulting Individual to an array called Children. I also decrement the Parent property of each of the two parent Individuals, and if it is zero, I remove that Individual from the Population. Thus, I’m eventually left with an empty Population array and a full Children array; I then move all the Children into the Population array. Thus, my approach to the notion of generations involves complete simultaneous replacement: a generation breeds, Individuals die the moment they become sterile, and the entire next generation moves in without ever meeting their parents. As explained, however, in the superb online article by Hartmut Pohlheim (see References), this step, technically known as reinsertion, can be performed in other ways. What I’m doing is “pure reinsertion”, which has the disadvantage that parents can be replaced by worse children; a better approach is to use some cut-off point to replace just the least fit parents by the most fit children, thus giving every Individual a chance to live and breed for as long as he continues to excel amongst the Population. Be that as it may, here’s my Breed routine:

Breed:

Sub breed()
   dim children(popsize) as individual
   redim children(popsize-1)
   dim i,u as integer
   dim parent1, parent2 as individual
   dim parent1index, parent2index as integer
   dim origPop as integer
   origPop = ubound(population) + 1
   for i = ubound(population) downto 0
      if population(i).sterile then
         population.remove i
      end
   next
   u = popsize-1
   for i = 0 to u
      parent1index = ubound(population)
      parent2index = app.r.lessThan(parent1index)
      parent1 = population(parent1index)
      parent2 = population(parent2index)
      parent1.decreaseParent
      if parent1.sterile then
         population.remove parent1index
      end
      parent2.decreaseParent
      if parent2.sterile then
         population.remove parent2index
      end
      children(i) = parent1.breedWith(parent2)
   next
   redim population(u)
   for i = 0 to u
      population(i) = children(i)
   next
End Sub

(The property app.r is an instance of REALbasic 5’s Random class; because of the very annoying way this is implemented, you have to maintain an instance of it if you want a coherent random sequence over your program’s lifetime.)

The Individual method BreedWith simply calls an Individual constructor that makes a new Individual (the child) given two other Individuals (the parents). This, of course, is all-important. How should the DNA of two Individuals be combined to make the DNA of a new Individual? My original approach was to choose at random, for each byte of the child’s DNA, the value from one of the parents. Upon reading Polheim, I learned that this is called “uniform crossover”, and I decided I didn’t like it; eventually I opted for “multi-point crossover”, where the DNA is divided randomly into some predetermined number of segments (I chose 10), and the random choice between the parents is performed for each segment.

Individual.Individual:

Sub individual(parent1 as individual, parent2 as individual)
   self.dna = newMemoryBlock(dna_byte_length)
   self.survived = true
   self.parent = 0
   self.fitness = 0.0
   dim segs(crossover_segments) as integer
   dim i as integer
   for i = 1 to crossover_segments
      segs(i) = app.r.lessThan(dna_byte_length)
   next
   segs.remove(0)
   segs.sort
   segs.append dna_byte_length
   dim pickAParent as double
   dim byte as integer
   for each i in segs
      pickAParent = app.r.number
      while byte < i
         if pickAParent < 0.5 then
            dna.byte(byte) = parent1.dna.byte(byte)
         else
            dna.byte(byte) = parent2.dna.byte(byte)
         end
         byte = byte + 1
      wend
   next
End Sub

The final step in the core loop is Mutate. This calls a Mutate method for every Individual of the Population; some small percentage of Individuals respond by altering their DNA in some way, thus contributing to the overall creativity of the Population.

Individual.Mutate:

Function mutate() As boolean
   dim shouldMutate as boolean
   shouldMutate = app.r.number < mut_rate
   if not shouldMutate then
      return false
   end
   dim byte as integer
   dim i,u as integer
   u = dna_byte_length - 1
   for i = 0 to u
      byte = dna.byte(i)
      byte = bitwise.bitand(byte, app.r.lessThan(256))
      byte = bitwise.bitor(byte, app.r.lessThan(256))
      dna.byte(i) = byte
   next
   return true
End Function

The entire algorithm that I ultimately arrived at is packaged as a REALbasic 5.1 project that shows each generation’s best DNA and graphs that DNA’s fitness value, along with several other interesting statistics, as the generations progress. Feel free to download it, experiment with it, and improve upon it. It would be interesting to implement various other selection, reinsertion, replacement and mutation algorithms, as described in Pohlheim’s article. I also would have liked to make the fitness function editable by the user, but bugs in RBScript prevented that idea from bearing fruit for the present.

References
http://www.geatbx.com/docu/algindex-01.html#TopOfPage