Introduction

In The Blind Watchmaker, Richard Dawkins discusses a very simple genetic algorithm (GA) to evolve the phrase "METHINKS IT IS LIKE A WEASEL" starting from randomly generated strings. Despite the fact that Dr. Dawkins was very careful to point out the limitations of this simulation:

Although the monkey/Shakespeare model is useful for explaining the distinction between single-step selection and cumulative selection, it is misleading in important ways. One of these is that, in each generation of selective 'breeding', the mutant 'progeny' phrases were judged according to the criterion of resemblance to a distant ideal target, the phrase METHINKS IT IS LIKE A WEASEL. Life isn't like that. Evolution has no long-term goal. There is no long-distance target, no final perfection to serve as a criterion for selection, although human vanity cherishes the absurd notion that our species is the final goal of evolution. In real life, the criterion for selection is always short-term, either simple survival or, more generally, reproductive success.

Intelligent Design proponents and other creationists have made a lot of noise about how it doesn't prove anything about evolution. Several have made factually incorrect claims regarding the implementation of the simulation in their attempts to discredit it.

I've implemented a variant of Dr. Dawkins' simulation to make it clear that, while simplistic, the Weasel Program does demonstrate exactly what he says it does.

The Goal

The purpose of the Weasel Program is to show that cumulative selection can search a solution space much faster than single-step selection. The biological analogy, of course, is that differential reproductive success operating on small variations leads to significant changes in populations over generations.

A secondary goal of the Weasel Program is to address the common misconception of evolution as "random chance" promulgated by many creationists. The cumulative selection demonstrated by this simple simulation shows how just one of many observed mechanisms results in evolution.

Variations

My version of the Weasel Program uses the GA Engine I wrote to solve Dave Thomas' Steiner Network design challenge. This engine operates on bit strings, so I use bits instead of characters. Each character in the string is represented by the seven bits of its ASCII representation. This significantly increases the search space — there are 128 possible values for each character rather than the 27 used by Dr. Dawkins.

Another significant variation is that reproduction in this GA engine is stochastic. The most fit individual in a population is not guaranteed to be selected as a parent of the next generation population. In the tournament selection model used, each parent is generated by picking two genomes at random from the population and selecting the most fit of those two.

Finally, it is important to note that, contrary to the claims of some of Dr. Dawkins' detractors, no bits or characters are "locked in" at any time. All bits are subject to mutation. Any character may be disrupted either by mutation or crossover.

Implementation

As described in the GA engine description, applying the GA engine to a problem requires following a few simple steps:

  1. Create a class to represent the characteristics of the problem
  2. The problem class weasel-problem contains only the target string and the target string's bit vector genome.
  3. Implement a method to create instances of the problem class
  4. The make-weasel-problem function creates an instance of weasel-problem from a target string and ensures that the bit vector genome is also set.
  5. Implement the required generic functions for the problem:
    • genome-length
    • The genome length is simply the number of characters in the target string times seven.
    • fitness
    • The fitness of a genome is the number of bits that match the target genome. A higher value means the genome is more fit. A value equal to the length of the genome indicates the solution has been found.
    • fitness-comparator
    • This problem can use the greater-comparator provided by the GA engine.
  6. Implement a terminator function
  7. This problem can use the fitness-terminator provided by the GA engine.
  8. Run solve

Results

My first simulation used a population of 1000 and a mutation rate of 2%:

  (solve-weasel (make-weasel-problem "Methinks it is like a weasel") 1000 0.02)
The fitness of the best individual rapidly climbed to above 180. After hovering betweeen 190 and 194 for several dozen generations, an optimal genome was found at generation 458.

Summary

This variation of the Weasel Program searches a much larger solution space than the original. Dr. Dawkins' version used a set of 27 characters (the uppercase alphabet plus a space). There are 27^28 unique strings of the correct length using that set of characters, or roughly 10^40. When using 7 bits for each character, the number of unique strings of the correct length is 2^196, which is around 10^59. The power of differential reproductive success is shown by the fact that at most 229,500 different strings out of that enormous space were evaluated before the optimal solution was found.

Dr. Dawkins' Weasel Program does exactly what he claims: it demonstrates the power of cumulative search over single-step search. Further, it provides a gentle introduction to one of the important mechanisms of biological evolution, addressing the all too common misconception that evolution is based on "blind chance."

What the Weasel Program does not do, or claim to do, is demonstrate the capabilities of general purpose genetic algorithms. The use of an explicit target helps simplify the program, making it more useful as a teaching aid. The capabilities of more general GAs are better demonstrated in domains such as the Steiner Problem.

Source Code

The code is available in two files:

This requires the GA Engine code:

Please contact me by email if you have any questions, comments, or suggestions.

www.softwarematters.org