Schematic Theorem

from Wikipedia, the free encyclopedia

The scheme theorem according to John H. Holland deals with the convergence behavior of genetic algorithms . The theorem proves that individuals with above-average fitness are more likely to prevail.

Derivation

The scheme theorem looks at the genome of an individual, usually a bit string that encodes values. First, the term scheme must be explained: A scheme is a bit pattern that represents a set of bit strings. A scheme consists of the characters 0, 1 or # . The # character acts as a placeholder for a 0 or 1 .

For example, the scheme # 101 # denotes the following set of bit strings:

01010, 01011, 11011, 11010 .

The schema theorem now calculates the expected value that a certain schema H "survives" from one generation to the next. For this purpose, the three central steps of a genetic algorithm are examined:

A population consisting of N binary genomes of length 1 at a point in time t is considered. The fitness function f used is normalized and defined for all bit strings of length l .

In the course of the derivation, the following notations are used:

     Anzahl der Genome zum Zeitpunkt t, die das Schema H enthalten.
     Der Durchmesser des Schemas H, wird definiert als Länge der kürzesten
     Teilkette, die noch alle festen Bits des Schemas enthält.
     Z. B. d(#0101##1##)=7.
     Steht für die Anzahl der festen Bits in H. Z. B. b(##0##11#)=3

selection

Since the fitness is normalized, the following applies to the probability p that a certain parent chain is selected from a population:

Let us now, without restriction, be all those bit strings of the population at time t which contain the scheme H.

The fitness of Scheme H is then defined by: .

The probability that a chain containing H is selected is thus:

For the probability that two parents chains, both H will contain selected, the following applies: .

Recombination (crossover)

In the case of 1-point recombination, a separation point is first selected between the bit positions 1 and l-1 . If both parents contain H , the daughter chain also contains this schema. If only one parent chain contains the schema, then H is passed on in half of the cases, if it is not severed during the crossover itself.

The likelihood that it won't cut is:

The following applies to the probability W that the scheme H is passed on in the crossover :

If scheme H is severed during the crossover , there is a possibility that the missing section is contained in the other parent chain at a suitable point. Hence the inequality. If a 2-point crossover is carried out, this only has an impact on the probability that the scheme will be severed increases.

mutation

Let p be the mutation probability, that is, each bit of the new chain is negated with the probability p . This means that the schema H with fixed bits with probability remains.

If this effect is taken into account, the probability that a chain generated by the operators crossover and mutation contains the scheme H is:

With .

Conclusion

If a total of N new chains are generated, then the following applies to the expected value of the number of chains that contain the scheme H at the point in time :

The last two formulas show that schemes with above-average fitness and small diameter are very likely to prevail. However, the probability of reproduction also increases with the frequency of a schema . That is, an average scheme can prevail within a population if it occurs often enough. This effect is called genetic drift .

The factor also deserves attention. A high mutation rate p causes an increased destruction of successful patterns. On the other hand, a certain mutation frequency is necessary in order to search the search area as comprehensively as possible. The search activity of the algorithm can thus be controlled by adjusting p .