# Schematic Theorem

The **scheme** theorem according to John H. Holland deals with the convergence behavior of genetic algorithms . The theorem proves that individuals with above-average fitness are more likely to prevail.

## Derivation

The scheme theorem looks at the genome of an individual, usually a bit string that encodes values. First, the term **scheme** must be explained: A scheme is a bit pattern that represents a set of bit strings. A scheme consists of the characters *0, * *1* or *#* . The *#* character acts as a placeholder for a *0* or *1* .

For example, the scheme *# 101 #* denotes the following set of bit strings:

*01010, 01011, 11011, 11010* .

The *schema* theorem now calculates the expected value that a certain schema *H* "survives" from one generation to the next. For this purpose, the three central steps of a genetic algorithm are examined:

A population consisting of *N* binary genomes of length *1* at a point in time *t is* considered. The fitness function *f used* is normalized and defined for all bit strings of length *l* .

In the course of the derivation, the following *notations* are used:

Anzahl der Genome zum Zeitpunktt, die das SchemaHenthalten.

DerDurchmesserdes SchemasH, wird definiert als Länge der kürzesten Teilkette, die noch alle festen Bits des Schemas enthält. Z. B.d(#0101##1##)=7.

Steht für die Anzahl der festen Bits inH. Z. B.b(##0##11#)=3

### selection

Since the fitness is normalized, the following applies to the probability *p* that a certain parent chain is selected from a population:

Let us now, without restriction, be all those bit strings of the population at time *t* which contain
the scheme *H.*

The fitness of Scheme *H* is then defined by: .

The probability that a chain containing *H* is selected is thus:

For the probability that two parents chains, both *H* will contain selected, the following applies: .

### Recombination (crossover)

In the case of 1-point recombination, a separation point is first selected between the bit positions 1 and *l-1* . If both parents contain *H* , the daughter chain also contains this schema. If only one parent chain contains the schema, then *H is passed* on in half of the cases, if it is not severed during the crossover itself.

The likelihood that it won't cut is:

The following applies to the probability *W* that the scheme *H is* passed on in the crossover :

If scheme *H is* severed during the crossover , there is a possibility that the missing section is contained in the other parent chain at a suitable point. Hence the inequality. If a 2-point crossover is carried out, this only has an impact on the probability that the scheme will be severed increases.

### mutation

Let *p be* the mutation probability, that is, each bit of the new chain is negated with the probability *p* . This means that the schema *H* with fixed bits with probability remains.

If this effect is taken into account, the probability that a chain generated by the operators crossover and mutation contains the scheme *H* is:

With .

### Conclusion

If a total of *N* new chains are generated, then the following applies to the expected value of the number of chains that contain the scheme *H* at the point in time :

The last two formulas show that schemes with above-average fitness and small diameter are very likely to prevail. However, the probability of reproduction also increases with the frequency of a schema . That is, an average scheme can prevail within a population if it occurs often enough. This effect is called **genetic drift** .

The factor also deserves attention. A high mutation rate *p* causes an increased destruction of successful patterns. On the other hand, a certain mutation frequency is necessary in order to search the search area as comprehensively as possible. The search activity of the algorithm can thus be controlled by adjusting *p* .