# Hardy-Weinberg equilibrium

Hardy-Weinberg equilibrium for two alleles: the horizontal axis shows the two allele frequencies p and q , the vertical axis shows the genotype frequencies. The three possible genotypes are represented by different characters.

The Hardy-Weinberg balance (HWG) (after the mathematician G. H. Hardy and the doctor and genetic researcher Wilhelm Weinberg ) is a term used in population genetics .

To calculate this mathematical model, one assumes an ideal population that cannot be found in reality. This means that no evolution takes place, since no evolutionary factors take effect that could change the gene pool . In this case, for any genotype distribution of the parent generation, there is a genotype distribution of the first daughter generation which is dependent only on the allele frequencies and which no longer changes in the following generations. Mathematically, this so-called equilibrium is a fixed point of the function defined by the inheritance mechanism.

Despite its model character, the Hardy-Weinberg equilibrium is used to derive population-genetic aspects from the model to reality. This model can be applied realistically, especially with relatively large populations . The rule is also used to calculate the proportion of heterozygous individuals (here in the example: Aa) in dominant- recessive inheritance patterns, since heterozygous organisms cannot be phenotypically differentiated from homozygous dominant (here: AA) because the dominant allele prevails.

## history

### GH Hardy's contribution

GH Hardy

The Mendel's laws were rediscovered in 1900, but they were questioned for several more years, because you still could find no statement on how this could be a stable successor generation. Udny Yule argued against its use in 1902, believing that the dominant alleles would have to spread through the population over time. The US-American William Ernest Castle showed in 1903 that the genotypic frequencies remained stable without selection . Karl Pearson , known today for his contributions to statistics, found an equilibrium point in 1903 at p  =  q  = 0.5. The British geneticist Reginald Punnett , unable to refute Yule's reply, asked his cricket partner Godfrey Harold Hardy , a pure mathematician who actually despised applied mathematics . In 1908 Hardy published an article in which he explained the "very simple" problem (his words) in biologist terms.

Suppose that Aa is a pair of Mendelian characters, A being dominant, and that in any given generation the number of pure dominants ( AA ), heterozygotes ( Aa ), and pure recessives ( aa ) are as p: 2q: r. Finally, suppose that the numbers are fairly large, so that mating may be regarded as random, that the sexes are evenly distributed among the three varieties, and that all are equally fertile. A little mathematics of the multiplication-table type is enough to show that in the next generation the numbers will be as (p + q) 2 : 2 (p + q) (q + r) :( q + r) 2 , or as p 1 : 2q 1 : r 1 , say.
The interesting question is - in what circumstances will this distribution be the same as that in the generation before? It is easy to see that the condition for this is q 2 = pr. And since q 1 2 = p 1 r 1 , whatever the values ​​of p, q, and r may be, the distribution will in any case continue unchanged after the second generation.

"Let us assume that Aa is a pair of Mendelian characters, A is dominant, and that in a given generation the number of homozygous dominants ( AA ), heterozygotes ( Aa ) and pure recessives ( aa ) are as p: 2q: r behaves. Finally, suppose that the numbers are sufficiently large that the pairings can be viewed as random, that the gender ratio is evenly distributed between the 3 variants, and that all are equally fertile. A little math of the multiplication table type is enough to show that the following applies to the numbers in the next generation: (p + q) 2 : 2 (p + q) (q + r) :( q + r) 2 , or p 1 : 2q 1 : r 1 .
The interesting question is: under what circumstances does the distribution remain the same as in the previous generation? It is easy to see that the condition for this is q 2 = pr. And since q 1 2 = p 1 r 1 , regardless of which values ​​p, q, and r assume, it follows that the distribution will always remain the same after the second generation. "

- Godfrey Harold Hardy : Article in Science 1908

Thus this principle was known in the English-speaking world as "Hardy's Law".

### Wilhelm Weinberg's contribution

Also in 1908, the German doctor and genetics researcher Wilhelm Weinberg gave a scientific lecture in Stuttgart entitled “ About the evidence of heredity in humans ”. In it he stated:

“The relationship is quite different if one considers MENDEL's inheritance under the influence of panmixia. I am starting from the general assumption that originally there were m male and female pure representatives of type A and likewise n pure representatives of type B each. If these intersect at random, one obtains, symbolically applying the binomial theorem, as the composition of the daughter generation:

${\ displaystyle (mAA + nBB) ^ {2} = {\ frac {m ^ {2}} {(m + n) ^ {2}}} \, AA + {\ frac {2mn} {(m + n) ^ {2}}} \, AB + {\ frac {n ^ {2}} {(m + n) ^ {2}}} \, BB}$

or if is${\ displaystyle m + n = 1}$

${\ displaystyle m ^ {2} AA + 2mnAB + n ^ {2} BB}$.

If you now cross the male and female members of the 1st generation at random, you get the following frequency of the different cross combinations:

${\ displaystyle m ^ {2} m ^ {2} (AA}$ × ${\ displaystyle AA) = m ^ {4} AA}$
${\ displaystyle 4m ^ {2} mn (AA}$×     ( sic )${\ displaystyle AB) = 2m ^ {3} nAA + 2m ^ {3} nAB}$
${\ displaystyle m ^ {2} n ^ {2} (AA}$ × ${\ displaystyle BB) = 2m ^ {2} n ^ {2} AB}$
${\ displaystyle 4 (mn) ^ {2} (AB}$ × ${\ displaystyle AB) = m ^ {2} n ^ {2} AA + 2m ^ {2} n ^ {2} AB + m ^ {2} n ^ {2} BB}$
${\ displaystyle 4mnn ^ {2} (AB}$×     ( sic )${\ displaystyle BB) = 2mn ^ {3} AB + 2mn ^ {2} BB}$
${\ displaystyle n ^ {2} n ^ {2} (BB}$ × ${\ displaystyle BB) = n ^ {4} BB}$

or the relative frequency is for

${\ displaystyle AA: m ^ {2} (m + n) ^ {2}}$
${\ displaystyle AB: 2m (m + n) ^ {2} \, n}$
${\ displaystyle BB: (m + n) ^ {2} \, n ^ {2}}$

and the composition of the second daughter generation is again

${\ displaystyle m ^ {2} AA + 2mnAB + n ^ {2} BB}$.

So, under the influence of panmixia, we get the same distribution of pure types and hybrids for each generation and thus the possibility of calculating for each generation how the Representation of these types represents. "

- Wilhelm Weinberg : Lecture at the scientific evening in Stuttgart on January 13, 1908

Weinberg's work remained completely unknown in the Anglo-Saxon region until the German émigré Curt Stern drew attention to Weinberg's work in 1943. Since then, the population genetic law has borne the names of both men. Castle's name, which recognized the principle early on, is rarely added, but its formulation was not identical.

## Characteristics of an ideal population

• Very large number of individuals: The accidental loss of an individual or genetic drift practically does not change the frequency of the alleles, which would have a relatively large impact in a small population.
• Panmixie : All matings, including carriers of different genotypes , are equally likely and equally successful.
• There is no selection , so there are neither advantages nor disadvantages for the carriers of certain genes ( genotype ) that have a phenotypic effect.
• There are no mutations .
• There will be no increase or migration ( migration instead) that change the allele frequency.

The ideal population is a theoretical construct, since in reality at least one of the conditions, which are all evolutionary factors, is not fulfilled. Evolution always takes place when the above conditions do not apply.

## Calculation formula for 2 alleles

In the case in which only two different alleles P and Q exist with the relative frequencies ("allele frequencies") p and q , the formula for the Hardy-Weinberg equilibrium is:

${\ displaystyle (p + q) ^ {2} \, = p ^ {2} + 2pq + q ^ {2} = 1}$

There are:

${\ displaystyle p}$: Allele frequency of allele ${\ displaystyle P}$
${\ displaystyle q}$: Allele frequency of allele ${\ displaystyle Q}$

The notation p 2  + 2pq + q 2  = 1 is useful in a biological context. The following applies in the Hardy-Weinberg equilibrium:

${\ displaystyle p ^ {2}}$: Frequency of homozygotes with characteristic P
${\ displaystyle q ^ {2}}$: Frequency of homozygotes with characteristic Q
${\ displaystyle 2pq}$: Frequency of heterozygotes (characteristics P and Q)

Since the homozygote and heterozygote frequencies are usually experimentally determinable quantities, the corresponding allele frequencies can be calculated from them. Conversely, knowing an allele frequency can also calculate the number of expected heterozygotes and homozygotes.

### 1st example: phenylketonuria

The Phenylketonuria is a metabolic disease with autosomal recessive inheritance. In Germany (approx. 80 million inhabitants) there are roughly 8,000 people affected. This gives the homozygote frequency : ${\ displaystyle p ^ {2}}$

${\ displaystyle p ^ {2} = {\ frac {8,000} {80,000,000}} = 0 {,} 0001}$

With

${\ displaystyle p = {\ sqrt {p ^ {2}}} = 0 {,} 01}$

and

${\ displaystyle p + q = 1}$

follows

${\ displaystyle q = 1-p = 0 {,} 99}$

The following applies to the frequency of the heterozygotes : ${\ displaystyle 2pq}$

${\ displaystyle 2pq = 2 \ cdot (0 {,} 01 \ cdot 0 {,} 99) = 0 {,} 0198}$

Converted to the total population results in the absolute number of heterozygotes:

${\ displaystyle 0 {,} 0198 \ cdot 80,000,000 = 1,584,000}$

I.e. almost 1.6 million people (approx. 2% of the population, about one in 50 people) in Germany are heterozygous for the phenylketonuria allele that causes the disease.

With a very small value of p , one can say as a first approximation that q ≈ 1 and thus approximately 2pq ≈ 2p applies to the heterozygote frequency . In the example above, this estimate is 1.6 million.

### 2nd example: Huntington's disease

The Huntington's disease is an autosomal dominant inherited neurological disorder. Both the heterozygotes and the homozygotes are clinically ill. The incidence of the disease is given as 5: 100,000. The sick are composed of the homozygotes ( p 2 ) and the heterozygotes ( 2pq ) for the disease-causing allele p and the following applies:

${\ displaystyle p ^ {2} + 2pq = 0 {,} 00005}$

With

${\ displaystyle q = 1-p}$

you get:

${\ displaystyle p ^ {2} \ + 2p (1-p) = p ^ {2} + 2p-2p ^ {2} = - p ^ {2} + 2p = 0 {,} 00005}$

${\ displaystyle p ^ {2} -2p + 1 = (p-1) ^ {2} = 0 {,} 99995}$

The two solutions to this quadratic equation are:

${\ displaystyle p = 1 - {\ sqrt {0 {,} 99995}} = 0 {,} 000025 \ dots}$ and ${\ displaystyle p = 1 + {\ sqrt {0 {,} 99995}} = 1 {,} 999975 \ dots}$

The second solution does not make sense in a biological context, since p must always be less than or equal to 1, and can be rejected. For q we get:

${\ displaystyle q = 1-p = 0 {,} 999975 \ dots}$

The homozygote frequency is thus:

${\ displaystyle p ^ {2} = 0 {,} 000000000625 \ dots}$

That would correspond to one person to around 1.6 billion. In other words, it is very likely that all people suffering from Huntington's disease in Germany are heterozygous for the disease-causing allele. As a first approximation, 2pq ≈ 2p also applies here .

## Generalization for more than 2 alleles

The Hardy-Weinberg formula can easily be generalized for the case of more than 2 alleles. The following describes the case of 3 different alleles P, Q, R with allele frequencies ( p   q   r ). Then:

${\ displaystyle (p + q + r) ^ {2} = p ^ {2} + q ^ {2} + r ^ {2} + 2pq + 2pr + 2qr = 1 \,}$

There are:

${\ displaystyle p ^ {2}}$ : Frequency of homozygotes related to characteristic P
${\ displaystyle q ^ {2}}$ : Frequency of homozygotes related to characteristic Q
${\ displaystyle r ^ {2}}$ : Frequency of homozygotes related to trait R
${\ displaystyle 2pq}$ : Frequency of heterozygotes related to traits P and Q
${\ displaystyle 2qr}$ : Frequency of heterozygotes related to the characteristics Q and R
${\ displaystyle 2pr}$ : Frequency of heterozygotes related to traits P and R

Generalized to n alleles A 1 … A n with the relative frequencies p 1 ,…, p n then applies in the Hardy-Weinberg equilibrium:

${\ displaystyle (p_ {1} + \ cdots + p_ {n}) ^ {2} \, = \ sum _ {i = 1} ^ {n} \ sum _ {j = 1} ^ {n} p_ { i} p_ {j} \, = 1}$

with the respective homozygote frequencies of feature A i :

${\ displaystyle p_ {i} ^ {2} \,}$

and the heterozygote frequencies (characteristics A i and A j ):

${\ displaystyle 2p_ {i} p_ {j} \,}$.

### Example: AB0 blood group system (3 alleles)

Frequency of the AB0 blood groups in Germany
Frequencies of the underlying genotypes calculated from the blood group frequencies (with slight deviations due to rounding)

The alleles for blood groups A and B are codominant, while the allele for blood group 0 is recessive. If the frequencies of the genes for A, B and 0 in the gene pool are a, b and o (with a + b + o = 1), the following applies to the frequency of the blood groups (phenotypes):

${\ displaystyle a ^ {2} + 2ao}$ : Frequency of people with blood group A
${\ displaystyle b ^ {2} + 2bo}$ : Frequency of people with blood group B
${\ displaystyle 2ab}$ : Frequency of people with blood group AB
${\ displaystyle o ^ {2}}$ : Frequency of people with blood group 0

The blood group frequencies observed in Germany are: blood group A 43%, blood group 0 41%, blood group B 11% and blood group AB 5%.

This results in the following relationships (numerical values ​​rounded):
1. For the allele : ${\ displaystyle o}$

${\ displaystyle o = {\ sqrt {0 {,} 41}} = 0 {,} 64}$

2. For the allele : ${\ displaystyle a}$

${\ displaystyle a ^ {2} + 2ao = a ^ {2} +1 {,} 28a = 0 {,} 43}$
${\ displaystyle a ^ {2} +1 {,} 28a + 0 {,} 41 = 0 {,} 43 + 0 {,} 41}$
${\ displaystyle (a + 0 {,} 64) ^ {2} \, = 0 {,} 84}$
${\ displaystyle a = {\ sqrt {0 {,} 84}} - 0 {,} 64 = 0 {,} 28}$

3. For the allele : ${\ displaystyle b}$

${\ displaystyle b ^ {2} + 2bo = b ^ {2} +1 {,} 28b = 0 {,} 11}$
${\ displaystyle b ^ {2} +1 {,} 28b + 0 {,} 41 = 0 {,} 11 + 0 {,} 41}$
${\ displaystyle (b + 0 {,} 64) ^ {2} \, = 0 {,} 52}$
${\ displaystyle b = {\ sqrt {0 {,} 52}} - 0 {,} 64 = 0 {,} 08}$

The values ​​for the allele frequencies calculated from observed data correspond to a Hardy-Weinberg equilibrium:

${\ displaystyle a + b + o = 0 {,} 28 + 0 {,} 08 + 0 {,} 64 = 1}$

## Generalization for a polyploid set of chromosomes

The example calculations given above relate to a diploid set of chromosomes, such as B. exists in humans, in which alleles are always present twice (each chromosome is always present twice, the only exception being the sex chromosomes). However, many organisms have polyploid sets of chromosomes in which an allele is present more than twice. The Hardy-Weinberg formula can also be generalized to such cases.

### Two alleles in a polyploid chromosome set

${\ displaystyle (p + q) ^ {x} \, = 1}$

where x is the degree of ploidy (diploid: x = 2; triploid: x = 3; tetraploid: x = 4; etc.), and

${\ displaystyle p}$: Allele frequency of allele ${\ displaystyle P}$
${\ displaystyle q}$: Allele frequency of allele ${\ displaystyle Q}$

For a triploid chromosome set we get:

${\ displaystyle (p + q) ^ {3} \, = p ^ {3} + 3p ^ {2} q + 3q ^ {2} p + q ^ {3} = 1}$
Frequency of the genotypes in a triploid chromosome set
genotype frequency
${\ displaystyle PPP}$ ${\ displaystyle p ^ {3}}$
${\ displaystyle PPQ}$ ${\ displaystyle 3p ^ {2} q}$
${\ displaystyle PQQ}$ ${\ displaystyle 3q ^ {2} p}$
${\ displaystyle QQQ}$ ${\ displaystyle q ^ {3}}$

For a tetraploid set of chromosomes we get:

${\ displaystyle (p + q) ^ {4} \, = p ^ {4} + 3p ^ {3} q + 6p ^ {2} q ^ {2} + 3q ^ {3} p + q ^ {4 } = 1}$
Frequency of the genotypes in a tetraploid chromosome set
genotype frequency
${\ displaystyle PPPP}$ ${\ displaystyle p ^ {4}}$
${\ displaystyle PPPQ}$ ${\ displaystyle 3p ^ {3} q}$
${\ displaystyle PPQQ}$ ${\ displaystyle 6p ^ {2} q ^ {2}}$
${\ displaystyle QQQP}$ ${\ displaystyle 3q ^ {3} p}$
${\ displaystyle QQQQ}$ ${\ displaystyle q ^ {4}}$

### More than two alleles in a polyploid chromosome set

In the most general case, a polyploid set of chromosomes with ploidy degree x and n different alleles results :

${\ displaystyle (p_ {1} + \ cdots + p_ {n}) ^ {x} \, = \ sum _ {k_ {1} + k_ {2} + \ cdots + k_ {n} = x} {\ frac {x!} {k_ {1}! \, k_ {2}! \ cdots k_ {n}!}} \ prod _ {1 \ leq t \ leq n} p_ {t} ^ {k_ {t}} \ ,,}$

or.

${\ displaystyle (p_ {1} + \ cdots + p_ {n}) ^ {x} = \ sum _ {| k | = x} {x \ choose k} p ^ {k}}$

with the “multi-indices” k = (k 1 , k 2 ,…, k n ) and p k  =  p 1 k 1 p 2 k 2p n k n .

4. Sic in the writing, here should actually be × on the left .${\ displaystyle \ textstyle 2m ^ {2} mn (AA}$${\ displaystyle \ textstyle AB)}$
5. Sic in the writing, here should actually be on the right .${\ displaystyle 2mn ^ {3} AB + 2mn ^ {3} BB}$