# Hypergeometric distribution

Probability function of the hypergeometric distribution for .Red: ; blue: ; Green: .${\ displaystyle n = 20}$${\ displaystyle M = 20, N = 60}$${\ displaystyle M = 20, N = 30}$${\ displaystyle M = 50, N = 60}$

The hypergeometric distribution is a probability distribution in stochastics . It is univariate and is one of the discrete probability distributions . In contrast to the general hypergeometric distribution , it is also called the classic hypergeometric distribution .

Elements are randomly taken from a dichotomous population in a sample without replacement . The hypergeometric distribution then provides information about the probability with which a certain number of elements will occur in the sample that have the desired property. This distribution is therefore important in quality controls, for example . ${\ displaystyle n}$

The hypergeometric distribution is modeled on the urn model without replacement (see also variation without repetition ). In this context, an urn with two kinds of balls is specifically considered. Balls are removed without replacing. The random variable is the number of balls of the first kind in this sample. ${\ displaystyle n}$ ${\ displaystyle X}$

The hypergeometric distribution thus describes the probability that for given elements ("population of the circumference "), which have the desired property, exactly hits will be achieved when selecting specimens ("sample of the circumference ") . H. the probability of success in experiments. ${\ displaystyle N}$${\ displaystyle N}$${\ displaystyle M}$${\ displaystyle n}$${\ displaystyle n}$${\ displaystyle k}$${\ displaystyle X = k}$${\ displaystyle n}$

Example 1: There are 30 balls in an urn, 20 of which are blue, so 10 are not blue. What is the probability p of drawing exactly thirteen blue balls from a sample of twenty balls (without replacing)? Answer: p = 0.3096. This corresponds to the blue bar at k = 13 in the diagram "Probability function of the hypergeometric distribution for n = 20".

Example 2: There are 45 balls in an urn, 20 of which are yellow. What is the probability p of drawing exactly four yellow balls from a sample of ten balls? Answer: p = 0.269. The example is calculated below .

## definition

The hypergeometric distribution depends on three parameters:

• the number of elements in a population .${\ displaystyle N}$
• the number of elements with a certain property in this basic set (the number of possible successes).${\ displaystyle M \ leq N}$
• the number of elements in a sample.${\ displaystyle n \ leq N}$

The distribution now provides information on how likely it is that elements with the property to be checked (successes or hits) are in the sample. The result space is therefore . ${\ displaystyle k}$ ${\ displaystyle \ Omega}$${\ displaystyle \ {\ max \ {0, n + MN \}, \ dotsc, \ min \ {n, M \} \}}$

A discrete random variable is subject to the hypergeometric distribution with the parameters , and , if they are the probabilities${\ displaystyle X}$${\ displaystyle M}$${\ displaystyle N}$${\ displaystyle n}$

${\ displaystyle h (k | N; M; n): = P (X = k) = {\ frac {\ displaystyle {M \ choose k} {NM \ choose nk}} {\ displaystyle {N \ choose n} }}}$

for owns. The binomial coefficient denotes " over ". Then you write or . ${\ displaystyle k \ in \ Omega}$${\ displaystyle {\ tbinom {N} {n}}}$${\ displaystyle N}$${\ displaystyle n}$${\ displaystyle X \ sim Hyp_ {N, M, n}}$${\ displaystyle X \ sim H (N, M, n)}$

The distribution function then indicates the probability that at most elements with the property to be tested are in the sample. This cumulative probability is the sum${\ displaystyle H (k \ mid N; M; n)}$ ${\ displaystyle k}$

${\ displaystyle H (k | N; M; n): = P \ left (X \ leq k \ right) = \ sum _ {y = 0} ^ {k} h \ left (y \ mid N; M; n \ right) = \ sum _ {y = 0} ^ {k} {\ frac {\ displaystyle {M \ choose y} {\ displaystyle {NM} \ choose {ny}}} {\ displaystyle {N \ choose n }}}}$.

### Alternative parameterization

Occasionally it is also called a probability function

${\ displaystyle Hyp_ {B_ {1}, B_ {2}, n} (\ {k \}): = {\ frac {\ displaystyle {B_ {2} \ choose k} {B_ {1} \ choose nk} } {\ displaystyle {B_ {1} + B_ {2} \ choose n}}}}$

used. This goes over with and into the above variant. ${\ displaystyle N = B_ {1} + B_ {2}}$${\ displaystyle M = B_ {2}}$

## Properties of the hypergeometric distribution

### Symmetries

The following symmetries apply:

• Interchanging drawn balls and successes: ${\ displaystyle h (k | N; M; n) = h (k | N; n; M)}$
• Swap successes and failures: ${\ displaystyle h (k | N; M; n) = h (nk | N; NM; n)}$

### Expected value

The expected value of the hypergeometrically distributed random variable is ${\ displaystyle X}$

${\ displaystyle \ operatorname {E} (X) = \ sum _ {k = 0} ^ {n} k {\ frac {\ displaystyle {M \ choose k} {\ displaystyle {NM} \ choose {nk}}} {\ displaystyle {N \ choose n}}} = n {\ frac {M} {N}}}$.

### mode

The mode of hypergeometric distribution is

${\ displaystyle \ left \ lfloor {\ frac {(n + 1) (M + 1)} {N + 2}} \ right \ rfloor}$.

Here is the Gaussian bracket . ${\ displaystyle \ lfloor \ cdot \ rfloor}$

### Variance

The variance of the hypergeometrically distributed random variable is ${\ displaystyle X}$

${\ displaystyle \ operatorname {Var} (X) = \ sum _ {k = 0} ^ {n} k ^ {2} {\ frac {\ displaystyle {M \ choose k} {\ displaystyle {NM} \ choose { nk}}} {\ displaystyle {N \ choose n}}} - \ left (n {\ frac {M} {N}} \ right) ^ {2} = n \, {\ frac {M} {N} } \ left (1 - {\ frac {M} {N}} \ right) {\ frac {Nn} {N-1}}}$,

The last fraction is the so-called correction factor ( finiteness correction ) in the model without replacement.

### Crookedness

The skewness of the hypergeometric distribution is

${\ displaystyle \ operatorname {v} (X) = {\ frac {(N-2M) (N-1) ^ {\ frac {1} {2}} (N-2n)} {[nM (NM) ( Nn)] ^ {\ frac {1} {2}} (N-2)}}}$.

### Characteristic function

The characteristic function has the following form:

${\ displaystyle \ phi _ {X} (t) = {{{NM \ choose n} \, _ {2} F_ {1} (- n, -M; NM-n + 1; e ^ {it}) } \ over {N \ choose n}}}$

Where denotes the Gaussian hypergeometric function . ${\ displaystyle _ {2} F_ {1} (\ cdot; \ cdot; \ cdot)}$

### Moment generating function

The torque generating function can also be expressed using the hypergeometric function:

${\ displaystyle M_ {X} (t) = {\ frac {{NM \ choose n} \, _ {2} F_ {1} (- n, -M; NM-n + 1; e ^ {t}) } {N \ choose n}}}$

### Probability generating function

The probability generating function is given as

${\ displaystyle m_ {X} (t) = {\ frac {{NM \ choose n} \, _ {2} F_ {1} (- n, -M; NM-n + 1; t)} {N \ choose n}}}$

## Relationship to other distributions

### Relationship to the binomial distribution

In contrast to the binomial distribution , with the hypergeometric distribution the samples are not returned to the reservoir for further selection. If the size of the sample is relatively small (roughly ) compared to the size of the population , the probabilities calculated by the binomial distribution and the hypergeometric distribution do not differ significantly from one another. In these cases, the approximation is often carried out using the binomial distribution, which is mathematically easier to handle. ${\ displaystyle n}$${\ displaystyle N}$${\ displaystyle n / N <0 {,} 05}$

### Relationship to the Pólya distribution

The hypergeometric distribution is a special case of the Pólya distribution (choose ). ${\ displaystyle c = -1}$

### Relationship to the urn model

The hypergeometric distribution arises from the discrete uniform distribution through the urn model . From an urn with a total of balls are colored and balls are drawn. The hypergeometric distribution indicates the probability that colored balls will be drawn. Otherwise, the binomial distribution can also be used for modeling in practice. See also the example. ${\ displaystyle N}$${\ displaystyle M}$${\ displaystyle n}$${\ displaystyle \ max \ {0, n + MN \} \ leq k \ leq \ min \ {n, M \}}$${\ displaystyle k}$

### Relationship to the multivariate hypergeometric distribution

The multivariate hypergeometric distribution is a generalization of the hypergeometric distribution. It answers the question about the number of balls of one color drawn from an urn if the urn contains more than two different colors of balls. For two colors it agrees with the hypergeometric distribution.

## Examples

### Various examples

There are 45 balls in a container, 20 of which are yellow. 10 balls are removed without replacing.

The hypergeometric distribution indicates the probability that exactly x = 0, 1, 2, 3, ..., 10 of the balls removed are yellow.

An example of the practical application of hypergeometric distribution is the lottery : In the number lottery there are 49 numbered balls; 6 of these will be drawn at the draw; 6 numbers are crossed on the lottery ticket.

${\ displaystyle h (x | 49; 6; 6)}$indicates the probability of achieving exactly x = 0, 1, 2, 3, ..., 6 "hits".

### Detailed calculation example for the balls

For the example of the colored balls given above, the probability is to be determined that exactly 4 yellow balls result.

 Total number of balls ${\ displaystyle N = 45}$ Number with the property "yellow" ${\ displaystyle M = 20}$ Sample size ${\ displaystyle n = 10}$ Aimed for yellow ${\ displaystyle x = 4}$

So . ${\ displaystyle h (4 | 45,20,10)}$

The probability results from:

Number of possibilities to choose exactly 4 yellow (and therefore exactly 6 purple) balls
divided by
Number of ways to choose exactly 10 balls of any color

There are

${\ displaystyle {M \ choose x} = {20 \ choose 4} = 4 \, 845}$

Ways to pick exactly 4 yellow balls.

There are

${\ displaystyle {{NM} \ choose {nx}} = {25 \ choose 6} = 177 \, 100}$

Ways to pick exactly 6 purple balls.

Since every “yellow possibility” can be combined with every “violet possibility”, this results

${\ displaystyle {M \ choose x} \ cdot {{NM} \ choose {nx}} = 4 \. 845 \ cdot 177 \, 100 = 858 \, 049 \, 500}$

Possibilities for exactly 4 yellow and 6 purple balls.

There are altogether

${\ displaystyle {N \ choose n} = {45 \ choose 10} = 3 \, 190 \, 187 \, 286}$

Ways to draw 10 balls.

So we get the probability

${\ displaystyle P (X = 4) = h (4 | 45; 20; 10) = {\ frac {{20 \ choose 4} {25 \ choose 6}} {45 \ choose 10}} = {\ frac { 4 \, 845 \ cdot 177 \, 100} {3 \, 190 \, 187 \, 286}} \ approx 0 {,} 2690}$,

that is, in around 27 percent of cases, exactly 4 yellow (and 6 purple) balls are removed.

Alternatively, the result can also be found using the following equation

${\ displaystyle P (X = 4) = h (4 | 45; 10; 20) = {\ frac {{10 \ choose 4} {35 \ choose 16}} {45 \ choose 20}} \ approx 0 {, } 2690}$

There are 4 yellow balls in the sample . The remaining yellow balls (16) are in the 35 remaining balls that do not belong to the sample. ${\ displaystyle n = 10}$

## Numerical values ​​for the examples

h (x | 45; 20; 10)
x Number of possible
results
Probability
in%
0 3,268,760 0.1024
1 40,859,500 1.2807
2 205.499.250 6.4416
3 547,998,000 17.1776
4th 858.049.500 26.8965
5 823.727.520 25.8207
6th 490.314.000 15.3694
7th 178,296,000 5.5889
8th 37,791,000 1.1846
9 4,199,000 0.1316
10 184,756 0.0058
3.190.187.286 100.0000
Expected value 4.4444
Variance 1.9641
h (x | 45; 10; 20)
x Number of possible
results
Probability
in%
0 3,247,943,160 0.1024
1 40.599.289.500 1.2808
2 204.190.544.250 6.4416
3 544,508,118,000 17.1776
4th 852.585.079.500 26.8965
5 818.481.676.320 25.8207
6th 487.191.474.000 15.3694
7th 177.160.536.000 5.5889
8th 37,550,331,000 1.1846
9 4,172,259,000 0.1316
10 183.579.396 0.0058
11… 20 0 0
3,169,870,830,126 100.0000
Expected value 4.4444
Variance 1.9641
h (x | 49; 6; 6)
x Number of possible
results
Probability
in%
0 6.096.454 43.5965
1 5,775,588 41.3019
2 1,851,150 13.2378
3 246.820 1.765
4th 13,545 0.0969
5 258 0.0018
6th 1 0.0000072
13,983,816 100.0000
Expected value 0.7347
Variance 0.5776