Mixed distribution

The term mixed distribution or composite distribution comes from the theory of probability . It is the probability distribution of the mixture of random variables from several different populations .

Introductory example

If, for example, one considers the characteristic of height in small children (first population) and adults (second population), this characteristic is mostly approximately normally distributed within each individual population , with the mean value for small children being significantly lower than for adults. The mixed distribution is now the distribution of body size if the two populations of small children and adults are not considered individually but together, i.e. the distribution of the body size of a person who is not known to be a small child or an adult.

Mathematically, in this example, the height of the toddlers is a random variable from one population and the height of the adults is a different random variable from the other population . The mixture of these two random variables is a further random variable that comes with a certain probability as the first population or with a certain probability as the other population . Since only these two populations are available, must apply. The probabilities and can also be interpreted as the relative proportions of the population and the common population, based on the example as the proportion of small children or adults in the total sample. The distribution of determined about the law of total probability to ${\ displaystyle X_ {1}}$${\ displaystyle G_ {1}}$${\ displaystyle X_ {2}}$${\ displaystyle G_ {2}}$${\ displaystyle X}$${\ displaystyle a_ {1}}$${\ displaystyle X_ {1}}$ ${\ displaystyle G_ {1}}$${\ displaystyle a_ {2}}$${\ displaystyle X_ {2}}$${\ displaystyle G_ {2}}$${\ displaystyle a_ {1} + a_ {2} = 1}$${\ displaystyle a_ {1}}$${\ displaystyle a_ {2}}$${\ displaystyle G_ {1}}$${\ displaystyle G_ {2}}$${\ displaystyle X}$

{\ displaystyle {\ begin {alignedat} {2} & P (X \ leq x) & {} = {} & P (X \ leq x | X {\ text {from}} G_ {1}) \ cdot a_ {1 } + P (X \ leq x | X {\ text {from}} G_ {2}) \ cdot a_ {2} \\ && {} = {} & P (X \ leq x | X = X_ {1}) \ cdot a_ {1} + P (X \ leq x | X = X_ {2}) \ cdot a_ {2} \\ && {} = {} & P (X_ {1} \ leq x) \ cdot a_ {1 } + P (X_ {2} \ leq x) \ cdot a_ {2} {\ text {;}} \ end {alignedat}}}

If and distribution functions and have, is the distribution function of so ${\ displaystyle X_ {1}}$${\ displaystyle X_ {2}}$ ${\ displaystyle F_ {1}}$${\ displaystyle F_ {2}}$${\ displaystyle F}$${\ displaystyle X}$

${\ displaystyle F (x) = F_ {1} (x) \ cdot a_ {1} + F_ {2} (x) \ cdot a_ {2}}$.

definition

The density function of a continuous random variable can be expressed as ${\ displaystyle X}$

${\ displaystyle f (x) = \ sum _ {k = 1} ^ {K} a_ {k} f_ {k} (x)}$

we say that it follows a mixed distribution. The density functions of continuous random variables and the probabilities are with ${\ displaystyle X}$${\ displaystyle f_ {k} (x)}$${\ displaystyle X_ {k}}$${\ displaystyle a_ {k}}$

${\ displaystyle \ sum _ {k = 1} ^ {K} a_ {k} = 1}$.

${\ displaystyle f}$is therefore a convex combination of the densities . ${\ displaystyle f_ {1}, \ ldots, f_ {K}}$

One can easily show that under these conditions is nonnegative and the normalization property ${\ displaystyle f}$

${\ displaystyle \ int _ {- \ infty} ^ {\ infty} f (x) \, \ mathrm {d} x = 1}$

is satisfied.

Accordingly, the probability function of a discrete mixed distribution results as

${\ displaystyle \ rho (x_ {i}) = \ sum _ {k = 1} ^ {K} a_ {k} \ rho _ {k} (x_ {i})}$

from the probability functions of discrete random variables . ${\ displaystyle \ rho _ {k}}$${\ displaystyle X_ {k}}$

properties

For the moments of : ${\ displaystyle X}$

${\ displaystyle \ operatorname {E} (X ^ {p}) = \ sum _ {k = 1} ^ {K} a_ {k} \, \ operatorname {E} (X_ {k} ^ {p}), ~ p \ in \ {1,2,3, \ dotsc \}.}$

This follows (in the continuous case) from

${\ displaystyle \ operatorname {E} (X ^ {p}) = \ int _ {- \ infty} ^ {\ infty} x ^ {p} f (x) \, \ mathrm {d} x = \ int _ {- \ infty} ^ {\ infty} x ^ {p} \ left (\ sum _ {k = 1} ^ {K} a_ {k} f_ {k} (x) \ right) \, \ mathrm {d } x = \ sum _ {k = 1} ^ {K} a_ {k} \ left (\ int _ {- \ infty} ^ {\ infty} x ^ {p} f_ {k} (x) \, \ mathrm {d} x \ right).}$

A similar calculation gives the formula for the discrete case.

Frequent special case: Gaussian mixed models

Example of a mixed distribution, calculated from a model with the parameters of three individual weighted Gaussian distributions using the EM algorithm (calculated with the R package mclust ).

A common special case of mixed distributions are so-called Gaussian mixture models ( gaussian mixture models , in short: GMM ). The density functions are those of the normal distribution with potentially different mean values ​​and standard deviations (or mean value vectors and covariance matrices in the -dimensional case). So it applies ${\ displaystyle f_ {1}, \ ldots, f_ {K}}$${\ displaystyle \ mu _ {1}, \ ldots, \ mu _ {K}}$${\ displaystyle \ sigma _ {1}, \ ldots, \ sigma _ {K}}$${\ displaystyle d}$

${\ displaystyle f_ {k} (x) = {\ mathcal {N}} \ left (\ mu _ {k}, \ Sigma _ {k} \ right) (x) = {\ frac {1} {\ left (2 \ pi \ right) ^ {\ frac {d} {2}} | \ Sigma _ {k} | ^ {\ frac {1} {2}}}} \ exp \ left (- {\ frac {1 } {2}} (x- \ mu _ {k}) \ Sigma _ {k} ^ {- 1} (x- \ mu _ {k}) \ right)}$

and the density of the mixed distribution has the form ${\ displaystyle f}$

${\ displaystyle f (x) = \ sum _ {k = 1} ^ {K} a_ {k} f_ {k} (x) = \ sum _ {k = 1} ^ {K} {\ frac {a_ { k}} {\ left (2 \ pi \ right) ^ {\ frac {d} {2}} | \ Sigma _ {k} | ^ {\ frac {1} {2}}}} \ exp \ left ( - {\ frac {1} {2}} (x- \ mu _ {k}) \ Sigma _ {k} ^ {- 1} (x- \ mu _ {k}) \ right)}$.

Parameter estimation

Estimators for the parameters of probability distributions are often derived using the maximum likelihood method . In the case of mixed distributions, however, this usually results in equations whose solutions cannot be given algebraically and must therefore be determined numerically. A typical method for this is the expectation maximization algorithm ( EM algorithm ), which, starting with initial values ​​for the parameters, generates a sequence of increasingly better estimated values, which in many cases approximate the real parameters .

example

Distribution of trout weight (g)

A trout farmer sells trout in bulk. An inventory is made in autumn when the ponds are emptied. The trout that have been fished out are weighed. The result is the distribution of the weight, as can be seen in the graphic. The two-peaked distribution indicates a mixed distribution. It turns out that the trout came from two different ponds. The trout weights from the first pond are normally distributed with the expected value 400 g and the variance 4900 g 2 and those from the second pond with the expected value 600 g and the variance 8100 g 2 . 40% of the trout come from the first pond and 60% from the second. The result is the density function (see figure). ${\ displaystyle f (x) = 0 {,} 4 \ cdot {\ frac {1} {70 \ cdot {\ sqrt {2 \ pi}}}} e ^ {- {\ frac {1} {2}} \ left ({\ frac {x-400} {70}} \ right) ^ {2}} + 0 {,} 6 \ cdot {\ frac {1} {90 \ cdot {\ sqrt {2 \ pi}} }} e ^ {- {\ frac {1} {2}} \ left ({\ frac {x-600} {90}} \ right) ^ {2}}}$