Multivariate distribution

In probability and statistics, a multivariate distribution is the distribution of a random vector , i.e. a random variable whose values are vectors im . In the two-dimensional case , one also speaks of a bivariate distribution . The joint probability distribution of a random vector is thus a probability in which measurable subsets assigns the probability that a value of supposed. In this context, the distributions of the individual components are called the marginal distributions of . Examples of multivariate distributions are the multinomial distribution or the multivariate normal distribution ; others can be found in the list of multivariate and matrix-variable probability distributions . ${\ displaystyle \ mathbb {R} ^ {n}}$ ${\ displaystyle n = 2}$ ${\ displaystyle X = (X_ {1}, \ dotsc, X_ {n})}$ ${\ displaystyle \ mathbb {R} ^ {n}}$ ${\ displaystyle A \ subseteq \ mathbb {R} ^ {n}}$ ${\ displaystyle X}$ ${\ displaystyle A}$ ${\ displaystyle X_ {i}}$ ${\ displaystyle X}$

Introductory example

We consider two random experiments:

Roll the dice twice with an ideal dice. This is equivalent to an urn experiment with six distinguishable balls, drawing twice with replacement . There are 36 possible pairs of outcomes (since we take into account the order in which the dice were rolled or the drawing was drawn), and all 36 possibilities are equally likely, i.e. have a probability of 1/36.
A similar urn experiment, but without replacing . In this case the results (1,1), (2,2),…, (6,6) do not occur, since the i-th ball cannot appear in the second draw if it was already taken out in the first draw . The remaining 30 pairs are equally likely and therefore have a probability of 1/30.

These two experiments result in two-dimensional discrete random variables and , which have the same marginal distributions (every number from 1 to 6 is equally likely in both experiments in both drawings and occurs with a probability of 1/6). ${\ displaystyle Z_ {1}}$ ${\ displaystyle Z_ {2}}$

However, the two draws are independent in the first experiment , since the drawn ball is moved back while they are not independent in the second experiment. This becomes clearest when one realizes that the pairs (1,1), (2,2), ..., (6,6) must occur in an independent experiment with a probability of 1/36 (product of the marginal probabilities 1 / 6), but they cannot occur at all in the second experiment (probability 0), since the ball is not moved back.

The distributions of and are therefore different; it is therefore an example of two different discrete multivariate distributions with the same marginal distributions. ${\ displaystyle Z_ {1}}$ ${\ displaystyle Z_ {2}}$

Two-dimensional distribution function

10000 samples of a distribution modeled with the Clayton copula (with ), in which the marginal distributions are one-dimensional standard normal distributions.

{\ displaystyle \ alpha = 2 {,} 88}

The distribution function of a two-dimensional random variable is defined as follows: ${\ displaystyle Z = (X, Y)}$

{\ displaystyle F_ {Z} (x, y) = P (X \ leq x, Y \ leq y).}

If the considered random variable Z has a (two-dimensional) density , then is the distribution function ${\ displaystyle f_ {X, Y}}$

{\ displaystyle F_ {Z} \ left (x, y \ right) = \ int _ {- \ infty} ^ {y} \ int _ {- \ infty} ^ {x} f_ {X, Y} \ left ( u, v \ right) \ mathrm {d} u \, \ mathrm {d} v}

.

If the random variable is discrete, then the common distribution using conditional probabilities can be written as:

{\ displaystyle {\ begin {aligned} \ mathrm {P} (X = x \ \ mathrm {and} \ Y = y) & {} = \ mathrm {P} (Y = y \ mid X = x) \ cdot \ mathrm {P} (X = x) \\ & {} = \ mathrm {P} (X = x \ mid Y = y) \ cdot \ mathrm {P} (Y = y) \ end {aligned}}}

and in the steady case accordingly

{\ displaystyle f_ {X, Y} (x, y) = f_ {Y | X} (y | x) f_ {X} (x) = f_ {X | Y} (x | y) f_ {Y} ( y) \;}

Here and are the conditional densities ( under the condition or of under the condition ) and the densities of the marginal distributions of and . ${\ displaystyle f_ {Y | X} (y | x)}$ ${\ displaystyle f_ {X | Y} (x | y)}$ ${\ displaystyle Y}$ ${\ displaystyle X = x}$ ${\ displaystyle X}$ ${\ displaystyle Y = y}$ ${\ displaystyle f_ {X} (x), f_ {Y} (y)}$ ${\ displaystyle X}$ ${\ displaystyle Y}$

The figure shows an example of the modeling of the dependency structure with the help of copulas . In particular, this is an example of how a bivariate random variable with normal marginal distributions not bivariate normal needs to be.

The general multidimensional case

If the n-dimensional random variable has a density, then the distribution function is analogous to the two-dimensional case ${\ displaystyle Z = (X_ {1}, \ dots, X_ {n})}$

{\ displaystyle F_ {Z} \ left (x_ {1}, \ dots, x_ {n} \ right) = \ int _ {- \ infty} ^ {x_ {n}} \ dots \ int _ {- \ infty } ^ {x_ {1}} f_ {X_ {1}, \ dots, X_ {n}} \ left (u_ {1}, \ dots, u_ {n} \ right) \ mathrm {d} u_ {1} \ dots \ mathrm {d} u_ {n}}

.

There are more possibilities for marginal distributions than in the two-dimensional case, since marginal distributions now exist for every lower dimension and one has options to select the subspace. For example, in the three-dimensional case there are 3 one-dimensional and 3 two-dimensional edge distributions. ${\ displaystyle 1 \ leq k <n}$ ${\ displaystyle {n \ choose k}}$

Joint distribution of independent random variables

If for discrete random variables for all x and y , or for continuous random variables for all x and y , then X and Y are independent . ${\ displaystyle \ P (X = x \ {\ text {and}} \ Y = y) = P (X = x) \ cdot P (Y = y)}$ ${\ displaystyle \ f_ {X, Y} (x, y) = f_ {X} (x) \ cdot f_ {Y} (y)}$

literature

KV Mardia, JT Kent, JM Bibby: Multivariate Analysis. Acad. Press, New York 1979, ISBN 0-12-471250-9 . (engl.)
Ludwig Fahrmeir, Alfred Hamerle (ed.): Multivariate statistical methods. de Gruyter, New York 1996, ISBN 3-11-008509-7 .
Joachim Hartung, Bärbel Elpelt: Multivariate Statistics. Oldenbourg, Munich / Vienna 1999, ISBN 3-486-25287-9 .