Multi-dimensional normal distribution

Density of the bivariate normal distribution in three-dimensional space

The multidimensional or multivariate normal distribution is a multivariate distribution In the multivariate statistics . It represents a generalization of the (one-dimensional) normal distribution over several dimensions . A two-dimensional normal distribution is also called a bivariate normal distribution .

A multi-dimensional normal distribution is determined by two distribution parameters - the expected value vector and the covariance matrix , which correspond to the parameters ( expected value ) and ( variance ) of the one-dimensional normal distributions. ${\ displaystyle {\ boldsymbol {\ mu}}}$ ${\ displaystyle \ mathbf {\ Sigma}}$ ${\ displaystyle \ mu}$ ${\ displaystyle \ sigma ^ {2}}$

Multi-dimensional, normally distributed random variables appear as limit values of certain sums of independent multi-dimensional random variables. This is the generalization of the central limit theorem for the multi-dimensional central limit theorem .

Because they occur where multidimensional random quantities can be viewed as a superposition of many independent individual effects, they are of great importance in practice.

Due to the so-called reproductive property of the multi-dimensional normal distribution, the distribution of sums (and linear combinations ) of multi-dimensional normally distributed random variables can be specified in concrete terms.

The multidimensional normal distribution: regular case

10000 samples of a two-dimensional normal distribution with , and ρ = 0.7

{\ displaystyle \ sigma _ {1} = 1}

{\ displaystyle \ sigma _ {2} = 2}

A -dimensional real random variable is multidimensionally normally distributed with an expected value vector and (positively definite, i.e. regular ) covariance matrix if it is a density function of the form ${\ displaystyle p}$ ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle {\ boldsymbol {\ mu}}}$ ${\ displaystyle \ mathbf {\ Sigma}}$

{\ displaystyle f_ {X} (\ mathbf {x}) = {\ frac {1} {\ sqrt {(2 \ pi) ^ {p} \ det (\ mathbf {\ Sigma})}}} \ exp \ left (- {\ frac {1} {2}} ({\ mathbf {x}} - {\ boldsymbol {\ mu}}) ^ {\ top} {\ mathbf {\ Sigma}} ^ {- 1} ( {\ mathbf {x}} - {\ boldsymbol {\ mu}}) \ right)}

owns. One writes

{\ displaystyle \ mathbf {X} \ sim {\ mathcal {N}} _ {p} ({\ boldsymbol {\ mu}}, \ mathbf {\ Sigma})}

.

The subscript is the dimension of the -dimensional normal distribution and indicates the number of variables; i.e., is , is and is . There is no closed form for the associated distribution function . The corresponding integrals must be calculated numerically. ${\ displaystyle p}$ ${\ displaystyle p}$ ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle p \ times 1}$ ${\ displaystyle {\ boldsymbol {\ mu}}}$ ${\ displaystyle p \ times 1}$ ${\ displaystyle \ mathbf {\ Sigma}}$ ${\ displaystyle p \ times p}$ ${\ displaystyle F}$

The value in the exponential part of the density function corresponds to the Mahalanobis distance , which represents the distance from the test point to the mean value . In comparison with the density function of the one-dimensional normal distribution, the covariance matrix plays the role of the scalar unknown variance in the multi-dimensional normal distribution . ${\ displaystyle ({\ mathbf {x}} - {\ boldsymbol {\ mu}}) ^ {\ mathrm {T}} {\ mathbf {\ Sigma}} ^ {- 1} ({\ mathbf {x}} - {\ boldsymbol {\ mu}})}$ ${\ displaystyle {\ mathbf {x}}}$ ${\ displaystyle {\ boldsymbol {\ mu}}}$ ${\ displaystyle \ mathbf {\ Sigma}}$ ${\ displaystyle \ sigma ^ {2}}$

properties

The multidimensional normal distribution has the following properties:

If the components of are uncorrelated in pairs, they are also stochastically independent . ${\ displaystyle \ mathbf {X}}$

The affine transformation with a matrix (with ), and is normally distributed -dimensional: . According to the definition given here, this only applies if is non-singular, i.e. has a non-vanishing determinant. ${\ displaystyle \ mathbf {Y} = \ mathbf {a} + \ mathbf {B} \, \ mathbf {X}}$ ${\ displaystyle \ mathbf {B} \ in \ mathbb {R} ^ {q \ times p}}$ ${\ displaystyle q \ leq p}$ ${\ displaystyle \ mathbf {a} \ in \ mathbb {R} ^ {q}}$ ${\ displaystyle q}$ ${\ displaystyle \ mathbf {Y} \ sim {\ mathcal {N}} _ {q} (\ mathbf {a} + \ mathbf {B} {\ varvec {\ mu}}, \ mathbf {B} \ mathbf { \ Sigma} \ mathbf {B} ^ {\ top})}$ ${\ displaystyle \ mathbf {B} \ mathbf {\ Sigma} \ mathbf {B} ^ {\ top} \,}$

The affine transformation

{\ displaystyle \ mathbf {Y} = \ mathbf {\ Sigma} ^ {- {\ frac {1} {2}}} (\ mathbf {X} - {\ boldsymbol {\ mu}})}

standardizes the random vector : it is (with identity matrix ).

{\ displaystyle \ mathbf {X} \,}

{\ displaystyle \ mathbf {Y} \ sim {\ mathcal {N}} _ {p} ({\ boldsymbol {0}}, \ mathbf {I}) \,}

{\ displaystyle \ mathbf {I}}

Conditional distribution with partial knowledge of the random vector: If a multidimensional, normally distributed random vector is conditional on a sub-vector, the result itself is again multidimensional, normally distributed, for

{\ displaystyle \ mathbf {X} = {\ binom {\ mathbf {X} _ {1}} {\ mathbf {X} _ {2}}} \ sim {\ mathcal {N}} \ left ({\ binom {{\ boldsymbol {\ mu}} _ {1}} {{\ boldsymbol {\ mu}} _ {2}}}, {\ begin {pmatrix} {\ mathbf {\ Sigma}} _ {11} & { \ mathbf {\ Sigma}} _ {12} \\ {\ mathbf {\ Sigma}} _ {21} & {\ mathbf {\ Sigma}} _ {22} \ end {pmatrix}} \ right)}

applies

{\ displaystyle \ mathbf {X} _ {1} \ mid \ mathbf {X} _ {2} \ sim {\ mathcal {N}} \ left ({\ boldsymbol {\ mu}} _ {1} + \ mathbf {\ Sigma} _ {12} \ mathbf {\ Sigma} _ {22} ^ {- 1} (\ mathbf {X} _ {2} - {\ boldsymbol {\ mu}} _ {2}), \ mathbf {\ Sigma} _ {11} - \ mathbf {\ Sigma} _ {12} \ mathbf {\ Sigma} _ {22} ^ {- 1} \ mathbf {\ Sigma} _ {21} \ right)}

,

in particular, the expected value depends linearly on the value of and the covariance matrix is independent of the value of .

{\ displaystyle \ mathbf {X} _ {2}}

{\ displaystyle \ mathbf {X} _ {2}}

The multidimensional normal distribution: general case

If the covariance matrix is singular, you can not invert, then there is no density in the form given above. At the same time, one can also define the multidimensional normal distribution, but now using the characteristic function . ${\ displaystyle \ mathbf {\ Sigma}}$ ${\ displaystyle \ mathbf {\ Sigma}}$

A -dimensional real random variable is called normally distributed with an expected value vector and (positive semidefinite, i.e. not necessarily regular) covariance matrix if it has a characteristic function of the following form: ${\ displaystyle p}$ ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle {\ boldsymbol {\ mu}}}$ ${\ displaystyle \ mathbf {\ Sigma}}$

{\ displaystyle \ phi (\ mathbf {t}) = \ exp (i \ mathbf {t} ^ {\ top} {\ boldsymbol {\ mu}} - {\ frac {1} {2}} \ mathbf {t } ^ {\ top} \ mathbf {\ Sigma} \ mathbf {t}), \ quad \ mathbf {t} \ in \ mathbb {R} ^ {p}}

.

If is regular, there is a probability density in the above form, if is singular, then in -dimensional space there is no density with respect to the Lebesgue measure . Suppose , then there is a -dimensional linear form , where is a -Matrix, which satisfies a -dimensional normal distribution with an existing density im . ${\ displaystyle \ mathbf {\ Sigma}}$ ${\ displaystyle \ mathbf {\ Sigma}}$ ${\ displaystyle p}$ ${\ displaystyle \ mathbb {R} ^ {p}}$ ${\ displaystyle \ operatorname {rank} \ mathbf {\ Sigma} = q <p}$ ${\ displaystyle q}$ ${\ displaystyle \ mathbf {Y} = \ mathbf {A} \ mathbf {X}}$ ${\ displaystyle \ mathbf {A}}$ ${\ displaystyle (q \ times p)}$ ${\ displaystyle q}$ ${\ displaystyle \ mathbb {R} ^ {q}}$

The marginal distribution of the multidimensional normal distribution

Bivariate normal distribution with marginal distributions

Let be multidimensional normally distributed. For any partition with and , , holds that the marginal distributions and (multi-dimensional) normal distributions are. ${\ displaystyle \ mathbf {X} \ sim {\ mathcal {N}} _ {n} ({\ boldsymbol {\ mu}}, \ mathbf {\ Sigma})}$ ${\ displaystyle \ mathbf {X} =: (\ mathbf {X} _ {1}, \ mathbf {X} _ {2})}$ ${\ displaystyle \ mathbf {X} _ {1} \ in \ mathbb {R} ^ {k}}$ ${\ displaystyle \ mathbf {X} _ {2} \ in \ mathbb {R} ^ {nk}}$ ${\ displaystyle k <n}$ ${\ displaystyle \ mathrm {P} _ {\ mathbf {X} _ {1}}}$ ${\ displaystyle \ mathrm {P} _ {\ mathbf {X} _ {2}}}$

However, the reverse does not apply, as the following example shows:

Be and be defined by ${\ displaystyle \ mathbf {X} _ {1} \ sim {\ mathcal {N}} _ {k} (\ mathbf {0}, \ mathbf {I} _ {k})}$ ${\ displaystyle \ mathbf {X} _ {2}}$

{\ displaystyle \ mathbf {X} _ {2}: = \ left \ {{\ begin {array} {rl} \ mathbf {X} _ {1} & {\ text {with probability}} p_ {1} \ \ - \ mathbf {X} _ {1} & {\ text {with probability}} p_ {2}, \ end {array}} \ right.}

whereby . Then is also and ${\ displaystyle p_ {1} + p_ {2} = 1}$ ${\ displaystyle \ mathbf {X} _ {2} \ sim {\ mathcal {N}} _ {k} (\ mathbf {0}, \ mathbf {I} _ {k})}$

{\ displaystyle \ operatorname {Cov} \ left (\ mathbf {X} _ {1}, \ mathbf {X} _ {2} \ right) = \ left ({\ begin {array} {cc} \ mathbf {I } _ {k} & \ left (p_ {1} -p_ {2} \ right) \ mathbf {I} _ {k} \\\ left (p_ {1} -p_ {2} \ right) \ mathbf { I} _ {k} & \ mathbf {I} _ {k} \ end {array}} \ right)}

.

Accordingly, the covariance (and thus the correlation coefficient ) of and is zero if and only if . From the non-correlation of two random variables , and it would be normally distributed multidimensional case immediately following independence (special feature of the multivariate normal distribution), but since and , by definition, are not independent ( always the same ), may in particular not be normally distributed multidimensional. ${\ displaystyle \ mathbf {X} _ {1}}$ ${\ displaystyle \ mathbf {X} _ {2}}$ ${\ displaystyle p_ {1} = p_ {2} = {\ tfrac {1} {2}}}$ ${\ displaystyle \ mathbf {X} _ {1}}$ ${\ displaystyle \ mathbf {X} _ {2}}$ ${\ displaystyle \ mathbf {X} _ {1}}$ ${\ displaystyle \ mathbf {X} _ {2}}$ ${\ displaystyle \ mathbf {X} _ {2}}$ ${\ displaystyle \ pm \ mathbf {X} _ {1}}$ ${\ displaystyle \ mathbf {X}: = (\ mathbf {X} _ {1}, \ mathbf {X} _ {2})}$

The p -dimensional standard normal distribution

The measure of probability based on that given by the density function ${\ displaystyle \ mathbb {R} ^ {p}}$

{\ displaystyle f_ {X}: \ mathbb {R} ^ {p} \ to \ mathbb {R}, \, (x_ {1}, \ ldots, x_ {p}) \ mapsto {\ frac {1} { \ sqrt {(2 \ pi) ^ {p}}}} \ exp \ left (- {\ frac {1} {2}} \ sum _ {i = 1} ^ {p} x_ {i} ^ {2 } \ right)}

is defined is called the standard normal distribution of the dimension . The dimensional standard normal distribution is apart from translations (d. H. Expectation value ) is the only multidimensional distribution whose components are stochastically independent and their density also rotationally symmetrical is. ${\ displaystyle p}$ ${\ displaystyle p}$ ${\ displaystyle \ mu \ neq 0}$

Moments and accumulators

As in the one-dimensional case, all moments of the multi-dimensional normal distribution are defined by the first two moments. All cumulants except the first two are zero. The first two cumulants are the expected value vector and the covariance matrix . With regard to the multidimensional moment problem , the normal distribution has the property that it is uniquely defined by its moments. That is, if all the moments of a multidimensional probability distribution exist and correspond to the moments of a multidimensional normal distribution, the distribution is the unambiguous multidimensional normal distribution with these moments. ${\ displaystyle {\ boldsymbol {\ mu}}}$ ${\ displaystyle \ mathbf {\ Sigma}}$

Density of the two-dimensional normal distribution

The density function of the two-dimensional normal distribution with mean values and and correlation coefficient is ${\ displaystyle \ mu _ {1} = \ mu _ {2} = 0}$ ${\ displaystyle \ sigma _ {1} ^ {2} = \ sigma _ {2} ^ {2} = 1}$ ${\ displaystyle \ varrho}$

{\ displaystyle f_ {X} (x_ {1}, x_ {2}) = {\ frac {1} {2 \ pi {\ sqrt {1- \ varrho ^ {2}}}}} \ exp \ left ( - {\ frac {1} {2 (1- \ varrho ^ {2})}} (x_ {1} ^ {2} -2 \ varrho \, x_ {1} x_ {2} + x_ {2} ^ {2}) \ right).}

10,000 samples each of two-dimensional normal distributions with ρ = −0.8, 0, 0.8 (all variances are 1).

In the two-dimensional case with mean values and any variances, the density function is ${\ displaystyle \ mu _ {1} = \ mu _ {2} = 0}$

{\ displaystyle f_ {X} (x_ {1}, x_ {2}) = {\ frac {1} {2 \ pi \ sigma _ {1} \ sigma _ {2} {\ sqrt {1- \ varrho ^ {2}}}}} \, \ exp \ left (- {\ frac {1} {2 (1- \ varrho ^ {2})}} \ left [{\ frac {x_ {1} ^ {2} } {\ sigma _ {1} ^ {2}}} + {\ frac {x_ {2} ^ {2}} {\ sigma _ {2} ^ {2}}} - {\ frac {2 \ varrho x_ {1} x_ {2}} {\ sigma _ {1} \ sigma _ {2}}} \ right] \ right).}

The general case with arbitrary means and variances is obtained through translation (replace through and through ) ${\ displaystyle x_ {1}}$ ${\ displaystyle x_ {1} - \ mu _ {1}}$ ${\ displaystyle x_ {2}}$ ${\ displaystyle x_ {2} - \ mu _ {2}}$

{\ displaystyle f_ {X} (x_ {1}, x_ {2}) = {\ frac {1} {2 \ pi \ sigma _ {1} \ sigma _ {2} {\ sqrt {1- \ varrho ^ {2}}}}} \, \ exp \ left (- {\ frac {1} {2 (1- \ varrho ^ {2})}} \ left [{\ frac {\ left (x_ {1} - \ mu _ {1} \ right) ^ {2}} {\ sigma _ {1} ^ {2}}} + {\ frac {\ left (x_ {2} - \ mu _ {2} \ right) ^ {2}} {\ sigma _ {2} ^ {2}}} - {\ frac {2 \ varrho \ left (x_ {1} - \ mu _ {1} \ right) \ left (x_ {2} - \ mu _ {2} \ right)} {\ sigma _ {1} \ sigma _ {2}}} \ right] \ right).}

Example of a multidimensional normal distribution

An apple tree plantation is considered with a large number of apple trees of the same age, i.e. comparable ones. One is interested in the characteristics of the size of the apple trees, the number of leaves and the yields. So the random variables are defined:

{\ displaystyle X_ {1}}

: Height of a tree [m]; : Yield [100 kg]; : Number of sheets [1000 pieces].

{\ displaystyle X_ {2}}

{\ displaystyle X_ {3}}

The variables are each normally distributed like

{\ displaystyle X_ {1} \ sim {\ mathcal {N}} (4; 1); \ quad X_ {2} \ sim {\ mathcal {N}} (20; 100); \ quad X_ {3} \ sim {\ mathcal {N}} (20; 225)}

.

Most trees are therefore large, very small or very large trees are rather rare. A large tree tends to have a higher yield than a small tree, but of course there will be a large tree with little yield every now and then. Yield and size are correlated, the covariance is and the correlation coefficient . ${\ displaystyle 4 \ pm 1 \, \ operatorname {m}}$ ${\ displaystyle \ operatorname {Cov} (X_ {1}, X_ {2}) = 9}$ ${\ displaystyle \ varrho _ {12} = 0 {,} 9}$

It is the same with the correlation coefficient , and with the correlation coefficient . ${\ displaystyle \ operatorname {Cov} (X_ {1}, X_ {3}) = 12 {,} 75}$ ${\ displaystyle \ varrho _ {13} = 0 {,} 85}$ ${\ displaystyle \ operatorname {Cov} (X_ {2}, X_ {3}) = 120}$ ${\ displaystyle \ varrho _ {23} = 0 {,} 8}$

If the three random variables are combined in the random vector , the distribution is multidimensional. However, this does not apply in general (cf. the marginal distribution of the multidimensional normal distribution ). In the present case then applies to the joint distribution of ${\ displaystyle \ mathbf {X}: = (X_ {1}, X_ {2}, X_ {3})}$ ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle \ mathbf {X}}$

{\ displaystyle {\ boldsymbol {\ mu}} = {\ begin {pmatrix} 4 \\ 20 \\ 20 \ end {pmatrix}}}

and

{\ displaystyle \ mathbf {\ Sigma} = {\ begin {pmatrix} 1 & 9 & 12 {,} 75 \\ 9 & 100 & 120 \\ 12 {,} 75 & 120 & 225 \ end {pmatrix}}.}

The corresponding sample correlation matrix is

{\ displaystyle \ mathbf {R} = {\ begin {pmatrix} 1 & 0 {,} 9 & 0 {,} 85 \\ 0 {,} 9 & 1 & 0 {,} 8 \\ 0 {,} 85 & 0 {,} 8 & 1 \ end {pmatrix }}.}

Estimation of the parameters of the multidimensional normal distribution

In reality, the distribution parameters of a -dimensional normal distribution will usually not be known. So these parameters need to be estimated . ${\ displaystyle K}$

You take a sample of the size . Every realization of the random vector could be understood as a point in a -dimensional hyperspace. This gives the one matrix (test plan or data matrix ): ${\ displaystyle T}$ ${\ displaystyle t \ in \ {1, \ ldots, T \}}$ ${\ displaystyle \ mathbf {x}}$ ${\ displaystyle K}$ ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle T \ times K}$

{\ displaystyle \ mathbf {X} = {\ begin {pmatrix} x_ {11} & x_ {12} & \ cdots & x_ {1k} & \ cdots & x_ {1K} \\ x_ {21} & x_ {22} & \ cdots & x_ {2k} & \ cdots & x_ {2K} \\\ vdots & \ vdots & \ ddots & \ vdots & \ ddots & \ vdots \\ x_ {t1} & x_ {t2} & \ cdots & x_ {tk} & \ cdots & x_ {tK} \\\ vdots & \ vdots & \ ddots & \ vdots & \ ddots & \ vdots \\ x_ {T1} & x_ {T2} & \ cdots & x_ {Tk} & \ cdots & x_ {TK} \ end { pmatrix}} _ {(T \ times K)} = {\ begin {pmatrix} \ \ mathbf {x} _ {1} ^ {\ top} \\\ \ mathbf {x} _ {2} ^ {\ top } \\\ vdots \\\ \ mathbf {x} _ {t} ^ {\ top} \\\ vdots \\\\\ mathbf {x} _ {T} ^ {\ top} \ end {pmatrix}} _ {(T \ times K)} = {\ begin {pmatrix} \ mathbf {x} _ {(1)} \ mathbf {x} _ {(2)} & \ cdots & \ mathbf {x} _ {( k)} & \ cdots & \ mathbf {x} _ {(K)} \ end {pmatrix}} _ {(T \ times K)} \ quad}

, in which

{\ displaystyle \ quad \ mathbf {x} _ {(1)} \ equiv 1 \! \! 1_ {T} = {\ begin {pmatrix} 1 \\ 1 \\\ vdots \\ 1 \\\ vdots \ \ 1 \ end {pmatrix}} _ {(T \ times 1)}}

which contains the coordinates of a point in each line (see multiple linear model in matrix notation ).

The expected value vector is estimated by the mean value vector of the arithmetic mean values of the columns of ${\ displaystyle K}$ ${\ displaystyle \ mathbf {X}}$

{\ displaystyle {\ widehat {\ operatorname {E} (\ mathbf {x})}} = {\ overline {\ mathbf {x}}} = {\ begin {pmatrix} {\ overline {x}} _ {1 } \\ {\ overline {x}} _ {2} \\\ vdots \\ {\ overline {x}} _ {k} \\\ vdots \\ {\ overline {x}} _ {K} \ end {pmatrix}}}

with the components

{\ displaystyle {\ overline {x}} _ {k} = {\ frac {1} {T}} \ sum _ {t = 1} ^ {T} x_ {tk}}

.

This estimator is the best unbiased estimator for the expected value vector with regard to the mean square deviation . However, it is deemed not to be permissible in the sense of decision theory . Then there are better estimators, e.g. B. the James Stein estimator . ${\ displaystyle K> 2}$

The data matrix centered with respect to the arithmetic mean values proves useful for estimating the covariance matrix . It is calculated as ${\ displaystyle \ mathbf {X} ^ {*}}$

{\ displaystyle \ mathbf {X} ^ {*} = \ mathbf {X} - \ mathbf {X} _ {(1)} = 1 \! \! 1_ {T} \ cdot {\ overline {\ mathbf {x }}} ^ {\ top}}

,

with the elements , where the one vector represents a column vector of length with all ones. The arithmetic mean of the associated column is subtracted from all entries. ${\ displaystyle x_ {tk} ^ {*}}$ ${\ displaystyle \ mathbf {x} _ {(1)} = 1 \! \! 1_ {T}}$ ${\ displaystyle T}$

The covariance matrix has the estimated components ${\ displaystyle (K \ times K)}$

{\ displaystyle s_ {jk} = {\ widehat {\ operatorname {Cov}}} (\ mathbf {X} _ {J}, \ mathbf {X} _ {K}) = {\ frac {1} {T- 1}} \ sum _ {t = 1} ^ {\ top} x_ {tj} ^ {*} x_ {tk} ^ {*}}

.

It turns out to be

{\ displaystyle {\ hat {\ mathbf {\ Sigma}}} = \ mathbf {S} = {\ frac {1} {T-1}} \ mathbf {X} ^ {* \ top} \ mathbf {X} ^ {*}}

.

The population correlation matrix is estimated by the pairwise correlation coefficients ${\ displaystyle \ mathbf {P}}$

{\ displaystyle r_ {jk} = {\ frac {\ sum \ limits _ {t = 1} ^ {\ top} x_ {tj} ^ {*} x_ {tk} ^ {*}} {{\ sqrt {\ sum \ limits _ {t = 1} ^ {\ top} {x_ {tj} ^ {*}} ^ {2}}} {\ sqrt {\ sum \ limits _ {t = 1} ^ {\ top} { x_ {tk} ^ {*}} ^ {2}}}}}}

,

on their main diagonal there are ones.

Sample to sample

10 apple trees were randomly selected and 3 properties were measured in each case:: Height of a tree [m]; : Yield [100 kg]; : Number of sheets [1000 pieces]. These observations are summarized in the data matrix : ${\ displaystyle X_ {1}}$ ${\ displaystyle X_ {2}}$ ${\ displaystyle X_ {3}}$ ${\ displaystyle 10}$ ${\ displaystyle \ mathbf {X}}$

{\ displaystyle \ mathbf {X} = {\ begin {pmatrix} 3 {,} 3 & 24 & 27 \\ 4 {,} 9 & 41 & 55 \\ 5 {,} 9 & 46 & 52 \\ 5 {,} 2 & 49 & 54 \\ 3 {,} 6 & 29 & 34 \\ 4 {,} 2 & 33 & 51 \\ 5 {,} 0 & 42 & 43 \\ 5 {,} 1 & 35 & 54 \\ 6 {,} 8 & 60 & 70 \\ 5 {,} 0 & 41 & 50 \ end {pmatrix}}}

.

The mean values are calculated, as shown by way of example , as ${\ displaystyle {\ overline {x}} _ {1}}$

{\ displaystyle {\ overline {x}} _ {1} = {\ frac {1} {10}} (3 {,} 3 + 4 {,} 9+ \ ldots +5 {,} 0) = 4 { ,} 9}

.

They give the mean value vector

{\ displaystyle {\ overline {\ mathbf {x}}} = {\ begin {pmatrix} 4 {,} 9 \\ 40 \\ 49 \ end {pmatrix}}}

.

The centered observations are obtained for the centered data matrix by subtracting the corresponding mean value from the columns: ${\ displaystyle \ mathbf {X} ^ {*}}$

{\ displaystyle {\ begin {array} {lll} 3 {,} 3-4 {,} 9 = -1 {,} 6 & 24-40 = -16 & 27-49 = -22 \\ 4 {,} 9-4 { ,} 9 = 0 & 41-40 = 1 & 55-49 = 6 \\\ vdots \ end {array}}}

,

so

{\ displaystyle \ mathbf {X} ^ {*} = {\ begin {pmatrix} -1 {,} 6 & -16 & -22 \\ 0 {,} 0 & 1 & 6 \\ 1 {,} 0 & 6 & 3 \\ 0 {,} 3 & 9 & 5 \\ - 1 {,} 3 & -11 & -15 \\ - 0 {,} 7 & -7 & 2 \\ 0 {,} 1 & 2 & -6 \\ 0 {,} 2 & -5 & 5 \\ 1 {,} 9 & 20 & 21 \\ 0 {,} 1 & 1 & 1 \ end {pmatrix}}}

.

The covariances are calculated for the covariance matrix, as in the example,

{\ displaystyle s_ {12} = {\ widehat {\ operatorname {Cov}}} (X_ {1}, X_ {2}) = {\ frac {1} {9}} (- 1 {,} 6 \ cdot (-16) +0 \ cdot 1+ \ ldots +0 {,} 1 \ cdot 1) = {\ frac {90 {,} 80} {9}} \ approx 10 {,} 09}

and accordingly the variances

{\ displaystyle s_ {22} = {\ widehat {\ operatorname {Var}}} (X_ {2}) = {\ frac {1} {9}} ((- 16) ^ {2} + 1 ^ {2 } + \ ldots + 1 ^ {2}) = {\ frac {974} {9}} \ approx 108 {,} 22}

,

so that the sample covariance matrix

{\ displaystyle \ mathbf {S} = {\ begin {pmatrix} 1 {,} 06 & 10 {,} 09 & 10 {,} 91 \\ 10 {,} 09 & 108 {,} 22 & 106 {,} 22 \\ 10 {,} 91 & 106 {,} 22 & 142 {,} 89 \ end {pmatrix}}}

results.

Correspondingly, for example , one obtains for the sample correlation matrix

{\ displaystyle r_ {12} = {\ frac {10 {,} 09} {\ sqrt {1 {,} 06 \ cdot 108 {,} 22}}} \ approx 0 {,} 9439}

or in total

{\ displaystyle \ mathbf {R} = {\ begin {pmatrix} 1 & 0 {,} 9439 & 0 {,} 8884 \\ 0 {,} 9439 & 1 & 0 {,} 8542 \\ 0 {,} 8884 & 0 {,} 8542 & 1 \ end {pmatrix }}}

.

Generation of multi-dimensional, normally distributed random numbers

An often used method for generating a random vector of a -dimensional normal distribution with an expected value vector and ( symmetric and positively definite ) covariance matrix can be given as follows: ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle N}$ ${\ displaystyle {\ boldsymbol {\ mu}}}$ ${\ displaystyle \ mathbf {\ Sigma}}$

Find a matrix such that . The Cholesky decomposition of or the square root of can be used for this. ${\ displaystyle \ mathbf {A}}$ ${\ displaystyle \ mathbf {A} \ mathbf {A} ^ {\ top} = \ mathbf {\ Sigma}}$ ${\ displaystyle \ mathbf {\ Sigma}}$ ${\ displaystyle \ mathbf {\ Sigma}}$

Let be a vector whose components are stochastically independent, standard normally distributed random numbers. These can be generated using the Box-Muller method , for example. ${\ displaystyle \ mathbf {Z} = (Z_ {1}, \ ldots, Z_ {N}) ^ {\ top}}$ ${\ displaystyle N}$

The desired -dimensional normal distribution results from the affine transformation .

{\ displaystyle \ mathbf {X} = {\ varvec {\ mu}} + \ mathbf {A} \ mathbf {Z}}

{\ displaystyle N}

Scatter regions of the multidimensional normal distribution

For one-dimensional normally distributed random variables, approximately 68.27% of the realizations lie in the interval ; for multi-dimensional normally distributed random variables, the regions of constant probability are given by ellipses (the standard deviation ellipses ) which are centered around the mean. The main axes of the ellipse are given by the eigenvectors of the covariance matrix , the length of the semi-axes is the square root of the eigenvalue belonging to the respective main axis . Finding a realization of the random variables in the region which is limited by the (multidimensional) standard deviation ellipse is less likely for a multidimensional, normally distributed random variable. ${\ displaystyle \ mu \ pm \ sigma}$ ${\ displaystyle \ Sigma}$ ${\ displaystyle \ sigma _ {i}}$

Representation of the standard deviation ellipse of a two-dimensional normal distribution, as well as the two marginal distributions .

After a main axis transformation, the axes can be normalized with their respective ones. Then the probability can be calculated as a function of the radius with which a measured value lies within this radius. With ${\ displaystyle \ sigma _ {i}}$ ${\ displaystyle r}$

{\ displaystyle r '^ {2} = \ sum _ {i = 1} ^ {p} (x_ {i} - \ mu _ {i}) ^ {2} / \ sigma _ {i} ^ {2} }

is the share

{\ displaystyle \ pi (r) = {\ frac {\ int _ {0} ^ {r} f_ {X} (r ') r' ^ {p-1} dr '} {\ int _ {0} ^ {\ infty} f_ {X} (r ') r' ^ {p-1} dr '}} = P \ left ({\ frac {p} {2}}, {\ frac {r ^ {2}} {2}} \ right)}

of the measured values at most at a distance from the mean value of a p-dimensional normal distribution. It is the regularized incomplete gamma function of the upper limit. ${\ displaystyle r}$ ${\ displaystyle P}$

${\ displaystyle P \ left ({\ frac {p} {2}}, {\ frac {r ^ {2}} {2}} \ right)}$
${\ displaystyle \ pi}$ in %	${\ displaystyle r = \ sigma}$	${\ displaystyle r = 2 \ sigma}$	${\ displaystyle r = 3 \ sigma}$
${\ displaystyle p = 1}$	68.27	95.45	99.73
${\ displaystyle p = 2}$	39.35	86.47	98.89
${\ displaystyle p = 3}$	19.87	73.85	97.07

Correspondingly, the inverse function can be used to specify the scattering radius r, in which a specified proportion of measured values lies:

{\ displaystyle r = {\ sqrt {2P ^ {- 1} \ left ({\ frac {p} {2}}, \ pi \ right)}}}

${\ displaystyle r}$ in ${\ displaystyle \ sigma}$	${\ displaystyle \ pi = 50 \%}$	${\ displaystyle \ pi = 90 \%}$	${\ displaystyle \ pi = 99 \%}$
${\ displaystyle p = 1}$	0.675	1.645	2.576
${\ displaystyle p = 2}$	1.177	2.146	3.035
${\ displaystyle p = 3}$	1.538	2,500	3.368

literature

Mardia, KV, Kent, JT, Bibby, JM: Multivariate Analysis , New York 1979
Fahrmeir, Ludwig, Hamerle, Alfred, Tutz, Gerhard (Eds.): Multivariate statistical methods , New York 1996
Hartung, Joachim, Elpelt, Bärbel: Multivariate Statistics , Munich, Vienna 1999
Flury, Bernhard, A first course in multivariate statistics , New York, 1997.

Remarks

↑ Multidimensional and multivariate normal distributions are used synonymously in this article. In Hartung / Elpelt: Multivariate statistics they have different meanings (in Chapter 1, Section 5): Here the multivariate normal distribution is a matrix distribution.
^ Rencher, Alvin C., and G. Bruce Schaalje: Linear models in statistics. , John Wiley & Sons, 2008, p. 89.
^ Rencher, Alvin C., and G. Bruce Schaalje: Linear models in statistics. , John Wiley & Sons, 2008, p. 90.
↑ Kleiber, Stoyanov: Multivariate distributions and the moment problem, Journal of Multivariate Analysis , Volume 113, January 2013, pages 7–18, doi: 10.1016 / j.jmva.2011.06.001 .
↑ Bin Wang, Wenzhong Shi, Zelang Miao: Confidence Analysis of Standard Deviational Ellipse and Its Extension into Higher Dimensional Euclidean Space . In: PLOS ONE . tape 10 , no. 3 , March 13, 2015, ISSN 1932-6203 , p. 11 , doi : 10.1371 / journal.pone.0118537 .
↑ would be described in the normalization . ${\ displaystyle \ sigma = 1}$

[1] Multidimensional and multivariate normal distributions are used synonymously in this article. In Hartung / Elpelt: Multivariate statistics they have different meanings (in Chapter 1, Section 5): Here the multivariate normal distribution is a matrix distribution.

[2] Rencher, Alvin C., and G. Bruce Schaalje: Linear models in statistics. , John Wiley & Sons, 2008, p. 89.

[3] Rencher, Alvin C., and G. Bruce Schaalje: Linear models in statistics. , John Wiley & Sons, 2008, p. 90.

[4] Kleiber, Stoyanov: Multivariate distributions and the moment problem, Journal of Multivariate Analysis , Volume 113, January 2013, pages 7–18, doi: 10.1016 / j.jmva.2011.06.001 .

[5] Bin Wang, Wenzhong Shi, Zelang Miao: Confidence Analysis of Standard Deviational Ellipse and Its Extension into Higher Dimensional Euclidean Space . In: PLOS ONE . tape 10 , no. 3 , March 13, 2015, ISSN 1932-6203 , p. 11 , doi : 10.1371 / journal.pone.0118537 .

[6] would be described in the normalization . ${\ displaystyle \ sigma = 1}$