# Covariance matrix

A two-dimensional Gaussian distribution centered around , with the covariance matrix${\ displaystyle (0; 0)}$${\ displaystyle \ mathbf {\ Sigma} = {\ begin {pmatrix} 1 & 0 {,} 5 \\ 0 {,} 5 & 1 \ end {pmatrix}}}$

In stochastics , the covariance matrix is the generalization of the variance of a one-dimensional random variable to a multidimensional random variable, i.e. H. on a random vector . The elements on the main diagonal of the covariance matrix represent the respective variances, and all other elements represent covariances. The covariance matrix is also variance-covariance matrix rarely or scattering matrix or dispersion matrix ( Latin dispersion "dispersion" of dispergere "spread, spread, scatter") called and is a positive semi-definite matrix. If all components of the random vector are linearly independent, the covariance matrix is ​​positive definite. ${\ displaystyle \ mathbf {X}}$

## definition

Be a random vector${\ displaystyle \ mathbf {X}}$

${\ displaystyle \ mathbf {X} = {\ begin {pmatrix} X_ {1} \\ X_ {2} \\\ vdots \\ X_ {n} \ end {pmatrix}}}$,

where represents the expectation of , the variance of and the covariance of the real random variables and . The expectation vector of is then given by (see expectation of matrices and vectors ) ${\ displaystyle \ operatorname {E} (X_ {i}) = \ mu _ {i}}$${\ displaystyle X_ {i}}$${\ displaystyle \ operatorname {Var} (X_ {i}) = \ sigma _ {i} ^ {2}}$${\ displaystyle X_ {i}}$${\ displaystyle \ operatorname {Cov} (X_ {i}, X_ {j}) = \ sigma _ {ij} \ ;, i \ neq j}$ ${\ displaystyle X_ {i}}$${\ displaystyle X_ {j}}$${\ displaystyle \ mathbf {X}}$

${\ displaystyle \ operatorname {E} (\ mathbf {X}) = \ operatorname {E} {\ begin {pmatrix} X_ {1} \\ X_ {2} \\\ vdots \\ X_ {n} \ end { pmatrix}} = {\ begin {pmatrix} \ mu _ {1} \\\ mu _ {2} \\\ vdots \\\ mu _ {n} \ end {pmatrix}} = {\ varvec {\ mu} }}$,

d. H. the expected value of the random vector is the vector of the expected values. A covariance matrix for the random vector can be defined as follows: ${\ displaystyle \ mathbf {X}}$

{\ displaystyle {\ begin {aligned} \ operatorname {Cov} (\ mathbf {X}) & = \ operatorname {E} \ left ((\ mathbf {X} - {\ boldsymbol {\ mu}}) (\ mathbf {X} - {\ boldsymbol {\ mu}}) ^ {\ top} \ right) \\\\ & = \ operatorname {E} {\ begin {pmatrix} (X_ {1} - \ mu _ {1} ) ^ {2} & (X_ {1} - \ mu _ {1}) (X_ {2} - \ mu _ {2}) & \ cdots & (X_ {1} - \ mu _ {1}) ( X_ {n} - \ mu _ {n}) \\\\ (X_ {2} - \ mu _ {2}) (X_ {1} - \ mu _ {1}) & (X_ {2} - \ mu _ {2}) ^ {2} & \ cdots & (X_ {2} - \ mu _ {2}) (X_ {n} - \ mu _ {n}) \\\\\ vdots & \ vdots & \ ddots & \ vdots \\\\ (X_ {n} - \ mu _ {n}) (X_ {1} - \ mu _ {1}) & (X_ {n} - \ mu _ {n}) ( X_ {2} - \ mu _ {2}) & \ cdots & (X_ {n} - \ mu _ {n}) ^ {2} \ end {pmatrix}} \\\\ & = {\ begin {pmatrix } \ operatorname {Var} (X_ {1}) & \ operatorname {Cov} (X_ {1}, X_ {2}) & \ cdots & \ operatorname {Cov} (X_ {1}, X_ {n}) \ \\\\ operatorname {Cov} (X_ {2}, X_ {1}) & \ operatorname {Var} (X_ {2}) & \ cdots & \ operatorname {Cov} (X_ {2}, X_ {n} ) \\\\\ vdots & \ vdots & \ ddots & \ vdots \\\\\ operatorname {Cov} (X_ {n}, X_ {1}) & \ operatorname {Cov} (X_ {n}, X_ { 2}) & \ cdots & \ operatorname {Var} (X_ {n}) \ end {pmatrix}} \\\\ & = {\ begin {pmatrix} \ sigma _ {1} ^ {2} & \ sigma _ {12} & \ cdots & \ sigma _ {1n} \\\\\ sigma _ {21} & \ sigma _ {2} ^ {2} & \ cdots & \ sigma _ {2n} \\\\\ vdots & \ vdots & \ ddots & \ vdots \\\\\ sigma _ {n1} & \ sigma _ {n2} & \ cdots & \ sigma _ {n} ^ {2} \ end {pmatrix}} \\\\ & = \ mathbf {\ Sigma} \ end {aligned}}}

The covariance matrix is noted with , or and the covariance matrix of the asymptotic distribution of a random variable with or . The covariance matrix and the expected value vector are the most important parameters of a probability distribution. They are shown in a random variable as additional information as follows: . The covariance matrix, as a matrix of all pairwise covariances of the elements of the random vector, contains information about its dispersion and about correlations between its components. If none of the random variables are degenerate (i.e., if none of them have a variance of zero) and there is no exact linear relationship between them , then the covariance matrix is ​​positive definite. We also speak of a scalar covariance matrix if all off-diagonal entries in the matrix are zero and the diagonal elements represent the same positive constant. ${\ displaystyle \ operatorname {Cov} (\ mathbf {X})}$${\ displaystyle \ mathbf {\ Sigma} _ {X}}$${\ displaystyle {\ boldsymbol {\ operatorname {V}}}}$${\ displaystyle {\ overline {\ boldsymbol {\ operatorname {V}}}}}$${\ displaystyle X \; \ sim \; ({\ boldsymbol {\ mu}}, \ mathbf {\ Sigma})}$${\ displaystyle X_ {1}, \ ldots, X_ {n}}$${\ displaystyle X_ {i}}$

## properties

### Basic properties

• For the following applies: . Thus, the covariance matrix for containing main diagonal , the variances of the individual components of the random vector. All elements on the main diagonal are therefore nonnegative.${\ displaystyle i = j}$${\ displaystyle \ operatorname {Cov} (X_ {i}, X_ {j}) = \ operatorname {Var} (X_ {i})}$
• A real covariance matrix is symmetric because the covariance of two random variables is symmetric.
• The covariance matrix is ​​positive semidefinite: Due to the symmetry, every covariance matrix can be diagonalized by means of a principal axis transformation , whereby the diagonal matrix is ​​again a covariance matrix. Since there are only variances on the diagonal, the diagonal matrix is ​​consequently positive semidefinite and thus also the original covariance matrix.
• Conversely, every symmetric positive semidefinite matrix can be understood as a covariance matrix of a -dimensional random vector.${\ displaystyle d \ times d}$${\ displaystyle d}$
• Due to the diagonalisability, whereby the eigenvalues (on the diagonal) are non-negative because of the positive semidefiniteness, covariance matrices can be represented as ellipsoids .
• The following applies to all matrices .${\ displaystyle \ mathbf {A} \ in \ mathbb {R} ^ {m \ times n}}$${\ displaystyle \ operatorname {Cov} (\ mathbf {A} \ mathbf {X}) = \ operatorname {E} {\ Big (} \ mathbf {A} (\ mathbf {X} - \ operatorname {E} (\ mathbf {X})) (\ mathbf {X} - \ operatorname {E} (\ mathbf {X})) ^ {\ top} \ mathbf {A} ^ {\ top} {\ Big)} = \ mathbf { A} \ operatorname {Cov} (\ mathbf {X}) \ \ mathbf {A} ^ {\ top}}$
• The following applies to all vectors .${\ displaystyle \ mathbf {b} \ in \ mathbb {R} ^ {n}}$${\ displaystyle \ operatorname {Cov} (\ mathbf {X} + \ mathbf {b}) = \ operatorname {Cov} (\ mathbf {X})}$
• If and are uncorrelated random vectors, then .${\ displaystyle \ mathbf {X}}$${\ displaystyle \ mathbf {Y}}$ ${\ displaystyle \ operatorname {Cov} (\ mathbf {X} + \ mathbf {Y}) = \ operatorname {Cov} (\ mathbf {X}) + \ operatorname {Cov} (\ mathbf {Y})}$
• With the obtained diagonal matrix , the covariance matrix by the relationship , where the correlation matrix in the population represents${\ displaystyle \ mathbf {D} = \ left (\ operatorname {diag} ({\ varvec {\ Sigma}}) \ right) ^ {1/2} = \ operatorname {diag} (\ sigma _ {1}, \ sigma _ {2}, \ dotsc, \ sigma _ {n})}$${\ displaystyle {\ boldsymbol {\ Sigma}} = \ mathbf {D} \, \ mathbf {P} \, \ mathbf {D}}$${\ displaystyle \ mathbf {P}}$
• If the random variables are standardized , the covariance matrix just contains the correlation coefficients and the correlation matrix is obtained
• The inverse of the covariance matrix is called the precision matrix or concentration matrix${\ displaystyle \ mathbf {P} = \ mathbf {\ Sigma} ^ {- 1}}$
• The following applies to the trace of the covariance matrix${\ displaystyle \ operatorname {track} (\ mathbf {\ Sigma}) = \ sum \ nolimits _ {i = 1} ^ {n} \ sigma _ {i} ^ {2}}$
• ${\ displaystyle \ operatorname {Cov} (\ mathbf {X} + \ mathbf {Y}, \ mathbf {Z}) = \ operatorname {Cov} (\ mathbf {X}, \ mathbf {Z}) + \ operatorname { Cov} (\ mathbf {Y}, \ mathbf {Z})}$

### Relationship to the expected value of the random vector

If the expectation value vector is , then it can be shown with the Steiner shift theorem applied to multidimensional random variables that ${\ displaystyle {\ boldsymbol {\ mu}} = \ operatorname {E} (X)}$

{\ displaystyle {\ begin {aligned} \ operatorname {Cov} (\ mathbf {X}) & = \ operatorname {E} {\ bigl (} (\ mathbf {X} - {\ boldsymbol {\ mu}}) ( \ mathbf {X} - {\ boldsymbol {\ mu}}) ^ {\ top} {\ bigr)} \\ & = \ operatorname {E} (\ mathbf {X} \ mathbf {X} ^ {\ top} ) - {\ boldsymbol {\ mu}} {\ boldsymbol {\ boldsymbol {\ mu}}} ^ {\ top} \ end {aligned}}}.

Here expectation values ​​of vectors and matrices are to be understood component-wise.

A random vector that is supposed to obey a given covariance matrix and have the expected value can be simulated as follows: First, the covariance matrix has to be decomposed (e.g. with the Cholesky decomposition ): ${\ displaystyle {\ boldsymbol {\ mu}}}$

${\ displaystyle \ operatorname {Cov} (\ mathbf {X}) = \ mathbf {D} \ mathbf {D} ^ {\ top}}$.

The random vector can then be calculated

${\ displaystyle \ mathbf {X} = \ mathbf {D} \ mathbf {\ xi} + {\ boldsymbol {\ mu}}}$

with a (different) random vector with independent standard normally distributed components. ${\ displaystyle \ mathbf {\ xi}}$

### Covariance matrix of two vectors

The covariance matrix of two vectors is

${\ displaystyle \ operatorname {Cov} (\ mathbf {x}, \ mathbf {y}) = \ operatorname {E} {\ bigl (} (\ mathbf {x} - {\ boldsymbol {\ mu}}) (\ mathbf {y} - {\ boldsymbol {\ nu}}) ^ {\ top} {\ bigr)}}$

with the expected value of the random vector and the expected value of the random vector . ${\ displaystyle {\ boldsymbol {\ mu}}}$${\ displaystyle \ mathbf {x}}$${\ displaystyle {\ boldsymbol {\ nu}}}$${\ displaystyle \ mathbf {y}}$

### Covariance matrix as an efficiency criterion

The efficiency or precision of a point estimator can be measured by means of the variance-covariance matrix, since this contains the information about the scatter of the random vector between its components. In general, the efficiency of a parameter estimator can be measured by the “size” of its variance-covariance matrix. The “smaller” the variance-covariance matrix, the greater the efficiency of the estimator. Let and be two undistorted random vectors. If is a random vector, then is a positively definite and symmetric matrix. One can say that "smaller" is than in the sense of the Loewner partial order , i. i.e. that is a positive semi-definite matrix. ${\ displaystyle {\ tilde {\ boldsymbol {\ theta}}}}$${\ displaystyle {\ hat {\ boldsymbol {\ theta}}}}$${\ displaystyle (K \ times 1)}$${\ displaystyle {\ boldsymbol {\ theta}}}$${\ displaystyle (K \ times 1)}$${\ displaystyle \ operatorname {Cov} ({\ hat {\ boldsymbol {\ theta}}})}$${\ displaystyle (K \ times K)}$${\ displaystyle \ operatorname {Cov} ({\ hat {\ boldsymbol {\ theta}}})}$${\ displaystyle \ operatorname {Cov} ({\ tilde {\ boldsymbol {\ theta}}})}$${\ displaystyle \ operatorname {Cov} ({\ tilde {\ boldsymbol {\ theta}}}) - \ operatorname {Cov} ({\ hat {\ boldsymbol {\ theta}}})}$

## Sample covariance matrix

An estimate of the correlation matrix in the population is obtained by treating the variances and covariances in the population and by the empirical variance and empirical covariances (their empirical counterparts) and replace (if the variables random variable value representing the parameters in the population). These are given by ${\ displaystyle {\ widehat {\ mathbf {\ Sigma}}}}$${\ displaystyle \ operatorname {Var} (X_ {i}) = \ sigma _ {i} ^ {2}}$${\ displaystyle \ operatorname {Cov} (X_ {i}, X_ {j}) = \ sigma _ {ij} \ ;, i \ neq j}$${\ displaystyle {\ hat {\ sigma}} _ {j} ^ {2} = s_ {j} ^ {2}}$${\ displaystyle {\ hat {\ sigma}} _ {jk} = s_ {jk}}$${\ displaystyle x}$

${\ displaystyle {\ hat {\ sigma}} _ {i} ^ {2} = s_ {i} ^ {2}: = {\ frac {1} {n-1}} \ sum \ limits _ {i = 1} ^ {n} \ left (x_ {ij} - {\ overline {x}} _ {j} \ right) ^ {2} \;}$and .${\ displaystyle \; {\ hat {\ sigma}} _ {ij} = s_ {ij}: = {\ frac {1} {n-1}} \ sum _ {i = 1} ^ {n} (x_ {ij} - {\ overline {x}} _ {j}) (x_ {ik} - {\ overline {x}} _ {k})}$

This leads to the sample covariance matrix : ${\ displaystyle \ mathbf {S}}$

{\ displaystyle {\ begin {aligned} \ mathbf {S} = {\ widehat {\ mathbf {\ Sigma}}} = {\ widehat {\ operatorname {Cov} (\ mathbf {X})}} & = {\ begin {pmatrix} s_ {1} ^ {2} & s_ {12} & \ cdots & s_ {1k} \\\\ s_ {21} & s_ {2} ^ {2} & \ cdots & s_ {2k} \\\\ \ vdots & \ vdots & \ ddots & \ vdots \\\\ s_ {k1} & s_ {k2} & \ cdots & s_ {k} ^ {2} \ end {pmatrix}} \ end {aligned}}}.

For example, and are given by ${\ displaystyle s_ {2} ^ {2}}$${\ displaystyle s_ {12}}$

${\ displaystyle {\ hat {\ sigma}} _ {2} ^ {2} = s_ {2} ^ {2}: = {\ frac {1} {n-1}} \ sum \ limits _ {i = 1} ^ {n} \ left (x_ {i2} - {\ overline {x}} _ {2} \ right) ^ {2} \;}$and ,${\ displaystyle \; {\ hat {\ sigma}} _ {12} = s_ {12}: = {\ frac {1} {n-1}} \ sum _ {i = 1} ^ {n} (x_ {i1} - {\ overline {x}} _ {1}) (x_ {i2} - {\ overline {x}} _ {2})}$

with the arithmetic mean

${\ displaystyle {\ overline {x}} _ {2}: = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} {x_ {i2}}}$.

## Special covariance matrices

### Ordinary Least Squares Estimator Covariance Matrix

For the covariance matrix of the ordinary least squares estimator

${\ displaystyle \ mathbf {b} = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {Y}; \ \ operatorname {Cov} (\ mathbf {Y}) = \ sigma ^ {2} \ mathbf {I}}$

results from the above calculation rules:

${\ displaystyle \ operatorname {Cov} (\ mathbf {b}) = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ operatorname {Cov} (\ mathbf {Y}) \ \ mathbf {X} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} = \ sigma ^ {2} (\ mathbf {X } ^ {\ top} \ mathbf {X}) ^ {- 1} (\ mathbf {X} ^ {\ top} \ mathbf {X}) (\ mathbf {X} ^ {\ top} \ mathbf {X} ) ^ {- 1} = \ sigma ^ {2} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} = \ Sigma _ {\ mathbf {b}}}$.

This covariance matrix is ​​unknown because the variance of the disturbance variables is unknown. An estimator for the covariance matrix is obtained by replacing the unknown disturbance variable variance with the unbiased estimator of the disturbance variable variance (see: Unambiguous estimation of the unknown variance parameter ). ${\ displaystyle \ sigma ^ {2}}$${\ displaystyle {\ hat {\ Sigma}} _ {\ mathbf {b}}}$${\ displaystyle \ sigma ^ {2}}$ ${\ displaystyle {\ hat {\ sigma}} ^ {2}}$

### Covariance matrix for seemingly unconnected regression equations

In seemingly unrelated regression equations ( English : seemingly unrelated regression equations , shortly SURE ) model

${\ displaystyle y_ {it} = {\ varvec {x}} _ {it} ^ {\ top} {\ varvec {\ beta}} + {\ varvec {e}} _ {it}}$,

where the error term is idiosyncratic, the covariance matrix results as ${\ displaystyle {\ boldsymbol {e}} _ {it}}$

{\ displaystyle {\ begin {aligned} \ operatorname {Cov} (\ mathbf {e}) = \ operatorname {E} (\ mathbf {e} \ mathbf {e} ^ {\ top}) & = {\ begin { pmatrix} \ operatorname {E} ({\ varvec {e}} _ {1} {\ varvec {e}} _ {1} ^ {\ top}) & \ cdots & \ operatorname {E} ({\ varvec { e}} _ {1} {\ boldsymbol {e}} _ {N} ^ {\ top}) \\\\\ vdots & \ ddots & \ vdots \\\\\ operatorname {E} ({\ boldsymbol { e}} _ {N} {\ varvec {e}} _ {1} ^ {\ top}) & \ cdots & \ operatorname {E} ({\ varvec {e}} _ {N} {\ varvec {e }} _ {N} ^ {\ top}) \ end {pmatrix}} = {\ begin {pmatrix} \ sigma _ {11} \ mathbf {I} _ {T} & \ cdots & \ sigma _ {1N} \ mathbf {I} _ {T} \\\\\ vdots & \ ddots & \ vdots \\\\\ sigma _ {N1} \ mathbf {I} _ {T} & \ cdots & \ sigma _ {NN} \ mathbf {I} _ {T} \ end {pmatrix}} = {\ begin {pmatrix} \ sigma _ {11} & \ cdots & \ sigma _ {1N} \\\\\ vdots & \ ddots & \ vdots \\\\\ sigma _ {N1} & \ cdots & \ sigma _ {NN} \ end {pmatrix}} \ otimes \ mathbf {I} _ {T} \\\\ & = \ mathbf {\ Sigma} \ otimes \ mathbf {I} _ {T} = \ mathbf {\ Phi} \ end {aligned}}}