# Covariance (stochastics)

The covariance ( latin con- = "co" and variance (dispersion) of variare = "(ver) change may be different," therefore rarely Mitstreuung ) is in the stochastic a non-standard measure of association for a monotonous two associated random variables with common probability distribution . The value of this parameter tends to make statements about whether high values ​​of one random variable are more likely to be associated with high or rather low values ​​of the other random variable. Covariance is a measure of the association between two random variables.

## definition

Are and two real, integrable random variables whose product is also integrable, d. i.e., the expected values , and exist, then is called ${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle \ operatorname {E} (X)}$ ${\ displaystyle \ operatorname {E} (Y)}$ ${\ displaystyle \ operatorname {E} (XY)}$ ${\ displaystyle \ operatorname {Cov} (X, Y): = \ operatorname {E} {\ bigl [} (X- \ operatorname {E} (X)) \ cdot (Y- \ operatorname {E} (Y) ) {\ bigr]}}$ the covariance of and . ${\ displaystyle X}$ ${\ displaystyle Y}$ If and are square-integrable , i.e. if and hold, then the Cauchy-Schwarz inequality implies : ${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle \ operatorname {E} (| X | ^ {2}) = \ operatorname {E} (X ^ {2}) <\ infty}$ ${\ displaystyle \ operatorname {E} (| Y | ^ {2}) = \ operatorname {E} (Y ^ {2}) <\ infty}$ ${\ displaystyle \ operatorname {E} (| X |) = \ operatorname {E} (| X | \ cdot 1) \ leq {\ sqrt {\ operatorname {E} (| X | ^ {2})}} < \ infty}$ and analog and in addition .${\ displaystyle \ operatorname {E} (| Y |) \ leq {\ sqrt {\ operatorname {E} (| Y | ^ {2})}} <\ infty}$ ${\ displaystyle \ operatorname {E} (| X \ cdot Y |) \ leq \ operatorname {E} (| X | \ cdot | Y |) \ leq {\ sqrt {\ operatorname {E} (| X | ^ { 2}) \ cdot \ operatorname {E} (| Y | ^ {2})}} <\ infty}$ Thus the required existence of the expected values ​​for square integrable random variables is fulfilled.

## Properties and rules of calculation

### Interpretation of the covariance

• The covariance is positive if and have a monotonic relationship, i.e. That is, high (low) values ​​of go along with high (low) values ​​of .${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle X}$ ${\ displaystyle Y}$ • The covariance, on the other hand, is negative if and have a monotonous relationship in opposite directions, i.e. That is, high values ​​of one random variable are associated with low values ​​of the other random variable and vice versa.${\ displaystyle X}$ ${\ displaystyle Y}$ • If the result is zero, there is no monotonic relationship between and (non-monotonic relationships are possible, however.).${\ displaystyle X}$ ${\ displaystyle Y}$ The covariance indicates the direction of a relationship between two random variables, but no statement is made about the strength of the relationship. This is due to the linearity of the covariance. In order to make a relationship comparable, the covariance must be normalized. The most common normalization using the standard deviation leads to the correlation coefficient .

### Displacement set

To often simplify the calculation of the covariance, one can also use the shift theorem as an alternative representation of the covariance.

Theorem (shift theorem for the covariance):

${\ displaystyle \ operatorname {Cov} (X, Y) = \ operatorname {E} (XY) - \ operatorname {E} (X) \ operatorname {E} (Y).}$ Proof:

{\ displaystyle {\ begin {aligned} \ operatorname {Cov} (X, Y) & = \ operatorname {E} {\ bigl [} (X- \ operatorname {E} (X)) \ cdot (Y- \ operatorname {E} (Y)) {\ bigr]} \\ & = \ operatorname {E} {\ bigl [} (XY-X \ operatorname {E} (Y) -Y \ operatorname {E} (X) + \ operatorname {E} (X) \ operatorname {E} (Y)) {\ bigr]} \\ & = \ operatorname {E} (XY) - \ operatorname {E} (X) \ operatorname {E} (Y) - \ operatorname {E} (Y) \ operatorname {E} (X) + \ operatorname {E} (X) \ operatorname {E} (Y) \\ & = \ operatorname {E} (XY) - \ operatorname { E} (X) \ operatorname {E} (Y) \ qquad \ Box \ end {aligned}}} ### Relationship to variance

Theorem: The covariance is the generalization of the variance , because it holds

${\ displaystyle \ operatorname {Var} (X) = \ operatorname {Cov} (X, X).}$ Proof:

{\ displaystyle {\ begin {aligned} \ operatorname {Cov} (X, X) & = \ operatorname {E} {\ bigl [} (X- \ operatorname {E} (X)) ^ {2} {\ bigr ]} \\ & = \ operatorname {Var} (X) \ qquad \ Box \ end {aligned}}} The variance is therefore the covariance of a random variable with itself.

The covariances can also be used to calculate the variance of a sum of square-integrable random variables. In general

{\ displaystyle {\ begin {aligned} \ operatorname {Var} \ left (\ sum _ {i = 1} ^ {n} X_ {i} \ right) & = \ sum _ {i, j = 1} ^ { n} \ operatorname {Cov} (X_ {i}, X_ {j}) \\ & = \ sum _ {i = 1} ^ {n} \ operatorname {Var} (X_ {i}) + \ sum _ { i, j = 1, i \ neq j} ^ {n} \ operatorname {Cov} (X_ {i}, X_ {j}) \\ & = \ sum _ {i = 1} ^ {n} \ operatorname { Var} (X_ {i}) + 2 \ sum _ {i = 1} ^ {n-1} \ sum _ {j = i + 1} ^ {n} \ operatorname {Cov} (X_ {i}, X_ {j}). \ end {aligned}}} The formula therefore applies especially to the sum of two random variables

${\ displaystyle \ operatorname {Var} (X + Y) = \ operatorname {Var} (X) + \ operatorname {Var} (Y) +2 \ operatorname {Cov} (X, Y).}$ As can be seen immediately from the definition, the covariance changes sign when one of the variables changes sign:

${\ displaystyle \ operatorname {Cov} (X, -Y) = - \ operatorname {Cov} (X, Y)}$ This results in the formula for the difference between two random variables

${\ displaystyle \ operatorname {Var} (XY) = \ operatorname {Var} (X + (- Y)) = \ operatorname {Var} (X) + \ operatorname {Var} (Y) -2 \ operatorname {Cov} ( X, Y).}$ ### Linearity, symmetry and definiteness

Theorem: The covariance is a positive semidefinite symmetric bilinear form on the vector space of the square integrable random variables.

So the following three sentences apply:

Theorem (bilinearity): For : ${\ displaystyle a, b, c, d, e, f, g, h \ in \ mathbb {R}}$ ${\ displaystyle \ operatorname {Cov} (aX + b, cY + d) = ac \ operatorname {Cov} (X, Y) \ qquad and}$ ${\ displaystyle \ operatorname {Cov} [X, (eY + f) + (gZ + h)] = e \ operatorname {Cov} (X, Y) + g \ operatorname {Cov} (X, Z).}$ Proof:

{\ displaystyle {\ begin {aligned} \ operatorname {Cov} (aX + b, cY + d) & = \ operatorname {E} {\ bigl [} (aX + b- \ operatorname {E} (aX + b) ) \ cdot (cY + d- \ operatorname {E} (cY + d)) {\ bigr]} \\ & = \ operatorname {E} {\ bigl [} (aX-a \ operatorname {E} (X) ) \ cdot (cY-c \ operatorname {E} (Y)) {\ bigr]} \\ & = ac \ operatorname {E} {\ bigl [} (X- \ operatorname {E} (X)) \ cdot (Y- \ operatorname {E} (Y)) {\ bigr]} \\ & = ac \ operatorname {Cov} (X, Y) \ end {aligned}}} {\ displaystyle {\ begin {aligned} \ operatorname {Cov} [X, (eY + f) + (gZ + h)] & = \ operatorname {E} {\ bigl [} (X- \ operatorname {E} ( X)) \ cdot (eY + f + gZ + h- \ operatorname {E} (eY + f + gZ + h)) {\ bigr]} \\ & = \ operatorname {E} {\ bigl [} (X - \ operatorname {E} (X)) \ cdot (eY-e \ operatorname {E} (Y) + gZ-g \ operatorname {E} (Z)) {\ bigr]} \\ & = \ operatorname {E } {\ bigl [} (X- \ operatorname {E} (X)) \ cdot e (Y- \ operatorname {E} (Y)) + (X- \ operatorname {E} (X)) \ cdot g ( Z- \ operatorname {E} (Z)) {\ bigr]} \\ & = e \ operatorname {E} {\ bigl [} (X- \ operatorname {E} (X)) \ cdot (Y- \ operatorname {E} (Y)) {\ bigr]} + g \ operatorname {E} {\ bigl [} (X- \ operatorname {E} (X)) \ cdot (Z- \ operatorname {E} (Z)) {\ bigr]} \\ & = e \ operatorname {Cov} (X, Y) + g \ operatorname {Cov} (X, Z) \ qquad \ Box \ end {aligned}}} The covariance is obviously invariant under the addition of constants to the random variables. In the second equation, the covariance is also linear in the first argument because of the symmetry.

Theorem (symmetry):

${\ displaystyle \ operatorname {Cov} (X, Y) = \ operatorname {Cov} (Y, X)}$ Proof:

{\ displaystyle {\ begin {aligned} \ operatorname {Cov} (X, Y) & = \ operatorname {E} {\ bigl [} (Y- \ operatorname {E} (Y)) \ cdot (X- \ operatorname {E} (X)) {\ bigr]} \\ & = \ operatorname {Cov} (Y, X) \ qquad \ Box \ end {aligned}}} Theorem (positive semi-definiteness):

${\ displaystyle \ operatorname {Cov} (X, X) \ geq 0.}$ Proof:

${\ displaystyle \ operatorname {Cov} (X, X) = \ operatorname {Var} (X) \ geq 0 \ qquad \ Box}$ Overall, the Cauchy-Schwarz inequality follows for every positive semidefinite symmetric bilinear form

${\ displaystyle | \ operatorname {Cov} (X, Y) | \ leq {\ sqrt {\ operatorname {Var} (X)}} \ cdot {\ sqrt {\ operatorname {Var} (Y)}}}$ The linearity of the covariance means that the covariance depends on the scale of the random variable. For example, you get ten times the covariance if you look at the random variable instead . In particular, the value of the covariance depends on the units of measurement used for the random variables. Since this property makes the absolute values ​​of the covariance difficult to interpret, one looks at the investigation for a linear relationship between and often instead the scale-independent correlation coefficient. The scale-independent correlation coefficient of two random variables and is the covariance of the standardized (related to the standard deviation) random variables and : ${\ displaystyle X}$ ${\ displaystyle 10X}$ ${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle {\ tilde {X}} = X / \ sigma _ {X}}$ ${\ displaystyle {\ tilde {Y}} = Y / \ sigma _ {Y}}$ ${\ displaystyle \ operatorname {Cov} ({\ tilde {X}}, {\ tilde {Y}}) = \ operatorname {Cov} (X / \ sigma _ {X}, Y / \ sigma _ {Y}) = {\ frac {1} {\ sigma _ {X} \ sigma _ {Y}}} \ operatorname {Cov} (X, Y) =: \ rho (X, Y)}$ .

### Uncorrelatedness and independence

Definition (uncorrelatedness): Two random variables and are called uncorrelated if . ${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle \ operatorname {Cov} (X, Y) = 0}$ Theorem: Two stochastically independent random variables are uncorrelated.

Proof: For stochastically independent random variables and we have , i. H. ${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle \ operatorname {E} (XY) = \ operatorname {E} (X) \ operatorname {E} (Y)}$ {\ displaystyle {\ begin {aligned} \ operatorname {E} (XY) - \ operatorname {E} (X) \ operatorname {E} (Y) & = 0 \\\ Leftrightarrow \ qquad \ qquad \ qquad \ operatorname { Cov} (X, Y) & = 0. \ qquad \ end {aligned}}} The reverse is generally not true. A counterexample is given by a random variable and uniformly distributed in the interval . Are obvious and interdependent. But it applies ${\ displaystyle [-1.1]}$ ${\ displaystyle X}$ ${\ displaystyle Y = X ^ {2}}$ ${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle \ operatorname {Cov} (X, Y) = \ operatorname {Cov} (X, X ^ {2}) = \ operatorname {E} (X ^ {3}) - \ operatorname {E} (X) \ operatorname {E} (X ^ {2}) = 0-0 \ cdot \ operatorname {E} (X ^ {2}) = 0}$ .

Stochastically independent random variables whose covariance exists are therefore also uncorrelated. Conversely, however, uncorrelatedness does not necessarily mean that the random variables are stochastically independent, because there may be a non-monotonic dependency that does not capture the covariance.

Further examples of uncorrelated but stochastically dependent random variables:

• Be and random variables and${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle P (X = 0, Y = 1) = {\ tfrac {1} {2}}}$ ${\ displaystyle P (X = 2, Y = 0) = P (X = 2, Y = 2) = {\ tfrac {1} {4}}.}$ Then and ,${\ displaystyle P (X = 0) = P (X = 2) = {\ tfrac {1} {2}}}$ ${\ displaystyle P (Y = 0) = P (Y = 2) = {\ tfrac {1} {4}}}$ ${\ displaystyle P (Y = 1) = {\ tfrac {1} {2}}.}$ It follows and also , so${\ displaystyle \ operatorname {E} (X) = \ operatorname {E} (Y) = 1}$ ${\ displaystyle \ operatorname {E} (XY) = 1}$ ${\ displaystyle \ operatorname {Cov} (X, Y) = 0.}$ On the other hand, and because of them are not stochastically independent.${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle P (X = 0, Y = 1) = {\ tfrac {1} {2}} \ neq {\ tfrac {1} {2}} \ cdot {\ tfrac {1} {2}} = P (X = 0) P (Y = 1)}$ • If the random variables and Bernoulli distributed with parameter and are independent, then and are uncorrelated, but not independent.${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle p}$ ${\ displaystyle (X + Y)}$ ${\ displaystyle (XY)}$ The uncorrelatedness is clear because ${\ displaystyle \ operatorname {Cov} (X + Y, XY) = \ operatorname {Cov} (X, X) - \ operatorname {Cov} (X, Y) + \ operatorname {Cov} (Y, X) - \ operatorname {Cov} (Y, Y) = 0.}$ But and are not independent because it is${\ displaystyle (X + Y)}$ ${\ displaystyle (XY)}$ ${\ displaystyle P (X + Y = 0, XY = 1) = 0 \ neq p (1-p) ^ {3} = P (X + Y = 0) P (XY = 1).}$ 