# Gauss-Markow theorem

In stochastics , the Gauss-Markow theorem (the English transcription Markov can also be found in the literature, i.e. Gauss-Markov's theorem ) or Gauss's theorem is a mathematical theorem about the class of linear, unbiased estimators . It represents a theoretical justification of the least squares method and is named after the mathematicians Carl Friedrich Gauß and Andrei Andrejewitsch Markow . It has recently been suggested that the sentence should simply be called Gauss' theorem , since the Markow attribution is based on an error (see #History ). The theorem states that in a linear regression model in which the disturbance variables have an expected value of zero and a constant variance and are uncorrelated (assumptions of the classical linear regression model ), the least squares estimator - provided it exists - is the best linear unbiased estimator just BLES ( English Best linear Unbiased Estimator , in short: BLUE ) is. Here, the “best” means that it has the “smallest” covariance matrix - within the class of linear unbiased estimators - and is therefore minimally variant. The disturbance variables do not necessarily have to be normally distributed . In the case of the generalized least squares estimate , they do not have to be independent and identically distributed either .

## history

The theorem was proven in 1821 by Carl Friedrich Gauß . Versions of his proof have been published by Helmert (1872), Czuber (1891), and Markow (1912) , among others . Jerzy Neyman , who did not know the work of Gauss, named the sentence after Markow among others. Since then the theorem has been known as the Gauss-Markow Theorem . Since the current name is primarily based on Neyman's ignorance of Gauss' proof, it has recently been suggested - especially in English-language literature - to name the sentence after Gauss alone, for example Gauss's sentence . Historical information on the Gauss-Markow theorem can be found in Seal (1967), Placket (1972), Stigler (1986) and in History of Mathematical Statistics from 1750 to 1930 by Hald (1998).

## Formulation of the sentence

In words, this theorem reads: The least squares estimator is the best linear unbiased estimator when the random confounders (the following formulas refer to simple linear regression ):

• uncorrelated are:
${\ displaystyle \ operatorname {Cov} (\ varepsilon _ {i}, \ varepsilon _ {j}) = \ mathbb {E} [(\ varepsilon _ {i} - \ mathbb {E} (\ varepsilon _ {i}) )) (\ varepsilon _ {j} - \ mathbb {E} (\ varepsilon _ {j}))] = \ mathbb {E} (\ varepsilon _ {i} \ varepsilon _ {j}) = 0 \ quad \ forall i \ neq j, \; i = 1, \ dotsc, n, \; j = 1, \ dotsc, n}$.
independent random variables are always also uncorrelated. In this context, one speaks of the absence of autocorrelation .
• mean zero: If the model contains an axis intercept that differs from zero, it is reasonable to at least demand that the mean value of in the population is zero and that the fluctuations of the individual disturbance variables are balanced out over the entirety of the observations. Mathematically, this means that the expected value of the disturbance variables is zero . This assumption does not make a statement about the relationship between and , but only makes a statement about the distribution of the unsystematic component in the population. This means that the model under consideration corresponds on average to the true relationship. If the expected value were not zero, then on average one would estimate a wrong relationship. This assumption can be violated if a relevant variable is not taken into account in the regression model (see bias due to omitted variables ).${\ displaystyle \ varepsilon _ {i}}$${\ displaystyle \ operatorname {E} (\ varepsilon _ {i}) = 0 \ quad, i = 1, \ ldots, n}$${\ displaystyle x}$${\ displaystyle \ varepsilon}$
• have a finite constant variance ( homoscedasticity ):${\ displaystyle \ forall i: \ operatorname {Var} (\ varepsilon _ {i}) = \ operatorname {Var} (Y_ {i}) = \ sigma ^ {2} = \ mathrm {const.} <\ infty}$
if the variance of the residuals (and thus the variance of the explained variables themselves) is the same for all expressions of the regressors, homoscedasticity or homogeneity of variance is present.

All of the above assumptions about the disturbance variables can be summarized as follows:

${\ displaystyle \ forall i: \ varepsilon _ {i} \; {\ sim} \; (0, \ sigma ^ {2})}$,

that is, all disturbance variables follow the distribution with the expected value and the variance . The distribution is not specified in detail at the beginning. ${\ displaystyle \ varepsilon _ {i} \; {\ sim} \; (0, \ sigma ^ {2})}$${\ displaystyle \ mathbb {E} (\ varepsilon _ {i}) = 0}$${\ displaystyle \ operatorname {Var} (\ varepsilon _ {i}) = \ sigma ^ {2}}$

These assumptions are also known as the Gauss-Markov assumptions . In econometrics , Gauss-Markov's theorem is often represented differently and further assumptions are made.

## General formulation of the Gauss-Markow theorem (regular case)

As a starting point, we consider a typical multiple linear regression model with given data for statistical units and regressors. The relationship between the dependent variable and the independent variables can be shown as follows ${\ displaystyle \ {y_ {i}, x_ {ij} \} _ {i = 1, \ dots, n; j = 1, \ dots, k}}$${\ displaystyle n}$ ${\ displaystyle k}$

${\ displaystyle y_ {i} = \ beta _ {0} + x_ {i1} \ beta _ {1} + x_ {i2} \ beta _ {2} + \ ldots + x_ {ik} \ beta _ {k} + \ varepsilon _ {i} = \ mathbf {x} _ {i} ^ {\ top} {\ boldsymbol {\ beta}} + \ varepsilon _ {i}, \ quad i = 1,2, \ dotsc, n }$.

In matrix notation too

${\ displaystyle {\ begin {pmatrix} y_ {1} \\ y_ {2} \\\ vdots \\ y_ {n} \ end {pmatrix}} _ {(n \ times 1)} \ quad = \ quad { \ begin {pmatrix} 1 & x_ {11} & x_ {12} & \ cdots & x_ {1k} \\ 1 & x_ {21} & x_ {22} & \ cdots & x_ {2k} \\\ vdots & \ vdots & \ vdots & \ ddots & \ vdots \\ 1 & x_ {n1} & x_ {n2} & \ cdots & x_ {nk} \ end {pmatrix}} _ {(n \ times p)} \ quad \ cdot \ quad {\ begin {pmatrix} \ beta _ {0} \\\ beta _ {1} \\\ vdots \\\ beta _ {k} \ end {pmatrix}} _ {(p \ times 1)} \ quad + \ quad {\ begin {pmatrix} \ varepsilon _ {1} \\\ varepsilon _ {2} \\\ vdots \\\ varepsilon _ {n} \ end {pmatrix}} _ {(n \ times 1)}}$

with . In compact notation ${\ displaystyle p = k + 1}$

${\ displaystyle \ mathbf {y} = \ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}}}$.

Here represents a vector of unknown parameters (known as regression coefficients ) that must be estimated from the data. Furthermore, it is assumed that the disturbance variables are zero on average: which means that we can assume that our model is correct on average. The data matrix is assumed to have full (column) rank , that is, it applies . In particular, there is then a regular, i.e. invertible matrix. That is why one speaks here of the regular case (see heading). Furthermore, it is assumed for the covariance matrix of the vector of the disturbance variables that the following applies. The Gauss-Markow assumptions can therefore be summarized in the multiple case as ${\ displaystyle {\ boldsymbol {\ beta}}}$${\ displaystyle \ mathbb {E} [{\ boldsymbol {\ boldsymbol {\ varepsilon}}}] = \ mathbf {0}}$ ${\ displaystyle \ mathbf {X} \ in \ mathbb {R} ^ {n \ times p}}$${\ displaystyle {\ mbox {Rank}} (\ mathbf {X}) = p}$${\ displaystyle \ mathbf {X} ^ {\ top} \ mathbf {X}}$${\ displaystyle {\ mbox {Cov}} ({\ boldsymbol {\ varepsilon}}) = \ mathbb {E} \ left ({\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top} \ right) = \ sigma ^ {2} \ mathbf {I} _ {n}}$

${\ displaystyle {\ boldsymbol {\ varepsilon}} \; \ sim \; (\ mathbf {0}, \ sigma ^ {2} \ mathbf {I} _ {n})}$

where the expected value of the disturbance variables is the zero vector and the covariance matrix is ​​the expected value of the dyadic product of the disturbance variables ${\ displaystyle \ mathbf {0}}$

${\ displaystyle \ mathbf {\ Sigma} = \ mathbb {E} ({\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top}) = {\ begin {pmatrix} \ sigma ^ {2 } & 0 & \ cdots & 0 \\ 0 & \ sigma ^ {2} & \ ddots & \ vdots \\\ vdots & \ ddots & \ ddots & 0 \\ 0 & \ cdots & 0 & \ sigma ^ {2} \ end {pmatrix}} _ {(n \ times n)} = \ sigma ^ {2} \ mathbf {I} _ {n}}$ represents.

This assumption is the homoscedasticity assumption in the multiple case. The above specification of the linear model thus gives for the random vector ${\ displaystyle \ mathbf {y}}$

${\ displaystyle \ mathbf {y} \; \ sim \; (\ mathbf {X} {\ varvec {\ beta}}, \ sigma ^ {2} \ mathbf {I} _ {n})}$.

These assumptions give:

1. That the least squares estimator for the true parameter vector , which reads, is a minimally variant linear unbiased estimator for .${\ displaystyle {\ boldsymbol {\ beta}}}$${\ displaystyle \ mathbf {b} = \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {y} }$${\ displaystyle {\ boldsymbol {\ beta}}}$
2. That is the covariance matrix of the least squares estimator .${\ displaystyle {\ mbox {Cov}} (\ mathbf {b}) = \ sigma ^ {2} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1}}$
3. That the estimated variance of the disturbance variables is an unbiased estimator for the unknown variance of the disturbance variables .${\ displaystyle {\ hat {\ sigma}} ^ {2} = {\ frac {{\ hat {\ boldsymbol {\ varepsilon}}} ^ {\ top} {\ hat {\ boldsymbol {\ varepsilon}}}} {np}}}$ ${\ displaystyle \ sigma ^ {2}}$

### Minimal variant linear unbiased estimator

#### Minimal variant

The minimum variant, or “the best” estimator, is characterized by the fact that it has the “smallest” covariance matrix (with regard to the Loewner partial order ) (is thus minimally variant). An estimator that exhibits this property is therefore also called a minimally variant or efficient estimator. With the additional assumption of fairness to expectations , one also speaks of a minimally variant fairness to expectations estimator.

Each estimator from the class of linear fair-expectation estimators can be represented as

${\ displaystyle {\ overline {\ varvec {\ beta}}} = \ mathbf {A} \ mathbf {y} \;}$ (Linearity)

with the matrix . An example of an estimator in this class is the least squares estimator . ${\ displaystyle (p \ times n)}$${\ displaystyle \ mathbf {A} \ neq \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top}}$${\ displaystyle \ mathbf {b}}$

The property of faithfulness to expectations means that the estimator “ on average ” corresponds to the true parameter vector

${\ displaystyle \ mathbb {E} ({\ overline {\ boldsymbol {\ beta}}}) = {\ boldsymbol {\ beta}}}$.

Under the above conditions , the inequality then applies to all vectors : ${\ displaystyle (p \ times 1)}$${\ displaystyle \ mathbf {R} _ {1}}$

${\ displaystyle \ operatorname {Var} (\ mathbf {R} _ {1} ^ {\ top} \ mathbf {b}) \; \ leq \; \ operatorname {Var} (\ mathbf {R} _ {1} ^ {\ top} {\ overline {\ boldsymbol {\ beta}}}})}$ (Efficiency property),

where is the least squares estimator, i.e. the estimator that was determined using the least squares estimation . This efficiency property can also be rewritten as ${\ displaystyle \ mathbf {b}}$

${\ displaystyle \ mathbf {R} _ {1} ^ {\ top} \ operatorname {Cov} [\ mathbf {b}] \ mathbf {R} _ {1} \; \ leq \; \ mathbf {R} _ {1} ^ {\ top} \ operatorname {Cov} [{\ overline {\ boldsymbol {\ beta}}}] \ mathbf {R} _ {1}}$

or

${\ displaystyle \ mathbf {R} _ {1} ^ {\ top} \ left (\ operatorname {Cov} [{\ overline {\ boldsymbol {\ beta}}}] - \ operatorname {Cov} [\ mathbf {b }] \ right) \ mathbf {R} _ {1} \; \ geq \; 0}$.

This property is called positive semidefiniteness (see also covariance matrix as an efficiency criterion ). So if the above inequality is true, then it can be said that is better than . ${\ displaystyle \ mathbf {b}}$${\ displaystyle {\ overline {\ varvec {\ beta}}}}$

#### Linearity

The least squares estimator is also linear

${\ displaystyle \ mathbf {b} = \ underbrace {(\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top}} _ {: = \ mathbf {A}} \ mathbf {y} = \ mathbf {A} \ mathbf {y}}$.

The above inequality states that, according to Gauss-Markow's theorem , a best linear unbiased estimator , BLES for short ( English Best Linear Unbiased Estimator , BLUE for short ) or a minimally variant linear unbiased estimator, i.e. in the class of linear predictive estimators For estimators, it is the estimator that has the smallest variance or covariance matrix. For this property of the estimator , no information on the distribution of the disturbance variable is required. An increase in the BLES property is represented by the so-called BES property ( BES for best unbiased estimator ), which is not restricted to linear estimators. Often times the maximum likelihood estimator represents a solution that is BES. In fact, the least squares estimator is a maximum likelihood estimator for normally distributed disturbance variables, and the BES property can be proven with the Lehmann-Scheffé theorem . ${\ displaystyle \ mathbf {b}}$ ${\ displaystyle \ mathbf {b}}$${\ displaystyle \ mathbf {b}}$

## proof

Given that the true relationship is described by a linear model , the least squares estimator must be compared with all other linear estimators. In order to be able to make a comparison, the analysis is limited to the class of linear and unbiased estimators. Any estimator in this class, besides the least squares estimator , can be represented as ${\ displaystyle \ mathbf {b}}$

${\ displaystyle {\ overline {\ varvec {\ beta}}} = \ mathbf {A} \ mathbf {y} \;}$with .${\ displaystyle \ mathbf {A} \ neq \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top}}$

If so , the least squares estimator is obtained . The class of all linear estimators is thus given by ${\ displaystyle \ mathbf {A} = \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top}}$${\ displaystyle \ mathbf {b} = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {y}}$

${\ displaystyle {\ overline {\ varvec {\ beta}}} = \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {y} + \ mathbf {A} \ mathbf {y} - \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {y} = \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf { y} + \ underbrace {(\ mathbf {A} - \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top}) } _ {= \ mathbf {C}} \ mathbf {y} = \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {y} + \ mathbf {C} \ mathbf {y}}$, where the matrix is given by${\ displaystyle \ mathbf {C}}$${\ displaystyle \ mathbf {C} = \ mathbf {A} - \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top }}$

Now it is time restrictions for the make sure that is unbiased for is. The covariance matrix of must also be found. The expected value of results in ${\ displaystyle \ mathbf {C}}$${\ displaystyle {\ overline {\ varvec {\ beta}}}}$ ${\ displaystyle {\ boldsymbol {\ beta}}}$${\ displaystyle {\ overline {\ varvec {\ beta}}}}$${\ displaystyle {\ overline {\ varvec {\ beta}}}}$

{\ displaystyle {\ begin {aligned} \ mathbb {E} ({\ overline {\ varvec {\ beta}}}) & = \ mathbb {E} (\ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} (\ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}}) + \ mathbf { C} (\ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}})) \\ & = \ mathbb {E} (\ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {X} {\ boldsymbol {\ beta}} + \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} {\ mathbf {\ varepsilon}} + \ mathbf {C} \ mathbf {X} {\ mathbf {\ beta}} + \ mathbf {C} {\ boldsymbol {\ varepsilon}}) \\ & = {\ boldsymbol {\ beta}} + \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {-1} \ mathbf {X} ^ {\ top} \ underbrace {\ mathbb {E} ({\ boldsymbol {\ varepsilon}})} _ {= \ mathbf {0}} + \ mathbf {C} \ mathbf {X} {\ boldsymbol {\ beta}} + \ mathbf {C} \ underbrace {\ mathbb {E} ({\ boldsymbol {\ varepsilon}})} _ {= \ mathbf {0}} \\ & = { \ boldsymbol {\ beta}} + \ mathbf {C} \ mathbf {X} {\ boldsymbol {\ beta}} \ end {al igned}}}

I.e. is then and only fair to expectation for , if , thus holds . ${\ displaystyle {\ overline {\ varvec {\ beta}}}}$${\ displaystyle {\ boldsymbol {\ beta}}}$${\ displaystyle \ mathbf {C} \ mathbf {X} = \ mathbf {0}}$${\ displaystyle \ mathbb {E} ({\ overline {\ boldsymbol {\ beta}}}) = {\ boldsymbol {\ beta}} \ Longleftrightarrow \ mathbf {C} \ mathbf {X} = \ mathbf {0}}$

It follows for the covariance matrix of : ${\ displaystyle {\ overline {\ varvec {\ beta}}}}$

{\ displaystyle {\ begin {aligned} \ mathbf {\ Sigma} _ {\ overline {\ varvec {\ beta}}} & = \ operatorname {Cov} ({\ overline {\ varvec {\ beta}}}) = \ mathbb {E} \ left \ {\ left [{\ overline {\ boldsymbol {\ beta}}} - \ mathbb {E} ({\ overline {\ boldsymbol {\ beta}}}) \ right] \ left [ {\ overline {\ boldsymbol {\ beta}}} - \ mathbb {E} ({\ overline {\ boldsymbol {\ beta}}}) \ right] ^ {\ top} \ right \} \\ & = \ mathbb {E} \ left \ {\ left [\ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} {\ varvec { \ varepsilon}} + \ mathbf {C} {\ boldsymbol {\ varepsilon}} \ right] \ left [\ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} {\ mathbf {\ varepsilon}} + \ mathbf {C} {\ mathbf {\ varepsilon}} \ right] ^ {\ top} \ right \} \\ & = \ mathbb {E} \ left \ {\ left [\ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} {\ varvec { \ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top} \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} + \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} {\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top} \ mathbf {C} ^ {\ top} + \ mathbf {C} {\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top} \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} + \ mathbf {C} {\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top} \ mathbf {C} ^ {\ top} \ right] \ right \} \\ & = \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ sigma ^ {2} \ mathbf {I} _ {n} \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} + \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1 } \ mathbf {X} ^ {\ top} \ sigma ^ {2} \ mathbf {I} _ {n} \ mathbf {C} ^ {\ top} + \ mathbf {C} \ sigma ^ {2} \ mathbf {I} _ {n} \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} + \ mathbf {C} {\ boldsymbol {\ sigma }} ^ {2} \ mathbf {I} _ {n} \ mathbf {C} ^ {\ top} \\ & = \ sigma ^ {2} \ left [\ left (\ mathbf {X} ^ {\ top } \ mathbf {X} \ right) ^ {- 1} + \ mathbf {C} \ mathbf {C} ^ {\ top} \ right] \ end {aligned}}}

It follows

${\ displaystyle \ mathbf {\ Sigma} _ {\ overline {\ boldsymbol {\ beta}}} - \ mathbf {\ Sigma} _ {\ mathbf {b}} = \ sigma ^ {2} \ mathbf {C} \ mathbf {C} ^ {\ top}}$

This matrix will always be positive semidefinite - regardless of how it is defined - since a matrix multiplied by its own transpose is always positive semidefinite. ${\ displaystyle \ mathbf {C}}$

## Singular case, estimable functions

We now consider the so-called singular case , i. H. it applies . Then it is also not of full rank, i.e. not invertible. The least squares estimator given above does not exist. It is said that it can not be estimated or identified . ${\ displaystyle {\ mbox {Rank}} (\ mathbf {X}) ${\ displaystyle \ mathbf {X} ^ {\ top} \ mathbf {X}}$${\ displaystyle \ mathbf {b}}$${\ displaystyle {\ boldsymbol {\ beta}}}$

The singular case occurs if , or if only different regressor settings are observed, or if there are linear dependencies in the data matrix . ${\ displaystyle n ${\ displaystyle q ${\ displaystyle \ mathbf {X}}$

Be now . Then, at best, -dimensional linear forms can be estimated linearly and unbiased, where is a matrix. ${\ displaystyle {\ mbox {Rank}} (\ mathbf {X}) = m ${\ displaystyle m}$${\ displaystyle {\ varvec {\ gamma}} = \ mathbf {A} {\ varvec {\ beta}}}$${\ displaystyle \ mathbf {A}}$${\ displaystyle (m \ times p)}$

### Estimability criterion

${\ displaystyle {\ varvec {\ gamma}} = \ mathbf {A} {\ varvec {\ beta}}}$with a matrix is estimable if and only if there is a matrix such that it holds, i. H. if each row vector of is a linear combination of the row vectors of . See e.g. B. ${\ displaystyle (s \ times p)}$${\ displaystyle \ mathbf {A}, s \ leq m}$${\ displaystyle (s \ times n)}$${\ displaystyle \ mathbf {C}}$${\ displaystyle \ mathbf {C} \ mathbf {X} = \ mathbf {A}}$${\ displaystyle \ mathbf {A}}$${\ displaystyle \ mathbf {X}}$

The estimability criterion can be formulated much more elegantly with pseudo inverse . The pseudoinverse of is called if applies. ${\ displaystyle \ mathbf {B} ^ {-}}$${\ displaystyle \ mathbf {B}}$${\ displaystyle \ mathbf {B} \ mathbf {B} ^ {-} \ mathbf {B} = \ mathbf {B}}$

${\ displaystyle {\ varvec {\ gamma}} = \ mathbf {A} {\ varvec {\ beta}}}$with a matrix is estimable if and only if . It is an arbitrary pseudo inverse of . See e.g. B. ${\ displaystyle (s \ times p)}$${\ displaystyle \ mathbf {A}, s \ leq m}$${\ displaystyle \ mathbf {A} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {-} \ mathbf {X} ^ {\ top} \ mathbf {X} = \ mathbf {A} }$${\ displaystyle (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {-}}$${\ displaystyle \ mathbf {X} ^ {\ top} \ mathbf {X}}$

#### example

For the quadratic regression equation , observations were made at . This results in ${\ displaystyle \ y = \ beta _ {0} + \ beta _ {1} x + \ beta _ {2} x ^ {2} + \ varepsilon \}$${\ displaystyle \ n = 4 \}$${\ displaystyle \ x_ {1} = 0, \ x_ {2} = 0, \ x_ {3} = 1, \ x_ {4} = 1 \}$

${\ displaystyle \ mathbf {X} = {\ begin {pmatrix} 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \ end {pmatrix}} \; \ \ operatorname {rank} (\ mathbf {X}) = 2}$.

Then

${\ displaystyle {\ boldsymbol {\ gamma}} = {\ begin {pmatrix} \ beta _ {0} \\\ beta _ {1} + \ beta _ {2} \ end {pmatrix}} = \ mathbf {A } {\ varvec {\ beta}} \; \ \ mathbf {A} = {\ begin {pmatrix} 1 & 0 & 0 \\ 0 & 1 & 1 \ end {pmatrix}} \; \ {\ varvec {\ beta}} = {\ begin { pmatrix} \ beta _ {0} \\\ beta _ {1} \\\ beta _ {2} \ end {pmatrix}}}$

estimable because the row vectors of are linear combinations of the row vectors of . For example, the second line vector of is equal to the difference between the third and first line vector of . ${\ displaystyle \ mathbf {A}}$${\ displaystyle \ mathbf {X}}$${\ displaystyle \ mathbf {A}}$${\ displaystyle \ mathbf {X}}$

On the other hand is

${\ displaystyle {\ boldsymbol {\ gamma}} = {\ begin {pmatrix} \ beta _ {0} + \ beta _ {1} \\\ beta _ {2} \ end {pmatrix}} = \ mathbf {A } {\ boldsymbol {\ beta}} \; \ \ mathbf {A} = {\ begin {pmatrix} 1 & 1 & 0 \\ 0 & 0 & 1 \ end {pmatrix}}}$

cannot be estimated because none of the line vectors of can be represented as a linear combination of the line vectors of . ${\ displaystyle \ mathbf {A}}$${\ displaystyle \ mathbf {X}}$

### Gauss-Markow theorem in the singular case

Be appreciable. Then ${\ displaystyle {\ varvec {\ gamma}} = \ mathbf {A} {\ varvec {\ beta}}}$

${\ displaystyle {\ varvec {g}} = \ mathbf {A} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {-} \ mathbf {X} ^ {\ top} {\ varvec {y}}}$

best linear unbiased estimator for , where an arbitrary pseudo inverse is to. ${\ displaystyle {\ boldsymbol {\ gamma}}}$${\ displaystyle (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {-}}$${\ displaystyle \ mathbf {X} ^ {\ top} \ mathbf {X}}$

The estimator can also be expressed without a pseudo inverse: ${\ displaystyle {\ boldsymbol {g}}}$

${\ displaystyle {\ boldsymbol {g}} = \ mathbf {A} {\ boldsymbol {b}}}$

Here is an arbitrary solution of the normal equation system . ${\ displaystyle {\ boldsymbol {b}}}$ ${\ displaystyle \ mathbf {X} ^ {\ top} \ mathbf {X} {\ varvec {b}} = \ mathbf {X} ^ {\ top} {\ varvec {y}}}$

## Generalized Least Squares Estimation

The generalized least squares estimate (VKQ estimate) developed by Aitken extends Gauss-Markov's theorem to the case where the vector of the disturbance variables has a non-scalar covariance matrix , i.e. H. it applies . The VKQ estimator is also BLUE. ${\ displaystyle \ mathbf {\ Sigma} \ neq \ sigma ^ {2} \ mathbf {I} _ {n}}$

## literature

• George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl, TC Lee. Introduction to the Theory and Practice of Econometrics. John Wiley & Sons, New York, Chichester, Brisbane, Toronto, Singapore, ISBN 978-0471624141 , second edition 1988

## Individual evidence

1. Ulrich Kockelkorn: Linear statistical methods. De Gruyter 2018, 978-3-486-78782-5, p. 329 (accessed via De Gruyter Online).
2. Ludwig von Auer : Econometrics. An introduction. Springer, ISBN 978-3-642-40209-8 , 6th through. u. updated edition 2013, p. 49.
3. Jeffrey Marc Wooldridge: Introductory econometrics: A modern approach. 5th edition. Nelson Education 2015, p. 24.
4. George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl , TC Lee. Introduction to the Theory and Practice of Econometrics. John Wiley & Sons, New York, Chichester, Brisbane, Toronto, Singapore, ISBN 978-0471624141 , second edition 1988, p. 202.
5. George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl, TC Lee. Introduction to the Theory and Practice of Econometrics. John Wiley & Sons, New York, Chichester, Brisbane, Toronto, Singapore, ISBN 978-0471624141 , second edition 1988, p. 203.
6. International Statistical Institute: Glossary of statistical terms.
7. George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl, TC Lee. Introduction to the Theory and Practice of Econometrics. John Wiley & Sons, New York, Chichester, Brisbane, Toronto, Singapore, ISBN 978-0471624141 , second edition 1988, p. 205.
8. CR Rao , H. Toutenburg, Shalabh, C. Heumann: Linear Models and Generalizations , Springer-Verlag, 2008 (third edition)
9. ^ F. Pukelsheim : Optimal Design of Experiments , Wiley, New York 1993
10. AC Aitken: On Least Squares and Linear Combinations of Observations . In: Proceedings of the Royal Society of Edinburgh . 55, 1935, pp. 42-48.
11. David S. Huang: Regression and Econometric Methods . John Wiley & Sons, New York 1970, ISBN 0-471-41754-8 , pp. 127-147.