Generalized Least Squares Estimation

In statistics , the generalized least squares estimation (short VKQ estimation ), generalized method of least squares , short VMKQ ( English generalized least squares , short GLS ) is a procedure to calculate unknown true regression parameters in a linear regression equation under problematic conditions (availability of autocorrelation and heteroscedasticity ) to estimate efficiently. The VKQ method can be used to perform a linear regression in a targeted manner on a model with a general disturbance variable structure. A generalized disturbance variable structure exists if a certain degree of correlation between the residuals and a non-constant disturbance variable variance are permissible. In these cases the ordinary least squares estimation and the weighted least squares estimation can be statistically inefficient or even lead to incorrect results of the statistical inference . For this reason, in order to obtain valid results of the statistical inference, a transformation of the classical linear model is carried out, through which the required assumptions for the statistical inference are still met. In contrast to the common least squares method, the VKQ method minimizes a weighted residual sum of squares. It was developed by Alexander Aitken and published in 1934 and is therefore also called the Aitken estimate .

history

Carl Friedrich Gauss

Alexander Aitken

On New Year's Day 1801, the Italian astronomer Giuseppe Piazzi discovered the dwarf planet Ceres . He was able to follow the path for 40 days, then Ceres disappeared behind the sun. In the course of the year, many scientists tried unsuccessfully to calculate the orbit on the basis of Piazzi's observations - assuming a circular orbit, because at that time the orbit elements could only be mathematically determined from observed heavenly positions for such . The 24-year-old Gauss, on the other hand, could also calculate elliptical orbits from three individual observations. But since there were significantly more orbital points, he used his least squares method to increase accuracy. When Franz Xaver von Zach and Heinrich Wilhelm Olbers found the minor planet in December 1801 exactly at the location predicted by Gauss, it was not only a great success for Gauss' method: Piazzi's reputation, which suffered greatly due to its orbit points that did not want to fit into a circular orbit had was also restored.

Gauss laid the cornerstone of the generalized least squares method as early as 1795 at the age of 18. The basis was an idea by Pierre-Simon Laplace to add up the amounts of errors so that the errors add up to zero. Instead, Gauss took the error squares and was able to omit the zero-sum requirement for the errors. Independently of this, the French Adrien-Marie Legendre developed the same method for the first time in 1805 at the end of a small work on the calculation of comet orbits and published a second treatise on it in 1810. The name Méthode des moindres carrés (least squares method) comes from him. .

In 1809, Gauss published his method, including the normal equations and the Gaussian elimination method, in the second volume of his celestial mechanical work Theoria motus corporum coelestium in sectionibus conicis solem ambientium (theory of the movement of the heavenly bodies, which revolve around the sun in conic sections) . He mentioned that he had discovered and used it before Legendre, which led to a priority dispute between the two. The least squares method quickly became the standard method for handling astronomical or geodetic data sets.

Gauss then used the method intensively in his survey of the Kingdom of Hanover by triangulation . In 1821 and 1823 the two-part work and in 1826 a supplement to Theoria combinationis observationum erroribus minimis obnoxiae (theory of the combination of observations subject to the smallest errors) appeared , in which Gauss was able to provide a reason why his method was so successful in comparison to the others: Least Squares is optimal in many ways, better than other methods. The exact statement is known as the Gauss-Markow Theorem , as Gauss's work received little attention and was finally rediscovered and popularized in the 20th century by Andrei Andrejewitsch Markow . The Theoria Combinationis also contains significant advances in efficiently solving the linear equation systems that occur , such as the Gauß-Seidel method and the LR decomposition .

Finally, in 1935 , Alexander Aitken published a paper in which he introduced the concept of generalized least squares and the widely used generalized least squares estimator. He also proved there that this estimator introduced by him Best Linear unbiasedness estimator , short BLES ( English Best Linear Unbiased Estimator , in short: BLUE ), d. H. in the class of linear unbiased estimators is the one with the smallest covariance matrix. Aitken also applied statistical methods to the theory of linear models and developed the notation that is now considered the standard vector-matrix notation. Aitken and one of his students named Harold Silverstone published a paper in which they introduced the lower limit of the variance of an estimator, also known as the Cramér-Rao inequality . In contrast to its predecessors, he found an efficient way to solve the problem of non-constant variance and correlated disturbance terms. The generalized least squares estimation is based on the Gauss-Markov theory and still plays a large role in theoretical and practical aspects of statistical inference in generalized linear (multiple) regression models.

Starting position

Since many variables of interest do not depend on just one independent variable, let's consider a dependent variable to be explained by several independent variables. For example, the total production of an economy depends on its capital input, labor input and its area. Such a multiple dependency comes much closer to reality and one abandons the assumption of simple linear regression , in which the variable of interest only depends on one variable. In order to model such a multiple dependency, we consider a typical multiple linear regression model with given data for statistical units as a starting point . It should be noted that in addition to the dimension of the independent variables, we also integrate a time dimension, which results in a linear system of equations which can also be represented in a matrices. The relationship between the dependent variable and the independent variables can be shown as follows ${\ displaystyle \ {y_ {t}, x_ {tk} \} _ {t = 1, \ dots, T, k = 1, \ dots, K}}$ ${\ displaystyle T}$

{\ displaystyle y_ {t} = x_ {t1} \ beta _ {1} + x_ {t2} \ beta _ {2} + \ ldots + x_ {tK} \ beta _ {K} + \ varepsilon _ {t} = \ mathbf {x} _ {t} ^ {\ top} {\ boldsymbol {\ beta}} + \ varepsilon _ {t}, \ quad t = 1,2, \ dotsc, T}

.

In vector matrix form too

{\ displaystyle {\ begin {pmatrix} y_ {1} \\ y_ {2} \\\ vdots \\ y_ {T} \ end {pmatrix}} _ {(T \ times 1)} \; = \; { \ begin {pmatrix} x_ {11} & x_ {12} & \ cdots & x_ {1k} & \ cdots & x_ {1K} \\ x_ {21} & x_ {22} & \ cdots & x_ {2k} & \ cdots & x_ {2K } \\\ vdots & \ vdots & \ ddots & \ vdots & \ ddots & \ vdots \\ x_ {T1} & x_ {T2} & \ cdots & x_ {Tk} & \ cdots & x_ {TK} \ end {pmatrix}} _ {(T \ times K)} \; \ cdot \; {\ begin {pmatrix} \ beta _ {1} \\\ beta _ {2} \\\ vdots \\\ beta _ {K} \ end { pmatrix}} _ {(K \ times 1)} \; + \; {\ begin {pmatrix} \ varepsilon _ {1} \\\ varepsilon _ {2} \\\ vdots \\\ varepsilon _ {T} \ end {pmatrix}} _ {(T \ times 1)}}

or in compact notation

{\ displaystyle \ mathbf {y} = \ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}}}

Here represents a vector of unknown regression parameters that must be estimated using the data. Furthermore, it is assumed that the disturbance variables are zero on average: which means that we can assume that our model is correct on average. Usually the requirement of such a model is that the Gauss-Markov assumptions apply. However, this should not be the case here, since problematic conditions are not opportunistically assumed. For this reason, a model is considered in which a general disturbance variable structure is permissible. ${\ displaystyle {\ boldsymbol {\ beta}}}$ ${\ displaystyle \ mathbb {E} ({\ boldsymbol {\ boldsymbol {\ varepsilon}}}) = \ mathbf {0}}$

The generalized linear regression model (VLR)

Furthermore, it is assumed for the model that the expected value of is linear in . The matrix represents the covariance matrix of the disturbance variables, whereby any known real nonsingular positive definite matrix is assumed and represents a still unknown scalar. The peculiarity in contrast to the usual least squares method is that heteroscedasticity (i.e. that the variance of the perturbation terms is not constant due to the explanatory variables) and autocorrelation (i.e. a correlation of the perturbation terms) are allowed: ${\ displaystyle \ mathbf {y}}$ ${\ displaystyle {\ boldsymbol {\ beta}}}$ ${\ displaystyle \ mathbf {\ Phi}}$ ${\ displaystyle \ mathbf {\ Psi}}$ ${\ displaystyle T \ times T}$ ${\ displaystyle \ sigma ^ {2}}$

The variance of the disturbance variables could be heteroscedastic:

${\ displaystyle \ operatorname {Cov} [{\ boldsymbol {\ varepsilon}}] = \ mathbb {E} ({\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top}) = \ sigma ^ {2} \ mathbf {\ Psi} = \ mathbf {\ Phi} \ neq \ sigma ^ {2} \ mathbf {I} _ {T}}$

If the variance of the residuals (and thus the variance of the explained variables themselves) is not different for all expressions of the regressors, then homoscedasticity ((residual) variance homogeneity) is present. If this assumption is violated, one speaks of heteroscedasticity.
The disturbance variables could be random variables that are not independent of one another, i. H. be autocorrelated:

${\ displaystyle \ forall \; t \ neq s: \ mathbb {E} (\ varepsilon _ {t} \ varepsilon _ {s}) \ neq 0}$ .

I.e. the assumption of the absence of autocorrelation could be violated.

There are different definitions for the matrix depending on the context. In the presence of heteroscedasticity, it takes the following form ${\ displaystyle {\ boldsymbol {\ Phi}}}$

{\ displaystyle {\ boldsymbol {\ Phi}} = \ sigma ^ {2} {\ boldsymbol {\ Psi}} = \ sigma ^ {2} {\ begin {pmatrix} w_ {11} & 0 & \ cdots & 0 \\ 0 & w_ {22} & \ cdots & 0 \\\ vdots & \ vdots & \ ddots & \ vdots \\ 0 & 0 & \ cdots & w_ {TT} \ end {pmatrix}} = {\ begin {pmatrix} \ sigma _ {11} ^ { 2} & 0 & \ cdots & 0 \\ 0 & \ sigma _ {22} ^ {2} & \ cdots & 0 \\\ vdots & \ vdots & \ ddots & \ vdots \\ 0 & 0 & \ cdots & \ sigma _ {TT} ^ { 2} \ end {pmatrix}}}

and if there is autocorrelation, the shape

{\ displaystyle {\ boldsymbol {\ Phi}} = \ sigma ^ {2} {\ boldsymbol {\ Psi}} = \ sigma ^ {2} {\ begin {pmatrix} 1 & a_ {1} & \ cdots & a_ {T- 1} \\ a_ {1} & 1 & \ cdots & a_ {T-2} \\\ vdots & \ vdots & \ ddots & \ vdots \\ a_ {T-1} & a_ {T-2} & \ cdots & 1 \ end {pmatrix}}}

.

A model of the form with , where is called generalized (multiple) linear regression model (with fixed regressors), VLR for short . It should be noted that this can always be drawn from the matrix as a constant factor. The scalar represents any constant proportionality factor. Sometimes it is useful - especially in the case of heteroscedasticity - to assume that . The assumption is equivalent to saying that the covariance matrix is completely known. If one writes where is known and is unknown then one is saying that it is not necessary to assume that the covariance matrix must be fully known; it is sufficient to assume that is known (the matrix obtained after extracting any unknown scaling parameter ). The generalized linear regression model with heteroscedastic disturbance variable covariance matrix can be reduced to the usual multiple regression model with homoscedastic disturbance variable covariance matrix by a suitable choice of . ${\ displaystyle \ mathbf {y} = \ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}}}$ ${\ displaystyle {\ boldsymbol {\ varepsilon}} \ sim (\ mathbf {0}, \ sigma ^ {2} {\ boldsymbol {\ Psi}})}$ ${\ displaystyle \ operatorname {Rank} ({\ varvec {\ Psi}}) = T}$ ${\ displaystyle \ sigma ^ {2}}$ ${\ displaystyle \ sigma ^ {2}}$ ${\ displaystyle \ sigma ^ {2} = 1}$ ${\ displaystyle \ mathbf {\ Psi}}$ ${\ displaystyle \ mathbf {\ Phi} = \ sigma ^ {2} \ mathbf {\ Psi}}$ ${\ displaystyle \ mathbf {\ Psi}}$ ${\ displaystyle \ sigma ^ {2}}$ ${\ displaystyle \ mathbf {\ Phi}}$ ${\ displaystyle \ mathbf {\ Psi}}$ ${\ displaystyle \ sigma ^ {2}}$ ${\ displaystyle \ mathbf {\ Phi}: = \ sigma ^ {2} \ mathbf {\ Psi}}$ ${\ displaystyle \ mathbf {\ Psi}}$ ${\ displaystyle \ mathbf {\ Sigma}: = \ sigma ^ {2} \ mathbf {I}}$

The Effects of Using the Common Least Squares (KQ) Method

Effects on properties of the point estimators

As a first naive approach, we assume that the least squares estimator , which is obtained by minimizing the sum of the squares of residuals , would be a useful candidate for the point estimator for a model with a general disturbance variable structure; then the residual vector is given by . However, it can be seen that the point estimator for a model with a general disturbance variable structure is true to expectations , but no longer efficient. In the naive approach, the covariance matrix is no longer the same , but is given by ${\ displaystyle \ mathbf {b}}$ ${\ displaystyle {\ boldsymbol {\ beta}}}$ ${\ displaystyle {\ boldsymbol {\ varepsilon}}}$ ${\ displaystyle \ mathbf {y} - \ mathbf {X} \ mathbf {b}}$ ${\ displaystyle {\ boldsymbol {\ beta}}}$ ${\ displaystyle \ sigma ^ {2} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1}}$

{\ displaystyle {\ begin {aligned} \ operatorname {Cov} ({\ hat {\ boldsymbol {\ beta}}}) & = \ mathbb {E} \ left ((\ mathbf {b} - \ mathbb {E}) (\ mathbf {b})) (\ mathbf {b} - \ mathbb {E} (\ mathbf {b})) ^ {\ top} \ right) = \ mathbb {E} \ left ((\ mathbf {X } ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} {\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top} \ mathbf {X} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ right) = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1 } \ mathbf {X} ^ {\ top} \ mathbb {E} (\ mathbf {\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top}) \ mathbf {\ mathbf {X}} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \\ & = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} {\ boldsymbol {\ Phi}} \ mathbf {X} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} (\ sigma ^ {2} {\ varvec {\ Psi}}) \ mathbf {X} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} = \ sigma ^ {2} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {\ Psi} \ mathbf {X} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \\ & \ neq \ sigma ^ {2} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1}. \ end {aligned}}}

This results mainly from the fact that a non-constant disturbance variable variance (heteroscedasticity) is permissible. If one assumes that there is no heteroscedasticity ( ), then again the covariance matrix of the usual least squares method results ( ). ${\ displaystyle {\ boldsymbol {\ Psi}} = \ mathbf {I}}$ ${\ displaystyle \ operatorname {Cov} (\ mathbf {b}) = \ sigma ^ {2} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1}}$

If a non-scalar unit covariance matrix is present, it can be shown that the property of the least-squares estimator is still true to expectations, but is no longer fulfilled for the variance of the disturbance variables. It applies to the variance of the interference that is that they have no expectation of loyalty estimate of the variance of the disturbances is

{\ displaystyle \ mathbb {E} ({\ hat {\ sigma}} ^ {2}) = \ sigma ^ {2} {\ frac {\ operatorname {trace} ({\ boldsymbol {\ Psi}} (\ mathbf {I} - (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top}))} {TK}} \ neq \ sigma ^ {2 }}

.

It is therefore a biased estimate of the true variance . ${\ displaystyle \ sigma ^ {2}}$

Effects on hypothesis testing

An important effect arises for the interval estimation and procedures of the hypothesis tests. The results of the statistical inference are no longer valid, since the results presented above for the covariance matrix of imply that we are wrongly using to estimate. Since this is a biased estimator, it leads to invalid results of the statistical inference. Another consequence for the inference is that the required test statistic for general linear hypotheses is no longer F -distributed . For this reason, the interval estimation should be based on the generalized least squares estimator or robust standard errors à la Eicker-Huber-White should be used. ${\ displaystyle \ mathbf {b}}$ ${\ displaystyle {\ hat {\ sigma}} ^ {2} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1}}$ ${\ displaystyle \ sigma ^ {2} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} {\ varvec {\ Psi}} \ mathbf {X} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1}}$

Obtaining the generalized least squares estimator (VKQ)

From the first naive approach above, it becomes clear that the least squares method is not effective for a general disturbance variable structure, since it leads to inefficiencies. Therefore, there is a need to overcome these inefficiencies using the generalized least squares method. The generalized least squares method estimates by minimizing the squared Mahalanobis distance of the residual vector: ${\ displaystyle {\ boldsymbol {\ beta}}}$

{\ displaystyle {\ boldsymbol {\ hat {\ beta}}} = {\ underset {\ mathbf {b}} {\ rm {arg \, min}}} \, (\ mathbf {y} - \ mathbf {X } \ mathbf {b}) ^ {\ top} \, \ mathbf {\ Psi} ^ {- 1} (\ mathbf {y} - \ mathbf {X} \ mathbf {b})}

.

Since the expression is a square form in , the result of minimizing is: ${\ displaystyle \ mathbf {b}}$

{\ displaystyle {\ hat {\ boldsymbol {\ beta}}} _ {\ text {VKQ}} = \ left (\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {y}}

.

The estimator is a generalized least squares estimator , short VKQ estimator or Aitken estimator ( English generalized least squares estimator , short GLSE ). The covariance matrix of the generalized least squares estimator is given by: ${\ displaystyle {\ hat {\ boldsymbol {\ beta}}} _ {\ text {VKQ}}}$

{\ displaystyle \ mathbf {\ Sigma} _ {{\ hat {\ varvec {\ beta}}} _ {\ text {VKQ}}} = \ sigma ^ {2} (\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X}) ^ {- 1}}

.

By the minimization problem given point estimator is Best Linear unbiasedness estimator (Best Linear Unbiased Estimator) . Another approach to get the VKQ estimator is by transforming the multiple linear model. ${\ displaystyle {\ boldsymbol {\ beta}}}$

Transformation of the multiple linear model

The VKQ method is equivalent to applying a linear transformation to the ordinary least squares ( english ordinary least squares , shortly OLS ). The transformation factor can be obtained by the Cholesky decomposition . Subsequently, both sides of the model with multiplied. The generalized linear model can be on the transformation , and in a classical linear model transform ${\ displaystyle \ mathbf {\ Psi} = {\ boldsymbol {P ^ {- 1}}} {\ boldsymbol {P ^ {- 1}}} ^ {\ top}}$ ${\ displaystyle \ mathbf {y} = \ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}}}$ ${\ displaystyle {\ boldsymbol {P}}}$ ${\ displaystyle \ mathbf {y} ^ {*} = {\ boldsymbol {P}} \ mathbf {y}}$ ${\ displaystyle \ mathbf {X} ^ {*} = {\ boldsymbol {P}} \ mathbf {X}}$ ${\ displaystyle {\ boldsymbol {\ varepsilon}} ^ {*} = {\ boldsymbol {P}} {\ boldsymbol {\ varepsilon}}}$

{\ displaystyle \ mathbf {y} ^ {*} = \ mathbf {X} ^ {*} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}} ^ {*}}

.

Properties of the transformed disturbance variables

Furthermore, the question arises what the transformed disturbance variables result on average. In this model it also applies to the transformed disturbance variables that they result in zero on average, da ${\ displaystyle {\ boldsymbol {\ varepsilon}} ^ {*}}$

{\ displaystyle \ mathbb {E} ({\ boldsymbol {\ varepsilon}} ^ {*}) = \ mathbb {E} ({\ boldsymbol {P}} {\ boldsymbol {\ varepsilon}}) = {\ boldsymbol { P}} \ mathbb {E} ({\ boldsymbol {\ varepsilon}}) = \ mathbf {0}}

.

The property ensures that, on average, one estimates the true model and not a distorted shape. The following applies for the covariance matrix of the transformed disturbance variables

{\ displaystyle \ operatorname {Cov} ({\ boldsymbol {\ varepsilon}} ^ {*}) = \ operatorname {Cov} ({\ boldsymbol {P}} {\ boldsymbol {\ varepsilon}}) = {\ boldsymbol { P}} \ operatorname {Cov} ({\ varvec {\ varepsilon}}) {\ varvec {P}} ^ {\ top} = \ sigma ^ {2} {\ varvec {P}} \ mathbf {\ Psi} {\ boldsymbol {P}} ^ {\ top}}

.

In order for the homoscedasticity assumption to be fulfilled, it is determined that where represents the identity matrix (because a positively definite matrix is always a matrix with the property ). So the homoscedasticity assumption and also all other Gauss-Markov assumptions are fulfilled for the transformed model with this definition . By following ${\ displaystyle {\ varvec {P}} \ mathbf {\ Psi} {\ varvec {P}} ^ {\ top}}$ ${\ displaystyle {\ varvec {P}} \ mathbf {\ Psi} {\ varvec {P}} ^ {\ top} = \ mathbf {I}}$ ${\ displaystyle \ mathbf {I}}$ ${\ displaystyle \ mathbf {\ Psi}}$ ${\ displaystyle {\ varvec {P}} \ mathbf {\ Psi} {\ varvec {P}} ^ {\ top} = \ mathbf {I}}$ ${\ displaystyle \ operatorname {Cov} [{\ boldsymbol {\ varepsilon}} ^ {*}] = \ sigma ^ {2} \ mathbf {I}}$ ${\ displaystyle {\ varvec {P}} \ mathbf {\ Psi} {\ varvec {P}} ^ {\ top} = \ mathbf {I}}$

{\ displaystyle \ mathbf {\ Psi} = ({\ varvec {P}} ^ {- 1}) (({\ varvec {P}} ^ {\ top}) ^ {- 1}) \ Leftrightarrow \ mathbf { \ Psi} ^ {- 1} = {\ varvec {P}} ^ {\ top} {\ varvec {P}}}

.

This result will be required later for the calculation of the VKQ estimator. Since the transformed model satisfies the Gauss-Markow assumptions , the least squares estimator of this model must be given by

{\ displaystyle {\ hat {\ varvec {\ beta}}} = (({\ varvec {P}} \ mathbf {X}) ^ {\ top} {\ varvec {P}} \ mathbf {X}) ^ {-1} ({\ varvec {P}} \ mathbf {X}) ^ {\ top} ({\ varvec {P}} \ mathbf {y}) = (\ mathbf {X ^ {*}} ^ { \ top} \ mathbf {X} ^ {*}) ^ {- 1} \ mathbf {X ^ {*}} ^ {\ top} \ mathbf {y} ^ {*}}

and be Best Linear Unexpected Estimator ( BLES ). Expressed differently

{\ displaystyle {\ hat {\ varvec {\ beta}}} = (\ mathbf {X} ^ {\ top} {\ varvec {P}} ^ {\ top} {\ varvec {P}} \ mathbf {X }) ^ {- 1} \ mathbf {X} ^ {\ top} {\ varvec {P}} ^ {\ top} {\ varvec {P}} \ mathbf {y}}

.

With the help of the above result for , the VKQ estimator also finally results from this approach ${\ displaystyle \ mathbf {\ Psi} ^ {- 1}}$

{\ displaystyle {\ hat {\ boldsymbol {\ beta}}} _ {\ text {VKQ}} = \ left (\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {y}}

.

It can be shown that multiplying the disturbance variable covariance matrix by a scalar does not change the value of the VKQ estimator:

{\ displaystyle {\ hat {\ boldsymbol {\ beta}}} _ {\ text {VKQ}} = \ left (\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {y} = \ left (\ mathbf {X} ^ {\ top } \ mathbf {\ Phi} ^ {- 1} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {\ Phi} ^ {- 1} \ mathbf {y }}

.

applies.

properties

Disturbance covariance matrix

The covariance matrix of the disturbance variables corresponds to the generalized least-squares estimation

{\ displaystyle {\ hat {\ sigma}} _ {\ text {VKQ}} ^ {2} = {\ frac {(\ mathbf {y} - \ mathbf {X} {\ hat {\ varvec {\ beta} }} _ {\ text {VKQ}}) ^ {\ top} \ mathbf {\ Psi} ^ {- 1} (\ mathbf {y} - \ mathbf {X} {\ hat {\ boldsymbol {\ beta}} } _ {\ text {VKQ}})} {TK}}}

.

Maximum Likelihood Estimation (MLS)

In the case of a non-scalar covariance matrix, as used in the generalized least squares method, the joint probability density from a maximum likelihood estimate of a classical linear model of normal regression can be written as:

{\ displaystyle \ prod _ {t = 1} ^ {T} f_ {t} (y_ {t} \ mid \ mathbf {x} _ {t}, {\ boldsymbol {\ beta}}, \ sigma ^ {2 }) = f (\ mathbf {y} \ mid \ mathbf {X}, {\ boldsymbol {\ beta}}, \ sigma ^ {2}) = (2 \ pi \ sigma ^ {2}) ^ {- { \ frac {T} {2}}} | \ mathbf {\ Psi} | ^ {- {\ frac {1} {2}}} \ operatorname {exp} \ left \ {- {\ frac {\ left (\ mathbf {y} - \ mathbf {X} {\ varvec {\ beta}} \ right) ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ left (\ mathbf {y} - \ mathbf {X } {\ boldsymbol {\ beta}} \ right)} {2 \ sigma ^ {2}}} \ right \}}

,

where is the determinant of . ${\ displaystyle | \ mathbf {\ Psi} |}$ ${\ displaystyle \ mathbf {\ Psi}}$

Expectancy

The VKQ estimator is unbiased , i. H. he meets the true parameter vector in the middle, as its expected value equal to the true value corresponds to

{\ displaystyle {\ begin {aligned} \ mathbb {E} ({\ hat {\ varvec {\ beta}}} _ {\ text {VKQ}}) & = \ mathbb {E} ((\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {y}) \\ & = \ mathbb {E} ((\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X}) ^ {- 1} \ mathbf { X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} (\ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}})) \\ & = {\ boldsymbol {\ beta}} + \ mathbb {E} ((\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} {\ boldsymbol {\ varepsilon}}) \\ & = {\ boldsymbol {\ beta}}. \ end {aligned}}}

This implies that there is no distortion . Thus the distribution of the VKQ estimator is given by

{\ displaystyle {\ hat {\ varvec {\ beta}}} _ {\ text {VKQ}} \ sim {\ mathcal {N}} ({\ varvec {\ beta}}, \ sigma ^ {2} (\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X}) ^ {- 1})}

.

Best Linear Unexpected Estimator (BLES)

It can be shown that the VKQ estimator is a best linear unbiased estimator . One estimator is “better” than another if it has a smaller variance, since the variance is a measure of the uncertainty. Thus the best estimator is characterized by the fact that it has a minimal variance and thus the lowest uncertainty. The following applies to all other linear unbiased estimators ${\ displaystyle {\ hat {\ beta}} _ {j}}$

{\ displaystyle \ operatorname {Var} ({{\ hat {\ beta}} _ {\ text {VKQ}}} _ {i}) \ leq \ operatorname {Var} ({\ hat {\ beta}} _ { j})}

.

Since the VKQ estimator is BLES, this means that it has to be at least as good as the ordinary SQM estimator. The efficiency of this approach can be seen because of the difference

{\ displaystyle D = \ operatorname {Cov} (\ mathbf {b}) - \ operatorname {Cov} ({\ hat {\ boldsymbol {\ beta}}}) = \ sigma ^ {2} ((\ mathbf {X } ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {\ Psi} \ mathbf {X} (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} - (\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X}) ^ {- 1}) = \ sigma ^ { 2} \ mathbf {A} \ mathbf {\ Psi} \ mathbf {A} ^ {\ top}}

is positive semidefinite , which means that the covariance matrix of the KQ approach (if heteroscedasticity is present ( )) overestimates the variation and is therefore “larger” than the covariance matrix obtained by the generalized least squares estimate (see also covariance matrix ). The KQ estimator corresponds to the VKQ estimator if the difference is equal to the zero matrix , i.e. if: ${\ displaystyle \ mathbf {\ Phi} = \ sigma ^ {2} \ mathbf {\ Psi}}$ ${\ displaystyle D}$

{\ displaystyle A: = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} - (\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} = \ mathbf {0}}

.

Asymptotic properties

An asymptotic property is that the product sum matrix averaged over summands converges in probability to a positively definite, finite, nonsingular matrix : ${\ displaystyle T}$ ${\ displaystyle {\ boldsymbol {V}}}$

{\ displaystyle \ operatorname {plim} \ left ({\ frac {\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X}} {T}} \ right) = {\ boldsymbol {V}}}

.

From this property follows the consistency of the VKQ estimator and the variance of the VKQ estimator and the property that the estimator converges in distribution to a normal distribution. The last property is important for statistical inference.

consistency

The VKQ estimator is true to expectations under the previous assumptions , whereby the sample size has no influence on the unbiasedness ( weak law of large numbers ). An estimator is consistent for the unknown parameter if and only if it converges to the true value in probability . The property of consistency therefore includes the behavior of the estimator when the number of observations increases. ${\ displaystyle \ mathbb {E} ({\ hat {\ varvec {\ beta}}} _ {\ text {VKQ}}) = {\ varvec {\ beta}}}$ ${\ displaystyle T}$

For the sequence , given the above-mentioned asymptotic result, it is true that it converges in probability to the true parameter vector ${\ displaystyle ({{\ hat {\ boldsymbol {\ beta}}} _ {\ text {VKQ}}} _ {t}) _ {t \ in \ mathbb {N}}}$

{\ displaystyle \ forall \ epsilon> 0 \ colon \ lim _ {t \ to \ infty} \ mathbb {P} (| {{\ hat {\ boldsymbol {\ beta}}}} _ {\ text {VKQ}}} _ {t} - {\ varvec {\ beta}} | \ geq \ epsilon) = {\ varvec {0}} \ Leftrightarrow {\ hat {\ varvec {\ beta}}} _ {\ text {VKQ}} \ ; {\ stackrel {p} {\ longrightarrow}} \; \ mathbf {\ boldsymbol {\ beta}}}

,

or to put it simply:

{\ displaystyle \ operatorname {plim} ({\ hat {\ varvec {\ beta}}} _ {\ text {VKQ}}) = {\ varvec {\ beta}}}

The VKQ estimator is consistent for . The property says that as the sample size increases, the probability that the estimator deviates from the true parameter decreases. The same applies to the variance of the VKQ estimator that it is consistent for : ${\ displaystyle {\ boldsymbol {\ beta}}}$ ${\ displaystyle {\ hat {\ boldsymbol {\ beta}}} _ {\ text {VKQ}}}$ ${\ displaystyle {\ boldsymbol {\ beta}}}$ ${\ displaystyle \ sigma ^ {2}}$

{\ displaystyle \ operatorname {plim} ({\ hat {\ sigma}} _ {\ text {VKQ}} ^ {2}) = \ sigma ^ {2}}

.

Convergence to normal distribution

Another property of the VKQ estimator is that distribution converges to a normal distribution ${\ displaystyle {\ sqrt {T}} ({\ hat {\ varvec {\ beta}}} _ {\ text {VKQ}} - {\ varvec {\ beta}})}$

{\ displaystyle {\ sqrt {T}} ({\ hat {\ varvec {\ beta}}} _ {\ text {VKQ}} - {\ varvec {\ beta}}) \ {\ xrightarrow {d}} \ {\ mathcal {N}} \! \ left ({\ varvec {0}}, {\ sigma} ^ {2} {\ varvec {V}} ^ {- 1} \ right)}

.

This asymptotic normality is especially important for statistical inference.

Prediction matrix

The prediction matrix of the VKQ estimator is given by

{\ displaystyle P = \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1} \ mathbf {X} \ right) ^ {- 1} \ mathbf { X} ^ {\ top} \ mathbf {\ Psi} ^ {- 1}}

It can be shown that , is no longer symmetrical. ${\ displaystyle P ^ {2} = P \ cdot P = P}$

Feasible Generalized KQ Estimation (GVKQ)

In practice, the covariance matrix of the disturbance variables is often unknown, so that the generalized least squares method is not feasible. A consistent estimate for is given by . In this case, in which the matrix has to be estimated, we also speak of the applicable or feasible generalized least squares estimation ( English Feasible Generalized Least Squares , shortly FGLS ) or from the estimated generalized least squares estimation or GVKQ estimate ( English Estimated Generalized Least Squares , EGLS for short ); its estimator is called the estimated VKQ estimator , or GVKQ estimator for short . It is given by: ${\ displaystyle \ mathbf {\ Psi} ^ {- 1}}$ ${\ displaystyle \ mathbf {\ Psi} ^ {- 1}}$ ${\ displaystyle {\ hat {\ mathbf {\ Psi}}} ^ {- 1}}$ ${\ displaystyle \ mathbf {\ Psi} ^ {- 1}}$

{\ displaystyle {\ boldsymbol {\ hat {\ hat {\ beta}}}} _ {\ text {GVKQ}} = \ left (\ mathbf {X} ^ {\ top} {\ hat {\ mathbf {\ Psi }}} ^ {- 1} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} {\ hat {\ mathbf {\ Psi}}} ^ {- 1} \ mathbf {y}}

.

Because the unknown covariance matrix of the disturbance variables was replaced by an estimate, the VKQ estimator has to be calculated and one also speaks of the feasible generalized KQ estimation. It is important to note that the covariance matrix has elements and thus cannot be estimated using only estimated disturbance vectors . For this reason, it is assumed that the elements are in functions of a small number of unknown parameters. ${\ displaystyle \ mathbf {\ Psi} ^ {- 1}}$ ${\ displaystyle \ mathbf {\ Psi}}$ ${\ displaystyle (T (T + 1)) / 2}$ ${\ displaystyle T}$ ${\ displaystyle {\ boldsymbol {\ hat {\ varepsilon}}}}$ ${\ displaystyle \ mathbf {\ Psi}}$

Weighted Least Squares (GKQ)

A special case of the VKQ method is the so-called weighted least squares method ( English weighted least squares , WLS for short ). It is used when all elements next to the main diagonal are zero. This method is used when the variances of the observed values are not constant (i.e. there is heteroscedasticity) and there is no correlation between the observed confounders. The weight of the unit ( ) is proportional to the reciprocal of the variance of the endogenous variables of the unit . The optimality criterion is the weighted residual square sum ${\ displaystyle \ mathbf {\ Psi}}$ ${\ displaystyle i}$ ${\ displaystyle w_ {i}}$ ${\ displaystyle i}$

{\ displaystyle GKQ ({\ varvec {\ beta}}) = {\ underset {\ varvec {\ beta}} {\ operatorname {arg \, min}}} \ sum _ {i = 1} ^ {n} { \ frac {1} {w_ {i}}} (y_ {i} - \ mathbf {x} _ {i} ^ {\ top} {\ varvec {\ beta}}) ^ {2} = (\ mathbf { y} - \ mathbf {X} {\ varvec {\ beta}}) ^ {\ top} \, \ mathbf {W} ^ {- 1} (\ mathbf {y} - \ mathbf {X} {\ varvec { \ beta}}) \ quad {\ text {with}} \ quad \ mathbf {W} = \ operatorname {diag} (w_ {1}, \ ldots, w_ {n})}

.

Applications

Multiplicative heteroscedasticity

If the assumption of homoscedasticity is not fulfilled, i. H. the diagonal elements of the covariance matrix are not identical, the following model results:

{\ displaystyle y_ {t} = \ mathbf {x} _ {t} ^ {\ top} {\ boldsymbol {\ beta}} + \ varepsilon _ {t} \ quad t = 1, \ dotsc, T}

With

{\ displaystyle \ mathbb {E} ({\ boldsymbol {\ varepsilon}}) = \ mathbf {0} \;}

and

{\ displaystyle \; \ operatorname {Cov} ({\ boldsymbol {\ varepsilon}}) = \ mathbb {E} ({\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top}) = \ sigma ^ {2} \ mathbf {\ Psi} = \ mathbf {\ Phi}}

General covariance matrix for heteroscedasticity:

{\ displaystyle \ mathbb {E} ({\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top}) = {\ begin {pmatrix} \ sigma _ {1} ^ {2} & 0 & \ cdots & 0 \\ 0 & \ sigma _ {2} ^ {2} & \ ddots & \ vdots \\\ vdots & \ ddots & \ ddots & 0 \\ 0 & \ cdots & 0 & \ sigma _ {T} ^ {2} \ end {pmatrix}} = \ sigma ^ {2} \ mathbf {\ Psi} = \ mathbf {\ Phi}}

Here it is assumed that there is a known, real , positive definite and symmetric matrix of the dimension . ${\ displaystyle \ mathbf {\ Psi}}$ ${\ displaystyle T \ times T}$

If the special form of multiplicative heteroscedasticity is present, the general covariance matrix takes the following form:

{\ displaystyle \ mathbb {E} ({\ boldsymbol {\ varepsilon}} {\ boldsymbol {\ varepsilon}} ^ {\ top}) = {\ begin {pmatrix} \ operatorname {exp} (\ mathbf {z} _ {1} ^ {\ top} \ mathbf {\ alpha}) & 0 & \ cdots & 0 \\ 0 & \ operatorname {exp} (\ mathbf {z} _ {2} ^ {\ top} \ mathbf {\ alpha}) & \ ddots & \ vdots \\\ vdots & \ ddots & \ ddots & 0 \\ 0 & \ cdots & 0 & \ operatorname {exp} (\ mathbf {z} _ {T} ^ {\ top} \ mathbf {\ alpha}) \ end {pmatrix}} = \ sigma ^ {2} \ mathbf {\ Psi} = \ mathbf {\ Phi}}

If this form of heteroscedasticity is present, the generalized least squares estimate can be used.

Seemingly unrelated regression

The seemingly unrelated regression ( English : seemingly unrelated regression , shortly SUR ), which is a generalization of linear regression model, includes a variety of regression equations that each have their own dependent variable and potentially different explanatory variables has. Each equation is itself a valid linear regression and can be estimated separately from the others. This results in a system of equations that is apparently called unconnected. Since the disturbance covariance matrix of the seemingly unrelated regression has the structure:

{\ displaystyle {\ begin {aligned} \ operatorname {Cov} (\ mathbf {e}) = \ mathbb {E} (\ mathbf {e} \ mathbf {e} ^ {\ top}) & = {\ begin { pmatrix} \ operatorname {E} ({\ varvec {e}} _ {1} {\ varvec {e}} _ {1} ^ {\ top}) & \ cdots & \ operatorname {E} ({\ varvec { e}} _ {1} {\ boldsymbol {e}} _ {N} ^ {\ top}) \\\\\ vdots & \ ddots & \ vdots \\\\\ operatorname {E} ({\ boldsymbol { e}} _ {N} {\ varvec {e}} _ {1} ^ {\ top}) & \ cdots & \ operatorname {E} ({\ varvec {e}} _ {N} {\ varvec {e }} _ {N} ^ {\ top}) \ end {pmatrix}} = {\ begin {pmatrix} \ sigma _ {11} \ mathbf {I} _ {T} & \ cdots & \ sigma _ {1N} \ mathbf {I} _ {T} \\\\\ vdots & \ ddots & \ vdots \\\\\ sigma _ {N1} \ mathbf {I} _ {T} & \ cdots & \ sigma _ {NN} \ mathbf {I} _ {T} \ end {pmatrix}} = {\ begin {pmatrix} \ sigma _ {11} & \ cdots & \ sigma _ {1N} \\\\\ vdots & \ ddots & \ vdots \\\\\ sigma _ {N1} & \ cdots & \ sigma _ {NN} \ end {pmatrix}} \ otimes \ mathbf {I} _ {T} \\\\ & = \ mathbf {\ Sigma} \ otimes \ mathbf {I} _ {T} = \ mathbf {\ Phi} \ end {aligned}}}

has, the apparently unconnected regression results in the following VKQ estimator:

{\ displaystyle {\ hat {\ boldsymbol {\ beta}}} _ {\ text {VKQ}} = \ left (\ mathbf {X} ^ {\ top} (\ mathbf {\ Sigma} ^ {- 1} \ otimes \ mathbf {I}) \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} (\ mathbf {\ Sigma} ^ {- 1} \ otimes \ mathbf {I}) \ mathbf {y}}

.

It can be shown that this VKQ estimator is equivalent to the KQ estimator if the cross-section of the data is inserted into the above formula . ${\ displaystyle {\ overline {\ mathbf {X}}}}$

proof
${\ displaystyle {\ begin {aligned} {\ hat {\ varvec {\ beta}}} _ {\ text {VKQ}} & = \ left (\ mathbf {X} ^ {\ top} (\ mathbf {\ Sigma } ^ {- 1} \ otimes \ mathbf {I}) \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} (\ mathbf {\ Sigma} ^ {- 1} \ otimes \ mathbf {I}) \ mathbf {y} = \ left ((\ mathbf {I} \ otimes {\ overline {\ mathbf {X}}}) ^ {\ top} (\ mathbf {\ Sigma} ^ { -1} \ otimes \ mathbf {I}) (\ mathbf {I} \ otimes {\ overline {\ mathbf {X}}}) \ right) ^ {- 1} (\ mathbf {I} \ otimes {\ overline {\ mathbf {X}}}) ^ {\ top} (\ mathbf {\ Sigma} ^ {- 1} \ otimes \ mathbf {I}) \ mathbf {y} \\ & = \ left (\ mathbf {\ Sigma} ^ {- 1} \ otimes {\ overline {\ mathbf {X}}} ^ {\ top} {\ overline {\ mathbf {X}}} \ right) ^ {- 1} \ left (\ mathbf { \ Sigma} ^ {- 1} \ otimes {\ overline {\ mathbf {X}}} ^ {\ top} \ right) \ mathbf {y} = \ left (\ mathbf {\ Sigma} \ otimes ({\ overline {\ mathbf {X}}} ^ {\ top} {\ overline {\ mathbf {X}}}) ^ {- 1} \ right) \ left (\ mathbf {\ Sigma} ^ {- 1} \ otimes { \ overline {\ mathbf {X}}} ^ {\ top} \ right) \ mathbf {y} \\ & = \ left (\ mathbf {I} \ otimes ({\ overline {\ mathbf {X}}} ^ {\ top} {\ overl ine {\ mathbf {X}}}) ^ {- 1} {\ overline {\ mathbf {X}}} ^ {\ top} \ right) \ mathbf {y} = \ left ((\ mathbf {I} \ otimes {\ overline {\ mathbf {X}}}) ^ {\ top} (\ mathbf {I} \ otimes {\ overline {\ mathbf {X}}}) \ right) ^ {- 1} (\ mathbf { I} \ otimes {\ overline {\ mathbf {X}}}) ^ {\ top} \ mathbf {y} \\ & = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {y} \\ & = \ mathbf {b} \ end {aligned}}}$ .

literature

George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl , TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4
Takeshi Amemiya : Generalized Least Squares Theory . In: Advanced Econometrics . Harvard University Press, 1985, ISBN 0-674-00560-0 .
John Johnston : Generalized Least-Squares . In: Econometric Methods , Second. Edition, McGraw-Hill, New York 1972, pp. 208-242.
Jan Kmenta : Generalized Linear Regression Model and Its Applications . In: Elements of Econometrics , Second. Edition, Macmillan, New York 1986, ISBN 0-472-10886-7 , pp. 607-650.

Remarks

^ Moritz Cantor : Gauß: Karl Friedrich G. In: General German Biography (ADB). Volume 8, Duncker & Humblot, Leipzig 1878, pp. 430-445., P. 436.

^ Adrien-Marie Legendre: Nouvelles méthodes pour la détermination des orbites des comètes. Paris 1805, pp. 72–80 (Appendix): Sur la Méthode des moindres quarrés.

^ Carl Friedrich Gauß: Theoria Motus Corporum Coelestium in sectionibus conicis solem ambientium . Göttingen 1809; Carl Haase (transl.): Theory of the movement of the heavenly bodies, which revolve around the sun in conic sections. Hanover 1865.

^ Matrices and determinants

^ Carl Friedrich Gauß: Theoria combinationis observationum erroribus minimis obnoxiae. 2 parts. Göttingen 1821–1823 (Commentationes Societatis Regiae Scientiarum Gottingensis recentiores, classis mathematicae, vol. 5.); Supplementum Theoria combinationis observationum erroribus minimis obnoxiae. Göttingen 1826/28 (Commentationes Societatis Regiae Scientiarum Gottingensis recentiores, classis mathematicae, vol. 6.); Anton Börsch Paul Simon (Ed.): Treatises on the least squares method by Carl Friedrich Gauss. In German language. Berlin 1887.

↑ Pete Stewart, June 21, 1991: Maybe We Should Call It “Lagrangian Elimination,” NA Digest Sunday, June 30, 1991 Volume 91, Issue 26.

↑ AC Aitken: On Least-squares and Linear Combinations of Observations . In: Proceedings of the Royal Society of Edinburgh . 55, 1934, pp. 42-48.

↑ Robertnowlan: Alexander Aitken ( page no longer available , search in web archives ) Info: The link was automatically marked as defective. Please check the link according to the instructions and then remove this notice.@1@ 2
^ AC Aitken, H. Silverstone: On the Estimation of Statistical Parameters . In: Proceedings of the Royal Society of Edinburgh , 1942, 61, pp. 186-194.
↑ Takeaki Kariya, Hiroshi Kurata: Generalized Least Squares
↑ George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl , TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 328.
↑ Fritz Pokropp : Linear regression and analysis of variance 2015, ISBN 978-3-486-78668-2 , page 108 (available via De Gruyter Online).
↑ Fritz Pokropp : Linear regression and analysis of variance 2015, ISBN 978-3-486-78668-2 , page 107 (available via De Gruyter Online).
↑ George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl , TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 328.
↑ George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl, TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 330.
↑ George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl, TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 341.
^ G. Judge, R. Carter Hill: Introduction to the Theory and Practice of Econometrics. 1998, p. 342.
Analogous to ( argument of the maximum ), ↑ denotes the argument of the minimum ${\ displaystyle \ arg \ min (\ cdot)}$ ${\ displaystyle \ arg \ max (\ cdot)}$
↑ In contrast to the generalized least squares method, the ordinary least squares method minimizes an unweighted sum of squares ${\ displaystyle (\ mathbf {y} - \ mathbf {X} {\ varvec {\ beta}}) ^ {\ top} (\ mathbf {y} - \ mathbf {X} {\ varvec {\ beta}}) }$
↑ George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl, TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 330.
↑ George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl, TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 331.
^ BH Baltagi: Econometrics . 4th ed. Springer, New York 2008.
^ T. Strutz: Data Fitting and Uncertainty (A practical introduction to weighted least squares and beyond) . Springer Vieweg, 2016, ISBN 978-3-658-11455-8 . , chapter 3
↑ George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl, TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 366.

[1] Moritz Cantor : Gauß: Karl Friedrich G. In: General German Biography (ADB). Volume 8, Duncker & Humblot, Leipzig 1878, pp. 430-445., P. 436.

[2] Adrien-Marie Legendre: Nouvelles méthodes pour la détermination des orbites des comètes. Paris 1805, pp. 72–80 (Appendix): Sur la Méthode des moindres quarrés.

[3] Carl Friedrich Gauß: Theoria Motus Corporum Coelestium in sectionibus conicis solem ambientium . Göttingen 1809; Carl Haase (transl.): Theory of the movement of the heavenly bodies, which revolve around the sun in conic sections. Hanover 1865.

[4] Matrices and determinants

[5] Carl Friedrich Gauß: Theoria combinationis observationum erroribus minimis obnoxiae. 2 parts. Göttingen 1821–1823 (Commentationes Societatis Regiae Scientiarum Gottingensis recentiores, classis mathematicae, vol. 5.); Supplementum Theoria combinationis observationum erroribus minimis obnoxiae. Göttingen 1826/28 (Commentationes Societatis Regiae Scientiarum Gottingensis recentiores, classis mathematicae, vol. 6.); Anton Börsch Paul Simon (Ed.): Treatises on the least squares method by Carl Friedrich Gauss. In German language. Berlin 1887.

[6] Pete Stewart, June 21, 1991: Maybe We Should Call It “Lagrangian Elimination,” NA Digest Sunday, June 30, 1991 Volume 91, Issue 26.

[7] AC Aitken: On Least-squares and Linear Combinations of Observations . In: Proceedings of the Royal Society of Edinburgh . 55, 1934, pp. 42-48.