Unexpected estimation of the variance of the disturbance variables

In statistics  , the unbiased estimation of the variance of the disturbance variables , also called the unbiased estimation of the error variance , is a point estimator that has the quality property that it estimates the unknown variance of the disturbance variables in an unbiased way if the Gauss-Markow assumptions  apply.

Introduction to the problem

The error variance , and residual variance , experimental error , Störgrößenvarianz , variance of the disturbances , unexplained variance , unexplained variance referred to , is the variance of the regression function in the population and the variance of the error terms or interference. The error variance is an unknown parameter that must be estimated from the sample information. It measures the variation that can be traced back to the measurement errors or disturbance variables. A first obvious approach would be to estimate the variance of the confounding variables as usual with the maximum likelihood estimation (see classical linear model of normal regression ). However, this estimator is problematic, as will be explained below. ${\ displaystyle \ sigma ^ {2}}$${\ displaystyle \ sigma ^ {2} = \ operatorname {E} [(\ varepsilon _ {i} - \ operatorname {E} (\ varepsilon _ {i})) ^ {2}] \ quad, i = 1 \ ldots n}$

Expectant estimator for the variance of the disturbance variables

Simple linear regression

Although homoscedastic variance in the population is sometimes assumed to be known, it must be assumed that it is unknown in most use cases (for example, when estimating demand parameters in economic models, or production functions ). Since the disturbance variable variance has an unknown value, the numerical values ​​of the variances of the slope parameter and the absolute term cannot be estimated, since the formulas depend on this. However, these quantities can be estimated from the available data. An obvious estimator of the disturbance variables is the residual , where the sample regression function represents. The information contained in the residuals could therefore be used for an estimator of the disturbance variable variance. Due to the fact that , from a frequentist point of view, the “ mean value ” is . The size cannot be observed, however, since the disturbance variables cannot be observed. If the observable counterpart is used instead of now , this leads to the following estimator for the disturbance variable variance ${\ displaystyle \ operatorname {Var} (y \ mid X = x) = \ operatorname {Var} (\ beta _ {0} + \ beta _ {1} x + \ varepsilon) = \ operatorname {Var} (\ varepsilon) = \ sigma ^ {2} = \ operatorname {const.}}$${\ displaystyle \ varepsilon _ {i}}$ ${\ displaystyle {\ hat {\ varepsilon}} _ {i} = y_ {i} - {\ hat {y}} _ {i}}$${\ displaystyle {\ hat {y}} _ {i} = {\ hat {\ beta}} _ {0} + {\ hat {\ beta}} _ {1} x_ {i}}$${\ displaystyle \ operatorname {E} (\ varepsilon _ {i} ^ {2}) = \ sigma ^ {2}}$${\ displaystyle \ sigma ^ {2}}$${\ displaystyle \ varepsilon _ {i} ^ {2}}$${\ displaystyle \ varepsilon _ {i} ^ {2}}$${\ displaystyle \ varepsilon _ {i} ^ {2}}$ ${\ displaystyle {\ hat {\ varepsilon}} _ {i} ^ {2}}$

${\ displaystyle {\ tilde {s}} ^ {2} = {\ frac {1} {n}} \ sum \ nolimits _ {i = 1} ^ {n} {\ hat {\ varepsilon}} _ {i } ^ {2} = {\ frac {1} {n}} {\ hat {\ boldsymbol {\ varepsilon}}} ^ {\ top} {\ hat {\ boldsymbol {\ varepsilon}}} = {\ frac { 1} {n}} \ sum \ limits _ {i = 1} ^ {n} (y_ {i} - {\ hat {\ beta}} _ {0} - {\ hat {\ beta}} _ {1 } x_ {i}) ^ {2} = {\ frac {1} {n}} SQR}$,

where is the residual sum of squares. This estimator is the sample mean of the estimated residual squares and could be used to estimate the confounding variance . It can be shown that the above definition also corresponds to the maximum likelihood estimator ( ). However, the estimator does not meet common quality criteria for point estimators and is therefore not used often. For example, the estimator is not unbiased for . This is because the expected value results in the residual sum of squares and therefore applies to the expected value of this estimator . In simple linear regression it can be shown under the assumptions of the classical model of linear single regression that an unbiased estimate for , i. .h an estimate that is satisfied given by ${\ displaystyle SQR}$${\ displaystyle {\ tilde {s}} ^ {2} = {\ hat {\ sigma}} _ {\ text {ML}} ^ {2}}$${\ displaystyle \ sigma ^ {2}}$ ${\ displaystyle \ operatorname {E} ({\ hat {\ varepsilon}}} ^ {\ top} {\ hat {\ varepsilon}}}) = \ sigma ^ {2} (np)}$${\ displaystyle \ operatorname {E} ({\ hat {\ sigma}} _ {\ text {ML}} ^ {2}) = {\ frac {np} {n}} \ sigma ^ {2}}$${\ displaystyle \ sigma ^ {2}}$${\ displaystyle \ operatorname {E} ({\ hat {\ sigma}} ^ {2}) = \ sigma ^ {2}}$

${\ displaystyle {\ hat {\ sigma}} ^ {2} = s ^ {2} = {\ frac {1} {n-2}} \ sum \ limits _ {i = 1} ^ {n} (y_ {i} - {\ hat {\ beta}} _ {0} - {\ hat {\ beta}} _ {1} x_ {i}) ^ {2} = {\ frac {1} {n-2} } SQR}$,

assuming that . This unbiased estimate for is the residual mean square and is sometimes referred to as residual variance . The square root of this unbiased estimate or the residual variance is called the standard error of the regression . The residual variance can be interpreted as the mean model estimation error and forms the basis for all further calculations ( confidence intervals , standard errors of the regression parameters, etc.). It differs from the above expression in that the residual sum of squares is adjusted by the number of degrees of freedom . This adjustment can be explained intuitively by the fact that one loses two degrees of freedom by estimating the two unknown regression parameters and . ${\ displaystyle n> 2}$${\ displaystyle \ sigma ^ {2}}$${\ displaystyle \ beta _ {0}}$${\ displaystyle \ beta _ {1}}$

As mentioned above, an unbiased estimate for in simple linear regression is given by ${\ displaystyle \ sigma ^ {2}}$

${\ displaystyle {\ hat {\ sigma}} ^ {2} = s ^ {2} = {\ frac {1} {n-2}} \ sum \ limits _ {i = 1} ^ {n} (y_ {i} - {\ hat {\ beta}} _ {0} - {\ hat {\ beta}} _ {1} x_ {i}) ^ {2}}$,

where and are least squares guessers for and . ${\ displaystyle {\ hat {\ beta}} _ {1} = {\ frac {\ sum \ nolimits _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) (y_ {i} - {\ overline {y}})} {\ sum \ nolimits _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2}}} \; }$${\ displaystyle \; {\ hat {\ beta}} _ {0} = {\ overline {y}} - {\ hat {\ beta}} _ {1} {\ overline {x}}}$${\ displaystyle \ beta _ {0}}$${\ displaystyle \ beta _ {1}}$

In order to show the accuracy of the expectation, one uses the property that the residuals can be represented as a function of the disturbance variables as . Furthermore, the property is used that the variance of the KQ estimator is given by . It should also be noted that the expected value of the KQ estimator is given by and the same applies to . ${\ displaystyle {\ hat {\ varepsilon}} _ {i} = \ varepsilon _ {i} - ({\ hat {\ beta}} _ {0} - \ beta _ {0}) - ({\ hat { \ beta}} _ {1} - \ beta _ {1}) x_ {i}}$ ${\ displaystyle {\ hat {\ beta}} _ {1}}$${\ displaystyle \ operatorname {Var} ({\ hat {\ beta}} _ {1}) = \ sigma ^ {2} {\ frac {1} {\ sum \ nolimits _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2}}}}$${\ displaystyle {\ hat {\ beta}} _ {1}}$${\ displaystyle \ beta _ {1}}$${\ displaystyle {\ hat {\ beta}} _ {0}}$

 proof {\ displaystyle {\ begin {aligned} \ operatorname {E} ({\ hat {\ sigma}} ^ {2}) & = \ operatorname {E} \ left ({\ tfrac {1} {n-2}} \ sum \ nolimits _ {i = 1} ^ {n} {\ hat {\ varepsilon}} _ {i} ^ {2} \ right) \\ & = \ operatorname {E} \ left ({\ tfrac {1 } {n-2}} \ sum \ nolimits _ {i = 1} ^ {n} ({\ hat {\ varepsilon}} _ {i} - {\ overline {\ hat {\ varepsilon}}}) ^ { 2} \ right) \\ & = \ operatorname {E} \ left ({\ tfrac {1} {n-2}} \ sum \ nolimits _ {i = 1} ^ {n} ({\ hat {\ varepsilon }} _ {i} - ({\ overline {\ varepsilon}} - ({\ hat {\ beta}} _ {0} - \ beta _ {0}) - ({\ hat {\ beta}} _ { 1} - \ beta _ {1}) {\ overline {x}})) ^ {2} \ right) \\ & = \ operatorname {E} \ left ({\ tfrac {1} {n-2}} \ sum \ nolimits _ {i = 1} ^ {n} (\ varepsilon _ {i} - ({\ hat {\ beta}} _ {0} - \ beta _ {0}) - ({\ hat {\ beta}} _ {1} - \ beta _ {1}) x_ {i} - ({\ overline {\ varepsilon}} - ({\ hat {\ beta}} _ {0} - \ beta _ {0} ) - ({\ hat {\ beta}} _ {1} - \ beta _ {1}) {\ overline {x}})) ^ {2} \ right) \\ & = \ operatorname {E} \ left ({\ tfrac {1} {n-2}} \ sum \ nolimits _ {i = 1} ^ {n} ((\ varepsilon _ {i} - {\ overline {\ varepsilon}}) - ({\ hat {\ beta}} _ {1} - \ beta _ {1}) (x_ {i} - {\ overline {x}})) ^ {2} \ right) \ \ & = \ operatorname {E} \ left ({\ tfrac {1} {n-2}} \ sum \ nolimits _ {i = 1} ^ {n} ((\ varepsilon _ {i} - {\ overline { \ varepsilon}}) ^ {2} -2 (\ varepsilon _ {i} - {\ overline {\ varepsilon}}) ({\ hat {\ beta}} _ {1} - \ beta _ {1}) ( x_ {i} - {\ overline {x}}) + ({\ hat {\ beta}} _ {1} - \ beta _ {1}) ^ {2} (x_ {i} - {\ overline {x }}) ^ {2}) \ right) \\ & = {\ tfrac {1} {n-2}} \ operatorname {E} \ left (\ sum \ nolimits _ {i = 1} ^ {n} ( \ varepsilon _ {i} - {\ overline {\ varepsilon}}) ^ {2} -2 ({\ hat {\ beta}} _ {1} - \ beta _ {1}) \ sum \ nolimits _ {i = 1} ^ {n} \ varepsilon _ {i} (x_ {i} - {\ overline {x}}) + ({\ hat {\ beta}} _ {1} - \ beta _ {1}) ^ {2} \ sum \ nolimits _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2} \ right) \\ & = {\ tfrac {1} {n -2}} \ left (\ operatorname {E} \ left (\ sum \ nolimits _ {i = 1} ^ {n} (\ varepsilon _ {i} - {\ overline {\ varepsilon}}) ^ {2} \ right) -2 \ operatorname {E} \ left (({\ hat {\ beta}} _ {1} - \ beta _ {1}) \ sum \ nolimits _ {i = 1} ^ {n} \ varepsilon _ {i} (x_ {i} - {\ overline {x}}) \ right) + \ operatorname {E} \ left (({\ hat {\ beta}} _ {1} - \ beta _ {1} ) ^ {2} \ sum \ nolimits _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2} \ right) \ right) \\ & = {\ tfrac {1} {n-2}} \ left ((n-1) \ sigma ^ {2} -2 \ operatorname {E} (({\ hat {\ beta}} _ {1} - \ beta _ { 1}) ^ {2}) \ sum \ nolimits _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2} + \ operatorname {E} (({\ has {\ beta}} _ {1} - \ beta _ {1}) ^ {2}) \ sum \ nolimits _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}} ) ^ {2} \ right) \\ & = {\ tfrac {1} {n-2}} \ left ((n-1) \ sigma ^ {2} -2 \ operatorname {Var} ({\ hat { \ beta}} _ {1}) \ sum \ nolimits _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2} + \ operatorname {Var} ({\ hat {\ beta}} _ {1}) \ sum \ nolimits _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2} \ right) \\ & = {\ tfrac {1} {n-2}} \ left ((n-1) \ sigma ^ {2} -2 \ sigma ^ {2} + \ sigma ^ {2} \ right) \\ & = {\ tfrac {1} {n-2}} \ left (n \ sigma ^ {2} - \ sigma ^ {2} + \ sigma ^ {2} -2 \ sigma ^ {2} \ right) \\ & = { \ tfrac {1} {n-2}} (n-2) \ sigma ^ {2} \\ & = \ sigma ^ {2} \ end {aligned}}}.

The variances of the KQ estimators and can also be estimated with the unbiased estimator . For example, can be estimated by replacing with. The estimated variance of the slope parameter is then given by ${\ displaystyle {\ hat {\ beta}} _ {0}}$${\ displaystyle {\ hat {\ beta}} _ {1}}$${\ displaystyle \ operatorname {Var} ({\ hat {\ beta}} _ {1})}$${\ displaystyle \ sigma ^ {2}}$${\ displaystyle {\ hat {\ sigma}} ^ {2}}$

${\ displaystyle {\ widehat {\ operatorname {Var} ({\ hat {\ beta}} _ {1})}} = {\ frac {{\ tfrac {1} {n-2}} \ sum \ nolimits _ {i = 1} ^ {n} {\ hat {\ varepsilon}} _ {i} ^ {2}} {\ sum \ nolimits _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2}}}}$.

Multiple linear regression

In multiple linear regression, the unbiased estimate of the variance of the disturbance variables or the residual variance is given by

${\ displaystyle {\ hat {\ sigma}} ^ {2} = SQR / (nk-1) = {\ frac {\ sum \ nolimits _ {i = 1} ^ {n} (y_ {i} - \ mathbf {x} _ {i} ^ {\ top} {\ hat {\ boldsymbol {\ beta}}}) ^ {2}} {nk-1}} = {\ frac {{\ hat {\ boldsymbol {\ varepsilon }}} ^ {\ top} {\ hat {\ varepsilon}}}} {nk-1}} = {\ frac {\ left (\ mathbf {y} - \ mathbf {X} \ mathbf {b } \ right) ^ {\ top} \ left (\ mathbf {y} - \ mathbf {X} \ mathbf {b} \ right)} {nk-1}}}$,

wherein the least squares estimator and the -th row of the experimental design matrix represents. Alternatively, the unbiased estimation of the variance of the disturbance variables in the multiple case can be represented as ${\ displaystyle \ mathbf {b} = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {y}}$${\ displaystyle \ mathbf {x} _ {i} ^ {\ top}}$${\ displaystyle i}$ ${\ displaystyle \ mathbf {X}}$

${\ displaystyle {\ hat {\ sigma}} ^ {2} = {\ frac {\ mathbf {y} ^ {\ top} \ mathbf {y} - \ mathbf {b} ^ {\ top} \ mathbf {X } ^ {\ top} \ mathbf {y}} {nk-1}}}$.

This representation results from the fact that the residual sum of squares can be written as . Another alternative representation of the residual variance results from the fact that the residual sum of squares can also be represented using the residual-generating matrix as . This results in the residual variance ${\ displaystyle \ sum _ {i = 1} ^ {n} {\ hat {\ varepsilon}} _ {i} ^ {2} = \ sum _ {i = 1} ^ {n} (y_ {i} - {\ hat {y}} _ {i}) ^ {2}}$${\ displaystyle \ mathbf {y} ^ {\ top} \ mathbf {y} - \ mathbf {b} ^ {\ top} \ mathbf {X} ^ {\ top} \ mathbf {y}}$${\ displaystyle SQR = {\ hat {\ varepsilon}}} ^ {\ top} {\ hat {\ varepsilon}}} = {\ varepsilon {\ varepsilon}} ^ {\ top} \ mathbf {Q} {\ boldsymbol {\ varepsilon}}}$

${\ displaystyle {\ hat {\ sigma}} ^ {2} = {\ frac {\ mathbf {y} ^ {\ top} \ mathbf {y} - \ mathbf {b} ^ {\ top} \ mathbf {X } ^ {\ top} \ mathbf {y}} {nk-1}} = {\ frac {\ mathbf {y} ^ {\ top} \ mathbf {Q} \ mathbf {y}} {nk-1}} = {\ frac {{\ boldsymbol {\ varepsilon}} ^ {\ top} \ mathbf {Q} {\ boldsymbol {\ varepsilon}}} {nk-1}}}$

This estimate can in turn be used to compute the covariance matrix of the KQ estimate vector . If is now replaced by , the KQ estimation vector for the estimated covariance matrix is ​​obtained ${\ displaystyle \ sigma ^ {2}}$${\ displaystyle {\ hat {\ sigma}} ^ {2}}$

${\ displaystyle {\ hat {\ Sigma}} _ {\ mathbf {b}} = {\ hat {\ sigma}} ^ {2} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} = {\ frac {{\ hat {\ varepsilon {\ varepsilon}}} ^ {\ top} {\ hat {\ varepsilon}}}} {nk-1}} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1}}$.

Regression with stochastic regressors

In the case of regression with stochastic regressors with the stochastic regressor matrix , the expectation-based estimation of the variance of the disturbance variables is also given by ${\ displaystyle \ mathbf {Z}}$

${\ displaystyle {\ hat {\ sigma}} ^ {2} = {\ frac {\ left (\ mathbf {y} - \ mathbf {Z} \ mathbf {b} \ right) ^ {\ top} \ left ( \ mathbf {y} - \ mathbf {Z} \ mathbf {b} \ right)} {nk-1}}}$.

The fidelity to expectations can be shown by means of the law of the iterated expectation value.