# Disturbance and residual

Theoretical true line and estimated regression line . The residual is the difference between the measured value and the estimate .${\ displaystyle y}$${\ displaystyle {\ hat {y}}}$${\ displaystyle {\ hat {\ varepsilon}} _ {i}}$${\ displaystyle y_ {i}}$${\ displaystyle {\ hat {y}} _ {i}}$

In statistics , disturbance and residual are two closely related concepts. The disturbance variables (not to be confused with disturbance parameters or disturbance factors ), also called disturbance variables , disturbance terms , error terms or errors for short , are unobservable random variables in a simple or multiple regression equation that measure the vertical distance between the observation point and the true straight line ( regression function of the population ). They are usually assumed to be uncorrelated , to have an expectation of zero and a homogeneous variance ( Gauss-Markov assumptions ). They include unobserved factors that affect the dependent variable. The disturbance can also contain measurement errors in the observed dependent or independent variables.

In contrast to the disturbance variables, residuals ( Latin residuum = “that which remained”) are calculated variables and measure the vertical distance between the observation point and the estimated regression line . Sometimes the residual is also referred to as the “estimated residual”. This naming is problematic because the disturbance variable is a random variable and not a parameter. There can therefore be no question of estimating the disturbance variable.

The problem with so-called regression diagnostics is that the Gauss-Markow assumptions only relate to the disturbance variables, but not to the residuals. Although the residuals also have an expected value of zero, they are not uncorrelated and do not show any homogeneous variance. In order to take this into account, the residuals are usually modified in order to meet the required assumptions, e.g. B. Studentized residuals. The sum of squares of the residuals plays an important role in statistics in many applications, e.g. B. the least squares method . The notation of the disturbance variables as or is based on the Latin word erratum (error). The residuals can be generated using the residual matrix. ${\ displaystyle \ varepsilon _ {i}}$${\ displaystyle e_ {i}}$

## Disturbance and residual

Disturbances are not to be confused with the residuals. A distinction is made between the two concepts as follows:

• Unobservable random disturbances : measure the vertical distance between the observation point and the theoretical ( true straight line )${\ displaystyle \ varepsilon _ {i}}$
• Residual : Measure the vertical distance between empirical observation and the estimated regression line${\ displaystyle {\ hat {\ varepsilon}} _ {i} = y_ {i} - {\ hat {y}} _ {i}}$

## Simple linear regression

This graphic shows the breakdown of the “deviation to be explained” into the “explained deviation” and the “residual” .${\ displaystyle \ left (y_ {i} - {\ overline {y}} \ right)}$${\ displaystyle \ left ({\ hat {y}} _ {i} - {\ overline {y}} \ right)}$${\ displaystyle \ left (y_ {i} - {\ hat {y}} _ {i} \ right)}$

In simple linear regression with the single linear regression model , the ordinary residuals are given by ${\ displaystyle Y_ {i} = \ beta _ {0} + \ beta _ {1} x_ {i} + \ varepsilon _ {i}}$

${\ displaystyle {\ hat {\ varepsilon}} _ {i} = y_ {i} - {\ hat {y}} _ {i} = y_ {i} - {\ hat {\ beta}} _ {0} - {\ hat {\ beta}} _ {1} x_ {i}}$.

These are residuals because an estimated value is subtracted from the true value . Specifically be said of the observation values of the adjusted values ( English fitted values ) subtracted. In simple linear regression, numerous assumptions are usually made about the disturbance variables (see Assumptions about the disturbance variables ). ${\ displaystyle y_ {i}}$ ${\ displaystyle {\ hat {y}} _ {i} = {\ hat {\ beta}} _ {0} + {\ hat {\ beta}} _ {1} x_ {i}}$

### Residual variance

The residual variance (also called residual variance) is an estimate of the variance of the regression function in the population . In simple linear regression, an estimate found by the maximum likelihood estimate is given by ${\ displaystyle \ operatorname {Var} (y \ mid X = x) = \ operatorname {Var} (\ beta _ {0} + \ beta _ {1} x + \ varepsilon) = \ sigma ^ {2} = \ operatorname {const}}$

${\ displaystyle {\ tilde {s}} _ {\ varepsilon} ^ {2} = {\ frac {1} {n}} \ sum \ limits _ {i = 1} ^ {n} {\ hat {\ varepsilon }} _ {i} ^ {2} = {\ frac {1} {n}} \ sum \ limits _ {i = 1} ^ {n} (y_ {i} - {\ hat {\ beta}} _ {0} - {\ hat {\ beta}} _ {1} x_ {i}) ^ {2}}$.

However, the estimator does not meet common quality criteria for point estimators and is therefore not used often. For example, the estimator is not unbiased for . In simple linear regression it can be shown under the assumptions of the classical model of linear single regression that an unbiased estimate of the variance of the disturbance variables , i.e. .h an estimate that is fulfilled is given by the variant adjusted for the number of degrees of freedom : ${\ displaystyle \ sigma ^ {2}}$ ${\ displaystyle \ sigma ^ {2}}$${\ displaystyle \ operatorname {E} ({\ hat {\ sigma}} ^ {2}) = \ sigma ^ {2}}$

${\ displaystyle {\ hat {\ sigma}} ^ {2} = {\ frac {1} {n-2}} \ sum \ limits _ {i = 1} ^ {n} (y_ {i} - {\ hat {\ beta}} _ {0} - {\ hat {\ beta}} _ {1} x_ {i}) ^ {2}}$.

The positive square root of this unbiased estimator is also known as the standard error of regression .

### Residuals as a function of the disturbance variables

In simple linear regression, the residuals can be written as a function of the disturbance variables for each individual observation as ${\ displaystyle \ varepsilon _ {i}}$

${\ displaystyle {\ hat {\ varepsilon}} _ {i} = \ varepsilon _ {i} - ({\ hat {\ beta}} _ {0} - \ beta _ {0}) - ({\ hat { \ beta}} _ {1} - \ beta _ {1}) x_ {i}}$.

### Sum of the residuals

The KQ regression equation is determined so that the residual sum of squares becomes a minimum. Equivalently, this means that positive and negative deviations from the regression line cancel each other out. If the model of linear single regression contains an axis intercept that differs from zero, then it must be true that the sum of the residuals is zero

${\ displaystyle \ sum _ {i = 1} ^ {n} {\ hat {\ varepsilon}} _ {i} = 0}$

## Multiple linear regression

Regression plane that runs through a point cloud for two regressors.

Since the residuals, in contrast to the disturbance variables, are observable and calculated variables, they can be displayed graphically or examined in other ways. In contrast to simple linear regression, in which a straight line is determined, in multiple linear regression (extension of simple linear regression to regressors) a hyperplane is determined that runs through the point cloud. If there are two regressors, the observations are figuratively speaking above or below the regression level. The differences between the observed and the predicted values lying on the hyperplane represent the residuals. ${\ displaystyle p}$${\ displaystyle y}$

${\ displaystyle {\ hat {\ varepsilon}} _ {i} = y_ {i} - {\ hat {y}} _ {i} = y_ {i} - {\ hat {\ beta}} _ {0} - {\ hat {\ beta}} _ {1} x_ {i1} - {\ hat {\ beta}} _ {2} x_ {i2} - \ dotsc - {\ hat {\ beta}} _ {k} x_ {ik}}$.

The residuals obtained by least squares estimation are called ordinary residuals . If there are additional observations, then the common SQ residuals in multiple linear regression are given by ${\ displaystyle n}$

${\ displaystyle {\ hat {\ boldsymbol {\ varepsilon}}} = \ mathbf {y} - {\ hat {\ mathbf {y}}} = \ mathbf {y} - \ mathbf {X} \ mathbf {b} = \ left (\ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top } \ right) \ mathbf {y} = (\ mathbf {I} - \ mathbf {P}) \ mathbf {y}}$,

where a projection matrix , or more precisely the idempotent and symmetrical residual matrix, represents and represents the KQ estimator in the multiple case. ${\ displaystyle \ mathbf {Q}: = (\ mathbf {I} - \ mathbf {P})}$ ${\ displaystyle \ mathbf {b} = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {y}}$

### properties

The ordinary residuals are in the mean , i. H. ${\ displaystyle 0}$

${\ displaystyle \ operatorname {E} ({\ hat {\ boldsymbol {\ varepsilon}}}) = \ operatorname {E} {\ begin {pmatrix} {\ hat {\ varepsilon}} _ {1} \\ {\ hat {\ varepsilon}} _ {2} \\\ vdots \\ {\ hat {\ varepsilon}} _ {n} \ end {pmatrix}} = {\ begin {pmatrix} 0 \\ 0 \\\ vdots \ \ 0 \ end {pmatrix}} = \ mathbf {0}}$

The covariance matrix of the ordinary residuals is given by

${\ displaystyle \ operatorname {Cov} ({\ hat {\ boldsymbol {\ varepsilon}}}) = \ operatorname {Cov} (\ mathbf {Q} \ mathbf {y}) = \ mathbf {Q} \ operatorname {Cov } (\ mathbf {y}) \ mathbf {Q} ^ {\ top} = \ mathbf {Q} \ operatorname {Cov} ({\ boldsymbol {\ varepsilon}}) \ mathbf {Q} = \ operatorname {Cov} ({\ boldsymbol {\ varepsilon}}) \ mathbf {Q} \ mathbf {Q} = \ sigma ^ {2} (\ mathbf {I} - \ mathbf {P}) = \ sigma ^ {2} \ mathbf { Q}}$.

The ordinary residuals are thus heteroscedastic, there

${\ displaystyle \ operatorname {Cov} ({\ boldsymbol {\ hat {\ varepsilon}}}) = \ sigma ^ {2} (\ mathbf {I} - \ mathbf {P}) = \ sigma ^ {2} \ mathbf {Q} \ neq \ sigma ^ {2} \ mathbf {I}}$.

This means that the Gauss-Markov assumptions are not fulfilled for the ordinary residuals , since the homoscedasticity assumption does not apply. ${\ displaystyle \ operatorname {Cov} ({\ boldsymbol {\ varepsilon}}) = \ sigma ^ {2} \ mathbf {I}}$

With the help of the prediction and residual matrices, it can be shown that the residuals are uncorrelated with the predicted values

${\ displaystyle {\ hat {\ boldsymbol {\ varepsilon}}} ^ {\ top} {\ hat {\ mathbf {y}}} = \ left (\ left (\ mathbf {I} - \ mathbf {P} \ right) \ mathbf {y} \ right) ^ {\ top} \ mathbf {P} \ mathbf {y} = \ mathbf {y} ^ {\ top} \ left (\ mathbf {I} - \ mathbf {P} \ right) \ mathbf {P} \ mathbf {y} = \ mathbf {y} ^ {\ top} \ left (\ mathbf {P} - \ mathbf {P} \ right) \ mathbf {y} = \ mathbf { 0}}$.

## Partial residuals

Partial residuals scatterplots are built using partial residuals that are defined by

${\ displaystyle {\ hat {\ varepsilon}} _ {x_ {j}, i}: = y_ {i} - {\ hat {\ beta}} _ {1} - {\ hat {\ beta}} _ { 2} x_ {i2} - \ ldots - {\ hat {\ beta}} _ {j-1} x_ {i, j-1} - {\ hat {\ beta}} _ {j + 1} x_ {i , j + 1} - \ ldots - {\ hat {\ beta}} _ {k} x_ {i, k} = y_ {i} - \ mathbf {x} _ {t} ^ {\ top} {\ hat {\ boldsymbol {\ beta}}} + {\ hat {\ beta}} _ {j} x_ {ij}}$.

## Studentized residuals

For this simple model, let the design matrix be

${\ displaystyle \ mathbf {X} = {\ begin {pmatrix} 1 & x_ {1} \\\ vdots & \ vdots \\ 1 & x_ {n} \ end {pmatrix}}}$

given. The prediction matrix is ​​the matrix of the orthogonal projection onto the column space of the design matrix . is given by ${\ displaystyle \ mathbf {P}}$${\ displaystyle \ mathbf {P}}$

${\ displaystyle \ mathbf {P} = \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} }$.

The statistical leverage values are the -th diagonal elements of the prediction matrix. The variance of the -th residual is given by ${\ displaystyle p_ {ii}}$${\ displaystyle i}$${\ displaystyle i}$

${\ displaystyle \ operatorname {Var} ({\ widehat {\ varepsilon}} _ {i}) = \ sigma ^ {2} (1-p_ {ii})}$.

In this case the design matrix has only two columns, which leads to the following variance ${\ displaystyle \ mathbf {X}}$

${\ displaystyle \ operatorname {Var} ({\ widehat {\ varepsilon}} _ {i}) = \ sigma ^ {2} \ left (1 - {\ frac {1} {n}} - {\ frac {( x_ {i} - {\ overline {x}}) ^ {2}} {\ sum _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2}} } \ right)}$.

The corresponding studentized residuals are

${\ displaystyle t_ {i} = {{\ widehat {\ varepsilon}} _ {i} \ over {\ widehat {\ sigma}} {\ sqrt {1-p_ {ii} \}}}}$.

The Studentized residuals are distributed identically (but not independently) and are therefore particularly homoscedastic. They could thus represent a solution to the violation of the homoscedasticity assumption.

## Building dimensions

### Residual Sum of Squares

If one forms the sum of the squared residuals for all observations, one obtains the residual squared sum :

${\ displaystyle SQR: = \ sum _ {i = 1} ^ {n} {\ hat {\ varepsilon}} _ {i} ^ {2} = \ sum _ {i = 1} ^ {n} (y_ { i} - {\ hat {y}} _ {i}) ^ {2}}$.

This particular square sums of deviation appears in many statistical measures, such as B. the  coefficient of determination , the F statistic and various standard errors , such as the standard error of regression . The minimization of the sum of squares leads to the least squares estimator.