# Residual Sum of Squares

The sum of the blue squares is the total sum of squares and the sum of the red squares is the sum of the squares of the residuals.

The residual sum of squares , the square sum of the residuals , or the sum of the residual squares , referred to in the statistics , the sum of the squared (least squares) residuals (deviations between the observed values and the predicted values) of all observations. Since deviation squares (here residual squares ) are formed first and then added up across all observations, it represents a deviation square sum. The residual square sum is a quality criterion for a linear model and describes the inaccuracy of the model. It records the dispersion of the observed values ​​around the predicted values ​​of the target variable , i.e. the dispersion that  cannot be explained by the sample regression line . It is therefore also called the unexplained deviation sum of squares (or shortly unexplained sum of squares indicated). In addition to the residual sum of squares , the total sum of squares and the explained sum of squares also play a major role in statistics .

To perform a global F test , mean squares are often of interest. If you divide the sum of the squares of the residuals by the residual degrees of freedom , you get the mean square of the residuals . The test statistic of a global F test is then given by the quotient of the “mean square of the explained deviations” and the “mean square of the residuals”.

## Abbreviation and designation problems

There is no international agreement on the exact name and its abbreviations. The natural German abbreviation for the residual sum of squares or the S umme the (deviation) Q uadrate the R estabweichungen (or "residuals") , is SAQ radical , or SQR . The English abbreviation SSR is ambiguous and leads to persistent confusion: Both Sum of Squared Residuals and Sum of Squares due to Regression (regression square sum ) are abbreviated as SSR . However, the regression sum of squares is often referred to as explained sum of squares ( Sum of Squares Explained called) whose natural abbreviation SSE is. The problem of abbreviations is exacerbated by the fact that the residual sum of squares is often also referred to as the sum of squares error , the natural English abbreviation of which is also SSE (this designation is particularly misleading because the errors and the residuals are different sizes). Furthermore, the English abbreviation RSS is also used for residual square sum , instead of the abbreviation SSR , since the term residual sum of squares is often used instead of the term Sum of Squared Residuals . This English abbreviation can also be confused with the regression sum of squares , which is also known in English as the regression sum of squares , the natural English abbreviation of which is RSS .

## definition

The sum of squares of the residuals is defined by the sum of the squares of the residual deviations or residuals :

${\ displaystyle SQR: = SQ _ {\ text {rest}}: = \ sum _ {i = 1} ^ {n} ({\ hat {\ varepsilon}} _ {i} - \ underbrace {\ overline {\ hat {\ varepsilon}}} _ {= 0}) ^ {2} = \ sum _ {i = 1} ^ {n} {\ hat {\ varepsilon}} _ {i} ^ {2} = \ sum _ { i = 1} ^ {n} (y_ {i} - {\ hat {y}} _ {i}) ^ {2}}$.

The second equality is there . ${\ displaystyle {\ overline {\ hat {\ varepsilon}}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} {\ hat {\ varepsilon}} _ {i} = 0}$

### Simple linear regression

In simple linear regression (model with only one explanatory variable), the residual sum of squares can also be expressed as follows:

${\ displaystyle SQR = \ sum _ {i = 1} ^ {n} {\ hat {\ varepsilon}} _ {i} ^ {2} = \ sum _ {i = 1} ^ {n} (y_ {i } - {\ hat {y}} _ {i}) ^ {2} = \ sum _ {i = 1} ^ {n} (y_ {i} - ({\ hat {\ beta}} _ {0} + {\ hat {\ beta}} _ {1} x_ {i})) ^ {2}}$

Here the represent the residuals and is the estimate of the absolute term and the estimate of the slope parameter. The least squares method tries to minimize the sum of the squares of the residuals (see minimizing the sum of the squares of errors ). A more specific concept is the PRESS statistic , and predictive residual sum of squares ( English predictive residual sum of squares ) called. ${\ displaystyle \ varepsilon _ {i} = y_ {i} - {\ hat {y}} _ {i}}$${\ displaystyle {\ hat {\ beta}} _ {0}}$${\ displaystyle {\ hat {\ beta}} _ {1}}$

It can be shown that in simple linear regression the residual sum of squares can be given as follows (for a proof, see Explained Sum of Squares # Simple Linear Regression )

${\ displaystyle SQR = SQT \ cdot (1-r_ {xy} ^ {2})}$,

where represents the total sum of squares and the Bravais-Pearson correlation coefficient . ${\ displaystyle SQT}$${\ displaystyle r_ {xy}}$

### Multiple linear regression

The ordinary residuals obtained by least squares estimation are given by in multiple linear regression

${\ displaystyle {\ hat {\ boldsymbol {\ varepsilon}}} = \ mathbf {y} - {\ hat {\ mathbf {y}}} = \ mathbf {y} - \ mathbf {X} \ mathbf {b} }$,

where is the least squares estimate vector. The sum of squares results from the product between the transposed residual vector and the non-transposed residual vector${\ displaystyle \ mathbf {b} = (\ mathbf {X} ^ {\ top} \ mathbf {X}) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {y}}$${\ displaystyle {\ hat {\ boldsymbol {\ varepsilon}}} ^ {\ top}}$${\ displaystyle {\ hat {\ boldsymbol {\ varepsilon}}}}$

${\ displaystyle SQR = \ sum _ {i = 1} ^ {n} {\ hat {\ varepsilon}} _ {i} ^ {2} = {\ hat {\ boldsymbol {\ varepsilon}}} ^ {\ top } {\ hat {\ boldsymbol {\ varepsilon}}} = (\ mathbf {y} - \ mathbf {X} \ mathbf {b}) ^ {\ top} (\ mathbf {y} - \ mathbf {X} \ mathbf {b}) = \ sum _ {i = 1} ^ {n} (y_ {i} - {\ hat {\ beta}} _ {0} - {\ hat {\ beta}} _ {1} x_ {i1} - {\ hat {\ beta}} _ {2} x_ {i2} - \ ldots - {\ hat {\ beta}} _ {k} x_ {ik}) ^ {2}}$.

Alternatively, it can also be written as:

${\ displaystyle SQR = \ mathbf {y} ^ {\ top} \ mathbf {y} - \ mathbf {b} ^ {\ top} \ mathbf {X} ^ {\ top} \ mathbf {y} = \ mathbf { y} ^ {\ top} \ mathbf {y} - \ mathbf {b} ^ {\ top} \ mathbf {X} ^ {\ top} {\ hat {\ mathbf {y}}}}$

The residual sum of squares can also be represented by means of the residual generating matrix as:

${\ displaystyle SQR = {\ hat {\ varepsilon}}} ^ {\ top} {\ hat {\ varepsilon}}} = {\ varepsilon {\ varepsilon}} ^ {\ top} (\ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top}) {\ varvec {\ varepsilon}} = {\ boldsymbol {\ varepsilon}} ^ {\ top} \ mathbf {Q} {\ boldsymbol {\ varepsilon}}}$.

This shows that the residual sum of squares is a quadratic form of the theoretical disturbance variables. An alternative representation is as a square form of the y values

${\ displaystyle SQR = \ mathbf {y} ^ {\ top} (\ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ { -1} \ mathbf {X} ^ {\ top}) \ mathbf {y} = \ mathbf {y} ^ {\ top} \ mathbf {Q} \ mathbf {y}}$.

## Calculation example

Scatter plot of the longitudes and latitudes of ten randomly selected warships.

The following example is intended to show the calculation of the residual sum of squares. Ten warships were randomly selected (see warship data) and analyzed for their length and width (in meters). It is to be examined whether the width of a warship is possibly in a fixed relation to the length.

The scatter diagram suggests a linear relationship between the length and width of a ship. A simple linear regression carried out using the least squares estimation results in the absolute term and the slope (for the calculation of the regression parameters, see example with a best-fit straight line ). The estimated regression line is thus ${\ displaystyle {\ hat {\ beta}} _ {0} = - 8 {,} 6450715}$${\ displaystyle {\ hat {\ beta}} _ {1} = 0 {,} 1612340}$

${\ displaystyle {\ widehat {\ mathtt {width}}} = - 8 {,} 6450715 + 0 {,} 1612340 \ cdot {\ mathtt {l {\ ddot {a}} nge}}}$.

The equation plots the estimated latitude as a function of longitude . The function shows that the width of the selected warships is roughly one sixth of their length. ${\ displaystyle {\ hat {y}} = {\ widehat {\ mathtt {width}}}}$${\ displaystyle x = {\ mathtt {l {\ ddot {a}} nge}}}$

Warship Length (m) Width (m) ${\ displaystyle y_ {i} ^ {*}}$ ${\ displaystyle y_ {i} ^ {*} \ cdot y_ {i} ^ {*}}$ ${\ displaystyle {\ hat {y}} _ {i}}$ ${\ displaystyle {\ hat {\ varepsilon}} _ {i}}$ ${\ displaystyle {\ hat {\ varepsilon}} _ {i} ^ {2}}$
${\ displaystyle i}$ ${\ displaystyle x_ {i}}$ ${\ displaystyle y_ {i}}$ ${\ displaystyle y_ {i} - {\ overline {y}}}$ ${\ displaystyle (y_ {i} - {\ overline {y}}) ^ {2}}$ ${\ displaystyle {\ hat {y}} (x_ {i})}$ ${\ displaystyle y_ {i} - {\ hat {y}} _ {i}}$ ${\ displaystyle (y_ {i} - {\ hat {y}} _ {i}) ^ {2}}$
1 208 21.6 3.19 10.1761 24.8916 −3.2916 10.8347
2 152 15.5 −2.91 8.4681 15.8625 −0.3625 0.1314
3 113 10.4 −8.01 64.1601 9.5744 0.8256 0.6817
4th 227 31.0 12.59 158.5081 27.9550 3.045 9.2720
5 137 13.0 −5.41 29.2681 13.4440 −0.4440 0.1971
6th 238 32.4 13.99 195.7201 29.7286 2.6714 7,1362
7th 178 19.0 0.59 0.3481 20.0546 −1.0546 1.1122
8th 104 10.4 −8.01 64.1601 8.1233 2.2767 5.1835
9 191 19.0 0.59 0.3481 22.1506 −3.1506 9.9265
10 130 11.8 −6.61 43.6921 12.3154 −0.5154 0.2656
Σ 1678 184.1 574.8490 0.0000 44.7405
Σ / n 167.8 18.41 57.48490 0.0000 4.47405

In addition to the total square sum of the measured values , the residual square sum (last column) can also be read from the table . Based on these two quantities, the coefficient of determination can also be calculated (see also the coefficient of determination # calculation example ). ${\ displaystyle 574 {,} 849 \; {\ text {m}} ^ {2}}$${\ displaystyle 44 {,} 7405 \; {\ text {m}} ^ {2}}$

## Properties of the residual sum of squares

### Distribution of the residual sum of squares

If the observations are multidimensionally normally distributed , then the quotient of the sum of the squares of the residuals and the disturbance variable variance follows a chi-square distribution with (with ) degrees of freedom: ${\ displaystyle SQR}$${\ displaystyle \ sigma ^ {2}}$${\ displaystyle np}$${\ displaystyle p = k + 1}$

${\ displaystyle {\ frac {SQR} {\ sigma ^ {2}}} = {\ frac {{\ hat {\ varepsilon}}} ^ {\ top} {\ hat {\ varepsilon} }}} {\ sigma ^ {2}}} = (np) {\ frac {{\ hat {\ sigma}} ^ {2}} {\ sigma ^ {2}}} \ sim \ chi ^ {2} (np)}$,

where represents the unbiased estimate of the variance of the disturbance variables . ${\ displaystyle {\ hat {\ sigma}} ^ {2}}$

### Expected value of the residual sum of squares

It can be shown that the expected value of the residual sum results ${\ displaystyle \ sigma ^ {2} (nk-1)}$

${\ displaystyle \ operatorname {E} ({\ hat {\ varepsilon}}} ^ {\ top} {\ hat {\ varepsilon}}}) = \ operatorname {E} ({\ varepsilon { \ varepsilon}} ^ {\ top} (\ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf { X} ^ {\ top}) {\ boldsymbol {\ varepsilon}}) = \ sigma ^ {2} (nk-1)}$,

where is the number of degrees of freedom of the residual sum of squares and the disturbance variable variance. From this it can be concluded that the unbiased estimator for the unknown scalar disturbance variance must be given by . ${\ displaystyle (nk-1)}$${\ displaystyle \ sigma ^ {2}}$${\ displaystyle {\ hat {\ boldsymbol {\ varepsilon}}} ^ {\ top} {\ hat {\ boldsymbol {\ varepsilon}}} / (nk-1)}$

## Mean residual square

When dividing the residual sum of squares by the number of degrees of freedom, is then obtained as a mean square deviation of the 'mean Residuenquadrat "( M ittleres Q uadrat the R esiduen , in short, MQR )

${\ displaystyle MQR = {\ frac {\ sum _ {i = 1} ^ {n} (y_ {i} - {\ hat {y}} _ {i}) ^ {2}} {nk-1}} = {\ frac {SQR} {nk-1}}}$.

The square root of the mean residual square is the standard error of the regression . In the linear single regression , which establishes the connection between the influencing and the target variable with the help of two regression parameters, the mean residual square is given by

${\ displaystyle MQR = {\ frac {\ sum _ {i = 1} ^ {n} (y_ {i} - {\ hat {y}} _ {i}) ^ {2}} {n-2}} = {\ frac {SQR} {n-2}}}$.

## Weighted residual sum of squares

In the generalized least squares estimation and other applications, a weighted version of the residual sum of squares is often used

${\ displaystyle GSQR = \ sum _ {i = 1} ^ {n} {\ frac {1} {w_ {i}}} (y_ {i} - \ mathbf {x} _ {i} ^ {\ top} {\ varvec {\ beta}}) ^ {2} = (\ mathbf {y} - \ mathbf {X} {\ varvec {\ beta}}) ^ {\ top} \, \ mathbf {W} ^ {- 1} (\ mathbf {y} - \ mathbf {X} {\ boldsymbol {\ beta}}) \ quad {\ text {with}} \ quad \ mathbf {W} = \ operatorname {diag} (w_ {1} , \ ldots, w_ {n})}$,

where represents the weight matrix. ${\ displaystyle \ mathbf {W} = \ operatorname {diag} (w_ {1}, \ ldots, w_ {n})}$

## Penalized residual square sum

In the context of penalized splines ( P-splines for short ), a so-called penalized residual square sum is used, which corresponds approximately to the usual residual square sum.

## Individual evidence

1. ^ Field, Andy: Discovering statistics using SPSS. Sage publications, 2009. p. 202.
2. Jeffrey Marc Wooldridge : Introductory econometrics: A modern approach. 4th edition. Nelson Education, 2015, p. 39.
3. Werner Timischl : Applied Statistics. An introduction for biologists and medical professionals. 2013, 3rd edition, p. 314.
4. ^ Ludwig Fahrmeir , Thomas Kneib , Stefan Lang: Regression: Models, Methods and Applications. , P. 77
5. ^ Ludwig Fahrmeir, Thomas Kneib, Stefan Lang, Brian Marx: Regression: models, methods and applications. Springer Science & Business Media, 2013, ISBN 978-3-642-34332-2 , p. 123.
6. George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl , TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 207.
7. Werner Timischl : Applied Statistics. An introduction for biologists and medical professionals. 2013, 3rd edition, p. 335.
8. ^ Ludwig Fahrmeir , Thomas Kneib , Stefan Lang: Regression: Models, Methods and Applications. , P. 432