Total sum of squares

from Wikipedia, the free encyclopedia
The sum of the blue squares of the deviation is the total sum of squares.

In the statistics themselves, and particularly in the regression analysis , the entire or total sum of squares ( S umme the Q uadrate the T otalen deviations , short SQT or English sum of squared total Deviations in short SST or total sum of squares , shortly TSS ), also called total deviation square sum , or total deviation square sum and denoted at SAQ y (for S of umme A bweichungs q uadrate the y values ) or SAQ total abbreviated, the square sum of the dependent variables . It is calculated as the sum of the squares of the centered measured values ​​of the dependent variables and can be interpreted as "total variation" or "total variation of the dependent variables ". The total sum of squares is also referred to as the sum of squares to be explained in the context of the decomposition of the sum of squares . There is no international agreement on the exact name and its abbreviations. In German-language literature, the German name is sometimes used with English abbreviations.

definition

The total sum of squares is calculated from the sum of the squares of the total deviations (the deviations of the measured values ​​from their mean value )

,

where stands for the arithmetic mean . The total sum of squares captures the “total variation” in the dependent variable. If you divide the total or total sum of squares by the number of degrees of freedom , you get the total variance or total variance as the empirical variance :

.

The name total variance comes from the fact that the “total variance” can be broken down into the “explained variance” and the “residual variance”.

Decomposition of the total sum of squares

This animation shows the dispersion decomposition, i.e. H. the decomposition of the total sum of squares into the explained sum of squares (the portion of the total spread that can be explained by) and the
residual sum of squares . It can also be seen that the regression line obtained by the least squares estimation runs through the "center of gravity" of the point distribution in the scatter diagram (see also, algebraic properties of the least squares estimator ).

The sum of squares decomposition , also known as the decomposition of the sum of the squares of deviations , decomposition of the total sum of squares or dispersion decomposition, describes a decomposition of the total sum of squares of deviations. Given a multiple or simple linear regression model with an axis intercept , which is based on the sample and includes observations. The total sum of squares

can then be broken down into the explained sum of squares

and the residual sum of squares

:
,

which is equivalent to

or .

The sum of squares decomposition or the decomposition of scatter means that the “total variation in ” is the sum of the “total variation in ” and the “total variation in ”.

proof

using the property that the residuals are uncorrelated with the predicted values, i.e. H. . This uncorrelatedness of the forecast values ​​with the residuals can be interpreted in such a way that all relevant information of the explanatory variables with regard to the dependent variables is already contained in the forecast. In addition, the property was used that the sum and thus the arithmetic mean of the residuals is zero (if the model contains the intercept) (see statistical properties of least squares estimators ). The sum-of-squares decomposition can be interpreted as "dispersion decomposition".

The ratio of the explained sum of squares to the total sum of squares is called the coefficient of determination . The residual sum of squares is also residual sum called (or not explained sum of squares). Various statistical analysis methods such as regression analysis try to find a model that explains the existing observation values ​​better than their mean value .

literature

Individual evidence

  1. ^ Andy Field: Discovering statistics using SPSS. Sage publications, 2009, p. 202.
  2. Jeffrey Marc Wooldridge : Introductory econometrics: A modern approach. 4th edition. Nelson Education, 2015, p. 39.
  3. ^ Gertrud Moosmüller: Methods of empirical economic research. Pearson Deutschland GmbH, 2008, p. 239.
  4. Jeffrey Marc Wooldridge: Introductory econometrics: A modern approach. 4th edition. Nelson Education, 2015, p. 38.
  5. Peter Hackl : Introduction to Econometrics. 2nd updated edition. Pearson Deutschland GmbH, 2008, ISBN 978-3-86894-156-2 , p. 79.
  6. ^ Rainer Schlittgen: Regression analyzes with R. ISBN 978-3-486-73967-1 , p. 27 (accessed via De Gruyter Online).
  7. ^ Ludwig Fahrmeir , Thomas Kneib , Stefan Lang, Brian Marx: Regression: models, methods and applications. Springer Science & Business Media, 2013, ISBN 978-3-642-34332-2 , p. 112.