Sum of squares of deviations
In the statistics is the sum of squares ( SAQ or English sum of squared Deviations short SSD ), also deviation square sum , shortly sum of squares or sum of squares ( SQ or Q or English sum of squares , shortly SS ) called the sum of squared deviations of the measured values from their arithmetic mean . The sum of the squares of deviation is a measure of the strength of the fluctuations in the measured values around their mean value and thus a measure of the “variation” of a feature . A generalization of the sum of the deviation squares represents the sum of the deviation products ( SAP or English sum of products of deviations , short SPD ), also deviation product sum , short sum of products ( SP or English sum of products , short SP ) or product sum (seldom also Sum of the cross products or cross product sum ) and is a measure of the degree of common variation (the “covariation”) of two features and . The sum of the deviation products is the sum of the mean value-adjusted pairs of measured values. The non-standardized “mean square deviation” of measured values is called empirical variance and the non-standardized “mean deviation product” of measured value pairs is called empirical covariance . The sum of squares is the numerator of the empirical variance and the product sum is the numerator of the empirical covariance. The sum of squares and the sum of products are used in a variety of ways, e.g. B. the coefficient of determination or the Bravais-Pearson correlation coefficient . The shift theorem provides important calculation rules for the sum of squares and the sum of products. The sums of squares that are important in statistics are the total sum of squares and the explained sum of squares . Another important in statistics sum of squares is the residual sum , short SQR ( S umme the Q uadrate the R estabweichungen (or "residuals") or English sum of squared residuals , short SSR ), which in the least squares method a large Role play.
definition
The differential square sum is the sum of the squared deviations of the measured values from their arithmetic mean value
- .
Alternatively, the sum of the squares of the deviation can be given by Steiner's displacement law as follows:
- .
To make the difference to the product total clearer, it is also noted as. For applications, especially in the analysis of variance , the notation of the sum of squares is preferred.
If the feature shows no variability, i. H. , then the sum of squares (and variance) is zero. In the calculation of the sum, a total of deviation squares are included, so that the deviation square sum is greater, the larger the sample size.
Mean square of deviation
In order to obtain a measure for the variation of the characteristic values that is independent of the sample size, a normalization must be carried out. The normalization is done by dividing the sum of squares by the number of degrees of freedom :
- .
The thus obtained dispersion measure represents a sort of "medium" or "durchschnitlliches" square deviation is ( English mean square , in short: MS ), that with or ( M edium Q uadrate the deviations or D verage Q uadrate of the deviations ) will be abbreviated. The "mean square of deviation" (often incorrectly called "mean sum of squares") is the empirical variance , but in the analysis of variance it is not referred to as the variance, but as the mean square of the deviation. The sum of squares is therefore times the empirical variance of the measured values. The mean square of the residuals is called the mean residual square.
generalization
The deviation product sum or the sum of the deviation products is a generalization of the deviation square sum and is defined as the sum of the products of the mean value-adjusted pairs of measured values:
- .
In particular . The empirical covariance is the sum of the deviation products of the measured values of and divided by :
- .
The empirical covariance can thus be interpreted as the “mean” or “average” deviation product.
Special sums of squares
Residual Sum of Squares
Based on the residuals , which measure the vertical distance between the observation point and the estimated regression line, a residual square sum can be defined as the sum of the squares of the residuals as follows
- .
Hypothesis Sum of Squares
The hypothesis sum of squares ( English sum of squares due to hypothesis ) occurs when testing the general linear hypothesis on. Let be a restriction matrix , with Let it be further assumed that the restrictions on the parameter vector can be expressed as :, where represents a -vector consisting of known constants. The hypothesis sum of squares is then given by
- .
Remarks
- ↑ The common variation of two or more characteristics is called "co-variation".
- ↑ Werner Timischl : Applied Statistics. An introduction for biologists and medical professionals. 2013, 3rd edition, p. 109.
- ↑ Lothar Sachs : Statistical Evaluation Methods , p. 400.
- ↑ Ludwig von Auer : Econometrics. An introduction. Springer, ISBN 978-3-642-40209-8 , 6th through. u. updated edition 2013, p. 46.
- ↑ Werner Timischl : Applied Statistics. An introduction for biologists and medical professionals. 2013, 3rd edition, p. 335.
- ↑ Lothar Sachs, Jürgen Hedderich: Applied Statistics: Collection of Methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , p. 404
- ↑ Jeffrey Marc Wooldridge: Introductory econometrics: A modern approach. 4th edition. Nelson Education, 2015, p. 810