Homoscedasticity and heteroscedasticity

from Wikipedia, the free encyclopedia
Heteroscedasticity: Here the scatter of the points around the straight line increases towards the right.

Heteroscedasticity (also variance heterogeneity , or heteroscedasty ; ancient Greek σκεδαστός skedastós , "scattered", "distributed"; " scatterable ") means in statistics that the variance of the interfering terms is not constant. If the variance of the disturbance terms (and thus the variance of the explained variables themselves) is not significantly different for all expressions of the exogenous (predictor) variables , homoscedasticity ( homogeneity of variance also homoscedasticity ) is present. The term plays an important role especially in econometrics and empirical research . The homoscedasticity assumption is an important part of the Gauss-Markov assumptions .

Homoscedasticity and heteroscedasticity

Homoscedasticity: The spread of the points around the straight line in the vertical direction is constant.

The distribution of characteristics plays a decisive role in statistics . For example, in regression analysis, a number of data points have been given into which a straight line is inserted as precisely as possible. The deviations of the data points from the straight line are called interference terms or residuals and, in terms of probability theory, are each random variables . Homoscedasty or heteroscedasty refers to the distribution of these disturbance terms, which is recorded using the variance. If these clutter terms all have the same variance, there is homogeneity of variance (i.e., homoscedasticity)

respectively .
Heteroscedasticity: The spread of the points around the straight line increases more than linearly towards the right.

Heteroscedasticity, on the other hand, means that the variance of the disturbance terms is not constant due to the explanatory variables:

.

In this case, the perturbation terms do not have the same variance and consequently the ordinary least squares method does not produce efficient estimates for the regression coefficients. This means that these estimates do not have the smallest possible variance. The standard errors of the regression coefficients are estimated in a biased manner and, moreover, a naive application of the t-test is not possible; the t-values ​​can no longer be used. In many cases, a suitable data transformation provides a remedy: If heteroscedasticity prevails, it can make perfect sense to transform the data using the logarithm or the square root in order to achieve homoscedasticity. This then leads to the correct use of Gauss-Markov's theorem .

In practice, heteroscedasticity occurs when the dispersion of the dependent variable depends on the magnitude of the explanatory variable. For example, spending on vacation is likely to be more diversified if the monthly disposable income is higher.

Consequences of heteroscedasticity on linear regression

Examples

Heteroscedasticity in time series

A typical example of heteroscedasticity is when the deviations from the trend line in a time series increase with the passage of time (e.g. for the accuracy of the weather forecast: the further into the future, the more unlikely an accurate forecast is). However, even in time series without constant variance, certain characteristic abnormalities such as B. Volatility clusters are observed. Therefore, in the context of volatility models, an attempt was made to base the course of the variance on a systematic explanation.

Heteroscedasticity in linear regression

Linear regression and residual plot on the Boston Housing data .

Heteroscedasticity can occur with a simple linear regression . This is a problem because classical linear regression analysis assumes homoscedasticity of the residuals. The graphic below shows the variables mean number of rooms per house (X) and mean purchase price per house (Y) for (almost) every district in Boston ( Boston housing data ). The linear regression graph shows the relationship between the two variables. The red line shows the residual for the right observation, i.e. the difference between the observed value (round circle) and the estimated value on the regression line.

The Heteroscedastic Residuals graphic shows the residuals for all observations. If one considers the scatter of the residuals in the range of 4–5 rooms or in the range from 7.5 rooms, it is greater than the scatter in the range 5–7.5 rooms. The spread of the residuals in the individual areas is therefore different, i.e. heteroscedastic. If the spread of the residuals were the same in all areas, then it would be homoscedastic.

Test procedure

Well-known methods to check the null hypothesis “homoscedasticity exists” are the Goldfeld-Quandt test , the White test , the Levene test , the Glejser test , the RESET test according to Ramsey and the Breusch-Pagan test .

literature

  • J. Wooldridge: Introductory Econometrics. A modern approach. 5th edition. Mason, Ohio 2013, ISBN 978-1-111-53439-4 .
  • M.-W. Stoetzer: Regression analysis in empirical economic and social research. Volume 1: A Non-Mathematical Introduction to SPSS and Stata. Berlin 2017, ISBN 978-3-662-53823-4 , pp. 135-147.

Individual evidence

  1. Jeffrey Wooldridge: Introductory Econometrics. A modern approach. 5th edition. South-Western, Cengage Learning, Mason, Ohio 2013, ISBN 978-1-111-53439-4 , pp. 849 .
  2. Lothar Sachs , Jürgen Hedderich: Applied Statistics: Collection of Methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , p. 814.
  3. Lothar Sachs, Jürgen Hedderich: Applied Statistics: Collection of Methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , p. 813.
  4. Jeffrey Wooldridge: Introductory Econometrics. A modern approach. 5th edition. South-Western, Cengage Learning, Mason, Ohio 2013, ISBN 978-1-111-53439-4 , pp. 49-54 .