Homoscedasticity and heteroscedasticity
Heteroscedasticity (also variance heterogeneity , or heteroscedasty ; ancient Greek σκεδαστός skedastós , "scattered", "distributed"; " scatterable ") means in statistics that the variance of the interfering terms is not constant. If the variance of the disturbance terms (and thus the variance of the explained variables themselves) is not significantly different for all expressions of the exogenous (predictor) variables , homoscedasticity ( homogeneity of variance also homoscedasticity ) is present. The term plays an important role especially in econometrics and empirical research . The homoscedasticity assumption is an important part of the Gauss-Markov assumptions .
Homoscedasticity and heteroscedasticity
The distribution of characteristics plays a decisive role in statistics . For example, in regression analysis, a number of data points have been given into which a straight line is inserted as precisely as possible. The deviations of the data points from the straight line are called interference terms or residuals and, in terms of probability theory, are each random variables . Homoscedasty or heteroscedasty refers to the distribution of these disturbance terms, which is recorded using the variance. If these clutter terms all have the same variance, there is homogeneity of variance (i.e., homoscedasticity)
- respectively .
Heteroscedasticity, on the other hand, means that the variance of the disturbance terms is not constant due to the explanatory variables:
- .
In this case, the perturbation terms do not have the same variance and consequently the ordinary least squares method does not produce efficient estimates for the regression coefficients. This means that these estimates do not have the smallest possible variance. The standard errors of the regression coefficients are estimated in a biased manner and, moreover, a naive application of the t-test is not possible; the t-values can no longer be used. In many cases, a suitable data transformation provides a remedy: If heteroscedasticity prevails, it can make perfect sense to transform the data using the logarithm or the square root in order to achieve homoscedasticity. This then leads to the correct use of Gauss-Markov's theorem .
In practice, heteroscedasticity occurs when the dispersion of the dependent variable depends on the magnitude of the explanatory variable. For example, spending on vacation is likely to be more diversified if the monthly disposable income is higher.
Consequences of heteroscedasticity on linear regression
- the first KQ assumption remains true, so the exogenous explanatory variable still does not correlate with the residual
- the exogenous and endogenous variables are no longer distributed identically, with the result that the KQ estimators are no longer efficient and the standard errors of the regression coefficients are distorted and not consistent . It follows from this that - as mentioned above - of course the t-values are no longer reliable either. This is because the t-value is calculated by dividing the coefficient estimate of an exogenous (predictor) variable by its standard error. However, in the presence of heteroscedasticity, other standard errors can be used, e.g. B. for heteroskedasty-robust standard errors ( Eicker-Huber-White estimator (named after Friedhelm Eicker , Peter J. Huber , Halbert L. White ); sometimes simply named with one of the developer names, for example as White estimator ). Another option in the presence of heteroskedasticity is the recourse to the weighted least squares estimator ( english weighted least squares estimator , short WLSE ) as a special case of the generalized least squares estimator (VKQ estimator).
Examples
Heteroscedasticity in time series
A typical example of heteroscedasticity is when the deviations from the trend line in a time series increase with the passage of time (e.g. for the accuracy of the weather forecast: the further into the future, the more unlikely an accurate forecast is). However, even in time series without constant variance, certain characteristic abnormalities such as B. Volatility clusters are observed. Therefore, in the context of volatility models, an attempt was made to base the course of the variance on a systematic explanation.
Heteroscedasticity in linear regression
Heteroscedasticity can occur with a simple linear regression . This is a problem because classical linear regression analysis assumes homoscedasticity of the residuals. The graphic below shows the variables mean number of rooms per house (X) and mean purchase price per house (Y) for (almost) every district in Boston ( Boston housing data ). The linear regression graph shows the relationship between the two variables. The red line shows the residual for the right observation, i.e. the difference between the observed value (round circle) and the estimated value on the regression line.
The Heteroscedastic Residuals graphic shows the residuals for all observations. If one considers the scatter of the residuals in the range of 4–5 rooms or in the range from 7.5 rooms, it is greater than the scatter in the range 5–7.5 rooms. The spread of the residuals in the individual areas is therefore different, i.e. heteroscedastic. If the spread of the residuals were the same in all areas, then it would be homoscedastic.
Test procedure
Well-known methods to check the null hypothesis “homoscedasticity exists” are the Goldfeld-Quandt test , the White test , the Levene test , the Glejser test , the RESET test according to Ramsey and the Breusch-Pagan test .
literature
- J. Wooldridge: Introductory Econometrics. A modern approach. 5th edition. Mason, Ohio 2013, ISBN 978-1-111-53439-4 .
- M.-W. Stoetzer: Regression analysis in empirical economic and social research. Volume 1: A Non-Mathematical Introduction to SPSS and Stata. Berlin 2017, ISBN 978-3-662-53823-4 , pp. 135-147.
Individual evidence
- ↑ Jeffrey Wooldridge: Introductory Econometrics. A modern approach. 5th edition. South-Western, Cengage Learning, Mason, Ohio 2013, ISBN 978-1-111-53439-4 , pp. 849 .
- ↑ Lothar Sachs , Jürgen Hedderich: Applied Statistics: Collection of Methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , p. 814.
- ↑ Lothar Sachs, Jürgen Hedderich: Applied Statistics: Collection of Methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , p. 813.
- ↑ Jeffrey Wooldridge: Introductory Econometrics. A modern approach. 5th edition. South-Western, Cengage Learning, Mason, Ohio 2013, ISBN 978-1-111-53439-4 , pp. 49-54 .