Goldfeld-Quandt test
The Goldfeld-Quandt test is a statistical test for heteroscedasticity (non-constant variance of the confounding variables ) in regression analysis . The test is based on the comparison of two sample halves. It was named after Stephen Goldfeld and Richard E. Quandt.
Action
The sample is divided into two subsets for an explanatory variable, see graphic. The two subsets must be disjoint so that no observations occur in both subsets. However, the two subsets together do not have to encompass the entire sample. In the graphic is e.g. B. the middle part of the observations in no subset (gray). A regression is estimated for both subsets and the variance of the residuals is calculated. Then the sample variance of the residuals for i = 1.2 is determined (with ) for each subset and the test value is compared with a critical value from the F-distribution . The example shows heteroscedasticity because the regression for one subset shows a high residual variance (red), while the regression for the other subset shows a low residual variance (blue).
Mathematical formulation
requirement
In the classic regression model, or with and applies . The test is sensitive to violations of the normal distribution of the residuals.
Hypotheses and test statistics
The null and alternative hypotheses are
- (Presence of homoscedasticity) vs. (Presence of heteroscedasticity).
The distribution of the test statistic results as
with the number of observations in the th part and the number of estimated regression parameters as well
- .
The null hypothesis (homoscedasticity) is rejected if the test value is greater than the critical value from the F-distribution with and degrees of freedom and a predefined level of significance .
example
variable | meaning |
---|---|
medv | Median purchase price of a house in US $ 1000 |
lstat | Proportion of the lower class population |
rm | Average number of rooms |
dis | Weighted distance to the five most important employment centers |
For the example, linear regressions were performed on the Boston Housing data set . The variables on the right were collected for each of the 506 districts and a multiple linear regression was carried out:
- .
If you plot the residuals against the variable dis (graphic above) you can see that the variance of the residuals decreases when the values of dis increase. The data is now divided into two parts: the red and the blue part. Then you fit two regression models and calculate the sum of the squared residuals.
red | |
---|---|
blue | |
Then the test value results from and the critical value for a significance level results from the F-distribution with 108 and 45 degrees of freedom . Since the test value is greater than the critical value, the null hypothesis of homoscedasticity must be rejected.
literature
- William E. Griffiths, R. Carter Hill, George G. Judge: Learning and Practicing Econometrics . 1st edition. 1993, ISBN 0-471-51364-4 , p. 494 ff.
Individual evidence
- ↑ Stephen M. Goldfeld, RE Quandt: Some Tests for Homoscedasticity . In: Journal of the American Statistical Association . 60, No. 310, June 1965, pp. 539-547. JSTOR 2282689 . doi : 10.1080 / 01621459.1965.10480811 .