Spurious correlation
Spurious correlation or ( Engl. ) Spurious relationship indicates a correlation ( correspondence or correspondence ) between two variables, which no causal relationship , but only a random or indirect relationship is based.
The German term is misleading because it actually means pseudo causality. Because there is not only an apparent, but actually a correlation (but no causality; for the delimitation of the concepts see: Correlation and causal relationship ). Anyway, a correlation is a statistical term that neither implies nor is it implied by causality.
Spurious correlation is the statistical correlation of the subject in the philosophy fallacy correlation does not imply causation (common occurrence does not imply any causality or engl. Correlation, not causation ).
A spurious correlation comes e.g. For example, if confounding variables (interfering variables) or other intervening variables have an influence. The phenomenon has been known since the dawn of statistics; the term spurious correlation was coined in 1954 by Herbert A. Simon .
example
A well-known example is the correlation between the human birth rate and the number of pairs of storks in different European regions. Although there is a correlation between the number of births and the number of pairs of storks, there is no causal relationship. The correlation between births and stork pairs results from the fact that more storks nest in rural regions and that more children tend to be born per couple.
Sham regression
The sham regression is a special case of regression , in which a statistically significant connection between a variable and a variable can be determined, which cannot be substantiated logically. Sham regressions are due to a common trend in the variables involved . An indication of sham regression is a high coefficient of determination and a Durbin-Watson coefficient of almost zero (high positive first-order autocorrelation ). In addition, if the Dickey-Fuller test identifies a time series as non-stationary , it is an indication of a sham regression.
An example in the applications is the spurious regression problem of econometrics , which Clive WJ Granger and Paul Newbold pointed out in 1974, according to which two independent random walks without a deterministic trend component (or other forms of stochastic processes with roots of unity ) correlate, although even stochastic Independence exists. To put it more precisely, such violations of the prerequisites of a regression model caused by autocorrelation lead, for example, to the fact that the test statistics for the hypothesis that the slope parameter of the regression line is equal to zero ( t statistic ) diverge with increasing data volume, i.e. if only enough data is collected, a connection is always established.
See also
- Third variable control
- Interaction effect
- Intervening variable
- Mediator variable
- Moderator variable
- Bogus causality
- Disruptive factor
- Big data
literature
General literature
- Günter Bamberg, Franz Baur, Michael Krapp: Statistics. 13th edition. Oldenbourg Wissenschaftsverlag, 2007, ISBN 978-3-486-58188-1 .
- Udo Kelle: The integration of qualitative and quantitative methods in empirical social research: theoretical foundations and methodological concepts. VS Verlag, 2007, ISBN 978-3-531-15312-4 , p. 203.
Original work
- Herbert A. Simon: Spurious correlation: a causal interpretation. In: Journal of the American Statistical Association. Vol. 49, 1954, pp. 467-479, doi: 10.1080 / 01621459.1954.10483515 JSTOR 2281124 .
- Clive WJ Granger, Paul Newbold: Spurious regressions in econometrics. In: Journal of Econometrics. No. 2, 1974, pp. 111-120, doi: 10.1016 / 0304-4076 (74) 90034-7 .
Web links
- Econometrics at the University of Illinois: Econ 508 - Fall 2007. e-Tutorial 10: Monte Carlo Simulation and Nonlinear Regression
- Stork problem , link broken on January 5, 2019.
- Collection of spurious correlations
- spurious-correlations
Individual evidence
- ^ R. Matthews: Storks deliver babies (p = 0.008). In: Teaching Statistics. 22 (2), 2000, pp. 36-38, doi: 10.1111 / 1467-9639.00013 .
- ^ Christopher Dougherty: Introduction to Econometrics. 3. Edition. Oxford University Press, 2007, ISBN 978-0-19-928096-4 , p. 388. (Google Books Link)