Spurious correlation

from Wikipedia, the free encyclopedia
The articles Correlation # Correlation and causal connection , spurious correlation and Cum hoc ergo propter hoc overlap thematically. Help me to better differentiate or merge the articles (→  instructions ) . To do this, take part in the relevant redundancy discussion . Please remove this module only after the redundancy has been completely processed and do not forget to include the relevant entry on the redundancy discussion page{{ Done | 1 = ~~~~}}to mark. Xiooix ( discussion ) 15:58, Feb. 21, 2015 (CET)
The articles spurious correlation , intervening variable and disruptive factor overlap thematically. Help me to better differentiate or merge the articles (→  instructions ) . To do this, take part in the relevant redundancy discussion . Please remove this module only after the redundancy has been completely processed and do not forget to include the relevant entry on the redundancy discussion page{{ Done | 1 = ~~~~}}to mark. Zulu55 ( discussion ) ignorance 20:27, 10 Dec. 2013 (CET)
The number of storks and the human birth rate in a country: an example of a spurious correlation

Spurious correlation or ( Engl. ) Spurious relationship indicates a correlation ( correspondence or correspondence ) between two variables, which no causal relationship , but only a random or indirect relationship is based.

The German term is misleading because it actually means pseudo causality. Because there is not only an apparent, but actually a correlation (but no causality; for the delimitation of the concepts see: Correlation and causal relationship ). Anyway, a correlation is a statistical term that neither implies nor is it implied by causality.

Spurious correlation is the statistical correlation of the subject in the philosophy fallacy correlation does not imply causation (common occurrence does not imply any causality or engl. Correlation, not causation ).

A spurious correlation comes e.g. For example, if confounding variables (interfering variables) or other intervening variables have an influence. The phenomenon has been known since the dawn of statistics; the term spurious correlation was coined in 1954 by Herbert A. Simon .

example

A well-known example is the correlation between the human birth rate and the number of pairs of storks in different European regions. Although there is a correlation between the number of births and the number of pairs of storks, there is no causal relationship. The correlation between births and stork pairs results from the fact that more storks nest in rural regions and that more children tend to be born per couple.

Sham regression

The sham regression is a special case of regression , in which a statistically significant connection between a variable and a variable can be determined, which cannot be substantiated logically. Sham regressions are due to a common trend in the variables involved . An indication of sham regression is a high coefficient of determination and a Durbin-Watson coefficient of almost zero (high positive first-order autocorrelation ). In addition, if the Dickey-Fuller test identifies a time series as non-stationary , it is an indication of a sham regression.

Regression line between two independent AR (1) processes with unit root including statistics.

An example in the applications is the spurious regression problem of econometrics , which Clive WJ Granger and Paul Newbold pointed out in 1974, according to which two independent random walks without a deterministic trend component (or other forms of stochastic processes with roots of unity ) correlate, although even stochastic Independence exists. To put it more precisely, such violations of the prerequisites of a regression model caused by autocorrelation lead, for example, to the fact that the test statistics for the hypothesis that the slope parameter of the regression line is equal to zero ( t statistic ) diverge with increasing data volume, i.e. if only enough data is collected, a connection is always established.

See also

literature

General literature

  • Günter Bamberg, Franz Baur, Michael Krapp: Statistics. 13th edition. Oldenbourg Wissenschaftsverlag, 2007, ISBN 978-3-486-58188-1 .
  • Udo Kelle: The integration of qualitative and quantitative methods in empirical social research: theoretical foundations and methodological concepts. VS Verlag, 2007, ISBN 978-3-531-15312-4 , p. 203.

Original work

Web links

Individual evidence

  1. ^ R. Matthews: Storks deliver babies (p = 0.008). In: Teaching Statistics. 22 (2), 2000, pp. 36-38, doi: 10.1111 / 1467-9639.00013 .
  2. ^ Christopher Dougherty: Introduction to Econometrics. 3. Edition. Oxford University Press, 2007, ISBN 978-0-19-928096-4 , p. 388. (Google Books Link)