Effect size

from Wikipedia, the free encyclopedia

Effect size (also effect size ) describes the size of a statistical effect. It can be used to illustrate the practical relevance of statistically significant results. Different effect measures are used to measure the strength of the effect .

definition

Different measures of the effect strength are in use. According to Cohen, the following should apply to a measure of the effect size:

  1. It is a dimensionless number,
  2. it does not depend on the unit of measurement of the original data,
  3. In contrast to test statistics, it is independent of the sample size and
  4. its value should be close to zero if the null hypothesis of the associated test was not rejected.

example

The intelligence performance of children who were taught using a new method is compared with children who were taught using the conventional method. If a very large number of children were recorded per sample, differences of 0.1 IQ points between the groups can become significant. A difference of 0.1 IQ points hardly means an improvement despite a significant test result.

Purely on the basis of the significance ( p-value ) of the result, the conclusion could be that the new method causes better intelligence performance, and the old teaching method would be abolished at possibly high cost, although the actual effect achieved - an increase of 0.1 points - hardly justifies this effort.

Use in research

Effect size in experiments (especially in medicine , the social sciences and psychology ) describes the extent of the effect of an experimental factor. In regression models, it serves as an indicator of the influence of a variable on the declared variable. Effect sizes are calculated in meta-analyzes in order to be able to compare the results of different studies to a uniform degree - the effect size.

On the one hand, the effect size can be calculated after an investigation in order to be able to compare differences between groups on a standardized basis. However, it can also be useful to set up an effect size as a minimum effect size before carrying out an investigation or before carrying out a test. If a statistical test is carried out, the null hypothesis can practically always be rejected if only a sufficiently large number of measurement results are included. If the sample size is sufficiently large, the test is practically always significant.

Effect size and statistical significance

In the practical application of statistical tests, a small p value is often associated with a high effect size. It is indeed the case that while maintaining the other parameters of a test situation (sample size, selected significance level, required selectivity ), a smaller p value is associated with a larger effect size. However, this is only the probability of error and its specific value depends on the respective statistical test (or the underlying distributions) and the sample size (larger samples systematically generate smaller p values), so that it can be used, for example, for comparisons between the results of different tests or samples of different sizes is not meaningful. However, a measure of the effect size is expected to be useful for such comparisons.

It is Z. B. when performing a meta-analysis - it is possible to determine an assigned effect size from a reported probability of error if the sample size is known. A statistical test essentially consists of using a special (meaningfully non-central) sample distribution for the test statistic used (e.g. the F test for an analysis of variance or the t test ) to check whether the empirically found value of the statistic It is plausible (or implausible) if one assumes that a specific null hypothesis to be checked is correct. The effect size of the test result can then be calculated from the given probability of error , the information about the sample size and other required parameters of the selected distribution. In a similar way, a reported compliance level can be used to provide an estimate of how large the effect size must have been at least in order for the reported significance level to be maintained for a given sample size.

In Fisher's test theory, the p -value can represent an effect quantity , since a small p -value is interpreted as a high probability for the research hypothesis to apply . Due to the standardization of the test statistics, however, each effect can be significantly "made" by increasing the sample size. Under Neyman-Pearson , however, the fact must be taken into account that an acceptance of the research hypothesis is always accompanied by a rejection of the null hypothesis. A result that becomes highly significant under the null hypothesis can be even more improbable under the research hypothesis, since the selectivity is extremely reduced. The p -value is therefore not suitable as an effect size , since the effect in the research hypothesis may be too small to have any practical significance.

Measures for the strength of the effect

Bravais-Pearson correlation coefficient

The Bravais-Pearson correlation coefficient is one of the most widely used and oldest measures of effect sizes in regression models . It naturally fulfills the requirements that Cohen placed on an effect size.

According to Cohen, indicates a small effect, a medium effect and a strong effect.

Alternatively, the coefficient of determination can be used.

Cohen's d

Cohen's d is the effect size for mean differences between two groups with the same group sizes and the same group variances and helps to assess the practical relevance of a significant mean difference (see also t-test ):

As an estimator for equal group sizes and different variances, Cohen

given, the respective mean from the two samples and the estimated variances from the two samples according to the equation

describe.

According to Cohen, an effect between 0.2 and 0.5 means a small effect, between 0.5 and 0.8 a medium effect and a value greater than 0.8 a strong effect.

Unequal group sizes and group variances

Authors other than Cohen estimate the standard deviation using the pooled variance as

With

Conversion in r

If the affiliation to one sample is coded with zero and the other with one, a correlation coefficient can be calculated. It arises from Cohens as

.

In contrast to Cohen's , the correlation coefficient has an upper limit of one. Cohen suggested speaking of a weak effect from r = 0.10, a medium effect from r = 0.30 and a strong effect from r = 0.50. Depending on the context of the content, this classification has now been revised. For psychology, for example, it could be empirically shown that r = 0.05 a very small, r = 0.10 a small, r = 0.20 a medium, r = 0.30 a large and r≥0.40 a very great effect.

Glass' Δ

Glass (1976) suggested using only the standard deviation of the second group

The second group is considered here as a control group. If comparisons are made with several experimental groups, then it is better to estimate from the control group so that the effect size does not depend on the estimated variances of the experimental groups.

However, assuming unequal variances in both groups, the pooled variance is the better estimator.

Hedges g

Larry Hedges proposed another modification in 1981. It is the same approach as Cohen's d, with a correction for the pooled standard deviation. Unfortunately, the terminology is often imprecise. This corrected effect size was originally also called d. Hedges g is also called Cohens . Cohen's d and Hedges g are largely comparable, but Hedges' modification is considered more error-prone. In particular, Hedges g does not provide unbiased estimators for small samples, but can be corrected. Hedges g can be useful when the sample sizes are different.

Hedges g is calculated as follows:

and

gives a biased estimate of the effect size. An undistorted estimator g * can be obtained by the following correction:

and

yields an undistorted estimator that is more suitable for calculating the confidence intervals of the effect sizes of sample differences than Cohen's d, which estimates the effect size in the population . refers to the gamma function.

Cohen's f 2

Cohen's is a measure of the effect size in the analysis of variance or the F-test and regression analysis .

Regression analysis

The effect size is calculated

with the coefficients of determination with all variables of the regression model and without the variable to be tested. If only the common effect of all variables is of interest, the above formula is reduced to

According to Cohen, indicates a small effect, a medium effect and a strong effect.

F test or analysis of variance

The effect size is calculated for groups as

with an estimate of the standard deviation within groups. According to Cohen, indicates a small effect, a medium effect and a strong effect.

Partial eta square

The size of the effect can also be specified using the partial eta square. The calculation is as follows:

with as sum of squares of the respective effect to be determined and as sum of squares of residuals . If you multiply the partial eta-square by 100, it can be used to interpret the explanation of the variance . The measure then indicates how much variance in the dependent variable is explained as a percentage by the independent variable. By default, IBM's SPSS program calculates partial eta-square when analyzing variance. In older program versions, this was incorrectly referred to as the eta square. In a one-way analysis of variance, there is no difference between eta-square and partial eta-square. Once a multi-factor analysis of variance is calculated, the partial eta-square must be calculated.

However, the eta-square as an effect size measure overestimates the proportion of the explained variance. Quickly u. a. and Bortz recommend instead using the population effect estimator , which is calculated by Cohen as follows:

Cramers Phi, Cramers V and Cohens w

A measure of the effect size can be calculated not only on the basis of differences in mean or variance, but also in relation to probabilities. For more on this, see page 4. In this case, the numbers in a crosstab, which contains probabilities instead of absolute frequencies, are calculated and the root is taken from them. The result is Cohen's :

It is the number of categories of the column variable, the number of categories of the row variable, the observed probability ij in the cell and the expected probability ij in the cell Expected cells probabilities are calculated by dividing the respective corresponding edge probabilities are multiplied together. To calculate see and Cohen and S. 6. As included in crosstabs that are not absolute frequencies, but chances at the point at which normally is to find the number of cases, always one is, can take also be calculated, which is numerically identical:

It is also numerically identical when calculating with reference to crosstabs that contain probabilities , where the number of rows, the number of columns and the smaller of the two numbers.

For Cohens , the conventional value 0.1 is considered small, 0.3 as medium and 0.5 as large.

Small, medium and large effect sizes

The previously specified values ​​for small, medium or large effect sizes depend heavily on the subject. Cohen chose the values ​​as part of his analysis and social science custom.

“This is an elaborate way to arrive at the same sample size that has been used in past social science studies of large, medium, and small size (respectively). The method uses a standardized effect size as the goal. Think about it: for a "medium" effect size, you'll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. "Medium" is definitely not the message! "

“This is a complicated way of getting to the same sample sizes that have been used in the past in large, medium, and small social science studies. This method aims at a standardized effect size. Let's think about it: For a "medium" effect size, we choose the same sample size regardless of the accuracy or reliability of the instrument, the similarity or the differences between the objects to be examined. Of course, important aspects of the investigation are ignored here. "Medium" is hardly the message! "

- RV Lenth :

They are therefore only accepted by many researchers as guidelines, or are critically questioned. An empirical study of the frequency of effect sizes in differential psychology has shown that Cohen's classification of the Pearson correlations (small = 0.10; medium = 0.30; large = 0.50) do not adequately reflect the findings in this research area. Only in less than 3% of the study results used (a total of 708 correlations) could an effect size of at least be observed. Based on this investigation, it is recommended to interpret this area as small, medium and large effect size.

See also

literature

  • Wynne W. Chin: The Partial Least Squares Approach to Structural Equation Modeling. In: George A. Marcoulides (Ed.): Modern Methods for Business Research. Lawrence Erlbaum Associates, Mahwah 1998, pp. 295-336.
  • Jacob Cohen: A power primer. In: Psychological Bulletin. 112, 1992, pp. 155-159.
  • Oswald Huber: The Psychological Experiment. Bern u. a 2000.
  • Brigitte Maier-Riehle, Christian Zwingmann: Effect size variants in one-group pre-post design: A critical consideration. In: Rehabilitation. 39, 2000, pp. 189-199.
  • Rainer Schnell, Paul B. Hill, Elke Esser: Methods of empirical social research. Munich / Vienna 1999.
  • Jürgen Bortz, Nicola Döring: Research methods and evaluation. 2nd Edition. Springer, Berlin a. a. 1996, ISBN 3-540-59375-6 .

Web links

Individual evidence

  1. a b c d e f g h J. Cohen: Statistical Power Analysis for the Behavioral Sciences. 2nd Edition. Lawrence Erlbaum Associates, Hillsdale 1988, ISBN 0-8058-0283-5 .
  2. W. Lenhard: Calculation of the effect sizes d (Cohen, 2001), dkorr (after Klauer, 2001), d from t-tests, r, eta-square and conversion of various measures: Psychometrica. In: psychometrica.de. Retrieved April 28, 2016 .
  3. ^ J. Hartung, G. Knapp, BK Sinha: Statistical Meta-Analysis with Application. Wiley, New Jersey 2008, ISBN 978-0-470-29089-7 .
  4. Funder, DC, & Ozer, DJ: Evaluating Effect Size in Psychological Research: Sense and Nonsense. In: Advances in Methods and Practices in Psychological Science. 2, 2019, pp. 156-168. doi : 10.1177 / 2515245919847202
  5. ^ LV Hedges: Distribution theory for Glass's estimator of effect size and related estimators. In: Journal of Educational Statistics. 6, (2) 1981, pp. 107-128. doi : 10.3102 / 10769986006002107
  6. Comparison of groups with different sample size (Cohen's d, Hedges' g) - Explanation and calculation by Hedges g.
  7. ^ Markus Bühner , Matthias Ziegler: Statistics for psychologists and social scientists. Pearson Germany, 2009, p. 175.
  8. Henriette Reinecke: Clinical relevance of the therapeutic reduction of chronic non-tumor-related pain. Logos Verlag, Berlin 2010, p. 49.
  9. ^ Markus Bühner, Matthias Ziegler: Statistics for psychologists and social scientists. Pearson Germany, 2009, p. 175.
  10. ^ Paul D. Ellis: The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge University Press, 2010, p. 10.
  11. Jürgen Margraf: Costs and Benefits of Psychotherapy. A critical evaluation of the literature. 2009, p. 15.
  12. ^ A b B. Rasch, M. Friese, W. Hofmann, E. Naumann: Quantitative methods 2. Introduction to statistics for psychologists and social scientists. Springer, Heidelberg 2010, pp. 78/79.
  13. J. Bortz: Statistics for social and human scientists. Springer, Heidelberg 2005, pp. 280/281.
  14. a b Dirk Wentura: A small guide to test strength analysis. Saarbrücken: Department of Psychology at Saarland University 2004, (online)
  15. ^ Hans Benninghau: Statistics for Sociologists 1. Descriptive Statistics. (= Teubner study scripts. 22). Teubner, Stuttgart 1989, p. 100ff.
  16. ^ A b Jürgen Bortz: Statistics for human and social scientists. Springer, Heidelberg 2005, pp. 167-168.
  17. RV Lenth: Java applets for power and sample size. Division of Mathematical Sciences, the College of Liberal Arts or The University of Iowa, 2006, accessed December 26, 2008.
  18. ^ Jacob Cohen: A power primer. Accessed April 30, 2020 (English).
  19. GE Gignac, ET Szodorai: Effect size guidelines for individual differences researchers. In: Personality and Individual Differences. 102, 2016, pp. 74-78. doi : 10.1016 / j.paid.2016.06.069