Wilcoxon-Mann-Whitney test

from Wikipedia, the free encyclopedia

The Wilcoxon-Mann-Whitney test (also: Mann-Whitney U test , U test , Wilcoxon rank sum test ) is the collective name for two nonparametric statistical tests for rank data ( ordinally scaled data ). They test whether, when considering two populations, it is equally likely that a randomly selected value from one population is greater or smaller than a randomly selected value from the other population. If this hypothesis is rejected, it can be assumed that the values ​​from one population tend to be larger or smaller than those from the other population. The Mann-Whitney U test or Wilcoxon rank sum test is - unlike the median test - not a priori a test for the equality of two medians. This is only the case provided that the form of distribution and scatter of the dependent variable are the same in both groups.

The tests were developed by Henry Mann and Donald Whitney (U-Test, 1947) and Frank Wilcoxon (Wilcoxon Rank Sum Test, 1945), respectively . The central idea of ​​the test was developed in 1914 by the German educator Gustaf Deuchler .

In practice, the Wilcoxon rank sum test or the U-test is used as an alternative to the t-test for independent samples if its prerequisites are violated. This is the case, among other things, if the variable to be tested only has the ordinal scale level, or if interval-scaled variables are not (approximately) normally distributed in the two populations.

The Wilcoxon rank sum test for two independent samples is not to be confused with the Wilcoxon signed rank test , which is used for two connected (paired) samples.

Assumptions

  • There are independent samples from and from , which are also independent of one another.

Test statistics

For testing the hypotheses of the Wilcoxon-Mann-Whitney test

there are two test statistics: the Mann-Whitney U statistic and the Wilcoxon rank sum statistic . Because of the relationship between the test statistics

the Wilcoxon rank sum test and the Mann-Whitney U test are equivalent.

Mann-Whitney U Statistics

The Mann-Whitney U test statistic is

,

where is , if , if , and otherwise . Depending on the alternative hypothesis, the null hypothesis is rejected for too small or too large values ​​of . This is the form found in Mann and Whitney and is often referred to as the Mann-Whitney U test .

Exact critical values

Exact critical values ​​are only available in tabular form and can be taken from the table below for small sample sizes ( for the two-sided test and the one-sided test ).

There is a recursion formula that allows the critical values ​​for small sample sizes to be determined step-by-step and with little computing time.

Approximate critical values

For , and can

can be approximated by the normal distribution. The critical values ​​then result from the critical values ​​of the approximate normal distribution.

Wilcoxon rank sum statistics

The Wilcoxon rank sum statistic is

with the rank of the ith X in the pooled, ordered sample. In this form, the test is often called the Wilcoxon rank sum test .

Exact critical values

The exact distribution of under the condition of the null hypothesis can easily be found by means of combinatorial considerations. However, the computational effort for large values increases rapidly from. The exact critical values for the significance level can be calculated using a recursion formula:

(or or or )

The formula arises when one conditioned on the condition whether the last value in the arrangement is an X (... X) or a Y (... Y).

Approximate critical values

For or (also: or ) the test statistic

can be approximated by the normal distribution . The critical values ​​then result from the critical values ​​of the approximate normal distribution.

One-sided hypotheses

The test can also be used for the one-sided hypotheses

or.

be formulated.

Derived hypotheses

The test is particularly interesting because if the null or alternative hypothesis is accepted or rejected, the following null and alternative hypotheses (under the conditions listed below) can also be accepted or rejected:

,

d. H. the mean values ​​of the distributions A and B differ.

,

d. H. the medians of the distributions A and B differ.

Requirements:

  • The random variables and have continuous distribution functions or , which differ from each other only by one shift , that is:
.
Because the two distribution functions are the same except for the shift, (homogeneity of variance) must apply in particular . I.e. if the homogeneity of variance is rejected by the Bartlett test or Levene test , the two random variables X and Y differ not only in terms of a shift.

If the prerequisites for the hypothesis about the medians are not met, the median test can be used .

example

From the data of the General Population Survey of the Social Sciences 2006, 20 people were randomly drawn and their net income was determined:

rank 1 2 3 4th 5 6th 7th 8th 9 10 11 12 13 14th 15th 16 17th 18th 19th 20th
Net income 0 400 500 550 600 650 750 800 900 950 1000 1100 1200 1500 1600 1800 1900 2000 2200 3500
gender M. W. M. W. M. W. M. M. W. W. M. M. W. M. W. M. M. M. M. M.

You have two samples in front of you, sample of men with values ​​and sample of women with values. We could now check whether the income of men and women is equal (two-sided test) or the income of women is less (one-sided test) with the distribution function of the income of men and the distribution function of the income of women. We look at the tests here

Two-sided test One-sided test

First, a test variable is formed from both series of numbers :

and are the numbers of values ​​per sample, and are the respective sums of all ranking numbers per sample. (If several values ​​are identical in both data sets, the median or the arithmetic mean must be entered for their ranks .) For the following tests, the minimum of and is required .

For our example we get (index M = men, W = women)

and .
and and
.

If the calculation is correct, or must apply . The test variable is now compared with the critical value (s). The example has been chosen so that a comparison with the exact critical values ​​as well as with the approximate values ​​is possible.

Two-sided test

Exact critical values

Using the table below, with and a critical value of for a significance level of . The null hypothesis is rejected if is; but this is not the case here.

Approximate critical values

Since the test statistic is distributed approximately normally, it follows that the

is distributed. For a significance level of the non-rejection region of the null hypothesis in the two-sided test by 2.5% is - and 97.5% quantile of the standardized normal distribution with . It turns out , however , i. H. the test value is within the interval and the null hypothesis cannot be rejected.

One-sided test

Exact critical values

Based on the table below, with and a critical value of for a significance level of ( different significance level than in the two-sided test! ). The null hypothesis is rejected if is; but this is not the case here.

Approximate critical values

For a significance level of , the critical value results as the 5% quantile of the standard normal distribution and the non-rejection range of the null hypothesis as . It turns out , however , i. H. the null hypothesis cannot be rejected.

Table of critical values ​​of the Mann-Whitney U statistic

The following table is valid for (two-sided) or (one-sided) with . The entry “-” means that the null hypothesis cannot be rejected in any case at the given level of significance. E.g. is:

1 2 3 4th 5 6th 7th 8th 9 10 11 12 13 14th 15th 16 17th 18th 19th 20th 21st 22nd 23 24 25th 26th 27 28 29 30th 31 32 33 34 35 36 37 38 39 40
1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 0 0
2 - - - - - - 0 0 0 0 1 1 1 1 1 2 2 2 2 3 3 3 3 3 4th 4th 4th 4th 5 5 5 5 5 6th 6th 6th 6th 7th 7th
3 - - 0 1 1 2 2 3 3 4th 4th 5 5 6th 6th 7th 7th 8th 8th 9 9 10 10 11 11 12 13 13 14th 14th 15th 15th 16 16 17th 17th 18th 18th
4th 0 1 2 3 4th 4th 5 6th 7th 8th 9 10 11 11 12 13 14th 15th 16 17th 17th 18th 19th 20th 21st 22nd 23 24 24 25th 26th 27 28 29 30th 31 31
5 2 3 5 6th 7th 8th 9 11 12 13 14th 15th 17th 18th 19th 20th 22nd 23 24 25th 27 28 29 30th 32 33 34 35 37 38 39 40 41 43 44 45
6th 5 6th 8th 10 11 13 14th 16 17th 19th 21st 22nd 24 25th 27 29 30th 32 33 35 37 38 40 42 43 45 46 48 50 51 53 55 56 58 59
7th 8th 10 12 14th 16 18th 20th 22nd 24 26th 28 30th 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74
8th 13 15th 17th 19th 22nd 24 26th 29 31 34 36 38 41 43 45 48 50 53 55 57 60 62 65 67 69 72 74 77 79 81 84 86 89
9 17th 20th 23 26th 28 31 34 37 39 42 45 48 50 53 56 59 62 64 67 70 73 76 78 81 84 87 89 92 95 98 101 103
10 23 26th 29 33 36 39 42 45 48 52 55 58 61 64 67 71 74 77 80 83 87 90 93 96 99 103 106 109 112 115 119
11 30th 33 37 40 44 47 51 55 58 62 65 69 73 76 80 83 87 90 94 98 101 105 108 112 116 119 123 127 130 134
12 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149
13 45 50 54 59 63 67 72 76 80 85 89 94 98 102 107 111 116 120 125 129 133 138 142 147 151 156 160 165
14th 55 59 64 69 74 78 83 88 93 98 102 107 112 117 122 127 131 136 141 146 151 156 161 165 170 175 180
15th 64 70 75 80 85 90 96 101 106 111 117 122 127 132 138 143 148 153 159 164 169 174 180 185 190 196
16 75 81 86 92 98 103 109 115 120 126 132 137 143 149 154 160 166 171 177 183 188 194 200 206 211
17th 87 93 99 105 111 117 123 129 135 141 147 154 160 166 172 178 184 190 196 202 209 215 221 227
18th 99 106 112 119 125 132 138 145 151 158 164 171 177 184 190 197 203 210 216 223 230 236 243
19th 113 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217 224 231 238 245 252 258
20th 127 134 141 149 156 163 171 178 186 193 200 208 215 222 230 237 245 252 259 267 274

implementation

In many software packages, the Mann-Whitney-Wilcoxon test (the hypothesis of equal distributions versus suitable alternatives) is poorly documented. Some packages mishandle bindings or fail to document asymptotic techniques (e.g., fix for continuity). During a review in 2000, some of the following packages were discussed:

Individual evidence

  1. ^ Frank Wilcoxon: Individual Comparisons by Ranking Methods. In: Biometrics Bulletin. 1, 1945, pp. 80-83, JSTOR 3001968 .
  2. ^ Henry Mann, Donald Whitney: On a test of whether one of two random variables is stochastically larger than the other. In: Annals of mathematical Statistics. 18, 1947, pp. 50-60, doi: 10.1214 / aoms / 1177730491 .
  3. ^ William H. Kruskal: Historical Notes on the Wilcoxon Unpaired Two-Sample Test. In: Journal of the American Statistical Association. Vol. 52, 1957, pp. 356-360, JSTOR 2280906
  4. A. Löffler: About a partition of natural numbers and their application in the U-test. In: Wiss. Z. Univ. Hall. Volume XXXII, Issue 5 1983, pp. 87-89. (lms.fu-berlin.de)
  5. B. Rönz, HG Strohe (Ed.): Lexicon Statistics. Gabler, Wiesbaden 1994, ISBN 3-409-19952-7 .
  6. ^ H. Rinne: Pocket book of statistics. 3. Edition. Verlag Harri Deutsch, 2003, p. 534.
  7. ^ S. Kotz, CB Read, N. Balakrishnan: Encyclopedia of Statistical Sciences. Wiley, Volume?, 2003, p. 208.
  8. Reinhard Bergmann, John Ludbrook, Will PJM Spooren: Different Outcomes of the Wilcoxon-Mann-Whitney test from Different Statistics packages . In: The American Statistician . tape 54 , no. 1 , 2000, pp. 72-77 , doi : 10.1080 / 00031305.2000.10474513 , JSTOR : 2685616 (English).
  9. scipy.stats.mannwhitneyu . In: SciPy v0.16.0 Reference Guide . The Scipy community. July 24, 2015 .: "scipy.stats.mannwhitneyu (x, y, use_continuity = True): Computes the Mann – Whitney rank test on samples x and y."
  10. org.apache.commons.math3.stat.inference.MannWhitneyUTest .

literature

  • Herbert Büning, Götz Trenkler: Nonparametric statistical methods. de Gruyter, 1998, ISBN 3-11-016351-9 .
  • Sidney Siegel: Nonparametric Statistical Methods. 2nd Edition. Specialized bookstore for psychology, Eschborn near Frankfurt am Main 1985, ISBN 3-88074-102-6 .

Web links