Wilcoxon-Mann-Whitney test

The Wilcoxon-Mann-Whitney test (also: Mann-Whitney U test , U test , Wilcoxon rank sum test ) is the collective name for two nonparametric statistical tests for rank data ( ordinally scaled data ). They test whether, when considering two populations, it is equally likely that a randomly selected value from one population is greater or smaller than a randomly selected value from the other population. If this hypothesis is rejected, it can be assumed that the values from one population tend to be larger or smaller than those from the other population. The Mann-Whitney U test or Wilcoxon rank sum test is - unlike the median test - not a priori a test for the equality of two medians. This is only the case provided that the form of distribution and scatter of the dependent variable are the same in both groups.

The tests were developed by Henry Mann and Donald Whitney (U-Test, 1947) and Frank Wilcoxon (Wilcoxon Rank Sum Test, 1945), respectively . The central idea of the test was developed in 1914 by the German educator Gustaf Deuchler .

In practice, the Wilcoxon rank sum test or the U-test is used as an alternative to the t-test for independent samples if its prerequisites are violated. This is the case, among other things, if the variable to be tested only has the ordinal scale level, or if interval-scaled variables are not (approximately) normally distributed in the two populations.

The Wilcoxon rank sum test for two independent samples is not to be confused with the Wilcoxon signed rank test , which is used for two connected (paired) samples.

Assumptions

There are independent samples from and from , which are also independent of one another. ${\ displaystyle X_ {1}, \ dots, X_ {m}}$ ${\ displaystyle X}$ ${\ displaystyle Y_ {1}, \ dots, Y_ {n}}$ ${\ displaystyle Y}$

Test statistics

For testing the hypotheses of the Wilcoxon-Mann-Whitney test

{\ displaystyle H_ {0}: a = 0 {\ text {vs. }} H_ {1}: a \ neq 0}

there are two test statistics: the Mann-Whitney U statistic and the Wilcoxon rank sum statistic . Because of the relationship between the test statistics ${\ displaystyle U}$ ${\ displaystyle W_ {m, n}}$

{\ displaystyle W_ {m, n} = U + {\ frac {m (m + 1)} {2}}}

the Wilcoxon rank sum test and the Mann-Whitney U test are equivalent.

Mann-Whitney U Statistics

The Mann-Whitney U test statistic is

{\ displaystyle U = \ sum _ {i = 1} ^ {m} \ sum _ {j = 1} ^ {n} S (X_ {i}, Y_ {j})}

,

where is , if , if , and otherwise . Depending on the alternative hypothesis, the null hypothesis is rejected for too small or too large values of . This is the form found in Mann and Whitney and is often referred to as the Mann-Whitney U test . ${\ displaystyle S (X, Y) = 1}$ ${\ displaystyle Y <X}$ ${\ displaystyle S (X, Y) = {\ frac {1} {2}}}$ ${\ displaystyle Y = X}$ ${\ displaystyle S (X, Y) = 0}$ ${\ displaystyle U}$

Exact critical values

Exact critical values are only available in tabular form and can be taken from the table below for small sample sizes ( for the two-sided test and the one-sided test ). ${\ displaystyle \ alpha = 5 \, \%}$ ${\ displaystyle \ alpha = 2.5 \, \%}$

There is a recursion formula that allows the critical values for small sample sizes to be determined step-by-step and with little computing time.

Approximate critical values

For , and can ${\ displaystyle m> 3}$ ${\ displaystyle n> 3}$ ${\ displaystyle m + n> 19}$

{\ displaystyle U \ approx N \ left ({\ frac {m \, n} {2}}; {\ frac {n \, m \, (n + m + 1)} {12}} \ right)}

can be approximated by the normal distribution. The critical values then result from the critical values of the approximate normal distribution.

Wilcoxon rank sum statistics

The Wilcoxon rank sum statistic is

{\ displaystyle W_ {m, n} = \ sum _ {i = 1} ^ {m} R (X_ {i})}

with the rank of the ith X in the pooled, ordered sample. In this form, the test is often called the Wilcoxon rank sum test . ${\ displaystyle R (X_ {i})}$

Exact critical values

The exact distribution of under the condition of the null hypothesis can easily be found by means of combinatorial considerations. However, the computational effort for large values increases rapidly from. The exact critical values for the significance level can be calculated using a recursion formula: ${\ displaystyle W_ {m, n}}$ ${\ displaystyle m, n}$ ${\ displaystyle w}$ ${\ displaystyle \ alpha}$

{\ displaystyle P (W_ {m-1, n} = w) = \ alpha}

(or or or )

{\ displaystyle = \ alpha / 2}

{\ displaystyle = 1- \ alpha}

{\ displaystyle = 1- \ alpha / 2}

The formula arises when one conditioned on the condition whether the last value in the arrangement is an X (... X) or a Y (... Y).

{\ displaystyle P (W_ {m, n} = w) = P (W_ {m, n} = w | ... X) P (... X) + P (W_ {m, n} = w | ... Y) P (... Y) = \,}

{\ displaystyle = P (W_ {m-1, n} = wmn) {\ frac {m} {m + n}} + P (W_ {m, n-1} = w) {\ frac {n} { m + n}}}

Approximate critical values

For or (also: or ) the test statistic ${\ displaystyle m> 25}$ ${\ displaystyle n> 25}$ ${\ displaystyle m> 10}$ ${\ displaystyle n> 10}$

{\ displaystyle W_ {m, n} \ approx N \ left ({\ frac {m \, (n + m + 1)} {2}}; {\ frac {n \, m \, (n + m + 1)} {12}} \ right)}

can be approximated by the normal distribution . The critical values then result from the critical values of the approximate normal distribution.

One-sided hypotheses

The test can also be used for the one-sided hypotheses

{\ displaystyle H_ {0}: a \ leq 0 {\ text {vs. }} H_ {1}: a> 0}

or.

{\ displaystyle H_ {0}: a \ geq 0 {\ text {vs. }} H_ {1}: a <0}

be formulated.

Derived hypotheses

The test is particularly interesting because if the null or alternative hypothesis is accepted or rejected, the following null and alternative hypotheses (under the conditions listed below) can also be accepted or rejected:

{\ displaystyle \! H_ {0}: \ mu _ {A} = \ mu _ {B} {\ text {vs. }} H_ {1}: \ mu _ {A} \ neq \ mu _ {B}}

,

d. H. the mean values of the distributions A and B differ.

{\ displaystyle H_ {0}: {\ tilde {x}} _ {A} = {\ tilde {x}} _ {B} {\ text {vs. }} H_ {1}: {\ tilde {x}} _ {A} \ neq {\ tilde {x}} _ {B}}

,

d. H. the medians of the distributions A and B differ.

Requirements:

The random variables and have continuous distribution functions or , which differ from each other only by one shift , that is: ${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle F_ {X}}$ ${\ displaystyle F_ {Y}}$ ${\ displaystyle a}$

{\ displaystyle F_ {Y} (x) = F_ {X} (xa) \,}

.

Because the two distribution functions are the same except for the shift, (homogeneity of variance) must apply in particular . I.e. if the homogeneity of variance is rejected by the Bartlett test or Levene test , the two random variables X and Y differ not only in terms of a shift.

{\ displaystyle \ sigma _ {X} = \ sigma _ {Y}}

If the prerequisites for the hypothesis about the medians are not met, the median test can be used .

example

From the data of the General Population Survey of the Social Sciences 2006, 20 people were randomly drawn and their net income was determined:

rank	1	2	3	4th	5	6th	7th	8th	9	10	11	12	13	14th	15th	16	17th	18th	19th	20th
Net income	0	400	500	550	600	650	750	800	900	950	1000	1100	1200	1500	1600	1800	1900	2000	2200	3500
gender	M.	W.	M.	W.	M.	W.	M.	M.	W.	W.	M.	M.	W.	M.	W.	M.	M.	M.	M.	M.

You have two samples in front of you, sample of men with values and sample of women with values. We could now check whether the income of men and women is equal (two-sided test) or the income of women is less (one-sided test) with the distribution function of the income of men and the distribution function of the income of women. We look at the tests here ${\ displaystyle 13}$ ${\ displaystyle 7}$ ${\ displaystyle F}$ ${\ displaystyle G}$

Two-sided test	One-sided test
${\ displaystyle H_ {0}: a = 0 {\ text {vs. }} H_ {1}: a \ neq 0}$	${\ displaystyle H_ {0}: a \ geq 0 {\ text {vs. }} H_ {1}: a <0}$

First, a test variable is formed from both series of numbers : ${\ displaystyle U}$

{\ displaystyle U_ {1} = n_ {1} \ cdot n_ {2} + {\ frac {n_ {1} \ cdot (n_ {1} +1)} {2}} - R_ {1}}

{\ displaystyle U_ {2} = n_ {1} \ cdot n_ {2} + {\ frac {n_ {2} \ cdot (n_ {2} +1)} {2}} - R_ {2}}

${\ displaystyle n_ {1}}$ and are the numbers of values per sample, and are the respective sums of all ranking numbers per sample. (If several values are identical in both data sets, the median or the arithmetic mean must be entered for their ranks .) For the following tests, the minimum of and is required . ${\ displaystyle n_ {2}}$ ${\ displaystyle R_ {1}}$ ${\ displaystyle R_ {2}}$ ${\ displaystyle U_ {1}}$ ${\ displaystyle U_ {2}}$ ${\ displaystyle \ min (U) = \ min (U_ {1}, U_ {2})}$

For our example we get (index M = men, W = women)

{\ displaystyle R_ {M} = 151}

and .

{\ displaystyle U_ {M} = 31}

{\ displaystyle R_ {W} = 59}

and and

{\ displaystyle U_ {W} = 60}

{\ displaystyle \ min (U) = 31}

.

If the calculation is correct, or must apply . The test variable is now compared with the critical value (s). The example has been chosen so that a comparison with the exact critical values as well as with the approximate values is possible. ${\ displaystyle R_ {1} + R_ {2} = (n_ {1} + n_ {2}) (n_ {1} + n_ {2} +1) / 2}$ ${\ displaystyle U_ {1} + U_ {2} = n_ {1} n_ {2}}$ ${\ displaystyle \ min (U)}$

Two-sided test

Exact critical values

Using the table below, with and a critical value of for a significance level of . The null hypothesis is rejected if is; but this is not the case here. ${\ displaystyle n_ {1} = 13}$ ${\ displaystyle n_ {2} = 7}$ ${\ displaystyle U _ {\ text {krit}} = 20}$ ${\ displaystyle \ alpha = 5 \, \%}$ ${\ displaystyle \ min (U) <U _ {\ text {krit}}}$

Approximate critical values

Since the test statistic is distributed approximately normally, it follows that the ${\ displaystyle U}$

{\ displaystyle Z = {\ frac {U - {\ frac {n_ {1} n_ {2}} {2}}} {\ sqrt {\ frac {n_ {1} n_ {2} (n_ {1} + n_ {2} +1)} {12}}}} \ approx N (0; 1)}

is distributed. For a significance level of the non-rejection region of the null hypothesis in the two-sided test by 2.5% is - and 97.5% quantile of the standardized normal distribution with . It turns out , however , i. H. the test value is within the interval and the null hypothesis cannot be rejected. ${\ displaystyle \ alpha = 5 \, \%}$ ${\ displaystyle N (0; 1)}$ ${\ displaystyle [-1 {,} 96; +1 {,} 96]}$ ${\ displaystyle z = {\ tfrac {31-45 {,} 5} {\ sqrt {159 {,} 25}}} \ approx -1 {,} 15}$

One-sided test

Exact critical values

Based on the table below, with and a critical value of for a significance level of ( different significance level than in the two-sided test! ). The null hypothesis is rejected if is; but this is not the case here. ${\ displaystyle n_ {1} = 13}$ ${\ displaystyle n_ {2} = 7}$ ${\ displaystyle U _ {\ text {krit}} = 20}$ ${\ displaystyle \ alpha = 2 {,} 5 \, \%}$ ${\ displaystyle \ min (U) <U _ {\ text {krit}}}$

Approximate critical values

For a significance level of , the critical value results as the 5% quantile of the standard normal distribution and the non-rejection range of the null hypothesis as . It turns out , however , i. H. the null hypothesis cannot be rejected. ${\ displaystyle \ alpha = 5 \, \%}$ ${\ displaystyle N (0; 1)}$ ${\ displaystyle [-1 {,} 65; + \ infty [}$ ${\ displaystyle z = {\ tfrac {31-45 {,} 5} {\ sqrt {159 {,} 25}}} \ approx -1 {,} 15}$

Table of critical values of the Mann-Whitney U statistic

The following table is valid for (two-sided) or (one-sided) with . The entry “-” means that the null hypothesis cannot be rejected in any case at the given level of significance. E.g. is: ${\ displaystyle \ alpha = 5 \, \%}$ ${\ displaystyle \ alpha = 2 {,} 5 \, \%}$ ${\ displaystyle n_ {2} \ leq n_ {1}}$

{\ displaystyle P (U \ geq 55 | H_ {0}, n_ {1} = 20, n_ {2} = 10) = 0 {,} 025.}

	${\ displaystyle n_ {1}}$
${\ displaystyle n_ {2}}$	1	2	3	4th	5	6th	7th	8th	9	10	11	12	13	14th	15th	16	17th	18th	19th	20th	21st	22nd	23	24	25th	26th	27	28	29	30th	31	32	33	34	35	36	37	38	39	40
1	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	0	0
2		-	-	-	-	-	-	0	0	0	0	1	1	1	1	1	2	2	2	2	3	3	3	3	3	4th	4th	4th	4th	5	5	5	5	5	6th	6th	6th	6th	7th	7th
3			-	-	0	1	1	2	2	3	3	4th	4th	5	5	6th	6th	7th	7th	8th	8th	9	9	10	10	11	11	12	13	13	14th	14th	15th	15th	16	16	17th	17th	18th	18th
4th				0	1	2	3	4th	4th	5	6th	7th	8th	9	10	11	11	12	13	14th	15th	16	17th	17th	18th	19th	20th	21st	22nd	23	24	24	25th	26th	27	28	29	30th	31	31
5					2	3	5	6th	7th	8th	9	11	12	13	14th	15th	17th	18th	19th	20th	22nd	23	24	25th	27	28	29	30th	32	33	34	35	37	38	39	40	41	43	44	45
6th						5	6th	8th	10	11	13	14th	16	17th	19th	21st	22nd	24	25th	27	29	30th	32	33	35	37	38	40	42	43	45	46	48	50	51	53	55	56	58	59
7th							8th	10	12	14th	16	18th	20th	22nd	24	26th	28	30th	32	34	36	38	40	42	44	46	48	50	52	54	56	58	60	62	64	66	68	70	72	74
8th								13	15th	17th	19th	22nd	24	26th	29	31	34	36	38	41	43	45	48	50	53	55	57	60	62	65	67	69	72	74	77	79	81	84	86	89
9									17th	20th	23	26th	28	31	34	37	39	42	45	48	50	53	56	59	62	64	67	70	73	76	78	81	84	87	89	92	95	98	101	103
10										23	26th	29	33	36	39	42	45	48	52	55	58	61	64	67	71	74	77	80	83	87	90	93	96	99	103	106	109	112	115	119
11											30th	33	37	40	44	47	51	55	58	62	65	69	73	76	80	83	87	90	94	98	101	105	108	112	116	119	123	127	130	134
12												37	41	45	49	53	57	61	65	69	73	77	81	85	89	93	97	101	105	109	113	117	121	125	129	133	137	141	145	149
13													45	50	54	59	63	67	72	76	80	85	89	94	98	102	107	111	116	120	125	129	133	138	142	147	151	156	160	165
14th														55	59	64	69	74	78	83	88	93	98	102	107	112	117	122	127	131	136	141	146	151	156	161	165	170	175	180
15th															64	70	75	80	85	90	96	101	106	111	117	122	127	132	138	143	148	153	159	164	169	174	180	185	190	196
16																75	81	86	92	98	103	109	115	120	126	132	137	143	149	154	160	166	171	177	183	188	194	200	206	211
17th																	87	93	99	105	111	117	123	129	135	141	147	154	160	166	172	178	184	190	196	202	209	215	221	227
18th																		99	106	112	119	125	132	138	145	151	158	164	171	177	184	190	197	203	210	216	223	230	236	243
19th																			113	119	126	133	140	147	154	161	168	175	182	189	196	203	210	217	224	231	238	245	252	258
20th																				127	134	141	149	156	163	171	178	186	193	200	208	215	222	230	237	245	252	259	267	274

implementation

In many software packages, the Mann-Whitney-Wilcoxon test (the hypothesis of equal distributions versus suitable alternatives) is poorly documented. Some packages mishandle bindings or fail to document asymptotic techniques (e.g., fix for continuity). During a review in 2000, some of the following packages were discussed:

MATLAB has a rank sum test (ranksum) ranksum function in its Statistics Toolbox .
R implements the test in its "stats" wilcox.testpackage.
SAS implements the test in its PROC NPAR1WAY procedure.
Python (programming language) has an implementation of this test via SciPy
SigmaStat (SPSS Inc., Chicago, IL)
SYSTAT (SPSS Inc., Chicago, IL)
Java implements the test via Apache Commons
JMP (SAS Institute Inc., Cary, NC)
S-Plus (MathSoft, Inc., Seattle, WA)
STATISTICA (StatSoft, Inc., Tulsa, OK)
UNISTAT (Unistat Ltd, London)
SPSS (SPSS Inc, Chicago)
StatsDirect (StatsDirect Ltd, Manchester, UK) implements the test via Analysis_Nonparametric_Mann-Whitney .
Stata (Stata Corporation, College Station, TX) implements the test in its ranksum command.
StatXact (Cytel Software Corporation, Cambridge, Massachusetts).
PSPP implements the test in its WILCOXON function.

Individual evidence

^ Frank Wilcoxon: Individual Comparisons by Ranking Methods. In: Biometrics Bulletin. 1, 1945, pp. 80-83, JSTOR 3001968 .
^ Henry Mann, Donald Whitney: On a test of whether one of two random variables is stochastically larger than the other. In: Annals of mathematical Statistics. 18, 1947, pp. 50-60, doi: 10.1214 / aoms / 1177730491 .
^ William H. Kruskal: Historical Notes on the Wilcoxon Unpaired Two-Sample Test. In: Journal of the American Statistical Association. Vol. 52, 1957, pp. 356-360, JSTOR 2280906
↑ A. Löffler: About a partition of natural numbers and their application in the U-test. In: Wiss. Z. Univ. Hall. Volume XXXII, Issue 5 1983, pp. 87-89. (lms.fu-berlin.de)
↑ B. Rönz, HG Strohe (Ed.): Lexicon Statistics. Gabler, Wiesbaden 1994, ISBN 3-409-19952-7 .
^ H. Rinne: Pocket book of statistics. 3. Edition. Verlag Harri Deutsch, 2003, p. 534.
^ S. Kotz, CB Read, N. Balakrishnan: Encyclopedia of Statistical Sciences. Wiley, Volume?, 2003, p. 208.
↑ Reinhard Bergmann, John Ludbrook, Will PJM Spooren: Different Outcomes of the Wilcoxon-Mann-Whitney test from Different Statistics packages . In: The American Statistician . tape 54 , no. 1 , 2000, pp. 72-77 , doi : 10.1080 / 00031305.2000.10474513 , JSTOR : 2685616 (English).
↑ scipy.stats.mannwhitneyu . In: SciPy v0.16.0 Reference Guide . The Scipy community. July 24, 2015 .: "scipy.stats.mannwhitneyu (x, y, use_continuity = True): Computes the Mann – Whitney rank test on samples x and y."
↑ org.apache.commons.math3.stat.inference.MannWhitneyUTest .

literature

Herbert Büning, Götz Trenkler: Nonparametric statistical methods. de Gruyter, 1998, ISBN 3-11-016351-9 .
Sidney Siegel: Nonparametric Statistical Methods. 2nd Edition. Specialized bookstore for psychology, Eschborn near Frankfurt am Main 1985, ISBN 3-88074-102-6 .

Web links

Social Science Statistics Mann-Whitney test (ability to calculate values)
VassarStats Mann-Whitney test (English, possibility of calculating values)
Mann-Whitney U test (Engl.)

[1] Frank Wilcoxon: Individual Comparisons by Ranking Methods. In: Biometrics Bulletin. 1, 1945, pp. 80-83, JSTOR 3001968 .

[2] Henry Mann, Donald Whitney: On a test of whether one of two random variables is stochastically larger than the other. In: Annals of mathematical Statistics. 18, 1947, pp. 50-60, doi: 10.1214 / aoms / 1177730491 .

[3] William H. Kruskal: Historical Notes on the Wilcoxon Unpaired Two-Sample Test. In: Journal of the American Statistical Association. Vol. 52, 1957, pp. 356-360, JSTOR 2280906

[4] A. Löffler: About a partition of natural numbers and their application in the U-test. In: Wiss. Z. Univ. Hall. Volume XXXII, Issue 5 1983, pp. 87-89. (lms.fu-berlin.de)

[5] B. Rönz, HG Strohe (Ed.): Lexicon Statistics. Gabler, Wiesbaden 1994, ISBN 3-409-19952-7 .

[6] H. Rinne: Pocket book of statistics. 3. Edition. Verlag Harri Deutsch, 2003, p. 534.

[7] S. Kotz, CB Read, N. Balakrishnan: Encyclopedia of Statistical Sciences. Wiley, Volume?, 2003, p. 208.

[8] Reinhard Bergmann, John Ludbrook, Will PJM Spooren: Different Outcomes of the Wilcoxon-Mann-Whitney test from Different Statistics packages . In: The American Statistician . tape 54 , no. 1 , 2000, pp. 72-77 , doi : 10.1080 / 00031305.2000.10474513 , JSTOR : 2685616 (English).

[9] scipy.stats.mannwhitneyu . In: SciPy v0.16.0 Reference Guide . The Scipy community. July 24, 2015 .: "scipy.stats.mannwhitneyu (x, y, use_continuity = True): Computes the Mann – Whitney rank test on samples x and y."

[10] rg.apache.commons.math3.stat.inference.MannWhitneyUTest .

Wilcoxon-Mann-Whitney test

Assumptions

Test statistics

Mann-Whitney U Statistics

Exact critical values

Approximate critical values

Wilcoxon rank sum statistics

Exact critical values

Approximate critical values

One-sided hypotheses

Derived hypotheses

example

Two-sided test

Exact critical values

Approximate critical values

One-sided test

Exact critical values

Approximate critical values

Table of critical values ​​of the Mann-Whitney U statistic

implementation

Individual evidence

literature

Web links

Table of critical values of the Mann-Whitney U statistic