Wilcoxon signed rank test

from Wikipedia, the free encyclopedia

The Wilcoxon signed rank test is a non-parametric statistical test . Using two paired samples, it checks the equality of the central tendencies of the underlying (connected) populations. In the area of ​​application, it supplements the sign test , as it not only takes into account the direction (i.e. the sign) of the differences, but also the magnitude of the differences between two paired samples.

The Wilcoxon signed rank test was proposed by the chemist and statistician Frank Wilcoxon (1892-1965) in 1945 and popularized by Sidney Siegel's textbook Nonparametric Statistics for the Behavioral Sciences .

Hypotheses and assumptions

There are three possible pairs of hypotheses for the test with regard to the two medians and :

  1. two-sided: vs. .
  2. one-sided: vs. or vs. .

A requirement is that the sample variables

are independent, identically distributed and symmetrical. However, the last requirement is often neglected.

Test statistics

First, the rank of the absolute differences is calculated for the test statistic :

The test statistic is calculated as the minimum of the negative and positive rank sums:

The indicator function denotes .

In the event that there are one or more differences , there are two options:

  1. The associated rank values ​​are assigned half and half .
  2. The observations are not included in the test; i.e., needs to be corrected. However, a larger number of identical observation values ​​indicates the validity of the null hypothesis.

The test statistic is approximately normally distributed for :

.

In addition, a continuity correction should be carried out for

.

For values ​​less than or equal to 50, the critical values ​​are also available in a table.

Critical values ​​for which must be fallen below in order to reject the null hypothesis
n
two-sided one-sided 4th 5 6th 7th 8th 9 10 11 12 13 14th 15th 16 17th 18th 19th 20th 25th 30th 35 40 45 50
0.1000 0.0500 0 2 3 5 8th 10 13 17th 21st 25th 30th 35 41 47 53 60 100 151 213 286 371 466
0.0500 0.0250 0 2 3 5 8th 10 13 17th 21st 25th 29 34 40 46 52 89 137 195 264 343 434
0.0200 0.0100 0 1 3 5 7th 9 12 15th 19th 23 27 32 37 43 76 120 173 238 312 397
0.0100 0.0050 0 1 3 5 7th 9 12 15th 19th 23 27 32 37 68 109 159 220 291 373
0.0050 0.0025 0 1 3 5 7th 9 12 15th 19th 23 27 32 60 98 146 204 272 350
0.0010 0.0005 0 1 2 4th 6th 8th 11 14th 18th 21st 45 78 120 172 233 304

Ties with the ranks

In the event that ties occur in the ranks of (i.e., several absolute differences get the same rank), the means of the corresponding ranks are assigned to each difference (see example below).

If the number of observations with the same rank as the observation pair denotes , then applies

and for the approximation

If the correction factor is omitted, the test is too conservative, i.e. that is, he too often decides in favor of the null hypothesis.

example

An example of its application: A statistically savvy farmer wants to determine whether cattle prefer hay or straw. It divides an area into two areas, between which the animals can freely switch back and forth. In one area he offers the five cattle straw, respectively. in the other hay. Every half hour he notes how many animals are in which area and receives n = 6 pairs of samples.

The result of his observations is a table including the differences between the values:

Animals in the hay Animals by the straw difference
4th 1 +3
3 2 +1
2 3 −1
5 0 +5
5 0 +5
3 2 +1
Contribution to
difference rank
+1 2 2
+1 2 2
−1 2 2
+3 4th 4th
+5 5.5 5.5
+5 5.5 5.5
19th 2

Rank: The three 1's values ​​would have to occupy ranks 1 to 3, but since they are equivalent, the mean value of their ranks is entered, i.e. (1 + 2 + 3) / 3 = 2. The same applies to the 5 values: (5 + 6) / 2 = 5.5.

Then the differences are sorted according to size (the sign is not taken into account); and each difference is assigned a rank - the greatest difference receives the highest rank. If several differences are of equal rank, the average rank is assigned to each value.

The sum of ranks of the positive differences is and the sum of ranks of the negative differences is , that is

.

Two-sided test

In the two-sided test with

(Cattle like hay and straw equally) vs.
(Cattle prefer one variety)

the null hypothesis at the significance level or not can be rejected. Because

  • from the table above results a critical value of for and . Since the test value is not less than the critical value, the null hypothesis cannot be rejected or
  • from the table above results a critical value of for and . Since the test value is not less than the critical value, the null hypothesis cannot be rejected.

One-sided tests

Also with the one-sided tests

Null hypothesis Alternative hypothesis
Left-sided (Cattle like hay more or both types the same) (Cattle like straw more)
Right-sided (Cattle like straw more or both types the same) (Cattle like hay more)

the null hypotheses cannot be rejected. Because

  • from the table above results a critical value of for and . Since the test value is not less than the critical value, the null hypothesis cannot be rejected or
  • from the table above results a critical value of for and . Since the test value is not less than the critical value, the null hypothesis cannot be rejected.

Approximation with the normal distribution in the two-tailed test

If one calculates - as an approximation - the normally distributed z-value:

From the standard normal distribution table result for the two-sided test

  • for critical values ​​of . Since the test value lies in the interval , the null hypothesis cannot be rejected .
  • for critical values ​​of . Since the test value is not in the interval , the null hypothesis can be rejected .

This means that the cattle have a preference for one of the two varieties at a 10% level of significance.

This seems to contradict the result from the exact two-tailed test. However, the z-value calculated using the formula given is only an approximation and is only reliable for one sample size !

For the approximation in a two-tailed test, it does not matter whether the value or (or the minimum of both) is used in the formula , because it follows

.

That is, the test decision would be the same.

Comparison with the sign test

Five samples have a positive sign (+), one a negative (-). According to the table of critical values ​​(MacKinnon, 1964), in this example one can only assume p <0.5 (i.e. less than 50 percent probability of error). If all six samples had the same sign, p would be between 0.02 and 0.1 - it was thus impressively shown that the Wilcoxon method delivers useful results, particularly with smaller sample sizes.

literature

  • Sidney Siegel: Nonparametric Statistical Methods . Publishing house Dietmar Klotz, Eschborn b. Frankfurt a. M. 2001, ISBN 3-88074-102-6 .
  • Sidney Siegel: Nonparametric Statistics for the Behavioral Sciences . McGraw-Hill, New York (etc.) circa 1988 (out of print)

Individual evidence

  1. Jürgen Bortz, Gustav A. Lienert, Klaus Boehnke: Distribution-free methods in biostatistics . 3. Edition. Springer Verlag, 2008, p. 256, 259 .
  2. ^ Frank Wilcoxon: Individual Comparisons by Ranking Methods . In: Biometrics Bulletin , 1 (6) (1945), pp. 80-83. JSTOR 3001968
  3. Leonard A. Marascuilo, Mary Ellen McSweeney: Nonparametric and Distribution-free Methods for the Social Sciences . Brooks / Cole Publishing Co, 1977, ISBN 978-0-8185-0202-6 .
  4. Jürgen Bortz, Gustav A. Lienert, Klaus Boehnke: Distribution-free methods in biostatistics . 3. Edition. Springer Verlag, Berlin 2010, p. 729 .