Sign test

from Wikipedia, the free encyclopedia

The sign test or character test is a non-parametric statistical test . The sign test is a binomial test . With its help, distribution hypotheses can be tested in one- and two-sample problems. The sign test can also be used if the data level is only ordinal .

Single sample problem

Test on median

The sign test can be used to test hypotheses about the median of a distribution.

Test for symmetry

The sign test can also be used as a test for the symmetry of a distribution: If the true arithmetic mean of the population is known or an estimator is assumed to be the true value, it can be checked whether the arithmetic mean coincides with the median, i.e. H. whether 50% of the possible values ​​are to the right and 50% to the left of the arithmetic mean, and thus whether the distribution is symmetrical.

Test for mean

If we again assume symmetry of the distribution, then the population mean is equal to the population median, and the sign test offers the possibility to test hypotheses about the arithmetic mean of the population.

Assumptions

  • The observations are independent of one another.
  • The underlying random variable is continuously distributed in the population.
  • Since size comparisons are made between observations and the hypothetical median, the examined characteristic must have been recorded at least at the ordinal level.

Hypotheses

If two-sided testing is carried out, the hypothesis is to be tested that the median in the population equals a certain hypothetical value . The probability that a value is greater than the hypothetical parameter should then be 0.5 if it actually corresponds to the median. If the test is carried out unilaterally, it is checked whether the median is greater or less than a hypothetical value, i.e. H. whether the probability that a value is greater than the hypothetical parameter is greater or less than 0.5.

One-sided Two-sided
Null hypothesis
Alternative hypothesis
Null hypothesis
Alternative hypothesis

Further equivalent formulations of the hypotheses are possible. The test principle can be extended to any quantile by adapting the hypotheses and the parameters of the distribution of the test statistics . When testing for a different quantile than the median, the hypothetical probability (here 1/2) must be adjusted accordingly (see binomial test ).

Action

The sample values ​​that are greater than the hypothetical median are assigned a "+"; Values ​​that are smaller, a "-". That is, the sample variable is mediandichotomized. The number of positive signs is counted and used as test statistics.

Two sample problem

The sign test is used when two connected samples are to be examined. Linked samples exist when the observations of both groups depend on each other in pairs, for example when the state of health of the same person is examined before and after treatment. Corresponding signs ("+" or "-") are generated from the size comparison between the values ​​of each pair.

The sign test tests for equality of the distribution function of two random variables from connected, continuously distributed ensembles. If the medians of the samples differ significantly, the distribution in the population is different.

Assumptions

  • The pairs of observations must not depend on one another, i.e. H. the value pair must be independent of the value pair .
  • The underlying random variables are continuously distributed in the population.
  • Since pairwise size comparisons are made between the observations, the examined characteristic must have been recorded at least at the ordinal level.

Hypotheses

If both populations have the same median, then P (X11> X12) = P (X11 <X12). The following hypotheses can be tested with the sign test:

One-sided Two-sided
Null hypothesis
Alternative hypothesis

Action

The value pairs of the samples for which applies are assigned a "+"; Value pairs for which a "-" applies. The number of positive signs is counted and used as test statistics.

Test statistics

Exact distribution

The test statistic corresponds to the number of positive comparisons (differences in values ​​or ranks):

With

For the one-sample problem, replace the values ​​of the second sample with the hypothetical median. If the null hypothesis is valid , the sum of the positive differences is binomially distributed with , since the median corresponds to the 50% quantile. n 'denotes the sample size remaining after handling ties (zero differences, rank ties). If the null hypothesis is valid, the distribution of the test variable is symmetrical.

Approximation by the normal distribution

With the binomial distribution approaches a normal distribution with . A rule of thumb for a useful approximation is . If the null hypothesis is valid

So if or holds, is the z-standardized size

approximately standard normal distribution and the critical values ​​for the test decision can be read from the table of the standard normal distribution.

Ties (zero differences)

Since continuous random variables are usually only collected discretely, ties can occur. If the values ​​of observations from the first to the second sample are unchanged in the two-sample problem, or if some values ​​are equal to the median in the one-sample problem, zero differences or ties result. However, a binomial test can only deal with two categories (here + and -). Therefore, the question arises how hierarchical ties can be treated. Possible methods are:

  • Rank-tied observations are eliminated; H. the sample size is reduced.
  • The observations are assigned equally to the groups. If the number of ties is odd, one pair of observations is eliminated.
  • The observations are assigned to one of the two groups (+ or -) with a probability of 0.5.
  • Zero differences are given the rarer sign (very conservative approach).

Example of a two-sample problem

A school authority wants to investigate whether a new learning method (e.g. e-learning ) has improved the school performance of students . The school performance of a random sample of 43 students is measured using a suitable test. Then the students are confronted with the new learning method. After the confrontation, the school performance on the same students is again recorded. The school authorities use the observations received to carry out a right-hand sign test:

For the evaluation, the frequencies of the signs (+, -, =) of the differences are determined:

sign total
number 25th 11 7th 43

The performance improved in 25 students. In eleven students they got worse and in seven they stayed the same. Can we conclude from this result that the new learning method has a positive effect in the population?

Ties

The sample size is reduced by the number of ties to .

Binomial test

Using the binomial distribution as the test distribution results in a critical value of 23 at a (maximum) significance level of 0.05 (0.95 quantile of the binomial distribution, p-value = 0.01441). Since 25> 23, the null hypothesis (no improvement) must be rejected. The school authorities can therefore conclude from such a result that e-learning has a positive influence on school performance.

Normal distribution approximation

The critical value of the standard normal distribution for α = 0.05 is 1.6449 (0.95 quantile of the standard normal distribution).

The approximation of the distribution of the test statistic by the normal distribution gives

with an associated p-value, i.e. the probability that the test value obtained or a greater one occurs under the null hypothesis of . Here, too, the school authorities can conclude at a level of significance of 5% that e-learning has a positive influence on school performance.

Individual evidence

  1. a b c d e f Bernd Rönz, Hans G. Strohe: Lexicon Statistics . Gabler Wirtschaft, 1994, p. 412 .
  2. a b c d e f g h J. Hartung: Statistics: teaching and manual of applied statistics . 8th edition. Oldenbourg, 1991, p. 242 .
  3. a b Horst Rinne: Pocket book of statistics . 3. Edition. Verlag Harri Deutsch, 2003, p. 530 .
  4. ^ Werner Voss: Pocket book of statistics . 1st edition. Fachbuchverlag Leipzig, 2000, p. 463 .
  5. a b c Werner Voss: Pocket book of statistics . 1st edition. Fachbuchverlag Leipzig, 2000, p. 470 .
  6. ^ JL Gastwirth: On the Sign Test for Symmetry . Vol. 66, No. 336 . Journal of the American Statistical Association, 1971, pp. 821-823 .
  7. Jürgen Bortz, Gustav A. Lienert, Klaus Boehnke: Distribution-free methods in biostatistics . 3. Edition. Springer, 2008, p. 258 .
  8. a b Jürgen Bortz, Gustav A. Lienert, Klaus Boehnke: Distribution-free methods in biostatistics . 3. Edition. Springer, 2008, p. 256 .
  9. a b c d Jürgen Bortz, Gustav A. Lienert, Klaus Boehnke: Distribution-free methods in biostatistics . 3. Edition. Springer, 2008, p. 257 .
  10. a b c K. Bosch: Statistics Pocket Book . Oldenbourg, 1992, p. 675-676 .