Walsh outlier test

from Wikipedia, the free encyclopedia

The Walsh outlier test is a statistical test that can be used to identify outliers in a sample . It does not require a specific frequency distribution of the data and is therefore one of the non-parametric methods . The test was developed by the American statistician John E. Walsh , who first described it in 1950.

The Walsh outlier test is not affected by the problem of most other outlier tests, which are based on the assumption of a normal distribution and can lead to false positive results in samples whose values are, for example, lognormally distributed . The prerequisite for the test application, however, is a sample size of more than 60 values ​​for a significance level of α = 0.10 and more than 220 values ​​for α = 0.05.

In addition, the number of assumed outliers must be specified a priori in order to carry out the test . The test's null hypothesis is the assumption that all observations belong to the sample and that the sample does not contain any outliers. The alternative hypothesis, on the other hand, is that the highest or lowest individual values ​​corresponding to the number of assumed outliers given for performing the test are actually outliers.

Test execution

Null hypothesis Alternative hypothesis
The smallest values ​​belong to a distribution. The smallest values ​​do not belong to a distribution; are outliers.
The largest values ​​belong to a distribution. The largest values ​​do not belong to a distribution; are outliers.

The following calculation steps are carried out:

  • with the largest whole number less than (round down),
  • ,
  • and
  • .

Applies now

  • then the null hypothesis at the significance level can be rejected or
  • then the null hypothesis at the significance level can be rejected.

The value indicates the smallest observation of the sample; see also rank (statistics) .

Since the value must be, must apply: . Therefore, for a significance level of at least 61 observations are required, for a significance level of at least 221 observations.

example

If , and then , , , . Ie if

  • then it is discarded or
  • then it is discarded.

Math background

Walsh considers a linear combination of order statistics of form

with and .

If the null hypothesis holds, then follows if it should be minimal. If it also applies , then using the Chebyshev inequality it follows :

.

However, some, not very restrictive, requirements must be met:

  1. If the inverse distribution function of the population or its first derivative is, then for (possibly with ) under must apply
    • ,
    • ,
    • ,
    • such as
    • analogous conditions for and .
  2. For the terms can be neglected and it then results .

literature

  • John Edward Walsh: Some Nonparametric Tests of whether the Largest Observations of a Set are too Large or too Small . In: Annals of Mathematical Statistics . tape 21 , no. 4 , 1950, ISSN  0003-4851 , pp. 583-592 , doi : 10.1214 / aoms / 1177729753 .
  • John Edward Walsh: Correction to "Some Nonparametric Tests of Whether the Largest Observations of a Set Are Too Large or Too Small" . In: Annals of Mathematical Statistics . tape 24 , no. 1 , 1953, p. 134-135 , doi : 10.1214 / aoms / 1177729095 .
  • John Edward Walsh: Large Sample Nonparametric Rejection of Outlying Observations. In: Annals of the Institute of Statistical Mathematics. 10/1958. The Institute of Statistical Mathematics, pp. 223-232, ISSN  0020-3157
  • Large sample outlier detection. In: Douglas M. Hawkins: Identification of Outliers. Chapman & Hall, London and New York 1980, ISBN 0-41-221900-X , pp. 83/84

Web links