The Walsh outlier test is not affected by the problem of most other outlier tests, which are based on the assumption of a normal distribution and can lead to false positive results in samples whose values are, for example, lognormally distributed . The prerequisite for the test application, however, is a sample size of more than 60 values for a significance level of α = 0.10 and more than 220 values for α = 0.05.
In addition, the number of assumed outliers must be specified a priori in order to carry out the test . The test's null hypothesis is the assumption that all observations belong to the sample and that the sample does not contain any outliers. The alternative hypothesis, on the other hand, is that the highest or lowest individual values corresponding to the number of assumed outliers given for performing the test are actually outliers.
The smallest values do not belong to a distribution; are outliers.
The largest values belong to a distribution.
The largest values do not belong to a distribution; are outliers.
The following calculation steps are carried out:
with the largest whole number less than (round down),
,
and
.
Applies now
then the null hypothesis at the significance level can be rejected or
then the null hypothesis at the significance level can be rejected.
The value indicates the smallest observation of the sample; see also rank (statistics) .
Since the value must be, must apply: . Therefore, for a significance level of at least 61 observations are required, for a significance level of at least 221 observations.
example
If , and then , , , . Ie if
then it is discarded or
then it is discarded.
Math background
Walsh considers a linear combination of order statistics of form
with and .
If the null hypothesis holds, then follows if it should be minimal. If it also applies , then using the Chebyshev inequality it follows :
.
However, some, not very restrictive, requirements must be met:
If the inverse distribution function of the population or its first derivative is, then for (possibly with ) under must apply
,
,
,
such as
analogous conditions for and .
For the terms can be neglected and it then results .
literature
John Edward Walsh: Some Nonparametric Tests of whether the Largest Observations of a Set are too Large or too Small . In: Annals of Mathematical Statistics . tape21 , no.4 , 1950, ISSN 0003-4851 , pp.583-592 , doi : 10.1214 / aoms / 1177729753 .
John Edward Walsh: Correction to "Some Nonparametric Tests of Whether the Largest Observations of a Set Are Too Large or Too Small" . In: Annals of Mathematical Statistics . tape24 , no.1 , 1953, p.134-135 , doi : 10.1214 / aoms / 1177729095 .
John Edward Walsh: Large Sample Nonparametric Rejection of Outlying Observations. In: Annals of the Institute of Statistical Mathematics. 10/1958. The Institute of Statistical Mathematics, pp. 223-232, ISSN 0020-3157
Large sample outlier detection. In: Douglas M. Hawkins: Identification of Outliers. Chapman & Hall, London and New York 1980, ISBN 0-41-221900-X , pp. 83/84