David-Hartley-Pearson test
The David-Hartley-Pearson test was developed in 1954 by statisticians HA David, HO Hartley and ES Pearson . It represents a statistical procedure for the identification of outliers and specifically checks whether it is probable that an observed extreme value (the smallest or the largest) belongs to a normally distributed population or that it is an outlier.
requirements
In order to be able to make statements about an extreme observation value, the David-Hartley-Pearson test assumes the normal distribution of the underlying population , so it is a parametric test .
hypothesis
The following null hypotheses are set up in the David-Hartley-Pearson test:
- is not an outlier vs. is an outlier
- is not an outlier vs. is an outlier
Here denotes the smallest and the largest observation of the sample .
Test statistics
The following test statistics are used to check the hypotheses and :
- ,
that is, the range of the sample divided by its standard deviation .
The null hypothesis below the significance level is rejected if:
Here denotes the critical value.
If the null hypothesis is rejected, the extreme value that is the greatest distance from the mean value is identified as an outlier. If the smallest and largest values are at the same distance from the mean, both are considered to be outliers.
Critical values
Extensive tables with critical values for the David-Hartley-Pearson test can be found in David et al. (1954). A selection of these is shown in the following table:
3 | 1.997 | 1.999 | 2,000 | 2,000 | 2,000 | 17th | 4.15 | 4.31 | 4.44 | 4.59 | 4.69 |
4th | 2.409 | 2,429 | 2,439 | 2,445 | 2,447 | 18th | 4.21 | 4.38 | 4.51 | 4.66 | 4.77 |
5 | 2.712 | 2.753 | 2.782 | 2.803 | 2.813 | 19th | 4.27 | 4.43 | 4.57 | 4.73 | 4.84 |
6th | 2.949 | 3.012 | 3.056 | 3.095 | 3.115 | 20th | 4.32 | 4.49 | 4.63 | 4.79 | 4.91 |
7th | 3.143 | 3.222 | 3.282 | 3,338 | 3.369 | 30th | 4.70 | 4.89 | 5.06 | 5.25 | 5.39 |
8th | 3.308 | 3,399 | 3.471 | 3.543 | 3,585 | 40 | 4.96 | 5.15 | 5.34 | 5.54 | 5.69 |
9 | 3,449 | 3,552 | 3,634 | 3.720 | 3,772 | 50 | 5.15 | 5.35 | 5.54 | 5.77 | 5.91 |
10 | 3.57 | 3.69 | 3.78 | 3.88 | 3.94 | 60 | 5.29 | 5.50 | 5.70 | 5.93 | 6.09 |
11 | 3.68 | 3.80 | 3.91 | 4.02 | 4.08 | 80 | 5.51 | 5.73 | 5.93 | 6.18 | 6.35 |
12 | 3.78 | 3.91 | 4.01 | 4.14 | 4.21 | 100 | 5.68 | 5.90 | 6.11 | 6.36 | 6.54 |
13 | 3.87 | 4.00 | 4.11 | 4.25 | 4.33 | 150 | 5.96 | 6.18 | 6.39 | 6.64 | 6.84 |
14th | 3.95 | 4.09 | 4.21 | 4.34 | 4.44 | 200 | 6.15 | 6.38 | 6.59 | 6.85 | 7.03 |
15th | 4.02 | 4.17 | 4.29 | 4.43 | 4.53 | 500 | 6.72 | 6.94 | 7.15 | 7.42 | 7.60 |
16 | 4.09 | 4.24 | 4.37 | 4.51 | 4.62 | 1000 | 7.11 | 7.33 | 7.54 | 7.80 | 7.99 |
example
To illustrate this, the following observed series of measurements (already sorted) are assumed:
Name of the measurement | ||||||||||||
Measured value (speed in m / s) | 36 | 37 | 39 | 39 | 40 | 40 | 41 | 41 | 41 | 42 | 44 | 46 |
From these data results for the test statistics:
- and ,
so that
This means that the null hypothesis cannot be rejected and neither the largest nor the smallest value are identified as outliers (at the level of significance ).
Individual evidence
- ↑ a b H. A. David, HO Hartley, ES Pearson: The distribution of the ratio, in a single, normal sample, of range to standard deviation. In: Biometrika. No. 41, 1954, pp. 482-493, doi : 10.1093 / biomet / 41.3-4.482 , JSTOR 2332728 .
- ↑ a b c J. Hartung: Statistics - teaching and manual of applied statistics. 13th edition. R. Oldenbourg Verlag, Munich / Vienna 2002.