David-Hartley-Pearson test

from Wikipedia, the free encyclopedia

The David-Hartley-Pearson test was developed in 1954 by statisticians HA David, HO Hartley and ES Pearson . It represents a statistical procedure for the identification of outliers and specifically checks whether it is probable that an observed extreme value (the smallest or the largest) belongs to a normally distributed population or that it is an outlier.

requirements

In order to be able to make statements about an extreme observation value, the David-Hartley-Pearson test assumes the normal distribution of the underlying population , so it is a parametric test .

hypothesis

The following null hypotheses are set up in the David-Hartley-Pearson test:

is not an outlier vs. is an outlier
is not an outlier vs. is an outlier

Here denotes the smallest and the largest observation of the sample .

Test statistics

The following test statistics are used to check the hypotheses and :

,

that is, the range of the sample divided by its standard deviation .

The null hypothesis below the significance level is rejected if:

Here denotes the critical value.

If the null hypothesis is rejected, the extreme value that is the greatest distance from the mean value is identified as an outlier. If the smallest and largest values ​​are at the same distance from the mean, both are considered to be outliers.

Critical values

Extensive tables with critical values ​​for the David-Hartley-Pearson test can be found in David et al. (1954). A selection of these is shown in the following table:

3 1.997 1.999 2,000 2,000 2,000 17th 4.15 4.31 4.44 4.59 4.69
4th 2.409 2,429 2,439 2,445 2,447 18th 4.21 4.38 4.51 4.66 4.77
5 2.712 2.753 2.782 2.803 2.813 19th 4.27 4.43 4.57 4.73 4.84
6th 2.949 3.012 3.056 3.095 3.115 20th 4.32 4.49 4.63 4.79 4.91
7th 3.143 3.222 3.282 3,338 3.369 30th 4.70 4.89 5.06 5.25 5.39
8th 3.308 3,399 3.471 3.543 3,585 40 4.96 5.15 5.34 5.54 5.69
9 3,449 3,552 3,634 3.720 3,772 50 5.15 5.35 5.54 5.77 5.91
10 3.57 3.69 3.78 3.88 3.94 60 5.29 5.50 5.70 5.93 6.09
11 3.68 3.80 3.91 4.02 4.08 80 5.51 5.73 5.93 6.18 6.35
12 3.78 3.91 4.01 4.14 4.21 100 5.68 5.90 6.11 6.36 6.54
13 3.87 4.00 4.11 4.25 4.33 150 5.96 6.18 6.39 6.64 6.84
14th 3.95 4.09 4.21 4.34 4.44 200 6.15 6.38 6.59 6.85 7.03
15th 4.02 4.17 4.29 4.43 4.53 500 6.72 6.94 7.15 7.42 7.60
16 4.09 4.24 4.37 4.51 4.62 1000 7.11 7.33 7.54 7.80 7.99

example

To illustrate this, the following observed series of measurements (already sorted) are assumed:

Name of the measurement
Measured value (speed in m / s) 36 37 39 39 40 40 41 41 41 42 44 46

From these data results for the test statistics:

and ,

so that

This means that the null hypothesis cannot be rejected and neither the largest nor the smallest value are identified as outliers (at the level of significance ).

Individual evidence

  1. a b H. A. David, HO Hartley, ES Pearson: The distribution of the ratio, in a single, normal sample, of range to standard deviation. In: Biometrika. No. 41, 1954, pp. 482-493, doi : 10.1093 / biomet / 41.3-4.482 , JSTOR 2332728 .
  2. a b c J. Hartung: Statistics - teaching and manual of applied statistics. 13th edition. R. Oldenbourg Verlag, Munich / Vienna 2002.