Robust estimation methods

from Wikipedia, the free encyclopedia

Robust estimation procedure is a term used in inferential statistics . An estimation procedure or test procedure is called robust if it does not react sensitively to outliers (values ​​outside a range of values ​​expected based on a distribution).

The classic estimation methods , which were developed in the first half of the 20th century, often tend to give misleading results if there are outliers in the sample . A robust estimation method is therefore based on the bulk of the data and integrates an outlier analysis in order to reduce the influence of model deviations and make it approach zero as the deviance increases.

The development of robust estimators to increase the efficiency of estimation methods has been an important research concern in mathematical statistics since the 1980s. The robust processes include, for example, the RANSAC algorithm and processes that have a high breakpoint resistance.

example

The expected value of a t-distribution with 2 degrees of freedom is estimated by a sample of size 10.

A simple, robust estimation method is the (empirical) median , which can be used instead of the arithmetic mean to estimate the expected value of a symmetrical distribution . The empirical median is obtained by sorting the observations according to size and then choosing the mean observation value in order as an estimate. An example: A certain number of measurements are carried out in order to determine a physical quantity (such as the gravitational constant) experimentally. It is assumed that the measurement errors that occur are unsystematic and can go in both directions, i.e. the measured values ​​are sometimes too large, sometimes too small; formally more precise: independent and identically distributed observations with symmetrical distribution and the true value of the variable to be determined as the expected value. Now and then there are individual measured values ​​that deviate significantly from the others ("outliers", the model deviations described above); as a rule, they are due to errors in the implementation of the experiment (“shaking” of the apparatus, “prescribing”, etc.). Although extreme deviations tend to indicate an error and therefore such observations should have less influence on the result, they have a strong influence on the arithmetic mean; the more pronounced the deviation, the greater the influence. The median, on the other hand, is insensitive to such outliers, so it is “robust”. If there are no outliers, however, if there are the same number of measured values, it generally provides a more inaccurate estimate, since “on a small scale” the estimate is only determined by a single - namely the mean - observation.

In the case of normally distributed random variables , outliers are rather unlikely and the arithmetic mean provides a good estimate of the expected value. In contrast, with a t-distribution with a small number of degrees of freedom, the probability of outliers is significantly increased due to the severe distribution tails. In the figure on the right, both estimators are true to expectations , but the median has a lower variance than the arithmetic mean, which speaks for the robustness of the median against outliers. As the number of degrees of freedom increases, the t-distribution converges to the normal distribution and outliers become less likely. In this case, the variance of the arithmetic mean is smaller because more information from the data is used.

See also

literature

  • P. Huber : Robust Estimation of a Location Parameter . In: The Annals of Mathematical Statistics . 35, 1964.
  • Frank R. Hampel et al .: Robust Statistics. The Approach Based on Influence Functions . Wiley, New York 1986, ISBN 0-471-73577-9 .
  • Helmuth Späth: Mathematical software for linear regression . Oldenbourg, Munich 1987, ISBN 3-486-20375-4 .
  • Helga Bunke, Olaf Bunke: Nonlinear regression, functional relations and robust methods . Volume 2: Non-Linear Functional Relations and Robust Methods . Wiley, New York et al. a. 1989, ISBN 0-471-91239-5 .
  • Werner Stahel (Ed.): Directions in Robust Statistics and Diagnostics . 2 volumes. (Volumes 33 and 34 of The IMA Volumes in Mathematics and its Applications .) Springer, Berlin a. a. 1991, ISBN 3-540-97530-6 , ISBN 3-540-97531-4 .
  • Karl-Rudolf Koch: Parameter estimation and hypothesis tests . 3. Edition. Dümmler, Bonn 1997, ISBN 3-427-78923-3 .
  • David C. Hoaglin, Frederick Mosteller, John W. Tukey: Understanding Robust and Exploratory Data Design . Wiley, New York 2000, ISBN 0-471-38491-7 .
  • Mia Hubert (Ed.): Theory and Application of Recent Robust Methods . Birkhäuser, Basel a. a. 2004, ISBN 3-7643-7060-2 .
  • Ricardo A. Maronna, Douglas R. Martin, Victor J. Yohai: Robust Statistics: Theory and Methods . Wiley, New York et al. a. 2006, ISBN 0-470-01092-4 .