Order statistics

from Wikipedia, the free encyclopedia

In statistics , the -th order statistic (also called order size ) denotes the -smallest value of a sample . Order statistics are therefore special random variables . They are obtained from a given group of random variables and modify them in such a way that the realizations of the order statistics correspond to the realizations of the underlying random variables, but are always ordered according to size.

Therefore, order statistics appear particularly when examining random structures that are provided with an order. This includes, for example, the analysis of waiting time processes or the determination of estimation functions for the median or quantile .

definition

Random variables are given . If the random variables are bond-free, that is, they almost certainly do not assume the same value, formally expressed

for all ,

that's how you define

and

for . Then the order statistics are called by . The random variable is then also called the -th order statistic. Instead of is used as an alternative notation .

If the random variables are not bounded, the order statistics can be defined as

.

Here the indicator function refers to the quantity . In the non-binding case, both definitions agree. As above, not all authors require that the random variables almost certainly assume unequal values. The properties of the order statistics then vary slightly.

properties

One demands in the definition

for all ,

so applies

almost sure .

The same applies to the realizations

for almost all events .

The realizations of the order statistics are (almost certainly) strictly increasing.

If one waives the requirement that the random variables should almost certainly not assume the same values, then the following applies accordingly

almost sure .

The realizations are then only (almost certainly) increasing.

Distribution of the order statistics

The following applies to the distribution function of the -th order statistic

Important special cases of the distribution result for the minimum ( ) and maximum ( ) as

If the distribution of has a density function , then the density function is obtained by differentiating

the -th order statistic.

application

In non-parametric statistics, rank statistics or empirical distribution functions can be expressed by order statistics. In addition, weakly consistent estimators for quantiles can be derived from order statistics . Furthermore, the distribution of important measures such as the median or the range can be obtained through the above-mentioned distribution via convolutions and transformation sets .

example

Probability densities of ranks 10 (gold), 9 (silver) and 8 (bronze)

The final of an athletics competition consisting of the best participants will be held. In this example it is assumed that the performance density in the final of the competition is very high and therefore there are no favorites for the medals. For the random total number of points of each athlete, the same constant even distribution in the point range from to is assumed. It is therefore only the daily form that determines the total number of points, which is subject to strong fluctuations, and all athletes have the same performance potential. Put the density function and the distribution function

the constant uniform distribution in the above density function of the order statistics, one obtains the distributions for the individual ranks. Since the scores in the order statistics are sorted in ascending order, one obtains the probability distribution for the gold medal, for the silver medal and for the bronze medal. The graphic on the right shows that a higher number of points can be expected for the gold medal than for the silver or bronze medal. Since the points in this example were modeled as a continuous uniform distribution, the -th order statistic for (see graphic) is beta-distributed (multiplied by ) with the parameters and . The expected value of such a beta distribution is . For the gold medal a score of , for silver and for bronze is to be expected. If an athlete has already received points and is waiting for the scores of the other athletes, he can calculate his own chances for gold based on the assumptions made. The likelihood that the other athletes will all do worse is . If the athlete receives total points, as expected for the gold medal, there is still only a probability that he will get the gold medal.

literature

  • Herbert Büning and Götz Trenkler: Nonparametric statistical methods . 2nd edition, de Gruyter, Berlin and New York 1994, ISBN 3-11-016351-9

Individual evidence

  1. Claudia Czado, Thorsten Schmidt: Mathematical Statistics . Springer-Verlag, Berlin Heidelberg 2011, ISBN 978-3-642-17260-1 , p. 23 , doi : 10.1007 / 978-3-642-17261-8 .
  2. a b c d Hans-Otto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 , p. 242-243 , doi : 10.1515 / 9783110215274 .
  3. Norbert Henze: Stochastics for beginners . An introduction to the fascinating world of chance. 10th edition. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-658-03076-6 , p. 323 , doi : 10.1007 / 978-3-658-03077-3 .
  4. David Meintrup, Stefan Schäffler: Stochastics . Theory and applications. Springer-Verlag, Berlin Heidelberg New York 2005, ISBN 978-3-540-21676-6 , pp. 290 , doi : 10.1007 / b137972 .