Empirical quantile

An empirical ( -) quantile ${\ displaystyle p}$ , also simply called a quantile for short , is a key figure of a sample in statistics . For every number between 0 and 1, in simplified terms, an empirical quantile divides the sample in such a way that a portion of the sample is smaller than the empirical quantile and a portion of the sample is greater than the empirical quantile. For example, if a sample of shoe sizes is given, the empirical 0.35 quantile is that shoe size , so that 35% of the shoe sizes in the sample are smaller than and 65% are larger than . ${\ displaystyle p}$ ${\ displaystyle p}$ ${\ displaystyle p}$ ${\ displaystyle p}$ ${\ displaystyle 1-p}$ ${\ displaystyle p}$ ${\ displaystyle s}$ ${\ displaystyle s}$ ${\ displaystyle s}$

Some empirical quantiles have proper names. They include the median ( ), the upper quartile, and the lower quartile, as well as the terciles , quintiles , deciles, and percentiles . ${\ displaystyle p}$ ${\ displaystyle p = 0 {,} 5}$

The quantiles (in the sense of probability theory ) must be distinguished from the empirical quantiles discussed here . These are indicators of a probability distribution and thus an abstract (quantity) function (similar to the expected value ), while the empirical quantiles are indicators of a sample (similar to the arithmetic mean ).

definition

It denotes the rounding function . It rounds each number down to the nearest smaller whole number. For example, and . ${\ displaystyle \ lfloor x \ rfloor}$ ${\ displaystyle x}$ ${\ displaystyle \ lfloor 1 {,} 2 \ rfloor = 1}$ ${\ displaystyle \ lfloor 3 {,} 99 \ rfloor = 3}$

Given a sample of size , the elements of which are ordered according to size. This means it applies ${\ displaystyle \ left (x_ {1}, x_ {2}, \ dotsc, x_ {n} \ right)}$ ${\ displaystyle n}$

{\ displaystyle x_ {1} \ leq x_ {2} \ leq \ dotsb \ leq x_ {n}}

.

Then means for a number ${\ displaystyle p \ in (0,1)}$

{\ displaystyle x_ {p} = {\ begin {cases} {\ tfrac {1} {2}} (x_ {n \ cdot p} + x_ {n \ cdot p + 1}), & {\ text {if }} n \ cdot p {\ text {integer,}} \\ x _ {\ lfloor n \ cdot p + 1 \ rfloor}, & {\ text {if}} n \ cdot p {\ text {not integer.} } \ end {cases}}}

the empirical quantile of . ${\ displaystyle p}$ ${\ displaystyle x_ {1}, x_ {2}, \ dotsc, x_ {n}}$

There are some definitions that differ from the definition given here.

example

The following sample consists of ten random whole numbers (drawn from the numbers between zero and one hundred, given the discrete uniform distribution ):

{\ displaystyle 82; 91; 12; 92; 63; 9; 28; 55; 96; 97}

Sorting supplies the sample

{\ displaystyle x_ {1} = 9; x_ {2} = 12; x_ {3} = 28; x_ {4} = 55; x_ {5} = 63; x_ {6} = 82; x_ {7} = 91; x_ {8} = 92; x_ {9} = 96; x_ {10} = 97}

.

It is . ${\ displaystyle n = 10}$

For one receives . Since this is an integer, one gets about the definition ${\ displaystyle p = 0 {,} 5}$ ${\ displaystyle p \ cdot n = 5}$

{\ displaystyle x_ {0 {,} 5} = {\ tfrac {1} {2}} \ left (x_ {5} + x_ {5 + 1} \ right) = {\ tfrac {1} {2}} (63 + 82) = 72 {,} 5}

For one receives . The rounding function then delivers and thus ${\ displaystyle p = 0 {,} 25}$ ${\ displaystyle p \ cdot n + 1 = 0 {,} 25 \ cdot 10 + 1 = 2 {,} 5 + 1}$ ${\ displaystyle \ lfloor 3 {,} 5 \ rfloor = 3}$

{\ displaystyle x_ {0 {,} 25} = x_ {3} = 28}

.

Analogously we get for direct and thus , therefore is ${\ displaystyle p = 0 {,} 75}$ ${\ displaystyle p \ cdot n + 1 = 0 {,} 75 \ cdot 10 + 1 = 8 {,} 5}$ ${\ displaystyle \ lfloor 8 {,} 5 \ rfloor = 8}$

{\ displaystyle x_ {0 {,} 75} = x_ {8} = 92}

.

In contrast to the arithmetic mean, the empirical quantile is robust against outliers. This means that if you replace values of a sample above (or below) a certain quantile with a value above (or below) the quantile, the quantile itself does not change. This is based on the fact that quantiles are only determined by their order and thus their position in relation to one another and not by the specific numerical values of the sample. In the case of the sample above, this would be the arithmetic mean . If you now modify the largest value of the sample, for example ${\ displaystyle {\ overline {x}} = 62 {,} 2}$

{\ displaystyle x_ {10} = 1000}

,

is like that, whereas the median and the lower and upper quartiles remain unchanged because the order of the sample has not changed. ${\ displaystyle {\ overline {x}} = 152 {,} 8}$

Special quantiles

For certain values, the corresponding quantiles have proper names. They are briefly presented here below. It should be noted that the corresponding quantiles of probability distributions are sometimes referred to with the same proper names. ${\ displaystyle p}$

Median

The median is the quantile and thus divides the sample into two halves: one half is smaller than the median, the other half is larger than the median. Together with the mode and the arithmetic mean, it is an important parameter in descriptive statistics. ${\ displaystyle 0 {,} 5}$

Terciles

The two -quantiles for and are called terciles . You divide the sample into three equal parts: one part is smaller than the lower third (= -quantile), one part is larger than the upper third (= -quantile), and one part lies between the thirds. ${\ displaystyle p}$ ${\ displaystyle p = {\ tfrac {1} {3}}}$ ${\ displaystyle p = {\ tfrac {2} {3}}}$ ${\ displaystyle {\ tfrac {1} {3}}}$ ${\ displaystyle {\ tfrac {2} {3}}}$

Quartiles

The two quantiles with and are designated as quartiles . The quantile is called the lower quartile and the quantile is called the upper quartile. Half of the sample lies between the upper and lower quartile, a quarter of the sample below the lower quartile and above the upper quartile. The interquartile range is defined on the basis of the quartiles , a measure of dispersion . ${\ displaystyle p = 0 {,} 25}$ ${\ displaystyle p = 0 {,} 75}$ ${\ displaystyle 0 {,} 25}$ ${\ displaystyle 0 {,} 75}$

Quintiles

The four quantiles are referred to as quintiles . Accordingly, 20% of the sample is below the first quintile and 80% above, 40% of the sample is below the second quintile and 60% above, etc. ${\ displaystyle p = 0 {,} 2; 0 {,} 4; 0 {,} 6; 0 {,} 8}$

Decile

The quantiles for multiples of , i.e. for, are called deciles. The -quantile is called the first decile, the -quantile is the second decile, etc. 10% of the sample is below the first decile and 90% of the sample is above. Likewise, 40% of the sample are below the fourth decile and 60% above. ${\ displaystyle 0 {,} 1}$ ${\ displaystyle p = 0 {,} 1; 0 {,} 2; \ dotsc; 0 {,} 9}$ ${\ displaystyle 0 {,} 1}$ ${\ displaystyle 0 {,} 2}$

Percentiles

The percentiles are the quantiles from to in steps of . ${\ displaystyle 0 {,} 01}$ ${\ displaystyle 0 {,} 99}$ ${\ displaystyle 0 {,} 01}$

Derived terms

Certain measures of dispersion can still be derived from the quantiles . The most important is the interquartile range (English interquartile range )

{\ displaystyle IQR: = x_ {0 {,} 75} -x_ {0 {,} 25}}

.

It indicates how far apart the upper and lower quartiles are and thus also how wide the range is in which the middle 50% of the sample lie. The (inter) quantile distance can be defined somewhat more generally than for . It indicates how wide is the range in which the middle ones of the sample lie. For it corresponds to the interquartile range. ${\ displaystyle x_ {1-p} -x_ {p}}$ ${\ displaystyle p \ in (0; 0 {,} 5)}$ ${\ displaystyle 200 \ cdot p \, \%}$ ${\ displaystyle p = 0 {,} 25}$

Another derived degree of divergence is the median deviation , English median absolute deviation . If a sample with a median is given, the median deviation is the empirical median of the modified sample . ${\ displaystyle x_ {1}, x_ {2}, \ dotsc, x_ {n}}$ ${\ displaystyle x_ {0 {,} 5}}$ ${\ displaystyle | x_ {1} -x_ {0 {,} 5} |, | x_ {2} -x_ {0 {,} 5} |, \ dotsc, | x_ {n} -x_ {0 {,} 5} |}$

presentation

Box plot of a sample

One way to represent quantiles is the box plot . The entire sample is represented by a box with two antennas. The outer boundaries of the box are the upper and lower quartiles, respectively. This means that half of the sample is in the box. The box itself is subdivided again, the dividing line is the median of the sample. The antennas are not defined uniformly. One possibility is to choose the first and the ninth decile to limit the antennas.

Individual evidence

↑ Norbert Henze: Stochastics for beginners . An introduction to the fascinating world of chance. 10th edition. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-658-03076-6 , p. 30 , doi : 10.1007 / 978-3-658-03077-3 .
↑ Eric W. Weisstein : Quantile . In: MathWorld (English).
^ Eric W. Weisstein : Interquartile Range . In: MathWorld (English).
↑ Norbert Henze: Stochastics for beginners . An introduction to the fascinating world of chance. 10th edition. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-658-03076-6 , p. 32 , doi : 10.1007 / 978-3-658-03077-3 .

[1] Norbert Henze: Stochastics for beginners . An introduction to the fascinating world of chance. 10th edition. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-658-03076-6 , p. 30 , doi : 10.1007 / 978-3-658-03077-3 .

[2] Eric W. Weisstein : Quantile . In: MathWorld (English).

[3] Eric W. Weisstein : Interquartile Range . In: MathWorld (English).

[4] Norbert Henze: Stochastics for beginners . An introduction to the fascinating world of chance. 10th edition. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-658-03076-6 , p. 32 , doi : 10.1007 / 978-3-658-03077-3 .