Quantile (probability theory)

from Wikipedia, the free encyclopedia
Two examples: One is the standard normal distribution and one is a chi-square distribution with three degrees of freedom (skewed distribution). Their quantiles are assigned to the respective probabilities; the area under the depicted density from minus infinity to the quantile is the respective value.

A quantile is a measure of position in statistics . A quantile is clearly a threshold value: a certain proportion of the values ​​is smaller than the quantile, the rest is larger. For example, the 25% quantile is the value for which 25% of all values ​​are less than this value. Quantiles formalize practical statements such as “25% of all women are shorter than 1.62 m” - where 1.62 m is the 25% quantile.

A well-known representation and illustration of a quantile function from econometrics is the parade of incomes (Pen's Parade) by economist Jan Pen on income distribution .

More precisely, the -quantile, where a real number is between 0 and 1, is a value of a variable or random variable that divides the set of all characteristic values (casually “the distribution”) into two sections: To the left of the -quantile is the proportion of all observed values ​​or the total number of random values ​​or the area under the density curve ; to the right of this is the respective remaining portion . The number is also called the percentage of the shortfall.

Special quantiles are the median , quartiles , quintiles , deciles and percentiles .

In statistics, the quantile of the order or quantile (also known as “fractile”) is a characteristic value below which a specified proportion of all cases of the distribution lies. Any value below is less than this specified proportion. Here may undershoot proportion as a real number between 0 (no case of distribution) and 1 (all cases or 100% of the distribution) can be specified.

definition

For probability distributions

A probability distribution is given , i.e. the real numbers , provided with Borel's σ-algebra .

Then a real number is called a quantile (of ) if:

and .

In particular, more than one quantile can exist.

For random variables

A real random variable is given .

Then a real number is called a quantile (of ) if:

and .

Thus the -quantile of the random variable is exactly the -quantile of its distribution .

Definition via distribution functions

Quantiles can also be defined using distribution functions. If the distribution function is from or from , then a -quantile is called if

and .

Here is the limit value on the left .

Determination and examples

With continuous distribution functions

If the distribution function of the random variable or the probability distribution is continuous , i.e. the distribution is a continuous probability distribution , the definition is simplified. The quantile is then a solution to the equation

This follows from the definition of the quantile via the distribution function, since the left-hand limit value in the third criterion then coincides with the function value at the point due to the continuity .

example

If you consider the exponential distribution with parameters , then it has the distribution function

Solve the equation

for an after , the quantile delivers . Here is

.

If the distribution function is constant over an interval, there are ambiguous -quantiles. Looking at the distribution function

,

so has the equation

infinite solutions. Each from the interval is then a quantile (i.e. a median ).

If there is a probability density function

If the random variable or the probability distribution has a probability density function (it is therefore an absolutely continuous distribution ), the -quantile is the solution of the equation

.

This follows directly from the fact that absolutely continuous distributions always have a continuous distribution function, this can be determined via the integral and the statement in the section above.

example

In the case of distributions with probability density functions, ambiguous quantiles occur when the density function is constantly zero over an interval. The distribution defined above via the distribution function has the probability density function

The ambiguous median derived above is caused here by the interval over which the probability density function is constantly equal to zero.

Ambiguity and unambiguous definition

Quantiles to the probabilities
The quantile function

If it can be inverted, for example in the case of continuous distributions with a strictly monotonic distribution function, the upper and lower limits coincide, making the above-mentioned set one-element or the quantile unique .

The function is called the quantile function or generalized inverse distribution function, the value , sometimes also written, accordingly -quantile of or of ( if it is clear which random variable is meant, this is often left out).

In the graphics on the right is the unambiguous -quantile, also the unambiguous -quantile, -quantile and -quantile

Has a jump point at , so is , so applies to almost everyone with .

In the graphic above right is

and therefore .

If one cannot be inverted, i.e. is constant to a certain extent, the quantile function for this has a jump point where it specifies the smallest possible quantile as the function value . In the graphic is

  • the smallest possible quantile,
  • the largest possible quantile, and
  • each a further quantile.

The often used 50% quantile even uses its own terminology for better differentiation: the sub- median is the smallest possible 50% quantile, the median is the middle 50% quantile and the upper median is the largest possible 50% quantile, with all three can clearly fall apart.

example

The quantile (i.e. the 0.3 quantile) is the value of the point in a distribution below which 30% of all cases of the distribution are located.

A quantile with an undershoot portion

Special quantiles

For some specific quantiles the -quantiles have additional names.

Median

The median or central value corresponds to the quantile (0.5 quantile). All cases of distribution are thus divided into two parts of equal scope. For every division into an odd number of quantiles with equidistantly distributed (which includes an even number of parts with the same circumference), the median corresponds to the middle quantile (for example the 2nd quartile Q2 or the 50th percentile P50).

Tercile

Terciles are used to divide the set of values ​​in order of magnitude into three equal parts: lower, middle and upper third.

Quartile

Representation of the interquartile range of a normal distribution

Quartiles ( Latin "quarter values") are the quantiles (0.25-quantile), (0.5-quantile = median) and (0.75-quantile), which are also called Q1 ("lower quartile") , Q2 (" middle quartile ”) and Q3 (“ upper quartile ”) . They are one of the most frequently used form of quantiles in statistics .

The (inter) quartile distance or also (inter) quartile distance (English interquartile range ) describes the difference between the upper and the lower quartile, i.e. it comprises 50% of the distribution. The interquartile range is used as a measure of dispersion .

See also: Variation (statistics)

Quintile

Using quintiles (Latin for “fifth values”), the set of values ​​in the distribution is broken down into 5 equal parts. Below the first quintile, i.e. H. of the quantile , 20% of the values ​​of the distribution lie below the second quintile (quantile ) 40%, etc.

Decile

With deciles (Latin for “tenths of a value”) the set of distributed values ​​is broken down into 10 equal parts. Accordingly, z. B. below the third decile (quantile ) 30% of the values. Deciles divide a data bundle sorted according to size into 10 parts of equal size. The 10% decile (or 1st decile) indicates which value separates the lower 10% from the upper 90% of the data values, the 2nd decile which value separates the lower 20% from the upper 80% of the values, and so on The distance between the 10% decile and the 90% decile is called the interdecile range.

Percentile

By percentiles (Latin "hundredth values"), and percentile ranks , called the distribution is divided into 100 equal parts extensive. Percentiles divide the distribution into 1% segments. Therefore, percentiles can be thought of as quantiles where is an integer. The quantile corresponds to the percentile P97: 97% of all cases of the distribution are below this point.

a fractile

For out , the quantile is also known as the fractile. This division is z. B. used in the so-called " Pareto principle " conjecture.

See also

literature

  • Hans-Otto Georgii: Stochastics, introduction to probability theory and statistics (=  De Gruyter textbook ). 2nd Edition. de Gruyter, Berlin / New York 2004, ISBN 3-11-018282-3 , p. 225 (definition: quantile, quartile, a-fractile).

Individual evidence

  1. ^ A b Hans-Otto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 , p. 233 , doi : 10.1515 / 9783110215274 .