Measure of dispersion (statistics)
Dispersion measures , including measures of dispersion ( Latin dispersion "dispersion" of dispergere "spread, spread, scatter") or scattering parameters called up in the descriptive statistics different metrics together, or the spread of values of a sample of a frequency distribution to a suitable position parameters describe around . The various calculation methods differ in principle in terms of their ability to be influenced or sensitivity to outliers .
Requirements for a measure of dispersion
It is a sample and a function. is called a measure of dispersion if it generally meets the following requirements:
- is a nonnegative real number that is zero when all observations are the same (there is no variability in the data) and increases as the data becomes more diverse. If at least two characteristic values are different from one another, then the data scatter among one another or around a mean value, which should also be reflected in the degree of scatter.
- Non-negativity is required for a measure of variance, since with variance, "the extent" instead of "the direction" is constitutive. A measure of dispersion should therefore be greater, the greater the difference between the observed values. The requirement is often even stricter that a measure of dispersion must not decrease when an observation value is replaced by a new feature value.
- is translation invariant , d. H. a shift of the zero point has no influence on the distribution. The following must therefore apply:
- It is also desirable that the measure of dispersion be invariant to changes in scale.
Dimensions
About the arithmetic mean
Sum of squares of deviations
The most intuitive measure of dispersion is the sum of the squares of deviations. It results as -fold empirical variance
- .
Empirical variance
One of the most important parameters of dispersion is the variance, which is defined in two slightly different variants. The origin of these differences and their use is explained in the main article. The versions are given as
respectively
In each case, the arithmetic mean of the sample denotes .
Empirical standard deviation
The standard deviation is defined as the square root of the variance and is therefore available in two versions:
respectively
An essential difference to the empirical variance is that the empirical standard deviation has the same dimension and thus the same units as the sample.
Coefficient of variation
The empirical coefficient of variation is formed as the quotient of the empirical standard deviation and the arithmetic mean :
- .
It is dimensionless and therefore not subject to units.
Mean absolute deviation
The mean absolute deviation of a random variable from its expected value is defined by
- .
This makes it the first absolute centered moment of the random variable . In the case of a specific sample with the arithmetic mean , it is calculated by
The mean absolute deviation is usually avoided in mathematical statistics in favor of the quadratic deviation, which is easier to treat analytically. The absolute value function used in the definition cannot be differentiated everywhere, which makes the calculation of the minimum more difficult.
Due to the inequality of the arithmetic-quadratic mean , the mean absolute deviation is less than or equal to the standard deviation (equality only applies to constant random variables).
For symmetric distributions, i.e. H. Distributions with the property for all real , with monotonically decreasing density for , holds
- .
The equals sign applies to constant uniform distribution .
Around the median
Quantile spacing
The quantile distance is the difference between the - and - quantile :
- With
Percent of all measured values lie within .
Interquartile range
The interquartile range , abbreviated IQR, is calculated as the difference between the quartiles and :
50% of all measured values lie within the IQR. He is - as well as the median and - insensitive to outliers. It can be shown that it has a breaking point of .
The interquartile range is equal to the quantile range
Mean absolute deviation from median
The mean absolute deviation (English mean deviation from the median , abbreviated MD ) from the median is defined by
In the case of a specific sample, it is calculated by
Due to the extremal property of the median , the absolute deviation always applies in comparison with the mean
- ,
ie the mean absolute deviation with respect to the median is even smaller than the standard deviation.
For symmetrical distributions, the median and the expected value, and thus also , agree.
The following applies to the normal distribution:
Median of the absolute deviations
The mean absolute deviation (engl. Median absolute deviation , also MedMed ), MAD, abbreviated, is defined by
In the case of a specific sample, it is calculated by
In the case of normally distributed data, the definition results in the following relationship to the standard deviation:
0.75 is the percentile of the standard normal distribution and is approximately 0.6745.
The mean absolute deviation is a robust estimate of the standard deviation. It can be shown to have a breaking point of .
Further measures of dispersion
span
The span ( English range ) is calculated as the difference between the largest and the smallest measured value:
Since the range is only calculated from the two extreme values, it is not robust against outliers.
Geometric standard deviation
The geometric standard deviation is a measure of the dispersion around the geometric mean .
Graphic forms of representation
See also
Individual evidence
- ^ Andreas Büchter, H.-W. Henn: Elementary Stochastics - An Introduction . 2nd Edition. Springer, 2007, ISBN 978-3-540-45382-6 , pp. 83 .
- ↑ Hans Friedrich Eckey et al .: Statistics: Basics - Methods - Examples. , S. 74. (1st edition 1992; 3rd edition 2002 ( ISBN 3409327010 ). The 4th edition 2005 and the 5th edition 2008 appeared under the title Descriptive Statistics: Basics - Methods - Examples).
literature
- Günter Buttler, Norman Fickel (2002), “Introduction to Statistics”, Rowohlt Verlag
- Jürgen Bortz (2005), Statistics: For human and social scientists (6th edition), Springer Verlag, Berlin
- Bernd Rönz, Hans G. Strohe (1994), Lexicon Statistics , Gabler Verlag