Skewness (statistics)

The skewness ( English skewness or skew ) is a statistical indicator that describes the type and strength of the asymmetry of a probability distribution . It shows whether and to what extent the distribution is inclined to the right (right-hand part, left-skewed, negative skewness) or to the left (left-hand part, right-hand skewed, positive skewness). Any non- symmetrical distribution is called skewed .

definition

The skewness of a random variable is the central 3rd order moment (if the 3rd order moment exists), normalized to the standard deviation : ${\ displaystyle \ gamma _ {m}}$ ${\ displaystyle X}$ ${\ displaystyle \ mu _ {3}}$ ${\ displaystyle \ sigma}$

{\ displaystyle \ gamma _ {m}: = {\ frac {\ mu _ {3}} {\ sigma ^ {3}}} = \ operatorname {E} \ left [\ left ({\ frac {X- \ mu} {\ sigma}} \ right) ^ {3} \ right] = {\ frac {\ operatorname {E} \ left [\ left (X- \ mu \ right) ^ {3} \ right]} {\ sigma ^ {3}}} = {\ frac {\ operatorname {E} \ left (X ^ {3} \ right) -3 \ operatorname {E} \ left (X ^ {2} \ right) \ mu +2 \ mu ^ {3}} {\ sigma ^ {3}}} = {\ frac {\ operatorname {E} \ left (X ^ {3} \ right) -3 \ mu \ sigma ^ {2} - \ mu ^ {3}} {\ sigma ^ {3}}}}

.

with the expected value and the variance . This representation is also called the moment coefficient of skewness . The representation results from the cumulants ${\ displaystyle \ mu = \ operatorname {E} (X)}$ ${\ displaystyle \ sigma ^ {2} = \ operatorname {Var} (X)}$ ${\ displaystyle \ kappa _ {i}}$

{\ displaystyle \ gamma _ {m} = {\ frac {\ kappa _ {3}} {\ sqrt {\ kappa _ {2} ^ {3}}}} = {\ frac {\ kappa _ {3}} {\ operatorname {Var} (X) ^ {\ frac {3} {2}}}}}

.

The skewness can take on any real value.

With negative skewness,, one speaks of a left- skewed or right- skewed distribution; in typical cases, it slopes more gently on the left than on the right. ${\ displaystyle \ gamma _ {m} <0}$

With positive skewness,, one speaks of a right- skewed or left- skewed distribution; it is typically the reverse of the slope on the right side than on the left. ${\ displaystyle \ gamma _ {m}> 0}$

Typical representatives of right-skewed distributions are the Bernoulli distribution for , the exponential distribution and the Pareto distribution for . ${\ displaystyle p <1/2}$ ${\ displaystyle k> 3}$

The skewness is invariant under linear transformation with : ${\ displaystyle a> 0}$

{\ displaystyle \ gamma _ {m} (aX + b) = \ gamma _ {m} (X)}

The following applies to the sum of independent standardized random variables : ${\ displaystyle X_ {i}}$

{\ displaystyle \ gamma _ {m} (X_ {1} + X_ {2} + \ ldots + X_ {n}) = \ left (\ gamma _ {m} (X_ {1}) + \ gamma _ {m } (X_ {2}) + \ ldots + \ gamma _ {m} (X_ {n}) \ right) / n ^ {\ frac {3} {2}}}

,

d. H. the skewness of the sum of independent and identically distributed random variables is the original skewness divided by . ${\ displaystyle {\ sqrt {n}}}$

Empirical skew

The following formula is used to calculate the skewness of an empirical frequency distribution:

{\ displaystyle g_ {m} = {\ frac {m_ {3}} {s ^ {3}}} = {\ frac {{\ tfrac {1} {n}} \ sum _ {i = 1} ^ { n} (x_ {i} - {\ overline {x}}) ^ {3}} {{\ sqrt {{\ tfrac {1} {n}} \ sum _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2}}} ^ {3}}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} \ left ( {\ frac {x_ {i} - {\ overline {x}}} {s}} \ right) ^ {3}}

So that the skewness is independent of the unit of measurement of the variable, the measured values are calculated using the arithmetic mean and the empirical standard deviation of the observed values ${\ displaystyle {\ overline {x}}}$ ${\ displaystyle s}$ ${\ displaystyle x_ {i}}$

{\ displaystyle z_ {i} = {\ frac {x_ {i} - {\ overline {x}}} {s}}}

standardized . The standardization applies

{\ displaystyle {\ overline {z}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} z_ {i} = 0 \ quad}

and .

{\ displaystyle \ quad s_ {z} ^ {2} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} z_ {i} ^ {2} = 1}

Estimating the skewness of a population

To estimate the unknown skewness of a population using sample data ( the sample size), the expected value and the variance from the sample must be estimated; H. the theoretical moments are replaced by the empirical ones: ${\ displaystyle \ gamma _ {m}}$ ${\ displaystyle x_ {1}, \ ldots, x_ {n}}$ ${\ displaystyle n}$

{\ displaystyle {\ tilde {\ gamma}} _ {m} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} \ left ({\ frac {x_ {i} - {\ overline {x}}} {s}} \ right) ^ {3}}

with the sample mean and the sample standard deviation. However, this estimator is not unbiased for unlike ${\ displaystyle {\ overline {x}}}$ ${\ displaystyle s}$ ${\ displaystyle \ gamma _ {m}}$

{\ displaystyle {\ hat {\ gamma}} _ {m} = {\ frac {n} {(n-1) (n-2)}} \ sum _ {i = 1} ^ {n} \ left ( {\ frac {x_ {i} - {\ overline {x}}} {s}} \ right) ^ {3}}

.

More misalignment

Location of mean and median

The definition goes on Karl Pearson

{\ displaystyle S = {\ frac {\ mu -x _ {\ text {med}}} {\ sigma}}}

with the expected value , the median and the standard deviation . The range of values of S is the interval . For symmetric distributions . Right-skewed distributions often have a positive , but there are exceptions to this rule of thumb. ${\ displaystyle \ mu}$ ${\ displaystyle x _ {\ text {med}}}$ ${\ displaystyle \ sigma}$ ${\ displaystyle [-1.1]}$ ${\ displaystyle S = 0}$ ${\ displaystyle S}$

If the standard deviation diverges, Pearson's definition can be generalized by calling a distribution skewed to the right when the median is less than the expected value. In this sense, the Pareto distribution is skewed to the right for any parameter . ${\ displaystyle k> 1}$

Quantile coefficient of skewness

The quantile coefficient of skewness describes the normalized difference between the distance of the - and the - quantile to the median. So it is calculated as follows: ${\ displaystyle \ alpha}$ ${\ displaystyle (1- \ alpha)}$

{\ displaystyle \ gamma _ {p} = {\ frac {(x_ {1- \ alpha} -x _ {\ text {med}}) - (x _ {\ text {med}} - x _ {\ alpha})} {x_ {1- \ alpha} -x _ {\ alpha}}}, \ quad \ alpha \ in \ left (0, {\ tfrac {1} {2}} \ right)}

The quantile coefficient can assume values between and . The quantile coefficient exists for any distribution, even if the expected value or the standard deviation should not be defined. ${\ displaystyle -1}$ ${\ displaystyle 1}$

A symmetrical distribution has the quantile coefficient ; a right-skewed (left-skewed) distribution usually has a positive (negative) quantile coefficient. For is the quartile coefficient. The Pareto distribution has positive quantile coefficients for any parameter . ${\ displaystyle 0}$ ${\ displaystyle \ alpha = {\ tfrac {1} {4}}}$ ${\ displaystyle k> 0}$

interpretation

Example of experimental data with a positive skew (right skew)

If , the distribution is skewed to the right , is , the distribution is skewed to the left . The following applies to benign distributions: In the case of right-skewed distributions, values that are smaller than the mean value are observed more frequently, so that the peak ( mode ) is to the left of the mean value; the right part of the graph is flatter than the left. If so, the distribution is balanced on both sides. With symmetrical distributions, is always . Conversely, distributions with need not be symmetric. ${\ displaystyle \ gamma _ {p}> 0}$ ${\ displaystyle \ gamma _ {p} <0}$ ${\ displaystyle \ gamma _ {p} = 0}$ ${\ displaystyle \ gamma _ {p} = 0}$ ${\ displaystyle \ gamma _ {p} = 0}$

The following rules of thumb can be stated for benign distributions:

right skew: ${\ displaystyle x _ {\ text {mod}} <x _ {\ text {med}} <{\ overline {x}}}$
symmetrical: ${\ displaystyle x _ {\ text {mod}} = x _ {\ text {med}} = {\ overline {x}}}$
left skewed: ${\ displaystyle x _ {\ text {mod}}> x _ {\ text {med}}> {\ overline {x}}}$

Skewness is a measure of the asymmetry of a probability distribution . Since the Gaussian distribution is symmetric, i. H. has zero skewness, skewness is a possible measure to compare a distribution with the normal distribution. (For a test of this property, see e.g. the Kolmogorow-Smirnow test .)

Interpretation of the skew

Right-skewed distributions can be found e.g. B. often in per capita income . There are a few people with extremely high incomes and very many people with rather low incomes. The third power gives the few very extreme values a high weight and creates a skew with a positive sign . There are different formulas to calculate skewness. The common statistics packages such as SPSS , SYSTAT , Stata etc. use formulas that differ from the above torque-based calculation rules, especially in the case of a small number of cases.

literature

WH Press et al .: Numerical Recipes in C . 2nd Edition. Cambridge University Press, 1992, chapter 14.1.

Individual evidence

↑ Bielefeld University: Andreas Handl - Symmetrie und Schiefe, p. 4 ( Memento from April 13, 2014 in the Internet Archive ) (PDF; 248 kB)
^ "SPSS 16" by Felix Brosius, page 361
^ Paul T. von Hippel: Mean, Median, and Skew: Correcting a Textbook Rule . In: Journal of Statistics Education . 13, No. 2, 2005.

Web links

Schiefe explained using graphic examples

[1] Bielefeld University: Andreas Handl - Symmetrie und Schiefe, p. 4 ( Memento from April 13, 2014 in the Internet Archive ) (PDF; 248 kB)

[2] "SPSS 16" by Felix Brosius, page 361

[3] Paul T. von Hippel: Mean, Median, and Skew: Correcting a Textbook Rule . In: Journal of Statistics Education . 13, No. 2, 2005.