# Skewness (statistics)

The skewness ( English skewness or skew ) is a statistical indicator that describes the type and strength of the asymmetry of a probability distribution . It shows whether and to what extent the distribution is inclined to the right (right-hand part, left-skewed, negative skewness) or to the left (left-hand part, right-hand skewed, positive skewness). Any non- symmetrical distribution is called skewed .

## definition

The skewness of a random variable is the central 3rd order moment (if the 3rd order moment exists), normalized to the standard deviation : ${\ displaystyle \ gamma _ {m}}$ ${\ displaystyle X}$ ${\ displaystyle \ mu _ {3}}$ ${\ displaystyle \ sigma}$ ${\ displaystyle \ gamma _ {m}: = {\ frac {\ mu _ {3}} {\ sigma ^ {3}}} = \ operatorname {E} \ left [\ left ({\ frac {X- \ mu} {\ sigma}} \ right) ^ {3} \ right] = {\ frac {\ operatorname {E} \ left [\ left (X- \ mu \ right) ^ {3} \ right]} {\ sigma ^ {3}}} = {\ frac {\ operatorname {E} \ left (X ^ {3} \ right) -3 \ operatorname {E} \ left (X ^ {2} \ right) \ mu +2 \ mu ^ {3}} {\ sigma ^ {3}}} = {\ frac {\ operatorname {E} \ left (X ^ {3} \ right) -3 \ mu \ sigma ^ {2} - \ mu ^ {3}} {\ sigma ^ {3}}}}$ .

with the expected value and the variance . This representation is also called the moment coefficient of skewness . The representation results from the cumulants${\ displaystyle \ mu = \ operatorname {E} (X)}$ ${\ displaystyle \ sigma ^ {2} = \ operatorname {Var} (X)}$ ${\ displaystyle \ kappa _ {i}}$ ${\ displaystyle \ gamma _ {m} = {\ frac {\ kappa _ {3}} {\ sqrt {\ kappa _ {2} ^ {3}}}} = {\ frac {\ kappa _ {3}} {\ operatorname {Var} (X) ^ {\ frac {3} {2}}}}}$ .

The skewness can take on any real value.

• With negative skewness,, one speaks of a left- skewed or right- skewed distribution; in typical cases, it slopes more gently on the left than on the right.${\ displaystyle \ gamma _ {m} <0}$ • With positive skewness,, one speaks of a right- skewed or left- skewed distribution; it is typically the reverse of the slope on the right side than on the left.${\ displaystyle \ gamma _ {m}> 0}$ Typical representatives of right-skewed distributions are the Bernoulli distribution for , the exponential distribution and the Pareto distribution for . ${\ displaystyle p <1/2}$ ${\ displaystyle k> 3}$ The skewness is invariant under linear transformation with : ${\ displaystyle a> 0}$ ${\ displaystyle \ gamma _ {m} (aX + b) = \ gamma _ {m} (X)}$ The following applies to the sum of independent standardized random variables : ${\ displaystyle X_ {i}}$ ${\ displaystyle \ gamma _ {m} (X_ {1} + X_ {2} + \ ldots + X_ {n}) = \ left (\ gamma _ {m} (X_ {1}) + \ gamma _ {m } (X_ {2}) + \ ldots + \ gamma _ {m} (X_ {n}) \ right) / n ^ {\ frac {3} {2}}}$ ,

d. H. the skewness of the sum of independent and identically distributed random variables is the original skewness divided by . ${\ displaystyle {\ sqrt {n}}}$ ### Empirical skew

The following formula is used to calculate the skewness of an empirical frequency distribution:

${\ displaystyle g_ {m} = {\ frac {m_ {3}} {s ^ {3}}} = {\ frac {{\ tfrac {1} {n}} \ sum _ {i = 1} ^ { n} (x_ {i} - {\ overline {x}}) ^ {3}} {{\ sqrt {{\ tfrac {1} {n}} \ sum _ {i = 1} ^ {n} (x_ {i} - {\ overline {x}}) ^ {2}}} ^ {3}}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} \ left ( {\ frac {x_ {i} - {\ overline {x}}} {s}} \ right) ^ {3}}$ So that the skewness is independent of the unit of measurement of the variable, the measured values ​​are calculated using the arithmetic mean and the empirical standard deviation of the observed values${\ displaystyle {\ overline {x}}}$ ${\ displaystyle s}$ ${\ displaystyle x_ {i}}$ ${\ displaystyle z_ {i} = {\ frac {x_ {i} - {\ overline {x}}} {s}}}$ standardized . The standardization applies

${\ displaystyle {\ overline {z}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} z_ {i} = 0 \ quad}$ and .${\ displaystyle \ quad s_ {z} ^ {2} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} z_ {i} ^ {2} = 1}$ ### Estimating the skewness of a population

To estimate the unknown skewness of a population using sample data ( the sample size), the expected value and the variance from the sample must be estimated; H. the theoretical moments are replaced by the empirical ones: ${\ displaystyle \ gamma _ {m}}$ ${\ displaystyle x_ {1}, \ ldots, x_ {n}}$ ${\ displaystyle n}$ ${\ displaystyle {\ tilde {\ gamma}} _ {m} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} \ left ({\ frac {x_ {i} - {\ overline {x}}} {s}} \ right) ^ {3}}$ with the sample mean and the sample standard deviation. However, this estimator is not unbiased for unlike ${\ displaystyle {\ overline {x}}}$ ${\ displaystyle s}$ ${\ displaystyle \ gamma _ {m}}$ ${\ displaystyle {\ hat {\ gamma}} _ {m} = {\ frac {n} {(n-1) (n-2)}} \ sum _ {i = 1} ^ {n} \ left ( {\ frac {x_ {i} - {\ overline {x}}} {s}} \ right) ^ {3}}$ .

## More misalignment

### Location of mean and median

The definition goes on Karl Pearson

${\ displaystyle S = {\ frac {\ mu -x _ {\ text {med}}} {\ sigma}}}$ with the expected value , the median and the standard deviation . The range of values ​​of S is the interval . For symmetric distributions . Right-skewed distributions often have a positive , but there are exceptions to this rule of thumb. ${\ displaystyle \ mu}$ ${\ displaystyle x _ {\ text {med}}}$ ${\ displaystyle \ sigma}$ ${\ displaystyle [-1.1]}$ ${\ displaystyle S = 0}$ ${\ displaystyle S}$ If the standard deviation diverges, Pearson's definition can be generalized by calling a distribution skewed to the right when the median is less than the expected value. In this sense, the Pareto distribution is skewed to the right for any parameter . ${\ displaystyle k> 1}$ ### Quantile coefficient of skewness

The quantile coefficient of skewness describes the normalized difference between the distance of the - and the - quantile to the median. So it is calculated as follows: ${\ displaystyle \ alpha}$ ${\ displaystyle (1- \ alpha)}$ ${\ displaystyle \ gamma _ {p} = {\ frac {(x_ {1- \ alpha} -x _ {\ text {med}}) - (x _ {\ text {med}} - x _ {\ alpha})} {x_ {1- \ alpha} -x _ {\ alpha}}}, \ quad \ alpha \ in \ left (0, {\ tfrac {1} {2}} \ right)}$ The quantile coefficient can assume values ​​between and . The quantile coefficient exists for any distribution, even if the expected value or the standard deviation should not be defined. ${\ displaystyle -1}$ ${\ displaystyle 1}$ A symmetrical distribution has the quantile coefficient ; a right-skewed (left-skewed) distribution usually has a positive (negative) quantile coefficient. For is the quartile coefficient. The Pareto distribution has positive quantile coefficients for any parameter . ${\ displaystyle 0}$ ${\ displaystyle \ alpha = {\ tfrac {1} {4}}}$ ${\ displaystyle k> 0}$ ## interpretation

If , the distribution is skewed to the right , is , the distribution is skewed to the left . The following applies to benign distributions: In the case of right-skewed distributions, values ​​that are smaller than the mean value are observed more frequently, so that the peak ( mode ) is to the left of the mean value; the right part of the graph is flatter than the left. If so, the distribution is balanced on both sides. With symmetrical distributions, is always . Conversely, distributions with need not be symmetric. ${\ displaystyle \ gamma _ {p}> 0}$ ${\ displaystyle \ gamma _ {p} <0}$ ${\ displaystyle \ gamma _ {p} = 0}$ ${\ displaystyle \ gamma _ {p} = 0}$ ${\ displaystyle \ gamma _ {p} = 0}$ The following rules of thumb can be stated for benign distributions:

• right skew: ${\ displaystyle x _ {\ text {mod}} • symmetrical: ${\ displaystyle x _ {\ text {mod}} = x _ {\ text {med}} = {\ overline {x}}}$ • left skewed: ${\ displaystyle x _ {\ text {mod}}> x _ {\ text {med}}> {\ overline {x}}}$ Skewness is a measure of the asymmetry of a probability distribution . Since the Gaussian distribution is symmetric, i. H. has zero skewness, skewness is a possible measure to compare a distribution with the normal distribution. (For a test of this property, see e.g. the Kolmogorow-Smirnow test .)

## Interpretation of the skew

Right-skewed distributions can be found e.g. B. often in per capita income . There are a few people with extremely high incomes and very many people with rather low incomes. The third power gives the few very extreme values ​​a high weight and creates a skew with a positive sign . There are different formulas to calculate skewness. The common statistics packages such as SPSS , SYSTAT , Stata etc. use formulas that differ from the above torque-based calculation rules, especially in the case of a small number of cases.