Skewness (statistics)

from Wikipedia, the free encyclopedia

The skewness ( English skewness or skew ) is a statistical indicator that describes the type and strength of the asymmetry of a probability distribution . It shows whether and to what extent the distribution is inclined to the right (right-hand part, left-skewed, negative skewness) or to the left (left-hand part, right-hand skewed, positive skewness). Any non- symmetrical distribution is called skewed .

definition

The skewness of a random variable is the central 3rd order moment (if the 3rd order moment exists), normalized to the standard deviation :

.

with the expected value and the variance . This representation is also called the moment coefficient of skewness . The representation results from the cumulants

.

The skewness can take on any real value.

Linksschief.svg Right skew.svg
  • With negative skewness,, one speaks of a left- skewed or right- skewed distribution; in typical cases, it slopes more gently on the left than on the right.
  • With positive skewness,, one speaks of a right- skewed or left- skewed distribution; it is typically the reverse of the slope on the right side than on the left.

Typical representatives of right-skewed distributions are the Bernoulli distribution for , the exponential distribution and the Pareto distribution for .

The skewness is invariant under linear transformation with :

The following applies to the sum of independent standardized random variables :

,

d. H. the skewness of the sum of independent and identically distributed random variables is the original skewness divided by .

Empirical skew

The following formula is used to calculate the skewness of an empirical frequency distribution:

So that the skewness is independent of the unit of measurement of the variable, the measured values ​​are calculated using the arithmetic mean and the empirical standard deviation of the observed values

standardized . The standardization applies

and .

Estimating the skewness of a population

To estimate the unknown skewness of a population using sample data ( the sample size), the expected value and the variance from the sample must be estimated; H. the theoretical moments are replaced by the empirical ones:

with the sample mean and the sample standard deviation. However, this estimator is not unbiased for unlike

.

More misalignment

Location of mean and median

The definition goes on Karl Pearson

with the expected value , the median and the standard deviation . The range of values ​​of S is the interval . For symmetric distributions . Right-skewed distributions often have a positive , but there are exceptions to this rule of thumb.

If the standard deviation diverges, Pearson's definition can be generalized by calling a distribution skewed to the right when the median is less than the expected value. In this sense, the Pareto distribution is skewed to the right for any parameter .

Quantile coefficient of skewness

The quantile coefficient of skewness describes the normalized difference between the distance of the - and the - quantile to the median. So it is calculated as follows:

The quantile coefficient can assume values ​​between and . The quantile coefficient exists for any distribution, even if the expected value or the standard deviation should not be defined.

A symmetrical distribution has the quantile coefficient ; a right-skewed (left-skewed) distribution usually has a positive (negative) quantile coefficient. For is the quartile coefficient. The Pareto distribution has positive quantile coefficients for any parameter .

interpretation

Example of experimental data with a positive skew (right skew)

If , the distribution is skewed to the right , is , the distribution is skewed to the left . The following applies to benign distributions: In the case of right-skewed distributions, values ​​that are smaller than the mean value are observed more frequently, so that the peak ( mode ) is to the left of the mean value; the right part of the graph is flatter than the left. If so, the distribution is balanced on both sides. With symmetrical distributions, is always . Conversely, distributions with need not be symmetric.

The following rules of thumb can be stated for benign distributions:

  • right skew:
  • symmetrical:
  • left skewed:

Skewness is a measure of the asymmetry of a probability distribution . Since the Gaussian distribution is symmetric, i. H. has zero skewness, skewness is a possible measure to compare a distribution with the normal distribution. (For a test of this property, see e.g. the Kolmogorow-Smirnow test .)

Interpretation of the skew

Right-skewed distributions can be found e.g. B. often in per capita income . There are a few people with extremely high incomes and very many people with rather low incomes. The third power gives the few very extreme values ​​a high weight and creates a skew with a positive sign . There are different formulas to calculate skewness. The common statistics packages such as SPSS , SYSTAT , Stata etc. use formulas that differ from the above torque-based calculation rules, especially in the case of a small number of cases.

See also

literature

  • WH Press et al .: Numerical Recipes in C . 2nd Edition. Cambridge University Press, 1992, chapter 14.1.

Individual evidence

  1. Bielefeld University: Andreas Handl - Symmetrie und Schiefe, p. 4 ( Memento from April 13, 2014 in the Internet Archive ) (PDF; 248 kB)
  2. ^ "SPSS 16" by Felix Brosius, page 361
  3. ^ Paul T. von Hippel: Mean, Median, and Skew: Correcting a Textbook Rule . In: Journal of Statistics Education . 13, No. 2, 2005.

Web links