Chi-square distribution

from Wikipedia, the free encyclopedia

The chi- square distribution or distribution (older name: Helmert-Pearson distribution , after Friedrich Robert Helmert and Karl Pearson ) is a continuous probability distribution over the set of non-negative real numbers. Usually, the “chi-square distribution” means the central chi-square distribution. The chi-square distribution has a single parameter, namely the number of degrees of freedom .

Densities of the chi-square distribution with different numbers of degrees of freedom k

It is one of the distributions that can be derived from the normal distribution : If one has random variables that are independent and standard normal distributed , the chi-square distribution with degrees of freedom is defined as the distribution of the sum of the squared random variables . Such sums of squared random variables occur in estimating functions such as sample variance for estimating empirical variance . The chi-square distribution thus enables, among other things, an assessment of the compatibility of a presumed functional relationship (dependence on time, temperature, pressure, etc.) with empirically determined measuring points. Can e.g. E.g. a straight line to explain the data, or do you need a parabola or perhaps a logarithm? Different models are chosen and the one with the best goodness of fit , the smallest chi-square, gives the best explanation of the data. By quantifying the random fluctuations, the chi-square distribution puts the selection of different explanatory models on a numerical basis. In addition, once the empirical variance has been determined, it allows the estimation of the confidence interval that includes the (unknown) value of the variance of the population with a certain probability. These and other uses are described below and in the Chi-square test article.

The chi-square distribution was introduced in 1876 by Friedrich Robert Helmert , the name comes from Karl Pearson (1900).

definition

Density and distribution of several chi-squared random variables

The square of a standard normally distributed random variable follows a chi-square distribution with one degree of freedom :

.

Furthermore, if there are stochastically independent chi-square-distributed random variables, then their sum is chi-square-distributed with the sum of the respective degrees of freedom

.

So the chi-square distribution is reproductive . Let be stochastically independent and standard normally distributed random variables , then it holds for their sum of squares that they are chi-square distributed with the number of degrees of freedom :

.

The symbol is an abbreviation for “follows the distribution”. E.g. means ; also often written as: The random variable follows a chi-square distribution with the number of degrees of freedom . The sum of squared sizes cannot have negative values.

In contrast to this, the following applies to the simple sum with a distribution symmetrical about the zero point.

density

The density of the distribution with degrees of freedom has the form:

Stands for the gamma function . You can use the values ​​of

.

to calculate.

Distribution function

The distribution function can be written using the regularized incomplete gamma function :

If is a natural number, then the distribution function can (more or less) be represented in an elementary way:

where denotes the error function . The distribution function describes the probability that lies in the interval .

properties

Expected value

The expectation of the chi-square distribution with degrees of freedom is equal to the number of degrees of freedom

.

Assuming a standard normally distributed population, the value should be close to 1 if the population variance is correctly estimated .

Variance

The variance of the chi-square distribution with degrees of freedom is equal to 2 times the number of degrees of freedom

.

mode

The mode of chi-square distribution with degrees of freedom is for .

Crookedness

The skewness of the chi-square distribution with degrees of freedom is

.

The chi-square distribution has positive skewness; i.e., it is skewed to the left or to the right. The higher the number of degrees of freedom , the less skewed the distribution.

Kurtosis

The kurtosis (curvature) of the chi-square distribution with degrees of freedom is given by

.

The excess compared to the normal distribution thus results in   . Therefore: the higher the number of degrees of freedom , the smaller the excess.

Moment generating function

The moment-generating function for has the form

.

Characteristic function

The characteristic function for results from the moment generating function as:

.

entropy

The entropy of the chi-square distribution (expressed in nats ) is

where ψ ( p ) denotes the digamma function .

Non-central chi-square distribution

If the normally distributed random variables are not centered on their expected value (i.e., if not all are), the non-central chi-square distribution is obtained. It has a second parameter in addition to the non-centrality parameter .

Be so is

with .

In particular, it follows from and that is.

A second possibility to generate a non-central chi-square distribution is as a mixed distribution of the central chi-square distribution. It is

,

when drawn from a Poisson distribution .

Density function

The density function of the non-central chi-square distribution is

for , for .

The sum over j leads to a modified Bessel function of the first kind . This gives the density function the following form:

for .

Expected value and variance of the non-central chi-square distribution and , like the density itself, go over into the corresponding expressions of the central chi-square distribution.

Distribution function

The distribution function of the non-central chi-square distribution can be expressed using the Marcum Q function .

example

To make measurements of a size which come from a normally distributed population. Let be the empirical mean of the measured values ​​and

the corrected sample variance . Then z. B. Specify the 95% confidence interval for the population variance :

whereby it is determined through and through , and therefore also . The limits result from the fact that how is distributed.

Derivation of the distribution of the sample variance

Let be a sample of measured values, drawn from a normally distributed random variable with an empirical mean and sample variance as estimators for the expected value and variance of the population.

Then it can be shown that how is distributed .

According to Helmert, these are transformed into new variables using an orthonormal linear combination . The transformation is:

   

The new independent variables are distributed like normally with the same variance , but with the expected value both due to the convolutional invariance of the normal distribution .

In addition, the following applies to the coefficients in (if , is ) because of the orthonormality ( Kronecker delta ) and thus

Therefore, the sum of the squared deviations now results

and finally after division through

The expression on the left is apparently distributed like a sum of squared standard normally distributed independent variables with summands, as required for.

Accordingly, the sum is chi-square distributed with degrees of freedom , while according to the definition the chi-square sum . One degree of freedom is “consumed” here, because due to the focus property of the empirical mean , the last deviation is already determined by the first . Consequently, only deviations vary freely and the empirical variance is therefore averaged by dividing by the number of degrees of freedom .

Relationship to other distributions

Relationship to the gamma distribution

The chi-square distribution is a special case of the gamma distribution . Is , then applies

Relationship to normal distribution

Quantiles of a normal distribution and a chi-square distribution
  • Let be independent and standard normally distributed random variables , then it holds for their sum of squares that they are chi-squared distributed with the number of degrees of freedom :
.
  • For is approximately standard normal distributed.
  • For the random variable is approximately normally distributed, with expected value and standard deviation or, in the case of a non-central chi-square distribution, with expected value and standard deviation .

Relationship to the exponential distribution

A chi-square distribution with 2 degrees of freedom is an exponential distribution with the parameter .

Relationship to the Erlang distribution

A chi-square distribution with degrees of freedom is identical to an Erlang distribution with degrees of freedom and .

Relationship to the F distribution

Let and be independent chi-square distributed random variables with and degrees of freedom, then the quotient is

F -distributed withnumerator degrees of freedom anddenominator degrees of freedom.

Relationship to the Poisson distribution

The distribution functions of the Poisson distribution and the chi-square distribution are related in the following way:

The probability of finding more events or more events in an interval within which one expects events on average equals the probability that the value of is. It is true

,

with and as regularized gamma functions .

Relationship to constant uniform distribution

For straight , the distribution can be formed as a fold with the help of the uniformly continuous density :

,

where are the independent uniformly continuously distributed random variables.

On the other hand, the following applies to odd

Derivation of the density function

The density of the random variable , with independent and standard normal distribution, results from the common density of the random variables . This common density is the -fold product of the standard normal distribution density :

The following applies to the density sought:

With

In the limit, the sum in the argument of the exponential function is equal . One can show that one can pull the integrand before the integral and the limit.

The remaining integral

corresponds to the volume of the shell between the sphere with radius and the sphere with radius ,

where indicates the volume of the n -dimensional sphere with radius R.

It follows:

and after insertion into the expression for the required density: .

Quantile function

The quantile function of the chi-square distribution is the solution to the equation and can therefore in principle be calculated using the inverse function. Specifically applies here

with as the inverse of the regularized incomplete gamma function. This value is entered in the quantile table under the coordinates and .

Quantile function for small sample sizes

For a few values (1, 2, 4) the quantile function can also be specified as an alternative:

wherein the error function , the lower branch of the Lambert W function referred to and the Euler number .

Approximation of the quantile function for fixed probabilities

For certain fixed probabilities , the associated quantiles can be calculated using the simple function of the sample size

approximate with the parameters from the table, where the signum function denotes, which simply represents the sign of its argument:

0.005 0.01 0.025 0.05 0.1 0.5 0.9 0.95 0.975 0.99 0.995
−3.643 −3.298 −2.787 −2.34 −1.83 0 1.82 2.34 2.78 3.29 3.63
1.8947 1.327 0.6 0.082 −0.348 −0.67 −0.58 −0.15 0.43 1.3 2
−2.14 −1.46 −0.69 −0.24 0 0.104 −0.34 −0.4 −0.4 −0.3 0

The comparison with a table shows a relative error below 0.4% and below 0.1%. Since the distribution for large in a normal distribution with standard deviation passes , has the parameter from the table, the here freely adjusted was, at the corresponding probability about the size of -fold of quantile of a normal distribution ( ), where the inverse function of the error function means.

For example , the 95% confidence interval for the population variance from the Example section can be: B. with the two functions from the lines with and in a simple manner as a function of graphically.

The median is in the column of the table with .

literature

  • Joachim Hartung, Bärbel Elpelt, Karl-Heinz Klösener: Statistics . 12th edition. Oldenbourg, 1999, ISBN 3-486-24984-3 , pp. 152 ff .

Web links

Individual evidence

  1. ^ R. Barlow: Statistics Wiley, 1989, p. 152 (Goodness of Fit).
  2. Kendall, Stuart: The Advanced Theory Of Statistics Vol. 2 Third Edition, London, 1973, p. 436 (Goodness of Fit).
  3. ^ FR Helmert. In: Zeitschrift fuer Math. Und Physik 21, 1876, pp. 102-219. Karl Pearson: On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is such that it Can Reasonably Be Supposed to have Arisen from Random Sampling. In: Philosophical Magazine 5, Volume 50, 1900, pp. 157-175. Quoted from L. Butterer: Mathematical Statistics . Springer, Vienna 1966, p. 93
  4. George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl , TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 51.
  5. ^ Wolfram Mathworld
  6. ^ AC Davison: Statistical Models , Cambridge University Press 2008, ISBN 1-4672-0331-9 , Chapter 3.2
  7. ^ Albert H. Nuttall: Some Integrals Involving the Q M Function. In: IEEE Transactions on Information Theory. No. 21, 1975, pp. 95-96, doi : 10.1109 / TIT.1975.1055327 .
  8. Helmert. In: Astronomische Nachrichten , 88, 1876, pp. 113-132
  9. George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl , TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 51.