Normal distribution model

from Wikipedia, the free encyclopedia

In statistics , the normal distribution model or Gaussian product model is a special statistical model that is characterized by simple model assumptions. On the one hand, the collection of the data should be stochastically independent of one another; on the other hand, the data should all be normally distributed , depending on the specification with one or two unknown parameters.

The importance of the normal distribution model results both from the fact that it is a very well investigated model for which good parameter estimates, confidence intervals and tests can be given, and from the special position of the normal distribution, which is always set according to the central limit theorem when many, independent, random influences overlap.

Three cases can be distinguished:

  • One starts from a known expected value of the normal distributions and tries to make statements about the variance. An example of this would be the calibration of a scale with a specified, standardized weight.
  • One starts from a known variance of the normal distributions and tries to make statements about the expected value. This case would arise, for example, in the case of a measurement with a measuring instrument of known inaccuracy that is specified by the manufacturer.
  • Both the variance and the expected value are unknown. An example for this case would be the estimation of the shoe size of men: It is not clear which shoe size a man has “on average”, nor is it clear how much the shoe sizes vary.

Different methods are available for each of the three cases.

Expected value known and variance unknown

If the expected value and variance are known, the framework conditions are formalized as follows: The statistical model is given by

,

where the distribution class is more accurate than

is defined. Here is the known expected value. With is the n-fold product measure of probability measure referred. The model is consequently a single-parameter model and a product model . The distribution class is part of the single-parameter exponential family , because the probability density of the normal distribution is represented as

with and .

This gives the representation for the probability density over the entire space

.

The unknown variance is to be estimated, the parameter function to be estimated is thus given by

.

Parameter estimation

Both the maximum likelihood method and the moment method provide the (uncorrected) sample variance as an estimator for the unknown variance

.

She is expectant . The sufficiency of this estimator follows from the representation of the normal distribution as part of the exponential family and the corresponding canonical statistics . In addition, the estimator is also complete and therefore, according to Lehmann-Scheffé's theorem, a consistently best, unbiased estimator .

Confidence intervals

Confidence intervals for the unknown variance are based on the pivot statistic

.

It's chi-square distributed with degrees of freedom , so . A bilateral confidence interval for the confidence level is thus given by

.

Here is the - quantile of the chi-square distribution with degrees of freedom. The concrete values ​​of the quantiles can be looked up in the quantile table of the chi-square distribution .

Testing

For one- sample problems there is the chi-square test to check a variance, for two-sample problems the F-test to compare two variances.

Variance known and expected value unknown

If the variance is known and the expected value is unknown, the framework conditions are formalized as follows: the statistical model given by

,

where the distribution class is more accurate than

is defined. Here denotes the known variance. The model is consequently a single-parameter model and a product model . The distribution class is also a part of the one-parameter exponential family , because the probability density of the normal distribution is represented as

with and .

This gives the representation for the probability density over the entire space

The unknown expected value is to be estimated; the parameter function to be estimated is thus given by

.

Parameter estimation

Both the maximum likelihood method and the moment method provide the sample mean as an estimator for the expected value

the sample. The maximum likelihood estimator follows, for example, by determining the maximum of the log likelihood function , the moment estimator follows directly from the fact that the arithmetic mean is the first empirical moment and the expected value is used to estimate the first stochastic moment shall be.

The estimator is unbiased . Since it is also the canonical statistics of the exponential family, it is also sufficient . In addition, the estimator is also complete and therefore, according to Lehmann-Scheffé's theorem, a consistently best, unbiased estimator .

Confidence intervals

The confidence intervals with known variance are based on the pivot statistic

.

It is distributed according to the standard normal , i.e. for everyone .

Let it denote the - quantile of the standard normal distribution. This can be found in the quantile table of the standard normal distribution. Then a right-hand unlimited confidence interval for the unknown expected value at the confidence level is given by

.

Similarly, there is an unlimited confidence interval on the left for the unknown expected value for the confidence level through

.

A two-sided confidence interval for the confidence level is given by

.

Testing

For -sample problem exists -sample z-test and the sample t-test for two-sample problems of the two-sample z-test .

Variance and expected value unknown

If the expected value and variance are unknown, the framework conditions are formalized as follows: the statistical model is given by

,

where the distribution class is more accurate than

is defined. This is then a parametric model and a product model . The distribution class is part of the two-parameter exponential family , as it is for the probability density of the normal distribution

with and

applies.

The expected value and variance are to be estimated, the parameter functions to be estimated are thus given by

and .

Parameter estimation

The maximum likelihood method and the moment method provide the sample mean as an estimate function for the unknown expected value

.

This estimator is unbiased .

Both the maximum likelihood method and the moment method provide the (uncorrected) sample variance

as an estimator for the unknown variance. It is not true to expectation, but only asymptotically true to expectation . The Bessel correction is therefore introduced and the corrected sample variance is obtained as an unbiased estimator

.

It is an unbiased estimator for the unknown variance.

Confidence intervals

Confidence intervals for the expected value, i.e. for , are based on the pivot statistics in this model

,

in which

is. This gives you the one-sided confidence interval for the expected value for the confidence level

,

as a two-sided confidence interval for the expected value at the confidence level is obtained

Here the quantile is the Student's t-distribution with n degrees of freedom. The concrete values ​​of the quantiles can be looked up in the quantile table of Student's t-distribution .

Confidence intervals for the variance, i.e. for , are based on the pivot statistic

.

It provides the one-sided confidence interval for the variance to the confidence level

,

and the two-tailed confidence interval for the variance to the confidence level

Here the quantile is the chi-square distribution with degrees of freedom. The concrete values ​​of the quantiles can be looked up in the quantile table of the chi-square distribution .

Testing

For one- sample problems there is a chi-square test for the variance to check a variance. For two-sample problems there is an F-test for the variance to compare two variances, for the expected value see Behrens-Fisher problem .

Individual evidence

  1. ^ Ludger Rüschendorf: Mathematical Statistics . Springer Verlag, Berlin Heidelberg 2014, ISBN 978-3-642-41996-6 , p. 96 , doi : 10.1007 / 978-3-642-41997-3 .
  2. ^ Hans-Otto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 , p. 205 , doi : 10.1515 / 9783110215274 .
  3. ^ A b Ludger Rüschendorf: Mathematical Statistics . Springer Verlag, Berlin Heidelberg 2014, ISBN 978-3-642-41996-6 , p. 110-111 , doi : 10.1007 / 978-3-642-41997-3 .
  4. Claudia Czado, Thorsten Schmidt: Mathematical Statistics . Springer-Verlag, Berlin Heidelberg 2011, ISBN 978-3-642-17260-1 , p. 143-144 , doi : 10.1007 / 978-3-642-17261-8 .
  5. ^ A b c Ludger Rüschendorf: Mathematical Statistics . Springer Verlag, Berlin Heidelberg 2014, ISBN 978-3-642-41996-6 , p. 196 , doi : 10.1007 / 978-3-642-41997-3 .
  6. ^ A b c Ludger Rüschendorf: Mathematical Statistics . Springer Verlag, Berlin Heidelberg 2014, ISBN 978-3-642-41996-6 , p. 231-232 , doi : 10.1007 / 978-3-642-41997-3 .