# Expectancy

Unbiasedness (often unbiasedness , English unbiasedness ) referred to in the mathematical statistics a property of estimator (short: an estimator). An estimator is called fair if its expected value is equal to the true value of the parameter to be estimated . If an estimator is not expected to be true, the estimator is said to be biased . The extent to which its expected value deviates from the true value is called distortion or bias. The bias expresses the systematic error of the estimator.

In addition to consistency , sufficiency and (asymptotic) efficiency , faithfulness to expectations is one of the four common criteria for assessing the quality of estimators. Furthermore, together with sufficiency and invariance / equivariance, it belongs to the typical reduction principles of mathematical statistics.

## meaning

Faithfulness to expectation is an important property of an estimator because the variance of most estimators converges to zero as the sample size increases. In other words, the distribution is drawn around the expected value of the estimator, and thus in the case of true-to-expectation estimators by the desired true parameter of the population . With unbiased estimates, we can expect that the larger the sample size, the smaller the difference between the estimated value calculated from the sample and the true parameter.

In addition to the practical assessment of the quality of estimators, the concept of faithfulness to expectations is also of great importance for mathematical estimation theory. In the class of all unbiased estimators, it is possible to prove the existence and uniqueness of the best estimator - under suitable conditions for the underlying distribution model . These are unbiased estimators that have minimal variance among all possible unbiased estimators.

## Basic idea and introductory examples

By an unknown real parameter to estimate a population, is calculated in mathematical statistics from a random sample by using a suitably selected function , an estimate . In general, suitable estimation functions can be determined using estimation methods , e.g. B. the maximum likelihood method win. ${\ displaystyle \ gamma}$ ${\ displaystyle X_ {1}, \ dotsc, X_ {n}}$${\ displaystyle g}$${\ displaystyle g (X_ {1}, \ dotsc, X_ {n})}$

Since the sample variables are random variables , the estimator itself is also a random variable. It is called unbiased if the expected value of this random variable is always the same as the parameter , regardless of the actual value . ${\ displaystyle X_ {1}, \ dotsc, X_ {n}}$ ${\ displaystyle g (X_ {1}, \ dotsc, X_ {n})}$${\ displaystyle \ gamma}$${\ displaystyle \ gamma}$

### Example sample means

The sample mean is usually used to estimate the expected value of the population${\ displaystyle \ gamma = \ mu}$

${\ displaystyle g (X_ {1}, \ dotsc, X_ {n}) = {\ overline {X}} _ {n} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} X_ {i}}$

used. If all sample variables are drawn randomly from the population, then all have the expected value . The expected value of the sample mean is thus calculated ${\ displaystyle X_ {i}}$${\ displaystyle \ operatorname {E} (X_ {i}) = \ mu}$

${\ displaystyle \ operatorname {E} ({\ overline {X}} _ {n}) = \ operatorname {E} \ left ({\ frac {1} {n}} \ sum _ {i = 1} ^ { n} X_ {i} \ right) = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} \ operatorname {E} (X_ {i}) = {\ frac {1} {n}} \ cdot n \ cdot \ mu = \ mu}$.

The sample mean is therefore an unbiased estimator of the unknown distribution parameter . ${\ displaystyle \ mu}$

Distribution of the estimator for different sample sizes .${\ displaystyle {\ overline {X}} _ {n}}$${\ displaystyle n}$

If the population is normally distributed with expected value and variance , then the distribution of can be specified exactly. In this case ${\ displaystyle \ mu}$${\ displaystyle \ sigma ^ {2}}$${\ displaystyle {\ overline {X}} _ {n}}$

${\ displaystyle {\ overline {X}} _ {n} \ sim {\ mathcal {N}} (\ mu, \ sigma ^ {2} / n),}$

that is, the sample mean is also normally distributed with expectation and variance . If the sample size is large, this distribution statement is at least approximately valid due to the central limit theorem , even if the population is not normally distributed. So the variance of this estimator converges to 0 when the sample size approaches infinity. The graphic on the right shows how the distribution of the sample means contracts to a fixed value for different sample sizes. The fact that this is true to expectations ensures that this value is the parameter being sought . ${\ displaystyle \ mu}$${\ displaystyle {\ tfrac {\ sigma ^ {2}} {n}}}$${\ displaystyle n}$${\ displaystyle n}$${\ displaystyle \ mu}$

### Example relative frequency

In order to estimate the probability with which a certain characteristic occurs in the population, a sample of size is selected at random and the absolute frequency of the characteristic in the sample is counted. The random variable is then binomially distributed with the parameters and , in particular, it applies to its expected value . For the relative frequency${\ displaystyle p}$${\ displaystyle n}$ ${\ displaystyle X}$${\ displaystyle X}$${\ displaystyle p}$${\ displaystyle n}$${\ displaystyle \ operatorname {E} (X) = np}$

${\ displaystyle h_ {n} = {\ frac {X} {n}}}$

then follows, that is, it is an unbiased estimator of the unknown probability . ${\ displaystyle \ operatorname {E} (h_ {n}) = {\ tfrac {1} {n}} \ operatorname {E} (X) = {\ tfrac {np} {n}} = p,}$${\ displaystyle p}$

## definition

In the modern, maßtheoretisch established mathematical statistics a statistical experiment is a statistical model described. This consists of a set , the sample space, along with a σ-algebra and a family of probability measures on . ${\ displaystyle ({\ mathcal {X}}, {\ mathcal {F}}, P _ {\ vartheta}: \ vartheta \ in \ Theta)}$${\ displaystyle {\ mathcal {X}}}$ ${\ displaystyle {\ mathcal {F}}}$ ${\ displaystyle (P _ {\ vartheta}) _ {\ vartheta \ in \ Theta}}$${\ displaystyle {\ mathcal {X}}}$

It is a point estimator

${\ displaystyle T \ colon {\ mathcal {X}} \ to \ mathbb {R}}$

as well as a function

${\ displaystyle \ gamma \ colon \ Theta \ to \ mathbb {R}}$

given (in the parametric case the so-called parameter function ), which assigns the key figure to be estimated (variance, median, expected value, etc.) to each probability distribution . ${\ displaystyle P _ {\ vartheta}}$${\ displaystyle \ gamma (\ vartheta)}$

Then the estimator is called fair if ${\ displaystyle T}$

${\ displaystyle \ operatorname {E} _ {\ vartheta} (T) = \ gamma (\ vartheta) \ quad \ mathrm {f {\ ddot {u}} r \; all \;} \ vartheta \ in \ Theta}$is. Here denotes the expected value with regard to the probability measure .${\ displaystyle \ operatorname {E} _ {\ vartheta}}$${\ displaystyle P _ {\ vartheta}}$

In applications there is often the distribution of a (real or vector valued) random variable on a probability space with an unknown parameter or parameter vector . A point estimator for in the above sense then results in a function and this is called an unbiased estimator if the following applies ${\ displaystyle P _ {\ vartheta}}$${\ displaystyle X \ colon \ Omega \ to {\ mathcal {X}}}$ ${\ displaystyle (\ Omega, \ Sigma, Q)}$${\ displaystyle \ vartheta}$${\ displaystyle g \ colon {\ mathcal {X}} \ to \ mathbb {R}}$${\ displaystyle \ gamma (\ vartheta)}$${\ displaystyle T = g \ circ X}$

${\ displaystyle \ operatorname {E} (T) = \ operatorname {E} (g (X)) = \ gamma (\ vartheta),}$

where the expected value is now formed with respect to . ${\ displaystyle Q}$

## properties

### existence

Unexpected estimators generally do not have to exist. The choice of function is essential for this . If the function to be estimated is not chosen appropriately, the set of unbiased estimators can be small, have nonsensical properties or be empty. ${\ displaystyle g (\ vartheta)}$

In the binomial model

${\ displaystyle X = \ {0,1, \ dots, n \}, \; {\ mathcal {A}} = {\ mathcal {P}} (X), \; P _ {\ vartheta} = \ operatorname { Bin} _ {n, \ vartheta} \ quad \ mathrm {f {\ ddot {u}} r \; all \;} \ vartheta \ in [0,1]}$

are for example, only polynomials in less than or equal of degree n is unbiased estimated. For functions to be appreciated that are not of the form ${\ displaystyle \ vartheta}$

${\ displaystyle g (\ vartheta) = a_ {n} \ vartheta ^ {n} + a_ {n-1} \ vartheta ^ {n-1} + \ dots + a_ {1} \ vartheta + a_ {0}}$

So there is no unbiased estimator.

Even if an unbiased estimator exists, it does not have to be a practically meaningful estimator: for example in the Poisson model

${\ displaystyle X = \ mathbb {N}, \; {\ mathcal {A}} = {\ mathcal {P}} (\ mathbb {N}), \; P _ {\ vartheta} = \ operatorname {Poi} _ {\ vartheta} \ quad \ mathrm {f {\ ddot {u}} r \; all \;} \ vartheta \ in (0, \ infty)}$

and when using the function to be estimated

${\ displaystyle g (\ vartheta) = \ exp (-3 \ vartheta)}$

results as the only unbiased estimator

${\ displaystyle T (k) = (- 2) ^ {k} \ quad \ mathrm {f {\ ddot {u}} r} \; k \ in \ mathbb {N}}$.

Obviously, this estimator is pointless. It should be noted here that the choice of the function to be estimated is not exotic: It estimates the probability that no event will occur three times in a row (with independent repetition).

### structure

A fixed statistical model is given. Let be the set of unbiased estimators for the function to be estimated and the set of all zero estimators , so ${\ displaystyle D_ {g}}$${\ displaystyle g}$${\ displaystyle D_ {0}}$

${\ displaystyle D_ {0} = \ {T \, | \, \ operatorname {E} _ {\ vartheta} (T) = 0 \ quad \ mathrm {f {\ ddot {u}} r \; all \; } \ vartheta \ in \ Theta \}}$.

If you now select one , it is ${\ displaystyle T \ in D_ {g}}$

${\ displaystyle D_ {g} = T + D_ {0}}$.

The set of all unbiased estimators for thus arise from an unbiased estimator for in combination with the zero estimators. ${\ displaystyle g}$${\ displaystyle g}$

### Relationship to Distortion and MQF

Unexpected estimators, by definition, have a bias of zero:

${\ displaystyle \ operatorname {Bias} _ {\ vartheta} (T): = \ operatorname {E} _ {\ vartheta} (T) -g (\ vartheta) = 0 \ quad \ mathrm {f {\ ddot {u }} r \; all \;} \ vartheta \ in \ Theta}$.

This reduces the mean square error to the variance of the estimator:

${\ displaystyle \ operatorname {MQF} (T, \ vartheta): = \ operatorname {Var} _ {\ vartheta} (T) + \ left (\ operatorname {Bias} _ {\ vartheta} (T) \ right) ^ {2} = \ operatorname {Var} _ {\ vartheta} (T)}$.

### Optimality

Faithfulness to expectations is in itself a quality criterion, since unambiguous estimators always have a distortion of zero and thus on average provide the value to be estimated. So you have no systematic error. In the set of unbiased estimators, the central quality criterion for estimators, the mean square error , is reduced to the variance of the estimators. Accordingly, the two common optimality criteria compare the variances of point estimates.

• Locally minimal estimators compare the variances of point estimates for a given one . An estimator is then called a locally minimal estimator in if${\ displaystyle \ vartheta _ {0} \ in \ Theta}$${\ displaystyle S}$${\ displaystyle \ vartheta _ {0}}$
${\ displaystyle \ operatorname {Var} _ {\ vartheta _ {0}} (S) \ leq \ operatorname {Var} _ {\ vartheta _ {0}} (T)}$
holds for all further unbiased estimators .${\ displaystyle T}$
• Uniformly best unambiguous estimators tighten this requirement to the effect that one estimator should have a smaller variance for all than any other unambiguous estimator. It then applies${\ displaystyle S}$${\ displaystyle \ vartheta \ in \ Theta}$
${\ displaystyle \ operatorname {Var} _ {\ vartheta} (S) \ leq \ operatorname {Var} _ {\ vartheta} (T) \ quad \ mathrm {f {\ ddot {u}} r \; all \; } \ vartheta \ in \ Theta}$
and all unbiased estimators .${\ displaystyle T}$

### Expectation vs. mean square error

Unexpected estimators can be viewed as "good" in two ways:

• On the one hand, their distortion is always zero; accordingly they have the desirable property of not showing any systematic error.
• On the other hand, due to the decomposition of the mean squared error into distortion and variance, the mean squared error of an unbiased estimator is always automatically small, since the distortion does not apply.

However, it is not always possible to achieve both goals (unambiguousness and minimum square error) at the same time. Thus, the binomial model with a uniformly most powerful unbiased estimator given by ${\ displaystyle X = \ {0, \ dots, n \}, {\ mathcal {A}} = {\ mathcal {P}} (X), P _ {\ vartheta} = \ operatorname {Bin} _ {n, \ vartheta}}$${\ displaystyle \ vartheta \ in [0,1]}$

${\ displaystyle T_ {1} (x) = {\ frac {x} {n}}}$.

The appraiser

${\ displaystyle T_ {2} = {\ frac {x + 1} {n + 2}}}$

is not fair to expectations and therefore skewed, but has a smaller mean square error for values close to . ${\ displaystyle \ vartheta}$${\ displaystyle 0 {,} 5}$

So it is not always possible to minimize distortion and mean square error at the same time.

## Estimator with bias

This graph shows a violation of the undistortion and consistency properties. On average, the true value 3 is not estimated, but -2. This results in a distortion of 5:${\ displaystyle \ mathrm {Bias} ({\ hat {\ beta}} _ {1}) = \ operatorname {E} ({\ hat {\ beta}} _ {1}) - \ beta _ {1} = 3 - (- 2) = 5}$

It follows from the definition that “good” estimators are at least approximately faithful to expectations, that is, they should be characterized by the fact that they are on average close to the value to be estimated. Usually, however, expectancy is not the only important criterion for the quality of an estimator; for example, it should also have a small variance, i.e. fluctuate as little as possible around the value to be estimated. In summary, the classic criterion of a minimum mean square deviation results for optimal estimators.

The bias of an estimator is defined as the difference between its expected value and the quantity to be estimated: ${\ displaystyle \ mathrm {Bias} _ {\ vartheta} (T)}$${\ displaystyle T}$

${\ displaystyle \ mathrm {Bias} _ {\ vartheta} (T): = \ operatorname {E} _ {\ vartheta} (T) - \ gamma (\ vartheta) = \ operatorname {E} _ {\ vartheta} ( T- \ gamma (\ vartheta)).}$

Its mean square error is ${\ displaystyle \ mathrm {MSE} _ {\ vartheta} (T)}$

${\ displaystyle \ mathrm {MSE} _ {\ vartheta} (T): = \ operatorname {E} _ {\ vartheta} {\ bigl (} (T- \ gamma (\ vartheta)) ^ {2} {\ bigr )}.}$

The mean square error is equal to the sum of the square of the distortion and the variance of the estimator:

${\ displaystyle \ mathrm {MSE} _ {\ vartheta} (T) = {\ bigl (} \ mathrm {Bias} _ {\ vartheta} (T) {\ bigr)} ^ {2} + \ operatorname {Var} _ {\ vartheta} (T).}$

In practice, distortion can have two causes:

Random errors can be tolerable if they contribute to the estimator having a smaller minimum squared deviation than an undistorted one.

## Asymptotic expectancy

As a rule, it does not matter that an estimator is unbiased. Most of the results of mathematical statistics are only valid asymptotically , i.e. when the sample size increases to infinity. It is therefore usually sufficient if the limit value is true to expectations, ie the convergence statement applies to a sequence of estimators . ${\ displaystyle T_ {n}}$${\ displaystyle \ textstyle \ lim _ {n \ rightarrow \ infty} \ operatorname {E} _ {\ vartheta} (T_ {n}) = \ gamma (\ vartheta)}$

## Another example: sample variance in the normal distribution model

A typical example are estimators for the parameters of normal distributions . In this case, consider the parametric family

${\ displaystyle P _ {\ vartheta}, \; \ vartheta \ in \ Theta}$with and ,${\ displaystyle \ vartheta = (\ mu, \ sigma ^ {2})}$${\ displaystyle \ Theta = \ mathbb {R} \ times \ mathbb {R} ^ {+}}$

where is the normal distribution with expectation and variance . Usually observations are given that are stochastically independent and each have the distribution . ${\ displaystyle P _ {\ vartheta}}$${\ displaystyle \ mu}$${\ displaystyle \ sigma ^ {2}}$${\ displaystyle X_ {1}, \ dotsc, X_ {n}}$${\ displaystyle P _ {\ vartheta}}$

As already seen, the sample mean is an unbiased estimator of . ${\ displaystyle {\ overline {X}} _ {n}}$${\ displaystyle \ gamma _ {1} (\ vartheta) = \ mu}$

The maximum likelihood estimator for the variance is obtained . However, this estimator is not faithful to expectations, as it can be shown (see sample variance (estimator) #of expectations ). So the distortion is . Since this vanishes asymptotically, i.e. for , the estimator is, however, asymptotically true to expectations. ${\ displaystyle \ gamma _ {2} (\ vartheta) = \ sigma ^ {2}}$ ${\ displaystyle \ textstyle s_ {n} ^ {2} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} (X_ {i} - {\ overline {X}} _ {n}) ^ {2}}$${\ displaystyle \ textstyle \ operatorname {E} (s_ {n} ^ {2}) = {\ frac {n-1} {n}} \ sigma ^ {2}}$${\ displaystyle \ textstyle \ operatorname {E} (s_ {n} ^ {2}) - \ sigma ^ {2} = - {\ frac {1} {n}} \ sigma ^ {2}}$${\ displaystyle n \ rightarrow \ infty}$

In addition, in this case the expected value of the distortion can be specified precisely and the distortion can be corrected by multiplying by (so-called Bessel correction ), and thus an estimator for the variance is obtained that is true to expectations even for small samples. ${\ displaystyle {\ tfrac {n} {n-1}}}$

In general, however, it is not possible to exactly determine the expected distortion and thus to correct it completely. However, there are methods to at least reduce the distortion of an asymptotically unbiased estimator for finite samples, for example the so-called jackknife method .

## Constructive terms

An unbiased estimator is called a regular unbiased estimator if ${\ displaystyle T}$

${\ displaystyle {\ frac {\ partial} {\ partial \ vartheta}} \ int T (x) \ cdot f _ {\ vartheta} (x) \, dx = \ int T (x) \ cdot {\ frac {\ partial} {\ partial \ vartheta}} f _ {\ vartheta} (x) \, dx}$

applies. here denotes the density function for the parameter . Differentiation and integration should therefore be interchangeable. Regular unbiased estimators play an important role in the Cramér-Rao inequality . ${\ displaystyle f _ {\ vartheta}}$${\ displaystyle \ vartheta}$

## Generalizations

A generalization of the fidelity to expectations is the L-authenticity , it generalizes the fidelity to expectations by means of more general loss functions . When using the Gaussian loss , one obtains the faithfulness to expectations as a special case, when using the Laplace loss, the median authenticity is obtained .

## Individual evidence

1. Bernd Rönz, Hans G. Strohe (1994), Lexicon Statistics , Gabler Verlag, pp. 110, 363
2. Horst Rinne: Pocket book of statistics . 3. Edition. Verlag Harri Deutsch, 2003, p. 435 .
3. Kauermann, G. and Küchenhoff, H .: sample: Methods and Practical Implementation With R . Springer, 2011, ISBN 978-3-642-12318-4 , pp. 21 . Google Books
4. ^ Rüschendorf: Mathematical Statistics. 2014, p. 126.
5. ^ Georgii: Stochastics. 2009, p. 209.