# Estimation theory

In addition to test theory, estimation theory is a central area of inductive statistics . On the one hand, it deals with the question of developing estimators for unknown parameters of a population . On the other hand, she would also like to make quality statements about the estimation functions developed.

## Basic modeling

The estimation theory is based on a statistical model . It contains ${\ displaystyle ({\ mathcal {X}}, {\ mathcal {A}}, (P _ {\ vartheta}) _ {\ vartheta \ in \ Theta})}$

• ${\ displaystyle {\ mathcal {X}}}$ all possible values ​​that the sample can assume,
• ${\ displaystyle {\ mathcal {A}}}$ all sets to which you want to assign a probability,
• ${\ displaystyle (P _ {\ vartheta}) _ {\ vartheta \ in \ Theta}}$all probability measures on which it deems relevant or possible.${\ displaystyle ({\ mathcal {X}}, {\ mathcal {A}})}$

Furthermore is a function

${\ displaystyle g \ colon \ Theta \ to E}$

which assigns the value to be estimated , for example a distribution parameter or a variable from which such a parameter can be calculated, to each probability measure based on its index . Usually it is the expected value , variance or median , then is . In case of a parametric statistical model this function is called Parameter Function${\ displaystyle P _ {\ vartheta}}$${\ displaystyle \ vartheta}$${\ displaystyle E = \ mathbb {R}}$

A point estimator or simply estimator is then a function

${\ displaystyle T \ colon ({\ mathcal {X}}, {\ mathcal {A}}) \ to (E, {\ mathcal {E}})}$

for a decision space . It assigns an estimated value for the value to be estimated to each sample . Here again is the most common or corresponding subsets or higher-dimensional equivalents. ${\ displaystyle (E, {\ mathcal {E}})}$${\ displaystyle x \ in {\ mathcal {X}}}$${\ displaystyle (E, {\ mathcal {E}}) = (\ mathbb {R}, {\ mathcal {B}} (\ mathbb {R}))}$

The underlying probability measure for this estimate is unknown. However, the samples are distributed according to this probability measure and therefore allow conclusions to be drawn about certain properties of the probability measure. ${\ displaystyle P _ {\ vartheta}}$

The distribution of the samples according to a probability measure is formalized by writing it as a realization of a random variable with distribution . This is the name of the random variable that arises when the sample itself is viewed as a random variable. Analog then denotes the evaluation of the realization of the random variable . is a function, an evaluation of this function at the point . ${\ displaystyle X}$${\ displaystyle P _ {\ vartheta}}$${\ displaystyle T (X)}$${\ displaystyle T (x)}$${\ displaystyle x}$${\ displaystyle X}$${\ displaystyle T (X)}$${\ displaystyle T (x)}$${\ displaystyle x}$

## Methods of obtaining estimators

One starts with sampling variables , ie random variable whose distribution indicates the probability that feature quantity (for discrete data) or which area of characteristic values (for continuous data) for the -th observation of a sample may occur. The searched population parameters appear in the distribution of the sample variables. ${\ displaystyle X_ {i}}$${\ displaystyle i}$

In the course of time, various methods for obtaining estimators have been developed, e.g. B.

The estimation functions and their distribution are then the basis of point estimates and interval estimates ( confidence intervals ).

## Quality criteria for appraisers

The quality of a point estimator is measured according to different criteria. There are two different classes of quality criteria:

1. Criteria that allow a direct comparison in the sense of better / worse between estimators.
2. Limitations on classes of estimators that have certain desirable structural properties.

The former include, for example, efficiency and the mean square error , and the latter include sufficiency .

The classic quality criteria of estimation theory are efficiency , faithfulness to expectations , consistency and sufficiency .

### Efficiency

The quality of an estimator is usually about its mean square error

${\ displaystyle \ operatorname {MSE} (T, \ vartheta): = \ operatorname {E} _ {\ vartheta} \ left (\ left (Tg (\ vartheta) \ right) ^ {2} \ right)}$

Are defined. Larger deviations from the function to be estimated are weighted more strongly by the square. An estimator is then called more efficient than if ${\ displaystyle T}$${\ displaystyle S}$

${\ displaystyle \ operatorname {MSE} (T, \ vartheta) \ leq \ operatorname {MSE} (S, \ vartheta) \ quad \ mathrm {f {\ ddot {u}} r \; all \;} \ vartheta \ in \ theta}$.

In the case true to expectations, this is reduced to

${\ displaystyle \ operatorname {Var} _ {\ vartheta} (T) \ leq \ operatorname {Var} _ {\ vartheta} (S) \ quad \ mathrm {f {\ ddot {u}} r \; all \; } \ vartheta \ in \ Theta}$.

The search is mostly for "absolutely" efficient estimators, ie those that are more efficient than any other estimator in a given set. Under relatively mild assumptions on an estimator, the Cramér-Rao inequality assures a lower bound for the variance of unbiased estimators for an estimation problem. Once you have found an estimator with this variance, there can be no more efficient estimator.

### Expectancy

An unbiased estimator always hits the value to be estimated “on average”, so it applies

${\ displaystyle \ operatorname {E} _ {\ vartheta} (T) = g (\ vartheta) \ quad \ mathrm {f {\ ddot {u}} r \; all \;} \ vartheta \ in \ Theta}$.

If an estimate is not true to expectations, it is called biased . A weakening of the fidelity to expectations is the asymptotic fidelity to expectations . With it, the expectancy only applies in the limit value. A generalization of the fidelity to expectations is the L-authenticity ; in addition to the fidelity to expectations, it also contains the median authenticity as a special case.

### consistency

The consistency is an asymptotic quality criterion and formalizes that for large samples the probability that the estimated value deviates from the value to be estimated should be very small. So it should apply

${\ displaystyle \ lim _ {n \ to \ infty} P (| T_ {n} -g (\ vartheta) |> \ varepsilon) = 0}$.

There are different versions of the concept of consistency, which differ in the types of convergence used .

### Sufficiency

The sufficiency formalizes that all information relevant to the estimate is taken into account. A distinction is made between sufficient statistics , which transmit all relevant data , and sufficient σ-algebras , which contain all relevant data. An exacerbation of sufficiency is minimal sufficiency , it deals with the question of how much data can be compressed without loss of information. Sufficiency gains its significance, among other things, through the theorem of Rao-Blackwell . This states that optimal estimators can always be found in the class of sufficient estimators.

## Central statements

The central statements of estimation theory include:

## Point estimation as a decision problem

Many optimality and reduction principles of estimation theory can be meaningfully classified in a statistical decision problem and compared with one another within the framework of decision theory .

As in estimation theory, the basis of the statistical decision problem is a statistical model and a decision space . Decision functions are then exactly the point estimators${\ displaystyle {\ mathcal {E}} = ({\ mathcal {X}}, {\ mathcal {A}}, (P _ {\ vartheta}) _ {\ vartheta \ in \ Theta})}$ ${\ displaystyle (E, {\ mathcal {E}})}$

${\ displaystyle S: ({\ mathcal {X}}, {\ mathcal {A}}) \ to (E, {\ mathcal {E}})}$.

Is now

${\ displaystyle g: \ Theta \ to E}$

a function to be estimated ( called a parameter function in the parametric case ),

so can be different loss functions

${\ displaystyle L: \ Theta \ times E \ to [0, + \ infty]}$

define. Typical loss functions are

• the Gaussian loss ${\ displaystyle L_ {2} (\ vartheta, e): = \ Vert eg (\ vartheta) \ Vert ^ {2}}$
• the Laplace loss ${\ displaystyle L_ {1} (\ vartheta, e): = \ Vert eg (\ vartheta) \ Vert}$
• a restriction to convex loss functions .${\ displaystyle L (eg (\ vartheta))}$

The risk function associated with the Gaussian loss is then the mean square error , and the risk function associated with the Laplace loss is the mean absolute error . The statistical model, the function to be estimated, the decision space and the loss function are then combined to form an estimation problem .

Typical reduction criteria are:

• Sufficiency: The Rao-Blackwell theorem now provides that for all convex loss functions (and thus also for the Laplace and Gaussian loss) the condition for sufficient estimators is always associated with a uniform reduction in the risk and thus justifies the restriction of the search of Elements of minimal risk to sufficient estimators.
• L-unadulteratedness : The restriction to L-unadulterated estimators is pragmatically motivated. These show no systematic error. Special cases are true to expectation (Gaussian loss) and median authenticity (Laplace loss). The risk of an estimator is then reduced to its variance for the fidelity to expectations.

For example, the admissible decision functions with regard to the Gaussian loss in the set of unbiased estimators are precisely the consistently best unbiased estimators, and one estimator is relatively more efficient than another estimator if its risk is always smaller than that of the second estimator.