Estimator

This item has been on the quality assurance side of the portal mathematics entered. This is done in order to bring the quality of the mathematics articles to an acceptable level .

Please help fix the shortcomings in this article and please join the discussion ! ( Enter article )

An estimation function , also known as estimation statistics or estimators for short , is used in mathematical statistics to determine an estimated value on the basis of existing empirical data of a sample and thereby to obtain information about unknown parameters of a population . Estimators are the basis for calculating point estimates and for determining confidence intervals using range estimates, and are used as test statistics in hypothesis tests. They are special sampling functions and can be calculated using estimation methods , e.g. B. the least squares estimate , the maximum likelihood estimate or the moment method can be determined.

In the context of decision theory , estimation functions can also be viewed as decision functions for decisions under uncertainty .

Formal definition

Be it

{\ displaystyle T_ {n} = h (X_ {1: n})}

,

a real-valued statistic based on a random sample from a probability density function distribution , where is an unknown scalar parameter. If the random variable is calculated in order to perform statistical inference , it is called an estimator . If the sample size is not relevant, one also writes instead . The concrete value that an estimator assumes for a realization of the random sample is called the estimate . ${\ displaystyle X_ {1: n}}$ ${\ displaystyle f (x; \ theta)}$ ${\ displaystyle \ theta}$ ${\ displaystyle T_ {n}}$ ${\ displaystyle \ theta}$ ${\ displaystyle n}$ ${\ displaystyle T}$ ${\ displaystyle T_ {n}}$ ${\ displaystyle t = h (x_ {1: n})}$ ${\ displaystyle x_ {1: n}}$ ${\ displaystyle X_ {1: n}}$

Basic concepts: sample variables and functions

As a rule, the experimenter is in the situation that he would like to make statements about the underlying distribution or its parameters in the population based on a finite number of observations (a sample).

Only in rare cases can the entire population be surveyed in full (total or full survey) so that it then provides exactly the information required. An example of a full survey is the unemployment statistics of the official statistics .

In most cases, however, the entire population cannot be surveyed, e.g. B. because it is too big. If you are interested in the mean size of the 18-year-olds in the EU, you would have to measure all 18-year-olds, which is practically impracticable. Instead, only a sample , a random selection of elements, is collected (partial survey). ${\ displaystyle n}$

Sample variable

This is where statistical modeling comes in. The sample variable , a random variable , describes with its distribution the probability with which a certain characteristic value occurs in the -th drawing from the population . Each observation value is the realization of a sample variable . ${\ displaystyle X_ {i}}$ ${\ displaystyle i}$ ${\ displaystyle x_ {i}}$ ${\ displaystyle X_ {i}}$

Sampling function

The definition of sample variables allows the definition of sample functions analogous to e.g. B. on characteristic values from descriptive statistics: ${\ displaystyle X_ {i}}$

Arithmetic mean	Sampling function
${\ displaystyle {\ overline {x}}: = {\ frac {1} {n}} (x_ {1} + x_ {2} + \ ldots + x_ {n})}$	${\ displaystyle {\ overline {X}}: = {\ frac {1} {n}} (X_ {1} + X_ {2} + \ ldots + X_ {n})}$

Since every sample turns out differently due to the randomness, these sample functions are also random variables whose distribution is from

the type of drawing of the sample from the population and
the distribution of the characteristic in the population

depends.

Sample distribution

Sample distribution is the distribution of a sample function over all possible samples from the population. The sampling function is typically an estimate of an unknown population parameter or a test statistic for a hypothesis about an unknown population parameter. Therefore, instead of the sample distribution, one speaks simply of the distribution of an estimator or test statistic. The distribution of the sample function is used to obtain information about unknown parameters in the population based on a sample. ${\ displaystyle g (X_ {1}, \ dotsc, X_ {n})}$ ${\ displaystyle g}$

The sample distribution is a frequentist concept, the Bayesian counterpart is the a posteriori distribution .

Calculation of the sample distribution

The sample distribution for a sample function with a certain sample size from a finite population can always be calculated (see the following examples), but in general one is more likely to use general formulas with e.g. B. interested in indefinite sample size . The following statements are important tools: ${\ displaystyle n}$

Reproductivity of the normal distribution : If the sample variables are independent of one another and have a normal distribution ( ), then there is also a normal distribution ( ).

{\ displaystyle X_ {i}}

{\ displaystyle X_ {i} \ sim {\ mathcal {N}} (\ mu; \ sigma ^ {2})}

{\ displaystyle {\ overline {X}} = (X_ {1} + \ dotsb + X_ {n}) / n}

{\ displaystyle {\ overline {X}} \ sim {\ mathcal {N}} (\ mu; \ sigma ^ {2} / n)}

Central limit theorem : If the sample variables are independent of one another and if the expected values exist for them and , for large, the distribution is approximately normal ( ). ${\ displaystyle \ X_ {i}}$ ${\ displaystyle \ \ operatorname {E} (X_ {i}) = \ mu}$ ${\ displaystyle \ \ operatorname {E} ((X_ {i} - \ mu) ^ {2} {\ bigr)} = \ operatorname {Var} (X_ {i}) = \ sigma ^ {2}}$ ${\ displaystyle {\ overline {X}} = (X_ {1} + \ dotsb + X_ {n}) / n}$ ${\ displaystyle n}$ ${\ displaystyle {\ overline {X}} \ approx {\ mathcal {N}} \ left (\ mu; \ sigma ^ {2} / n \ right)}$

Bootstrap sample distributions

If a sufficiently large sample is representative of the population, the sample distribution for any sample function can be estimated non-parametrically using the bootstrap method , without the need to know the distribution of the . However, it must generally be shown mathematically that the bootstrap sample distributions converge to the true sample distribution as the number of bootstrap samples increases. ${\ displaystyle X_ {i}}$

Examples

example 1

Consider an urn with seven balls labeled 10, 11, 11, 12, 12, 12 and 16. If you draw two balls with replacement , the following table shows all possible samples from the population:

	10	11	11	12	12	12	16
10	10; 10	10; 11	10; 11	10; 12	10; 12	10; 12	10; 16
11	11; 10	11; 11	11; 11	11; 12	11; 12	11; 12	11; 16
11	11; 10	11; 11	11; 11	11; 12	11; 12	11; 12	11; 16
12	12; 10	12; 11	12; 11	12; 12	12; 12	12; 12	12; 16
12	12; 10	12; 11	12; 11	12; 12	12; 12	12; 12	12; 16
12	12; 10	12; 11	12; 11	12; 12	12; 12	12; 12	12; 16
16	16; 10	16; 11	16; 11	16; 12	16; 12	16; 12	16; 16

Each of the possible samples occurs with a probability of . If one now calculates the sample mean value from the two spheres, the result is: ${\ displaystyle 1/49}$ ${\ displaystyle {\ overline {X}} = (X_ {1} + X_ {2}) / 2}$

${\ displaystyle {\ overline {X}}}$	10	11	11	12	12	12	16
10	10.0	10.5	10.5	11.0	11.0	11.0	13.0
11	10.5	11.0	11.0	11.5	11.5	11.5	13.5
11	10.5	11.0	11.0	11.5	11.5	11.5	13.5
12	11.0	11.5	11.5	12.0	12.0	12.0	14.0
12	11.0	11.5	11.5	12.0	12.0	12.0	14.0
12	11.0	11.5	11.5	12.0	12.0	12.0	14.0
16	13.0	13.5	13.5	14.0	14.0	14.0	16.0

If one summarizes the results of according to the probability of occurrence of the sample, one obtains the sample distribution of : ${\ displaystyle {\ overline {X}}}$ ${\ displaystyle {\ overline {X}}}$

${\ displaystyle x}$	10.0	10.5	11.0	11.5	12.0	13.0	13.5	14.0	16.0
${\ displaystyle P ({\ overline {X}} = x)}$	1/49	4/49	10/49	12/49	9/49	2/49	4/49	6/49	1/49

If you change the type of drawing from a drawing with replacement to a drawing without replacement, the result is a different distribution for . In the tables above, the main diagonal is omitted so that there are only possible samples. Therefore the following distribution results for : ${\ displaystyle {\ overline {X}}}$ ${\ displaystyle 42}$ ${\ displaystyle {\ overline {X}}}$

${\ displaystyle x}$	10.0	10.5	11.0	11.5	12.0	13.0	13.5	14.0	16.0
${\ displaystyle P ({\ overline {X}} = x)}$	0	4/42	8/42	12/42	6/42	2/42	4/42	6/42	0

Example 2

There are five red and four blue balls in an urn. Three balls are drawn from this urn without replacing. Defining the sampling function : the number of red balls among the three drawn is hypergeometrically distributed with the number of red balls in the urn, the total number of balls in the urn and the number of attempts. All information about the distribution of can be obtained here, because both the stochastic model (drawing from an urn) and the associated parameters (number of red and blue balls) are known. ${\ displaystyle X}$ ${\ displaystyle X}$ ${\ displaystyle M = 5}$ ${\ displaystyle N = 9}$ ${\ displaystyle n = 3}$ ${\ displaystyle X}$

Example 3

A wholesale grocery store receives a shipment of 2,000 glasses of plum compote. The kernels left in the fruit are problematic. The customer tolerates a proportion of glasses with cores of 5%. With this delivery, he would like to make sure that this quota is not exceeded. However, a complete survey of the population of 2000 glasses is not feasible, because checking 2000 glasses is too time-consuming and, moreover, opening a glass destroys the goods.

However, one could choose a small number of glasses at random, i.e. take a random sample, and count the number of glasses that are objectionable. If this number exceeds a certain limit, the critical value of the test variable, it is assumed that there are too many objectionable glasses in the delivery.

A possible sampling function is where a random variable denotes that only takes the values 1 (glass contains plums with a stone) or 0 (glass contains no plums with a stone). ${\ displaystyle \ pi = {\ frac {1} {n}} (X_ {1} + X_ {2} + \ ldots + X_ {n})}$ ${\ displaystyle X_ {i}}$

If the random variables are Bernoulli-distributed , then the distribution is approximately normal due to the central limit theorem . ${\ displaystyle X_ {i}}$ ${\ displaystyle \ pi}$

Estimators

Basic idea and concept of the estimator

Estimators are special sampling functions used to determine parameters or population distributions. Estimation functions are influenced by, among other things

the type of drawing of the sample (e.g. drawing with or without replacement) and
the type of estimation method (e.g. least squares method , maximum likelihood method, or moment method ).

Ultimately, one would like to try to use only the knowledge of the underlying model and the observed sample to specify intervals that are most likely to contain the true parameter. Alternatively, if there is a certain probability of error, you want to test whether a special assumption about the parameter (for example that too many glasses contain cores) can be confirmed. In this sense, estimation functions form the basis for every well-founded decision about the characteristics of the population; the best possible choice of such functions is the result of the mathematical investigation.

If you make a decision on this basis, e.g. B. If the delivery is back, there is a possibility that the decision is wrong. There are the following sources of error:

The sample is not representative of the population; i.e., it does not reflect the population.
The model for the random variables is wrong. ${\ displaystyle X_ {i}}$
The random sample could have turned out to be atypical so that the delivery is wrongly rejected.

However, in practice there is usually no alternative to statistical methods of this type. The problems mentioned above can be countered in various ways:

One tries to draw a simple random sample as possible .

On the one hand, the models for the random variables are chosen as large as possible (so that the "correct" model is included) and, on the other hand, the estimation function is chosen so that its distribution can be calculated for many models (see central limit theorem ). ${\ displaystyle X_ {i}}$

Based on the estimator, a probability of error is given.

Formal definition of the estimator

The basis of every estimator are the observations of a statistical feature . In terms of model theory, this feature is idealized: It is assumed that the observations are actually realizations of random variables whose “true” distribution and “true” distribution parameters are unknown. ${\ displaystyle x_ {i}}$ ${\ displaystyle X}$ ${\ displaystyle X_ {i}}$

In order to obtain information about the actual properties of the characteristic, a sample of elements is taken. With the help of these sample elements, one then estimates the parameters or distribution sought (see kernel density estimation ). ${\ displaystyle n}$

In order to estimate a parameter of an unknown distribution , for example, one has to do formally with a random sample of the size , so realizations ( ) of the random variables are observed. The random variables are then combined in a suitable estimation function using an estimation method . Formally, it is assumed that there is a measurable function . ${\ displaystyle \ gamma}$ ${\ displaystyle n}$ ${\ displaystyle n}$ ${\ displaystyle x_ {i}}$ ${\ displaystyle i = 1, \ dotsc, n}$ ${\ displaystyle X_ {i}}$ ${\ displaystyle X_ {i}}$ ${\ displaystyle g (X_ {1}, X_ {2}, \ dotsc, X_ {n})}$ ${\ displaystyle g}$

To simplify the calculation of the estimator, it is often assumed that the random variables are independently and identically distributed , i.e. have the same distribution and the same distribution parameters. ${\ displaystyle X_ {i}}$

Selected estimation functions

In statistical practice, the following population parameters are often searched for:

the mean and ${\ displaystyle \ mu}$
the variance of a metric feature as well as ${\ displaystyle \ sigma ^ {2}}$
the proportional value of a dichotomous population. ${\ displaystyle \ pi}$

Estimators and estimates for the mean

The expected value is usually estimated using the arithmetic mean of the sample: ${\ displaystyle \ mu}$

Estimator	Estimate
${\ displaystyle {\ overline {X}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} X_ {i}}$	${\ displaystyle {\ hat {\ mu}} = {\ overline {x}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} x_ {i}}$

If the distribution is symmetrical , the median of the sample can also be used as an estimate for the expected value:

Estimator	Estimate
${\ displaystyle Z = X _ {\ left \ lfloor {\ frac {n + 1} {2}} \ right \ rfloor}}$	${\ displaystyle {\ hat {\ mu}} = z = x _ {\ left \ lfloor {\ frac {n + 1} {2}} \ right \ rfloor}}$

where the lower Gaussian bracket denotes. The median is therefore the value of the random variable that is "in the middle" after the data has been sorted. So there are numerically as many values above as below the median. ${\ displaystyle \ left \ lfloor \; \ right \ rfloor}$

Which estimator is better in the case of symmetric distributions depends on the distribution family under consideration.

Estimator and estimate of the variance

For the population variance, the corrected sample variance is usually used as the estimator : ${\ displaystyle \ sigma ^ {2}}$

Estimator	Estimate
${\ displaystyle S_ {n} ^ {2} = {\ frac {1} {n-1}} \ sum _ {i = 1} ^ {n} (X_ {i} - {\ overline {X}}) ^ {2}}$	${\ displaystyle {\ hat {\ sigma}} ^ {2} = s_ {n} ^ {2} = {\ frac {1} {n-1}} \ sum _ {i = 1} ^ {n} ( x_ {i} - {\ overline {x}}) ^ {2}}$

Typical other pre-factors are also and . All these estimators are asymptotically equivalent, but are used differently depending on the type of sample (see also Sample Variance (Estimation Function) ). ${\ displaystyle {\ frac {1} {n}}}$ ${\ displaystyle {\ frac {1} {n + 1}}}$

Estimator functions and estimated value for the proportional value

One looks at the urn model with two types of balls. The proportional value of the balls of the first kind in the population is to be estimated. The proportion of balls of the first kind in the sample is used as an estimation function,

Estimator	Estimate
${\ displaystyle \ Pi = {\ frac {X} {n}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} X_ {i}}$	${\ displaystyle {\ hat {\ pi}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} x_ {i}}$

with : number of balls of the first kind in the sample and a binary random variable: ball of the first kind in the -th drawing drawn ( ) or not drawn ( ). ${\ displaystyle X}$ ${\ displaystyle X_ {i}}$ ${\ displaystyle i}$ ${\ displaystyle X_ {i} = 1}$ ${\ displaystyle X_ {i} = 0}$

The distribution of is a binomial distribution in the model with replacement and a hypergeometric distribution in the model without replacement. ${\ displaystyle \ Pi}$

Distribution of the estimation functions

The distribution of the estimator functions naturally depends on the distribution of the characteristic in the population .

Let be independent and identically normally distributed random variables with expected value and variance . The estimator ( sample mean ) as a linear transformation of the then has the distribution ${\ displaystyle X_ {1}, X_ {2}, \ dots, X_ {n}}$ ${\ displaystyle \ mu}$ ${\ displaystyle \ sigma ^ {2}}$ ${\ displaystyle {\ overline {X}}}$ ${\ displaystyle X_ {i}}$

{\ displaystyle {\ overline {X}} \ sim {\ mathcal {N}} \ left (\ mu; {\ frac {\ sigma ^ {2}} {n}} \ right)}

.

The variance estimator contains a sum of squares of centered normally distributed random variables. That's why the expression ${\ displaystyle S_ {n} ^ {2}}$ ${\ displaystyle {\ overline {X}}}$

{\ displaystyle {\ frac {(n-1) S_ {n} ^ {2}} {\ sigma ^ {2}}} \ sim \ chi ^ {2} (n-1)}

Chi-square distributed with degrees of freedom . ${\ displaystyle (n-1)}$

If the distribution of the characteristic is unknown, the distribution of the estimator can be given approximately with the normal distribution or one of its derived distributions if the requirement of the central limit theorem is present.

Quality criteria of estimation functions

Probability densities for the consistent estimators ( ). As the sample size increases, the unknown parameter is estimated more and more accurately.

{\ displaystyle g_ {n}}

{\ displaystyle n = 100,178,400}

{\ displaystyle \ gamma}

Expectancy

An unbiased estimator is on average (expected value) equal to the true parameter : ${\ displaystyle \ gamma}$

{\ displaystyle \ \ operatorname {E} (g_ {n}) = \ gamma}

.

Differs , however, systematically from off the estimator is distorted ( English biased ). The bias of an estimator is calculated as follows ${\ displaystyle \ \ operatorname {E} (g_ {n})}$ ${\ displaystyle \ gamma}$ ${\ displaystyle \ operatorname {Bias} (g_ {n})}$

{\ displaystyle \ operatorname {Bias} (g_ {n}) = \ operatorname {E} (g_ {n}) - \ gamma = \ operatorname {E} (g_ {n} - \ gamma)}

.

For a merely asymptotically unbiased estimator, on the other hand, only the following must apply

{\ displaystyle \ lim _ {n \ to \ infty} \ operatorname {E} (g_ {n}) = \ gamma}

consistency

An estimator is said to be consistent if for each ( infinitesimal number ): ${\ displaystyle \ varepsilon> 0}$

{\ displaystyle \ lim _ {n \ to \ infty} P (| g_ {n} - \ gamma |> \ varepsilon) = 0}

.

with . One speaks here of stochastic convergence . ${\ displaystyle g_ {n} = g (X_ {1}, X_ {2}, \ dotsc, X_ {n})}$

The graphic on the right illustrates the process: For each , the filled areas have to become smaller and smaller as the sample size increases. ${\ displaystyle \ varepsilon> 0}$

In simple words: A consistent estimator approaches the true parameter as it grows (estimates the true parameter more and more precisely). ${\ displaystyle n}$ ${\ displaystyle \ gamma}$

Consistent estimation functions must therefore be at least asymptotically true to expectation (see above).

This property is fundamental to all inductive statistics ; it guarantees that increasing the sample size enables more accurate estimates, smaller confidence intervals, or smaller ranges of assumptions to be made in hypothesis tests. ${\ displaystyle H_ {0}}$

Minimal variance, efficiency

The estimator should have the smallest possible variance . The estimator from all unbiased estimators that has the smallest variance is called the efficient , best or most effective estimator: ${\ displaystyle g_ {n} ^ {*}}$ ${\ displaystyle g_ {n}}$

{\ displaystyle \ operatorname {Var} (g_ {n} ^ {*}) \ leq \ min _ {g_ {n}} \ operatorname {Var} (g_ {n})}

.

Under certain conditions, the Cramér-Rao inequality can also give a lower limit for . That is, for an estimator it can be shown that there cannot be any more efficient estimator; at most just as efficient estimation functions. ${\ displaystyle \ operatorname {Var} (g_ {n})}$

Mean square error

The accuracy of an estimation function or of an estimator is often represented by its mean square error ( English mean squared error expressed). An estimate function (not necessarily also true to expectation) should therefore always have the smallest possible mean squared error, which can be mathematically determined as the expected value of the squared deviation of the estimator from the true parameter : ${\ displaystyle g_ {n}}$ ${\ displaystyle \ gamma}$

{\ displaystyle \ mathrm {MSE} (g_ {n}) = \ operatorname {E} {\ bigl [} (g_ {n} - \ gamma) ^ {2} {\ bigr]} = {\ bigl (} \ operatorname {E} [g_ {n} - \ gamma] {\ bigr)} ^ {2} + \ operatorname {E} {\ bigl [} (g_ {n} -E (g)) ^ {2} {\ bigr]} = (\ operatorname {Bias} (g_ {n})) ^ {2} + \ operatorname {Var} (g_ {n})}

As can be seen, the mean squared error of an unbiased estimator is the sum of its variance and the square of the bias; for unbiased estimators, on the other hand, the variance and MSE are equal.

Web links

Volker Schmidt: Methods of Statistics from the Lecture Notes Stochastics for Computer Scientists, Physicists, Chemists and Economists

Wikibooks: Statistics - Learning and teaching materials

Individual evidence

^ Leonhard Held and Daniel Sabanés Bové: Applied Statistical Inference: Likelihood and Bayes. Springer Heidelberg New York Dordrecht London (2014). ISBN 978-3-642-37886-7 , p. 52.

[1] Leonhard Held and Daniel Sabanés Bové: Applied Statistical Inference: Likelihood and Bayes. Springer Heidelberg New York Dordrecht London (2014). ISBN 978-3-642-37886-7 , p. 52.