# Estimation method (statistics)

Estimation methods (including estimation procedures ) are used in mathematical statistics . They are used to construct estimators for unknown parameters of a statistical population .

Three classic estimation methods are among others

With all three methods, outliers have a very strong influence on the result. The moment method can also be understood as a special case of the substitution principle. Methods based on quantiles are also frequently used and are often more robust (e.g. the median can often replace the mean as an estimator).

The various procedures are partly competing, partly also complementary.

## Maximum likelihood method

The observations are viewed here as sample realizations of n , usually stochastically independent, random variables with a known distribution type. The parameters of the distribution depend on the parameter sought and the estimated value of the parameter results as the value that the observed sample would produce with the greatest plausibility. ${\ displaystyle x_ {i}}$${\ displaystyle X_ {i}}$

The advantage of the maximum likelihood method (method of greatest plausibility) lies in the properties of the estimator. For one thing , it is often consistent (that is, the more observations you have in the sample, the more accurately you can estimate the parameter you are looking for) and asymptotically efficient (that is, there is no better estimator for large sample sizes). One can formulate significance tests for model comparisons in a very general way.

A major disadvantage is that one has to know the type of distribution of the sample variables. If you make a mistake here, the estimator can deliver completely wrong values. Furthermore, a numerical maximization must usually be carried out to find the parameter, which may end up in a local instead of a global maximum.

However, since the advantages outweigh the advantages, the maximum likelihood method is likely to be the most widely used estimation method. In the case of a normal distribution, the results using the moment method and the maximum likelihood method are almost identical; the moment method yields a somewhat smaller systematic error with regard to the standard deviation. With the maximum likelihood method, these errors are generally not negligible if the sample size is small.

## Least squares method

Here, too, the observations are viewed as realizations of random variables . Here, the expected value depends directly or through a known function on the parameter sought and on a disturbance variable . Therefore, one determines the desired parameter in such a way that the sum of the squared disturbance variables is as small as possible. ${\ displaystyle x_ {i}}$${\ displaystyle n}$ ${\ displaystyle X_ {i}}$${\ displaystyle \ operatorname {E} (X_ {i})}$

The classic example is simple linear regression : the regression line with the parameters and is superimposed by a disturbance variable. So one observes . The following applies to the random variable : and . Now one calculates the sum of the squared disturbance variables and minimizes it in order to find estimated values ​​for and . The goodness of fit of the estimation can with the coefficient of determination are quantified. ${\ displaystyle y = \ beta _ {0} + \ beta _ {1} x}$ ${\ displaystyle \ beta _ {0}}$${\ displaystyle \ beta _ {1}}$${\ displaystyle (x_ {i}, y_ {i} = \ beta _ {0} + \ beta _ {1} x + \ varepsilon _ {i})}$${\ displaystyle Y_ {i}}$${\ displaystyle \ operatorname {E} (Y_ {i}) = \ beta _ {0} + \ beta _ {1} x_ {i}}$${\ displaystyle \ operatorname {Var} (Y_ {i}) = \ sigma _ {\ varepsilon} ^ {2}}$${\ displaystyle \ sum \ nolimits _ {i = 1} ^ {n} \ left (y_ {i} - (\ beta _ {0} + \ beta _ {1} x_ {i}) \ right) ^ {2 }}$${\ displaystyle \ beta _ {0}}$${\ displaystyle \ beta _ {1}}$

The advantage of the least squares method is that no assumption has to be made about the distribution type, but only about the relationship between the expected value and the unknown parameter. This estimation method can therefore be used in a broader range of problems.

However, the advantage is also a disadvantage. Since only information about the expected value is used and not about the distribution as in the case of the maximum likelihood method, the estimation functions do not have as good properties as the estimation functions from the maximum likelihood method. If the expected value does not depend linearly on the parameter, numerical approximation methods must also generally be used to determine the minimum with this method.

### example

In a new game, you can lose € 1.00 with probability, win € 1.00 with probability and neither lose nor win money with probability . The game is now played six times with the result: EUR −1, EUR 1, EUR −1, EUR 0, EUR 1, EUR 1. What is the value of ? ${\ displaystyle p}$${\ displaystyle 1-2p}$${\ displaystyle p}$${\ displaystyle p}$

#### Maximum likelihood method

According to the maximum likelihood method, the probability for the observed sample results as

 ${\ displaystyle P}$ ${\ displaystyle (X_ {1} = - 1, X_ {2} = 1, X_ {3} = - 1, X_ {4} = 0, X_ {5} = 1, X_ {6} = 1)}$ ${\ displaystyle = p \ cdot (1-2p) \ cdot p \ cdot p \ cdot (1-2p) \ cdot (1-2p)}$ ${\ displaystyle = p ^ {3} \ cdot (1-2p) ^ {3}}$.

The maximization then gives an estimate . ${\ displaystyle p_ {ML} = 1/4}$

#### Least squares method

For the least squares method one needs the expected value , i. H. on average one expects EUR profit per game. For each observation one calculates the squared error between the observed profit and the expected profit per game and adds them up: ${\ displaystyle \ operatorname {E} (X_ {i}) = - 1 \ cdot p + 0 \ cdot p + 1 \ cdot (1-2p) = 1-3p}$${\ displaystyle 1-3p}$

 ${\ displaystyle Q (p)}$ ${\ displaystyle =}$ ${\ displaystyle (-1- (1-3p)) ^ {2} + (1- (1-3p)) ^ {2} + (- 1- (1-3p)) ^ {2}}$ ${\ displaystyle + (0- (1-3p)) ^ {2} + (1- (1-3p)) ^ {2} + (1- (1-3p)) ^ {2}}$ ${\ displaystyle =}$ ${\ displaystyle 9-30p + 54p ^ {2}}$

The minimization then gives an estimate . ${\ displaystyle p_ {KQ} = 5/18}$

## Minimum Chi Square Method

The minimum chi-square method is related to the least squares method. However, it is assumed that the random variables are discrete (this also includes classified data). Finding the minimum of the squared errors becomes difficult because the minimization algorithm has to deal with points of discontinuity . Instead, one considers the random variables , the frequency with which the characteristic expression (or class) occurs. ${\ displaystyle X_ {i}}$${\ displaystyle H_ {j}}$${\ displaystyle x_ {j}}$

If one can combine the expected frequencies with the searched parameters, one minimizes the test statistics of the chi-square goodness-of-fit test in order to find estimated values ​​for the searched parameters.

### example

Six sentences were randomly selected in a book and the number of subordinate clauses they contained was counted. It was found that three sentences contained no subordinate clause, two sentences contained a subordinate clause and only one sentence contained more than one subordinate clause. Assuming that the subordinate clauses are Poisson distributed , the question arises as to how large is the mean number of subordinate clauses per sentence. ${\ displaystyle \ lambda}$

#### Maximum likelihood method

According to the maximum likelihood method, the probability for the observed sample results as

 ${\ displaystyle P}$ ${\ displaystyle (X_ {1} = 0, X_ {2} = 0, X_ {3} = 0, X_ {4} = 1, X_ {5} = 1, X_ {6}> 1)}$ ${\ displaystyle = e ^ {- \ lambda} \ cdot e ^ {- \ lambda} \ cdot e ^ {- \ lambda} \ cdot \ lambda e ^ {- \ lambda} \ cdot \ lambda e ^ {- \ lambda } \ cdot \ left (1-e ^ {- \ lambda} - \ lambda e ^ {- \ lambda} \ right)}$ ${\ displaystyle = \ lambda ^ {2} e ^ {- 5 \ lambda} \ left (1-e ^ {- \ lambda} - \ lambda e ^ {- \ lambda} \ right)}$.

The maximization then gives an estimate . ${\ displaystyle \ lambda _ {ML} = 0 {,} 8372}$

#### Minimum Chi Square Method

For the minimum chi-square method you need the expected frequencies: , and${\ displaystyle H_ {0} (\ lambda) = ne ^ {- \ lambda}}$${\ displaystyle H_ {1} (\ lambda) = n \ lambda e ^ {- \ lambda}}$${\ displaystyle H _ {> 1} (\ lambda) = n \ left (1-e ^ {- \ lambda} - \ lambda e ^ {- \ lambda} \ right)}$

${\ displaystyle \ chi ^ {2} (\ lambda) = {\ frac {(3-H_ {0} (\ lambda)) ^ {2}} {H_ {0} (\ lambda)}} + {\ frac {(2-H_ {1} (\ lambda)) ^ {2}} {H_ {1} (\ lambda)}} + {\ frac {(1-H _ {> 1} (\ lambda)) ^ {2 }} {H _ {> 1} (\ lambda)}}}$

The minimization then gives an estimate . ${\ displaystyle p_ {CQ} = 0 {,} 8369}$

## Moment method

The observations are viewed here as sample realizations of n , usually stochastically independent, random variables with a known distribution type. The moments of the respective distribution depend on the distribution parameters, which in turn contain the parameter sought, and equations are obtained between the parameters sought and the moments. The moments can in turn be estimated from the observation data ( ) and a system of equations is obtained that can be solved for the parameters sought. The solution is then an estimate of the desired parameter. ${\ displaystyle x_ {i}}$${\ displaystyle X_ {i}}$ ${\ displaystyle \ operatorname {E} (X_ {i} ^ {r})}$${\ displaystyle {\ tfrac {1} {n}} \ sum _ {i = 1} x_ {i} ^ {r}}$

The advantage of the moment method is that it is easy to calculate, even if a numerical iteration method has to be used to solve a possibly non-linear system of equations. However, it can also be used when the sample variables are not independent. In such a case, estimating with a maximum likelihood method can become very complicated. ${\ displaystyle X_ {i}}$

The simple calculability is also the disadvantage, since not all information from the sample is used. This can lead to estimates that are outside the parameter space for small samples (e.g. negative values ​​for estimated variances). The estimation functions from the moment method are mostly inefficient, i. H. for a given sample size there are better estimators. For example, the moment estimator is less efficient with a uniform distribution than that according to the maximum likelihood method.

Sometimes the moment method is used for complex problems in order to obtain starting values ​​for the parameters in the maximum likelihood method.

### example

The wages of employees are Pareto-distributed in the interval ( be the minimum wage). A sample of three employees was observed who earned 1.2, 1.5 and 1.8 times the minimum wage, respectively. We are looking for the parameter ; because the larger the lower the probability of a high wage: . ${\ displaystyle {\ mathcal {P}} (k; 1)}$${\ displaystyle [1; \ infty)}$${\ displaystyle 1}$${\ displaystyle k}$${\ displaystyle k}$${\ displaystyle P (X> x) = {\ tfrac {1} {x ^ {k}}}}$

#### Maximum likelihood method

According to the maximum likelihood method, the likelihood function for the observed sample results as

 ${\ displaystyle L (1 {,} 2; 1 {,} 5; 1 {,} 8)}$ ${\ displaystyle = k \ left ({\ frac {1} {1 {,} 2}} \ right) ^ {k} \ cdot k \ left ({\ frac {1} {1 {,} 5}} \ right) ^ {k} \ cdot k \ left ({\ frac {1} {1 {,} 8}} \ right) ^ {k}}$ ${\ displaystyle = k ^ {3} \ left ({\ frac {1} {1 {,} 2 \ cdot 1 {,} 5 \ cdot 1 {,} 8}} \ right) ^ {k}}$

The maximization then gives an estimate , i.e. H. the probability of earning more than double the minimum wage in this model is just under 1.7%. ${\ displaystyle k_ {ML} = 5 {,} 88}$

#### Moment method

For the Pareto distribution results (if ). The expected value is estimated using the arithmetic mean , i.e. H. it applies ${\ displaystyle \ operatorname {E} (X_ {i}) = {\ tfrac {k} {k-1}}}$${\ displaystyle k> 1}$

${\ displaystyle {\ frac {1} {3}} (1 {,} 2 + 1 {,} 5 + 1 {,} 8) = {\ frac {k} {k-1}}}$.

Solving the equation then gives an estimate ; H. the probability of earning more than double the minimum wage in this model is 12.5%. ${\ displaystyle k_ {MM} = 3}$