# Mean square deviation

Two estimation functions: The choice of a distorted statistic can be advantageous with regard to its expected deviation from the true value compared to an unbiased statistic .

The root mean square deviation , also expected square error or mean square error called and MQA , MSE , or MSE (after the English name English mean squared error ) abbreviated, is a concept of mathematical statistics . In estimation theory, it indicates how much a point estimator scatters around the value to be estimated. This makes it a central quality criterion for appraisers . In regression analysis it is interpreted as the expected square distance that an estimator has from the true value .

## definition

A statistical model and a point estimator are given${\ displaystyle ({\ mathcal {X}}, {\ mathcal {A}}, (P _ {\ vartheta}) _ {\ vartheta \ in \ Theta})}$

${\ displaystyle T: ({\ mathcal {X}}, {\ mathcal {A}}) \ to (\ mathbb {R}, {\ mathcal {B}} (\ mathbb {R}))}$

for a function to be estimated (in the parametric case the parameter function )

${\ displaystyle g \ colon \ Theta \ to \ mathbb {R}}$

Then is called

${\ displaystyle \ operatorname {MSE} (T, \ vartheta): = \ operatorname {E} _ {\ vartheta} \ left (\ left (Tg (\ vartheta) \ right) ^ {2} \ right)}$

is the root mean square deviation of . It denotes the expected value with regard to the probability measure . The equivalent representation follows by means of the shift law of the variance${\ displaystyle T}$${\ displaystyle \ operatorname {E} _ {\ vartheta}}$ ${\ displaystyle P _ {\ vartheta}}$

${\ displaystyle \ operatorname {MSE} (T, \ vartheta) = \ operatorname {Var} _ {\ vartheta} (T) + \ left (\ mathbb {B} _ {T} (\ vartheta) \ right) ^ { 2}}$.

Here the bias of the estimator is called the bias. ${\ displaystyle \ mathbb {B} _ {T} (\ vartheta)}$

For estimators who take values ​​in a general decision space that is provided with a norm , the mean square deviation can be defined as ${\ displaystyle \ | \ cdot \ |}$

${\ displaystyle \ operatorname {MSE} (T, \ vartheta): = \ operatorname {E} _ {\ vartheta} \ left (\ | Tg (\ vartheta) \ | ^ {2} \ right)}$.

## interpretation

In the classic case, a low mean square deviation means that the bias and variance of the estimator are both small. With the estimator you are on average close to the functional to be estimated (lower distortion) and at the same time you know that the estimated values ​​have little scatter (low variance) and are very likely to be close to their expected value .

With the MSE it is therefore possible to compare estimation methods with one another. The idea is that it can be beneficial to prefer a slightly skewed estimator that has a much smaller variance. The estimation procedure with the smaller MSE is generally considered to be the better one.

The problem is that the MSE depends on the unknown population parameter to be estimated.

## example

A typical case is the estimation of the mean of a normal distribution. We assume that random variables exist, each of which is normally distributed with an unknown expected value and variance 1. The classic estimator is the sample mean . Here the distortion is zero: ${\ displaystyle X_ {1}, \ ldots, X_ {n} \;}$${\ displaystyle \ gamma}$ ${\ displaystyle {\ overline {X}} _ {n}}$

${\ displaystyle \ operatorname {Bias} ({\ overline {X}} _ {n}) = 0}$,

since the empirical mean is unbiased for . Since it is normally distributed with expected value and variance , it follows ${\ displaystyle \ gamma}$${\ displaystyle {\ overline {X}} _ {n}}$${\ displaystyle \ gamma}$${\ displaystyle {\ tfrac {1} {n}}}$

${\ displaystyle \ operatorname {MSE} ({\ overline {X}} _ {n}) = {\ frac {1} {n}}.}$

## Root mean square consistency

An estimated statistic is called consistent in the root mean square if applies to ${\ displaystyle n \ to \ infty}$

${\ displaystyle \ operatorname {MSE} \ rightarrow 0}$

## Effectiveness of Estimation Statistics

Let two estimation statistics and . The estimation statistic is called MSE effective if ${\ displaystyle T_ {1}}$${\ displaystyle T_ {2}}$${\ displaystyle T_ {1}}$

${\ displaystyle \ operatorname {MSE} (T_ {1}) \ leq \ operatorname {MSE} (T_ {2})}$

holds for all admissible distributions. Furthermore, an estimation statistic is called the most effective MSE if its MSE is always the smallest for all admissible distributions.

## Classification and related concepts

If one interprets the estimation theory as a statistical decision problem , then every point estimator is a decision function . The deviation of the decision function from the value to be estimated is then weighted by a loss function . This indicates how great the "damage" is caused by an estimate. The loss function is then combined with the decision function to form the risk function , which indicates the mean damage when using a certain decision function.

In this context, the root mean square deviation is the risk function that is obtained when using the Gaussian loss function

${\ displaystyle L (\ cdot, \ vartheta) = (\ cdot -g (\ vartheta)) ^ {2}}$

arises. The risk function is then obtained by forming the expected value.

With an analog construction using the Laplace loss , the mean error is obtained

${\ displaystyle \ operatorname {E} _ {\ vartheta} \ left (\ left | Tg (\ vartheta) \ right | \ right)}$.