# Normal approximation

The normal approximation is a method of probability calculus to approximate the binomial distribution for large samples using the normal distribution . This is an application of the Moivre-Laplace theorem and thus also an application of the Central Limit Theorem .

## formulation

According to Moivre-Laplace's theorem,

${\ displaystyle \ lim _ {n \ to \ infty} \ left (\ operatorname {P} (S_ {n} \ leq x) - \ Phi \ left ({\ frac {x-np} {\ sqrt {np ( 1-p)}}} \ right) \ right) = 0}$,

if is a binomial random variable and the probability distribution function of the standard normal distribution . If you now set and , then applies ${\ displaystyle S_ {n}}$ ${\ displaystyle \ Phi (z)}$${\ displaystyle \ mu = np}$${\ displaystyle \ sigma = {\ sqrt {np (1-p)}}}$

{\ displaystyle {\ begin {aligned} P (x_ {1} \ leq S_ {n} \ leq x_ {2}) & = \ underbrace {\ sum _ {k = x_ {1}} ^ {x_ {2} } {\ binom {n} {k}} \ cdot p ^ {k} \ cdot (1-p) ^ {nk}} _ {\ text {binomial distribution}} \\ & \ approx \ underbrace {\ Phi \ left ({\ frac {x_ {2} +0 {,} 5- \ mu} {\ sigma}} \ right) - \ Phi \ left ({\ frac {x_ {1} -0 {,} 5- \ mu } {\ sigma}} \ right)} _ {\ text {normal distribution}}. \ end {aligned}}}

Adding and subtracting 0.5 (the value is thus de facto the upper limit of the -th interval ) is also known as "continuity correction" and thus provides a better approximation for the transition from the discrete to the continuous calculation. ${\ displaystyle x_ {1} -0 {,} 5}$${\ displaystyle x_ {1} -1}$${\ displaystyle (x_ {1} -1) +0 {,} 5}$

According to Berry-Esseen's theorem, the smaller the term, the better the approximation

${\ displaystyle {\ frac {p (1-p) ^ {3} + (1-p) p ^ {3}} {(p (1-p)) ^ {3/2}}} \ cdot {\ frac {1} {\ sqrt {n}}}}$

is. It is small exactly when is large. The approximation is considered sufficiently good if applies. If this is not the case, then at least and should apply. The more asymmetrical the binomial distribution, i. H. the greater the difference between and , the greater should be. The Poisson approximation is more suitable for approximation for close to 0 . For close to 1 both approximations are bad, but then instead be considered, d. H. in the binomial distribution, successes and failures are swapped. is again binomially distributed with parameters and and can therefore be approximated with the Poisson approximation. ${\ displaystyle \ sigma ^ {2}}$${\ displaystyle \ sigma ^ {2} = np (1-p) \ geq 9}$${\ displaystyle np \ geq 5}$${\ displaystyle n (1-p) \ geq 5}$${\ displaystyle p}$${\ displaystyle 1-p}$${\ displaystyle n}$${\ displaystyle p}$${\ displaystyle p}$${\ displaystyle S_ {n} '= n-S_ {n}}$${\ displaystyle S_ {n}}$${\ displaystyle S_ {n} '}$${\ displaystyle n}$${\ displaystyle 1-p}$

## example

A fair dice is thrown 1000 times. You are now interested in the probability that the six will be rolled between 100 and 150 times.

### Exact solution

For modeling, you define the probability space with the result set , the number of sixes rolled. The σ-algebra is then canonically the power set of the result set and the probability distribution the binomial distribution , where is and . It is then ${\ displaystyle (\ Omega, \ Sigma, P)}$ ${\ displaystyle \ Omega: = \ {0, \ dotsc, 1000 \}}$${\ displaystyle \ Sigma: = {\ mathcal {P}} (\ Omega)}$ ${\ displaystyle P (\ {k \}): = B_ {n, p} (\ {k \})}$${\ displaystyle n = 1000}$${\ displaystyle p = 1/6}$

${\ displaystyle P (100 \ leq S_ {n} \ leq 150) = \ sum _ {i = 100} ^ {150} {\ binom {1000} {i}} p ^ {i} (1-p) ^ {1000-i} \ approx 0 {,} 0837}$

With a probability of around 8.4%, the six will be rolled between 100 and 150 times.

### Approximate solution

It is , so the approximated solution is sufficiently accurate. Hence applies ${\ displaystyle \ sigma ^ {2} = np (1-p) \ approx 138 {,} 9> 9}$

${\ displaystyle P (100 \ leq S_ {n} \ leq 150) \ approx \ Phi \ left ({\ frac {150 + 0 {,} 5-1000 / 6} {\ sigma}} \ right) - \ Phi \ left ({\ frac {100-0 {,} 5-1000 / 6} {\ sigma}} \ right)}$
${\ displaystyle \ approx \ Phi (-1 {,} 3718) - \ Phi (-5 {,} 6993) \ approx 1- \ Phi (1 {,} 3718) \ approx 0 {,} 0851}$

The values ​​of are usually given in a table because there is no explicit antiderivative. Nevertheless, the approximated solution is numerically more favorable, since no extensive calculations of the binomial coefficients have to be carried out. ${\ displaystyle \ Phi (z)}$

## Individual evidence

1. ^ Michael Sachs: Probability calculation and statistics for engineering students at technical colleges. Fachbuchverlag Leipzig, Munich 2003, ISBN 3-446-22202-2 , pp. 129-130
2. Christian Hassold, Sven Knoth, Detlef Steuer; Collection of formulas Statistics I & II. Descriptive statistics - Probability calculation - Closing statistics ; Hamburg 2010, p. 25 ( Memento from February 9, 2016 in the Internet Archive ), last accessed February 9, 2016.
3. K.Zirkelbach, W. Schmid; Annotated collection of formulas Statistics I and II. Descriptive statistics - probability calculation ; Frankfurt (Oder) 2008, p. 29.
4. ^