Confidence interval
A confidence interval , short KI (also confidence interval , confidence level or expected range called) is in the statistics an interval that the precision of position estimation of a parameter (for. Example, a mean value is to be specified). The confidence interval specifies the range which, if a random experiment is repeated infinitely, includes the true position of the parameter with a certain probability (the confidence level).
A frequently used confidence level is 95%, so that in this case  if the random experiment is repeated in the same way  a 95% confidence interval will cover the unknown “true” parameter in approximately 95% of all cases.
The frequently used formulation that the true value lies with 95% probability in the confidence interval, i. that is, in the present calculated interval, is strictly speaking incorrect, since the true value is assumed to be given (fixed) and not to be stochastic .
On closer inspection, the upper and lower bounds of the confidence interval are stochastic because they contain random variables . Consequently, the correct formulation is: When calculating a confidence interval, its interval boundaries enclose the true parameter in 95% of cases and not in 5% of cases. The confidence interval is constructed in such a way that the true parameter is covered with the probability if the estimation procedure is repeated for many samples .
The estimation of parameters with the help of confidence intervals is called interval estimation , the corresponding estimator a range or interval estimator . One advantage over point estimators is that one can read the significance directly from a confidence interval : a wide interval for a given confidence level indicates a small sample size or a high variability in the population .
Confidence intervals are to be delimited by forecast intervals as well as confidence and forecast bands .
definition
Let be a random sample , an unknown parameter, and a confidence level . The confidence interval for the parameter is the interval with the two limits and , so that:
Explanations
Confidence level
A confidence interval ( ) is often used in statistics. The confidence level is also called the coverage probability .
Interval limits
For the statistics and is always accepted. The realizations and of and form the estimation interval . The boundaries of the confidence interval are functions of the random sample and are therefore also random. In contrast, the unknown parameter is fixed.
Confidence intervals are random
If the random experiment is repeated in an identical manner, then a confidence interval will cover the unknown parameter in all cases. Since the unknown parameter is not a random variable, however, it cannot be said that there is a confidence interval with probability . Such an interpretation is reserved for the Bayesian counterpart of the confidence interval, the socalled credibility intervals .
Often you bet . The probability can be interpreted as a relative frequency : If one uses intervals for a large number of confidence estimates that each have the same level , the relative frequency with which the specific intervals cover the parameter approaches the value .
Formal definition
Framework
A statistical model and a function are given
 ,
which in the parametric case is also called the parameter function. The set contains the values that can be the result of an estimate. Mostly is
Confidence Range
An illustration
is called a confidence range, confidence range, range estimator, or range estimator if it meets the following condition:
 For all the amount is in included. (M)
A confidence range is therefore a mapping that initially assigns an arbitrary subset of to each observation ( here is the power set of the set , i.e. the set of all subsets of )
The condition (M) ensures that a probability can be assigned to all sets . This is needed to define the confidence level.
Confidence interval
If and is always an interval for each , then a confidence interval is also called.
Are confidence intervals in the form
 ,
defined, this is also called the upper and lower confidence bounds.
Confidence level and level of error
A confidence range is given . Then a confidence range is called the confidence level or security level , if
 .
The value is then also called the level of error. A more general formulation is possible with form hypotheses (see form hypotheses # Confidence ranges for form hypotheses ).
For the abovementioned special cases with confidence ranges with upper and lower confidence bound, the result is
or.
and
Description of the procedure
One is interested in the unknown parameter of a population . This is estimated by an estimator from a sample of the size . It is assumed that the sample is a simple random sample that roughly reflects the population and that the estimate should therefore be close to the true parameter. The estimator is a random variable with a distribution that contains the parameter .
First of all, with the help of the distribution, you can specify an interval that covers the unknown true parameter with a probability . We determine e.g. For example, the 95% confidence interval for the true expected value of a population, then this means that we are determining a confidence interval that contains the expected value for an average of 95 out of 100 random samples of the same size.
example
The method can be demonstrated on the basis of a normally distributed feature with the unknown expected value and the known variance : The expected value of this normal distribution is to be estimated. The unbiased estimator is used: the sample mean .
The expected value of the population is estimated using our sample
 Estimator:
 Point estimate:
where the random variable stands for the ith observation (before the sample was drawn). The sample mean follows a normal distribution with expected value and variance (see sample mean # properties )
 .
The limits of the central fluctuation interval
 ,
that which is covered by the probability are determined by the relationship
 .
One standardized to the standard normal distribution and obtained for the standardized random variable
the probability
 ,
where the  is the quantile of the standard normal distribution . If one resolves for the unknown parameter , the result is
the confidence interval for
The estimation interval, the realization of a confidence interval based on a specific sample, then results as
The limits of the estimation interval, however, depend on and thus change from sample to sample. However, if the sample is extreme, the interval does not cover the parameter. This is the case in α × 100% of all samples, i.e. that is, that by a certain interval covers the true parameter with a probability of .
The width of the confidence interval is of particular interest. This is determined by the standard deviation of the estimator and the chosen confidence level. Increasing the sample size can decrease the width. As a rule, a confidence interval that is as narrow as possible is desirable, because this indicates an accurate estimate given a constant confidence level.
As an absolute mistake denotes half the width of the confidence interval. So in the above case
The absolute error is a measure of the accuracy of the estimate (width of the confidence interval:) .
The absolute error is important when determining the required sample size for a given confidence interval and length of confidence interval . The question is: What sample size do you need to estimate a parameter (e.g. arithmetic mean) with a given accuracy and a given degree of certainty?
Selected estimation intervals
Overview for continuous distributions
An overview of all cases with normally distributed characteristics can be found in the article Normal Distribution Model .
Expected value of a normally distributed characteristic with known variance : is the quantile of the standard normal distribution. 

Expected value of a normally distributed characteristic with unknown variance: The variance of the population is determined by the corrected sample variance estimated. is the quantile of the tdistribution with degrees of freedom . For , the quantile of the tdistribution can be replaced approximately by the corresponding quantile of the standard normal distribution. 

Expected value of an unknown distributed characteristic with unknown variance: If it is sufficiently large, the confidence interval can be determined on the basis of the central limit theorem . 

Standard deviation of a normally distributed characteristic:
is the pquantile of the chisquare distribution with degrees of freedom. 
Discrete distributions
Confidence intervals for the parameter p of the binomial distribution are described in the
The socalled ClopperPearson confidence interval can be determined with the help of the beta or F distribution . This confidence interval is also called exact because the required confidence level is actually maintained. In the case of approximation methods that are (mostly) based on the approximation of the binomial distribution by the normal distribution, the confidence level is often not maintained.
If the number of elements in the population is known, a confidence interval for an urn model without replacement can also be specified for the parameter (with the help of a correction factor) .
Confidence intervals and hypothesis tests
The terms confidence range and statistical test are dual to one another; under general conditions, statistical tests for corresponding point hypotheses can be obtained from a confidence range for a parameter and vice versa:
If the null hypothesis : is tested for a parameter , then the null hypothesis is not rejected at a significance level if the corresponding confidence interval, calculated with the same data, contains the value . Therefore, confidence intervals sometimes replace hypothesis tests.
For example, one tests in the regression analysis whether in the multiple linear regression model with the estimated regression hyperplane
the true regression coefficients are zero (see Global F test ). If the hypothesis is not rejected, the corresponding regressors are likely to be irrelevant in explaining the dependent variable . Corresponding information is provided by the confidence interval for a regression coefficient: If the confidence interval covers zero , the regression coefficient is not statistically different from at a significance level .
The terms of authenticity and the consistently best test can be transferred to confidence ranges.
Examples of a confidence interval
example 1
A company wants to introduce a new detergent. In order to sound out buyer acceptance, the detergent is placed in a test supermarket. This action is intended to estimate the average daily sales in a supermarket of this size. The daily sales are now defined as a random variable [piece] with the unknown parameters expected value and variance . Based on longterm observations, it is assumed that the distribution is approximately normal. The market research department has found a confidence level of 0.95 (95%) to be sufficient. Daily sales are then recorded for 16 days. It turns out
Day  1  2  3  4th  5  6th  7th  8th  9  10  11  12  13  14th  15th  16 

paragraph  110  112  106  90  96  118  108  114  107  90  85  84  113  105  90  104 
For a normally distributed population with unknown variance, the confidence interval for the expected value is given as
It is the mean of the sample
and the variance of the sample
It is the quantile of the tdistribution with 15 degrees of freedom
The value for t is not easy to calculate and must therefore be read from a table.
The 95% confidence interval is then calculated as
On average, 95% of the estimated intervals contain the true average , i.e. the average daily sales of detergent bottles in comparable supermarkets. For this specific interval, however, the statement that it contains the true mean with a 95% probability does not apply . All we know is that this interval comes from a set (of intervals) of which 95% contain the true mean.
Example 2
A company delivered a lot (batch) of 6000 pieces (e.g. screws) to the customer. This carries out an incoming inspection by means of random sampling in accordance with the international standard ISO 28591. Here z. B. 200 screws (depending on the selected AQL) randomly drawn across the entire lot and checked for compliance with the agreed requirements (quality features). Of the 200 screws tested, 10 did not meet the requirements. By calculating the confidence interval (Excel function BETAINV) , the customer can estimate how large the expected proportion of defective screws in the whole lot is: at a confidence level of 95%, the ClopperPearson confidence interval is calculated [2.4%, 9th %] for the proportion of defective screws in the batch (parameters: n = 200, k = 10 ).
See also
literature
 Ulrich Krengel : Introduction to probability theory and statistics. 8th edition. Vieweg, 2005.
 Joachim Hartung : Statistics. 14th edition. Oldenbourg, 2005.
Web links
 Confidence intervals and hypothesis tests
 Confidence intervals explained as simply as possible (PDF; 109 kB)
 Java applet for evaluating your own measurement series
 Interactive illustration
Individual evidence
 ↑ Significance Test Controversy (English)
 ↑ What is the Real Result in the Target Population? In: Statistics in Brief: Confidence Intervals . PMC 2947664 (free full text) (English)
 ^ Leonhard Held and Daniel Sabanés Bové: Applied Statistical Inference: Likelihood and Bayes. Springer Heidelberg New York Dordrecht London (2014). ISBN 9783642378867 , p. 56.
 ^ Leonhard Held and Daniel Sabanés Bové: Applied Statistical Inference: Likelihood and Bayes. Springer Heidelberg New York Dordrecht London (2014). ISBN 9783642378867 , p. 57.
 ^ Karl Mosler and Friedrich Schmid: Probability calculation and conclusive statistics. SpringerVerlag, 2011, p. 214.
 ↑ ^{a } ^{b } ^{c } ^{d } ^{e} HansOtto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 9783110215267 , p. 229 , doi : 10.1515 / 9783110215274 .
 ^ ^{A } ^{b} Ludger Rüschendorf: Mathematical Statistics . Springer Verlag, Berlin Heidelberg 2014, ISBN 9783642419966 , p. 230231 , doi : 10.1007 / 9783642419973 .
 ^ Ludger Rüschendorf: Mathematical Statistics . Springer Verlag, Berlin Heidelberg 2014, ISBN 9783642419966 , p. 245 , doi : 10.1007 / 9783642419973 .
 ↑ See for example chap. IV, Sections 3.1.1 and 3.2 for Hartung. The Wilson and ClopperPearson intervals as well as the correction factor for the hypergeometric distribution are discussed here.
 ↑ Acceptance sampling inspection based on the number of defective units or defects [attribute inspection ]  Part 1: Sampling plans for the inspection of a series of lots ordered according to the acceptable quality limit layer AQL