The empirical variance , and sample variance (obsolete: empirical scattering square ) or just short of variance ( Latin variantia = "diversity" or variare called = "(ver) change, be different"), a statistical indication of the spread of values of a Sample and, in descriptive statistics, a key figure of a sample. It is one of the measures of dispersion and describes the mean square deviation of the individual measured values from the empirical mean . It thus represents a kind of average square deviation . The positive root of the empirical variance is the empirical standard deviation . The empirical standard deviation is the most common measure of dispersion.
The terms "variance", "sample variance" and "empirical variance" are not used consistently in the literature. In general, a distinction must be made between the
- Variance (in the sense of probability theory) as a key figure of a probability distribution or the distribution of a random variable
- Sample variance (in terms of inductive statistics) as an estimate function for the variance (in terms of probability theory)
- the empirical variance discussed here as a key figure of a specific sample, i.e. several numbers.
A precise delimitation and relationships can be found in the section Relationship of the Variance Terms .
The variance of a finite population of size is a measure of the dispersion of the individual values around the population mean and is defined as
- with the population mean .
Since it is unknown in practical situations and still has to be calculated, the empirical variance is often used. This is especially necessary when in large populations it is not possible to count every individual subject in the population.
Given a sample of elements . It denotes
the empirical mean of the sample. This empirical mean is an estimate of the population mean . The empirical variance can be defined in two ways. Either the empirical variance of the sample is defined as the sum of the squared deviations divided by the number of measured values:
or it is defined as a slightly modified form as the sum of the squared deviations divided by the number of degrees of freedom
The empirical variance thus represents a kind of "mean square deviation". It is an estimator for the population variance . The representations follow directly from the definition
- respectively .
This slightly modified form is often referred to as sample variance and is used by program packages such as B. SPSS , R etc. are preferred. If the sample shows no variability, i. H. , then there is a variance of . The averaging can be explained intuitively by instead of by the modified form of the empirical variance as follows: Due to the focus property of the empirical mean , the last deviation is already determined by the first . Consequently, only deviations vary freely and one therefore averages by dividing by the number of degrees of freedom .
If only “the” empirical variance is spoken of, then one must pay attention to which convention or definition applies in the corresponding context. Neither the naming of the definitions nor the corresponding notation is uniform in the literature, but the term empirical variance is often used for the unmodified form and the term sample variance for the modified form . There are also notation for , but it is also referred to as or . Some authors refer to as a root mean square deviation from the empirical mean and as a theoretical variance or inductive variance in contrast to the empirical variance.
is as unbiased and sample variance (and as a distorted sample variance called) because an unbiased estimator for the variance is.
Empirical variance for frequency data
The empirical standard deviation is also a measure of how far the sample spreads on average around the empirical mean. Be the absolute frequency of occurrences and the number of values for the true, that is . Let further be the relative frequency of , i. H. the proportion of values for which applies. The absolute frequency distribution and the relative frequency distribution are often summarized in a frequency table. The characteristics together with the frequencies or are also referred to as frequency data . For frequency data with the characteristics and relative frequencies , the empirical variance is calculated as follows
Behavior during transformations
The variance does not change when the data is shifted by a constant value c, so and , so is
- as well .
If scaled by a factor , that is , then applies
- as well .
As the average square of the deviation
The variance in the variance analysis often as "medium" or "average" deviation squared referred
The mean squares of deviation of the respective variables are summarized in a so-called variance analysis table.
Representation by means of displacement block
Another representation can be obtained from the displacement theorem , according to which
applies. By multiplying with one gets from this
Representation without empirical means
Another representation that does not require the use of the empirical mean is
If you put the arithmetic mean of the observation values in the summands of the double sum
adds and subtracts (i.e. inserts zero), then applies
This is equivalent to
Empirical standard deviation
The empirical standard deviation, also known as the sample spread or sample standard deviation , denotes the positive square root of the empirical variance, i.e.
In contrast to the empirical variance, the empirical standard deviation has the same units as the empirical mean or the sample itself. As with the empirical variance, the naming and designation of the empirical standard deviation is not uniform. The empirical standard deviation should be distinguished from the standard deviation in the sense of probability theory . This is an index of a probability distribution or the distribution of a random variable , whereas the empirical standard deviation is an index of a sample.
Empirical coefficient of variation
The empirical coefficient of variation is a dimensionless measure of dispersion and is defined as the empirical standard deviation divided by the empirical mean, i.e.
In contrast to the standard deviation, there is a dimensionless spread and therefore not subject to units. Its advantage is that it is expressed as a percentage of the empirical mean .
The sample is given
so it is . For the empirical mean value results
In the case of a piece-by-piece calculation, this then results
The first definition gives you
whereas the second definition
supplies. The standard deviation can also be calculated using the variance example above. This is done by simply pulling roots. If one determines the uncorrected sample variance, then (according to the 1st definition)
However, if the empirical standard deviation is determined via the corrected sample variance, then (according to the 2nd definition)
Origin of the various definitions
The definition of corresponds to the definition of the empirical variance as the root mean square deviation from the empirical mean. This is based on the idea of defining a degree of dispersion around the empirical mean. Be it . A first approach is to add up the difference between the measured values and the empirical mean. this leads to
However, this always results in 0 ( center of gravity property ), so it is not suitable for quantifying the variance. In order to get a value for the variance greater than or equal to 0, the differences can either be put in terms of amount , i.e. the sum of the absolute deviations
form. This has the advantage that larger deviations from the empirical mean are weighted more heavily. In order to make the degree of dispersion independent of the number of measured values in the sample, it is divided by this number. In addition, squaring offers the advantage that identical positive and negative elements of the sum cannot cancel each other out and are therefore taken into account in the calculation. The result of this pragmatically derived measure of dispersion is the mean square deviation from the empirical mean or the variance defined above .
The definition of has its roots in estimation theory . There will
used as an unbiased estimator for the unknown variance of a probability distribution . This is true because of the following theorem: If random variables are independently and identically distributed with and , then applies . Hence, there is an estimator for the unknown population variance .
If one now moves from the random variables to the realizations , the estimated value is obtained from the abstract estimation function . The ratio of to thus corresponds to the ratio of a function to its function value at one point .
Thus it can be seen as a practically motivated measure of dispersion in descriptive statistics, whereas it is an estimate for an unknown variance in inductive statistics. These different origins justify the above-mentioned manner of speaking for as empirical variance and for as inductive variance or theoretical variance. It should be noted that it can also be interpreted as an estimated value of an estimator. When using the moment method , one obtains as an estimate function for the variance
Relationship of the concepts of variance
As already mentioned in the introduction, there are different variance terms, some of which have the same name. Their relationship to one another becomes clear when one considers their role in modeling inductive statistics:
- The variance (in the sense of probability theory) is a measure of dispersion of an abstract probability distribution or the distribution of a random variable in stochastics.
- The sample variance (in the sense of inductive statistics) is an estimation function for estimating the variance (in the sense of probability theory) of an unknown probability distribution. It is therefore not a key figure, but an estimation method in order to guess as well as possible the variance of an unknown probability distribution.
- The empirical variance discussed here is, in addition to its role in descriptive statistics, a concrete estimate of the underlying variance according to the estimation method, which is given by the sample variance (in the sense of inductive statistics).
The key is the difference between the estimation method (sample variance in the sense of inductive statistics) and its concrete estimation (empirical variance). It corresponds to the difference between a function and its function value.
In financial market theory , variances or volatilities of returns are often calculated. These variances, if based on daily data, must be annualized; H. can be extrapolated to one year. This is done using an annualization factor (there are around trading days per year ). The volatility can thus be estimated as the root of the annualized variance
- Norbert Henze: Stochastics for beginners . An introduction to the fascinating world of chance. 10th edition. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-658-03076-6 , p. 31 , doi : 10.1007 / 978-3-658-03077-3 .
- Ehrhard Behrends: Elementary Stochastics . A learning book - co-developed by students. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-8348-1939-0 , pp. 274 , doi : 10.1007 / 978-3-8348-2331-1 .
- Thomas Cleff: Descriptive Statistics and Exploratory Data Analysis . A computer-aided introduction with Excel, SPSS and STATA. 3rd, revised and expanded edition. Springer Gabler, Wiesbaden 2015, ISBN 978-3-8349-4747-5 , p. 56 , doi : 10.1007 / 978-3-8349-4748-2 .
- Ludwig Fahrmeir, Rita artist, Iris Pigeot, Gerhard Tutz: Statistics. The way to data analysis. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2016, ISBN 978-3-662-50371-3 , p. 65
- Helge Toutenburg, Christian Heumann: Descriptive statistics . 6th edition. Springer-Verlag, Berlin / Heidelberg 2008, ISBN 978-3-540-77787-8 , pp. 75 , doi : 10.1007 / 978-3-540-77788-5 .
- Thomas Cleff: Descriptive Statistics and Exploratory Data Analysis . A computer-aided introduction with Excel, SPSS and STATA. 3rd, revised and expanded edition. Springer Gabler, Wiesbaden 2015, ISBN 978-3-8349-4747-5 , p. 255 , doi : 10.1007 / 978-3-8349-4748-2 .
- Chapter 10: Unexpected Estimators (PDF file), www.alt.mathematik.uni-mainz.de, accessed on December 31, 2018
- Ludwig Fahrmeir , Rita artist, Iris Pigeot , Gerhard Tutz : Statistics. The way to data analysis. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2016, ISBN 978-3-662-50371-3 , p. 65.
It is and thus
- from which the claim follows.
- This follows as above through direct recalculation.
- Werner Timischl : Applied Statistics. An introduction for biologists and medical professionals. 2013, 3rd edition, p. 109.
- Lothar Sachs : Statistical Evaluation Methods , p. 400.
- Reinhold Kosfeld, Hans Friedrich Eckey, Matthias Türck: Descriptive statistics . Basics - methods - examples - tasks. 6th edition. Springer Gabler, Wiesbaden 2016, ISBN 978-3-658-13639-0 , p. 122 , doi : 10.1007 / 978-3-658-13640-6 .
- Norbert Henze: Stochastics for beginners . An introduction to the fascinating world of chance. 10th edition. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-658-03076-6 , p. 31-32 , doi : 10.1007 / 978-3-658-03077-3 .
- Ehrhard Behrends: Elementary Stochastics . A learning book - co-developed by students. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-8348-1939-0 , pp. 274-275 , doi : 10.1007 / 978-3-8348-2331-1 .
- Werner Timischl: Applied Statistics. An introduction for biologists and medical professionals. 2013, 3rd edition, p. 109.
- Norbert Henze: Stochastics for beginners . An introduction to the fascinating world of chance. 10th edition. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-658-03076-6 , p. 33 , doi : 10.1007 / 978-3-658-03077-3 .
- Otfried Beyer, Horst Hackel: Probability calculation and mathematical statistics. 1976, p. 123.