# Location parameters (descriptive statistics)

As a location parameter or central tendency is referred to in the descriptive statistics some indicators of a sample that involve a central tendency of the data set expressed. In the simplest case, they indicate where the center of the sample is, i.e. in which area a large part of the sample is located. Typical examples of location parameters are mean income and mean income in income surveys.

## definition

Some authors require the so-called displacement equivariance from position parameters . Is a location parameter and is ${\ displaystyle L (x)}$ ${\ displaystyle y = (x_ {1} + a, x_ {2} + a, \ dots, x_ {n} + a)}$ a data record shifted by the value , so should ${\ displaystyle a}$ ${\ displaystyle L (y) = a + L (x)}$ be valid. A shift of the data by a certain value always results in a shift of the position parameter by this value. Not all parameters, which are commonly referred to as location parameters, meet this condition. For this reason, location parameters are usually described as key figures that express a central tendency of the data set.

## Important location parameters

### mode

The mode or modal value of a sample is defined as the value that occurs most frequently in the sample. If several values ​​occur with the same frequency, they are all referred to as a mode, so the mode is not unique. One then speaks of multimodal distributions . The mode exists for random samples because, unlike the other measures of location, it can be defined if only one nominal scale is given. ${\ displaystyle D}$ ### Median

The median, denoted by , or , is the value that divides the sample into two halves: ${\ displaystyle {\ tilde {x}}}$ ${\ displaystyle {\ tilde {x_ {0 {,} 5}}}}$ ${\ displaystyle x_ {med}}$ • One half smaller than the median
• Half greater than the median

To do this, the sample is first sorted according to the size of the values. The resulting data set is then referred to as. Thus, the value is the largest of the initial sample. The median is then defined as ${\ displaystyle (x_ {1}, x_ {2}, \ dots, x_ {n})}$ ${\ displaystyle (x _ {(1)}, x _ {(2)}, \ dots, x _ {(n)})}$ ${\ displaystyle x _ {(k)}}$ ${\ displaystyle k}$ ${\ displaystyle {\ tilde {x}} = {\ begin {cases} x _ {({\ frac {n + 1} {2}})} & {\ text {if}} n {\ text {odd}} \\ {\ frac {1} {2}} \ left (x _ {({\ frac {n} {2}})} + x _ {({\ frac {n} {2}} + 1)} \ right ) & {\ text {if}} n {\ text {even.}} \ end {cases}}}$ ### Arithmetic mean

The arithmetic mean, also called empirical mean or simply mean value and designated with , is the sum of the characteristic values ​​in the sample, divided by the size of the sample (in this case, multiple occurring characteristic values ​​are also to be added several times). So it is ${\ displaystyle {\ overline {x}}}$ ${\ displaystyle {\ overline {x}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} x_ {i}}$ after aggregation and according to the existence of the frequencies

${\ displaystyle {\ overline {x}} = {\ frac {1} {n}} \ sum _ {j = 1} ^ {m} a_ {j} F_ {j}}$ be used.
(Where n denotes the size of the sample, i the index over all feature carriers, j the index over the set of possible features (result space) with thickness m and F the absolute frequency).

## Examples and characteristics

It becomes the sample

${\ displaystyle x = (10,1,3,1,9,8,9)}$ considered.

The values , and are only included once in the sample, the values and twice. No value is accepted three times. That’s the two modes ${\ displaystyle 10}$ ${\ displaystyle 3}$ ${\ displaystyle 8}$ ${\ displaystyle 1}$ ${\ displaystyle 9}$ ${\ displaystyle D_ {1} = 1}$ and ${\ displaystyle D_ {2} = 9}$ To determine the median, the sample is sorted according to size and thus obtained

${\ displaystyle s = (1,1,3,8,9,9,10)}$ It's odd, so by definition ${\ displaystyle n = 7}$ ${\ displaystyle {\ tilde {x}} = x _ {((7 + 1) / 2)} = x _ {(4)} = 8}$ .

The arithmetic mean is

${\ displaystyle {\ overline {x}} = {\ frac {1} {7}} \ left (10 + 1 + 3 + 1 + 9 + 8 + 9 \ right) = {\ frac {1} {7} } \ cdot 41 \ approx 5 {,} 9}$ ### existence

The advantage of the mode is that it always exists. This also applies to random samples such as

${\ displaystyle ({\ text {Zebra}}, {\ text {Elephant}}, {\ text {Giraffe}}, {\ text {Zebra}})}$ yet to determine the mode to zebra. Determining the median does not make sense here because there is no clearly defined order. Determining the arithmetic mean would be even more nonsensical, since it is unclear what is meant by. ${\ displaystyle {\ text {Zebra}} + {\ text {Giraffe}}}$ In situations in which there is an order structure, the median is also defined. In such situations, too, the arithmetic mean is generally not defined, since the existence of greater / less relations does not mean that adding is possible.

### Uniqueness

As shown in the example above, the mode is generally ambiguous. In contrast to this, the median is unambiguous, but there are slightly different definitions in the literature that arise from different pragmatic considerations. Therefore, if different definitions are used, the median can also assume different values.

### robustness

In contrast to the arithmetic mean, the median is robust. This means that if the sample changes, it changes to a few values ​​- e.g. B. individual outliers - only slightly changed. For example, consider the sample given above

${\ displaystyle x = (10,1,3,1,9,8,9)}$ ,

as is already shown and . If you now look at the sample ${\ displaystyle x_ {med} = 8}$ ${\ displaystyle {\ overline {x}} = {\ frac {41} {7}} \ approx 5 {,} 9}$ ${\ displaystyle x = (10,1,3,1,9,8,1000)}$ ,

in which only one value was changed, after a new calculation this still results for the median , whereas for the arithmetic mean . The outlier is therefore very noticeable in the arithmetic mean, while it does not change the median. ${\ displaystyle x_ {med} = 8}$ ${\ displaystyle {\ overline {x}} = {\ frac {1032} {7}} \ approx 147}$ ## Further dimensions

### Quartiles and quantiles

The so-called (p-) quantiles are closely related to the median. A -quantile is defined as the number so that a portion of , i.e. , the sample is smaller than the -quantile and a portion of , i.e. , the sample is greater than the -quantile. Thus the median is exactly the quantile. ${\ displaystyle p}$ ${\ displaystyle p}$ ${\ displaystyle p \ cdot 100 \, \%}$ ${\ displaystyle p}$ ${\ displaystyle 1-p}$ ${\ displaystyle (1-p) \ cdot 100 \, \%}$ ${\ displaystyle p}$ ${\ displaystyle {\ tfrac {1} {2}}}$ Some p-quantiles for special p-values have proper names, including the tercile, the quartile, the quintile, the decile and the percentile.

### Trimmed mean

The trimmed mean is created when a certain proportion of the largest and smallest values ​​is omitted from a data set and the arithmetic mean is formed from the remaining data.

### Geometric mean

The geometric mean is also one of the location parameters. It is defined as the -th root of the product of the sample elements , that is ${\ displaystyle n}$ ${\ displaystyle x _ {\ text {geom}} = {\ sqrt [{n}] {x_ {1} \ cdot x_ {2} \ dotsm x_ {n}}}}$ for a sample . ${\ displaystyle (x_ {1}, x_ {2}, \ dots, x_ {n})}$ ### Harmonious mean

Another position parameter is the harmonic mean. It is given as

${\ displaystyle x _ {\ text {harm}} = {\ frac {n} {{\ frac {1} {x_ {1}}} + \ dotsb + {\ frac {1} {x_ {n}}}} }}$ .

### Winsored means and Lehmann-Hodges means

Further measures of location are the so-called winsorized mean and the Lehmann-Hodges mean .