# Frequency distribution

Example of an (absolute) frequency distribution: projected age distribution for Germany in 2050

In mathematical statistics, a frequency distribution is first of all a function that indicates how often this value has occurred for each value that has occurred and for each possible value. In terms of the concept of function, each element of the definition set, which here includes the set of possible manifestations of a feature, is assigned an element from the value set, which here consists of the counted relative or absolute frequencies.

At the same time, the use of this function and, in particular, its graphic representation is a highly suitable method for the statistical description of data ( measured values , feature values ) and their clear representation. Like any function, such a distribution can be described as a table , a graph or a model using a function equation.

In descriptive statistics, the frequency distribution is the equivalent of the probability distribution in probability theory and in inferential statistics . The latter offers a number of mathematical functions that are used to approximate and analyze frequency distributions (such as the normal distribution ).

## Procedure

The amount of data (measured values, survey data) forms the initially unordered original list . First, it is sorted or sorted. The median value , range (statistical spread ), quantile and interquartile range can be taken from the ordered original list ( ranking list ) and the standard deviation can be estimated.

Then we summarize the same values ​​and note how often it occurs for each value, i.e. its absolute frequency . If we relate the absolute frequencies to the total number of values, the so-called random sample (sample size ), we get the relative frequencies . We now have an ordered set of value pairs (feature value and associated relative frequency), a so-called ranking .

If we add up the relative frequencies - starting with the smallest characteristic value - and assign the total achieved up to that point to each characteristic value (including its own contribution), we obtain the distribution sum or the cumulative frequency , which indicates for each characteristic value how large the proportion of the values ​​is is less than or equal to the associated characteristic value. The proportion starts with 0 and goes up to 1 or 100 percent. If the table is represented graphically, a slightly monotonously rising curve results, mostly in an extended S-shape. There are numerous attempts to approximate real distribution sums by means of function equations. The distribution sums depending on the characteristic values ​​are the simplest way of representing a frequency distribution.

The further calculation requires a division of the characteristic values ​​into classes . To do this, the value range that occurs is divided into, for example, 10 or 20 mostly equally wide classes (the rare values ​​at the edges (see " Outliers ") are sometimes grouped together in larger classes). One then arrives at the density functions , which in the case of a continuous distribution are the derivation of the empirical distribution function according to the characteristic value. Furthermore, the frequency can be determined not only by counting, but also, for example, by weighing. We then get a mass distribution instead of a number distribution . In principle, any additive quantity is suitable for measuring the frequency.

If a random sample deviates significantly from the expected distribution, this can be caused by the random sample, but also by undetected influences, selection effects or a trend . If the sample size consists of a superposition of several subsets (age distribution, occupations, groups), the frequency distribution can also have two or more peaks instead of a maximum .