Probability density function

from Wikipedia, the free encyclopedia
The probability that a random variable takes on a value between and corresponds to the content of the area under the graph of the probability density function .

A probability density function, often briefly called density function, probability density, distribution density or just density and abbreviated to WDF or English pdf from probability density function , is a special real-valued function in stochastics , a branch of mathematics . There the probability density functions are used to construct probability distributions with the help of integrals and to investigate and classify probability distributions.

In contrast to probabilities, probability density functions can also assume values ​​above one. The construction of probability distributions via probability density functions is based on the idea that the area between the probability density function and the x-axis from a point to a point corresponds to the probability of obtaining a value between and . It is not the function value of the probability density function that is relevant, but the area under its function graph , i.e. the integral.

In a more general context, probability density functions are density functions (in the sense of measure theory) with respect to the Lebesgue measure .

While in the discrete case probabilities of events can be calculated by adding up the probabilities of the individual elementary events (for example, an ideal cube shows every number with a probability of ), this no longer applies to the continuous case. For example, two people are hardly exactly the same size, but only down to a hair's breadth or less. In such cases, probability density functions are useful. With the help of these functions, the probability for any interval - for example a height between 1.80 m and 1.81 m - can be determined, although there are an infinite number of values ​​in this interval, each of which has the probability .


Probability densities can be defined in two ways: on the one hand as a function from which a probability distribution can be constructed, and on the other hand as a function that is derived from a probability distribution. So the difference is the direction of the approach.

For the construction of probability measures

A real function is given

for which applies:
  • is nonnegative, that is, for everyone .
  • can be integrated .
  • is normalized in the sense that

Then a probability density function is called and defined by

a probability distribution on the real numbers.

Derived from probability measures

A probability distribution or a real-valued random variable is given .

Does a real function exist , so for all


holds, then the probability density function of or of is called .


Probability density functions of the exponential distribution for different parameters.

A probability distribution that can be defined using a probability density function is the exponential distribution . It has the probability density function

Here is a real parameter. In particular, the probability density function for parameters at the point exceeds the function value as described in the introduction. That it really is a probability density function follows from the elementary integration rules for the exponential function , positivity and integrability of the exponential function are clear.

A probability distribution from which a probability density function can be derived is the constant uniform distribution over the interval . It is defined by

for and

Outside the interval, all events are given zero probability. We are now looking for a function for which

applies if . The function

fulfills this. It is then continued outside the interval by zero in order to be able to integrate over any subsets of the real numbers without any problems. A probability density function of the continuous uniform distribution would be:

The same would be the probability density function

possible because both differ only on a Lebesgue null set and both meet the requirements. Any number of probability density functions could be generated by modifying the value at one point. In fact, this does not change the property of the function to be a probability density function, since the integral ignores these small modifications.

Further examples of probability densities can be found in the list of univariate probability distributions .

Comments on the definition

Strictly speaking, the integral in the definition is a Lebesgue integral with respect to the Lebesgue measure and it should accordingly be written as. In most cases the conventional Riemann integral is sufficient, which is why it is written here . The disadvantage of the Riemann integral at the structural level is that, like the Lebesgue integral, it cannot be embedded in a general weight-theoretical framework. For details on the relationship between Lebesgue and Riemann integral, see Riemann and Lebesgue integral .

Some authors also differentiate between the two approaches above by name. The function that is used to construct probability distributions is called probability density, whereas the function derived from a probability distribution is called distribution density .

Existence and uniqueness

Construction of probability distributions

What is described in the definition really provides a probability distribution. Because from the normalization follows . That the probabilities are all positive follows from the positivity of the function. The σ-additivity follows from the theorem of the majorized convergence with the probability density function as majorant and the sequence of functions


with pairwise disjoint sets .

Here is the characteristic function on the set .

The fact that the probability distribution is unambiguous follows from the measure uniqueness theorem and the intersection stability of the producer of Borel's σ-algebra , in this case the set system of closed intervals.

Derived from a probability density function

The central statement about the existence of a probability density function for a given probability distribution is the Radon-Nikodým theorem :

The probability distribution has a probability density function if and only if it is absolutely continuous with respect to the Lebesgue measure . That means that it must always follow.

There can certainly be more than one such probability density function, but they only differ from one another on one set of Lebesgue measure 0, so they are almost identical everywhere .

Thus, discrete probability distributions can not have a probability density function, because for them it always holds for a suitable element . Such point sets always have the Lebesgue measure 0, so discrete probability distributions are not absolutely continuous with respect to the Lebesgue measure.

Calculation of probabilities


The probability for an interval can be calculated with the probability density as


This formula also applies to the intervals , and , because it is in the nature of continuous random variables that the probability of assuming a concrete value is ( impossible event ). Expressed formally, the following applies:

For more complex sets, the probability can be determined analogously by integrating over sub-intervals. In general, probability takes shape


The σ-additivity of the probability distribution is often helpful . That means: Are pairwise disjoint intervals and is

the union of all these intervals, then holds


The intervals are of the form . This also applies to a finite number of intervals. If the probability of disjoint intervals is to be calculated, one can accordingly first calculate the probability of each individual interval and then add up these probabilities.

Example: time between calls to a call center

Experience shows that the time between two calls in a call center is roughly exponentially distributed to a parameter and therefore has the probability density function


see also the Examples section and the Poisson process article . The x-axis is provided with any time unit (hours, minutes, seconds). The parameter then corresponds to the average number of calls per time unit.

The probability that the next call will occur one or two time units after the previous one is then


Suppose a call center service employee needs five time units for a break. The likelihood that she won't miss a call is equal to the likelihood that the next call will come in at time five or later. It is with that


Relationship between distribution function and density function

Probability density of the log normal distribution (with )
Cumulative distribution function of the log normal distribution (with )

The distribution function of a random variable or a probability distribution with probability density function or is formed as an integral over the density function:

This follows directly from the definition of the distribution function. The distribution functions of random variables or probability distributions with probability density functions are therefore always continuous .

If the distribution function is differentiable , its derivative is a density function of the distribution:

This relationship still applies when is continuous and there are at most countably many places where it is not differentiable; which values ​​are used in these places is irrelevant.

In general, a density function exists if and only if the distribution function is absolutely continuous . This condition implies, among other things, that is continuous and almost everywhere has a derivative that matches the density.

It should be noted, however, that there are distributions like the Cantor distribution that have a continuous, almost everywhere differentiable distribution function, but still no probability density. Distribution functions can always be differentiated almost everywhere, but the corresponding derivation generally only covers the absolutely continuous part of the distribution.

Densify on partial intervals

The probability density of a random variable that only accepts values ​​in a sub-interval of the real numbers can be chosen in such a way that it has the value outside the interval . An example is the exponential distribution with . Alternatively, the probability density can be viewed as a function ; H. as a density of distribution on with respect to the Lebesgue measure .

Nonlinear transformation

In the case of the non-linear transformation , the random variable applies to the expected value


It is therefore not necessary to calculate the probability density function by itself.

Convolution and sum of random variables

For probability distributions with probability density functions, the convolution (of probability distributions) can be traced back to the convolution (of functions) of the corresponding probability density functions. If probability distributions are with probability density functions and , then is


The convolution of and and denotes the convolution of the functions and . The probability density function of the convolution of two probability distributions is therefore exactly the convolution of the probability density functions of the probability distributions.

This property is directly transferred to the sum of stochastically independent random variables . If two stochastically independent random variables with probability density functions and are given, then is


The probability density function of the sum is thus the convolution of the probability density functions of the individual random variables.

Determination of key figures using probability density functions

Many of the typical key figures of a random variable or a probability distribution can be derived directly from the probability density functions if they exist.


The mode of a probability distribution or random variable is defined directly via the probability density function. One is called a mode when the probability density function has a local maximum at the point . That means it is

for all

for a .

Of course, a probability density function can also have two or more local maxima ( bimodal distributions and multimodal distributions ). In the case of the uniform distribution in the example section above, the probability density function even has an infinite number of local maxima.


The median is usually defined via the distribution function or, more specifically, via the quantile function . If a probability density function exists, a median is given by that for which


applies. Due to the continuity of the associated distribution function, always exists in this case , but is generally not unique.

Expected value

The expected value of a random variable with a probability density function is given by


if the integral exists.

Variance and standard deviation

Is a random variable with probability density function given and designated

is the expected value of the random variable, then the variance of the random variable is given by


Alternatively, the law of displacement also applies


Here, too, the statements only apply if all occurring integrals exist. The standard deviation can then be calculated directly as the square root of the variance.

Higher moments, skewness and curvature

Using the rule for nonlinear transformations given above, higher torques can also be calculated directly. So for the kth moment of a random variable with probability density function

and for the k th absolute moment


Denotes the expected value of , then results for the central moments

and the absolute central moments


The skewness and curvature of the distribution can be determined directly via the central moments , see the corresponding main article.


Again the probability density function of the exponential distribution for the parameter is given , that is

There is always one mode of exponential distribution . Because on the interval the probability density function is constantly equal to zero, and on the interval it is strictly monotonically decreasing , so there is a local maximum at the point 0. From the monotony it follows directly that it is the only local maximum, so the mode is clearly determined.

To determine the median one forms (since the probability density function on the left of zero vanishes)


A short calculation gives you


This also satisfies the second of the two equations in the Median section above and is therefore a median.

The expected value is obtained with the help of partial integration


Similarly, the variance can be determined by applying the partial integration twice.

Further examples

A density function is given by for as well as for and for , because it is completely non-negative and it applies


The following applies to:

The distribution function can be written as

If the density is a random variable , then for example


For the expectation of results


Multi-dimensional random variables

Probability densities can also be defined for multi-dimensional random variables, i.e. for random vectors. If a -value random variable , then a function is called the probability density (with regard to the Lebesgue measure) of the random variable , if it holds

for all borel quantities .

In particular, it then follows for -dimensional intervals with real numbers :


The concept of the distribution function can also be extended to multi-dimensional random variables. In the notation, the vector and the symbol are to be read component-wise. is here a mapping of into the interval [0,1] and it applies


If n-times is continuously differentiable, one obtains a probability density by partial differentiation:

The densities of the component variables can be calculated as the densities of the marginal distributions by integration over the other variables.

The following also applies: If a -value random variable with density, the following are equivalent:

  • has a density of the form , where is the real probability density of .
  • The random variables are independent.

Estimating a probability density based on discrete data

Frequency density

Discreetly recorded, but actually constant data (for example body height in centimeters) can be represented as frequency density. The histogram thus obtained is a piece-wise constant estimate of the density function. Alternatively, the density function can be estimated by a continuous function , for example with so-called kernel density estimators . The core used for this should correspond to the expected measurement error .

Border crossing

Let it be an approximating random variable with the characteristics and the probabilities . The limit transition from an approximating discrete random variable to a continuous random variable can be modeled by a probability histogram. To do this, the range of values ​​of the random variable is divided into equal intervals . These intervals with the length and the corresponding class centers are used to approximate the density function by the probability histogram, which consists of rectangles with the area that are located above the class centers. For small can be understood as an approximation of the continuous random variable . If the interval lengths are reduced, the approximation of by improves . The border crossing for all intervals leads to in the case of variance

and in the case of the expected value


This approximation results in the definition of the variance for continuous random variables.


Web links

Individual evidence

  1. ^ Georgii: Stochastics. 2009, pp. 19, 24.
  2. ^ AV Prokhorov: Mode . In: Michiel Hazewinkel (Ed.): Encyclopaedia of Mathematics . Springer-Verlag , Berlin 2002, ISBN 978-1-55608-010-4 (English, online ).
  3. L. Fahrmeir, R. Künstler u. a .: Statistics. The way to data analysis. 8th edition. Springer 2016, p. 262 ff.