Probability density function
A probability density function, often briefly called density function, probability density, distribution density or just density and abbreviated to WDF or English pdf from probability density function , is a special real-valued function in stochastics , a branch of mathematics . There the probability density functions are used to construct probability distributions with the help of integrals and to investigate and classify probability distributions.
In contrast to probabilities, probability density functions can also assume values above one. The construction of probability distributions via probability density functions is based on the idea that the area between the probability density function and the x-axis from a point to a point corresponds to the probability of obtaining a value between and . It is not the function value of the probability density function that is relevant, but the area under its function graph , i.e. the integral.
In a more general context, probability density functions are density functions (in the sense of measure theory) with respect to the Lebesgue measure .
While in the discrete case probabilities of events can be calculated by adding up the probabilities of the individual elementary events (for example, an ideal cube shows every number with a probability of ), this no longer applies to the continuous case. For example, two people are hardly exactly the same size, but only down to a hair's breadth or less. In such cases, probability density functions are useful. With the help of these functions, the probability for any interval - for example a height between 1.80 m and 1.81 m - can be determined, although there are an infinite number of values in this interval, each of which has the probability .
Probability densities can be defined in two ways: on the one hand as a function from which a probability distribution can be constructed, and on the other hand as a function that is derived from a probability distribution. So the difference is the direction of the approach.
For the construction of probability measures
- for which applies:
- is nonnegative, that is, for everyone .
- can be integrated .
- is normalized in the sense that
Then a probability density function is called and defined by
a probability distribution on the real numbers.
Derived from probability measures
A probability distribution or a real-valued random variable is given .
Does a real function exist , so for all
holds, then the probability density function of or of is called .
A probability distribution that can be defined using a probability density function is the exponential distribution . It has the probability density function
Here is a real parameter. In particular, the probability density function for parameters at the point exceeds the function value as described in the introduction. That it really is a probability density function follows from the elementary integration rules for the exponential function , positivity and integrability of the exponential function are clear.
A probability distribution from which a probability density function can be derived is the constant uniform distribution over the interval . It is defined by
- for and
Outside the interval, all events are given zero probability. We are now looking for a function for which
applies if . The function
fulfills this. It is then continued outside the interval by zero in order to be able to integrate over any subsets of the real numbers without any problems. A probability density function of the continuous uniform distribution would be:
The same would be the probability density function
possible because both differ only on a Lebesgue null set and both meet the requirements. Any number of probability density functions could be generated by modifying the value at one point. In fact, this does not change the property of the function to be a probability density function, since the integral ignores these small modifications.
Further examples of probability densities can be found in the list of univariate probability distributions .
Comments on the definition
Strictly speaking, the integral in the definition is a Lebesgue integral with respect to the Lebesgue measure and it should accordingly be written as. In most cases the conventional Riemann integral is sufficient, which is why it is written here . The disadvantage of the Riemann integral at the structural level is that, like the Lebesgue integral, it cannot be embedded in a general weight-theoretical framework. For details on the relationship between Lebesgue and Riemann integral, see Riemann and Lebesgue integral .
Some authors also differentiate between the two approaches above by name. The function that is used to construct probability distributions is called probability density, whereas the function derived from a probability distribution is called distribution density .
Existence and uniqueness
Construction of probability distributions
What is described in the definition really provides a probability distribution. Because from the normalization follows . That the probabilities are all positive follows from the positivity of the function. The σ-additivity follows from the theorem of the majorized convergence with the probability density function as majorant and the sequence of functions
with pairwise disjoint sets .
Here is the characteristic function on the set .
The fact that the probability distribution is unambiguous follows from the measure uniqueness theorem and the intersection stability of the producer of Borel's σ-algebra , in this case the set system of closed intervals.
Derived from a probability density function
The central statement about the existence of a probability density function for a given probability distribution is the Radon-Nikodým theorem :
- The probability distribution has a probability density function if and only if it is absolutely continuous with respect to the Lebesgue measure . That means that it must always follow.
There can certainly be more than one such probability density function, but they only differ from one another on one set of Lebesgue measure 0, so they are almost identical everywhere .
Thus, discrete probability distributions can not have a probability density function, because for them it always holds for a suitable element . Such point sets always have the Lebesgue measure 0, so discrete probability distributions are not absolutely continuous with respect to the Lebesgue measure.
Calculation of probabilities
The probability for an interval can be calculated with the probability density as
This formula also applies to the intervals , and , because it is in the nature of continuous random variables that the probability of assuming a concrete value is ( impossible event ). Expressed formally, the following applies:
For more complex sets, the probability can be determined analogously by integrating over sub-intervals. In general, probability takes shape
The σ-additivity of the probability distribution is often helpful . That means: Are pairwise disjoint intervals and is
the union of all these intervals, then holds
The intervals are of the form . This also applies to a finite number of intervals. If the probability of disjoint intervals is to be calculated, one can accordingly first calculate the probability of each individual interval and then add up these probabilities.
Example: time between calls to a call center
Experience shows that the time between two calls in a call center is roughly exponentially distributed to a parameter and therefore has the probability density function
see also the Examples section and the Poisson process article . The x-axis is provided with any time unit (hours, minutes, seconds). The parameter then corresponds to the average number of calls per time unit.
The probability that the next call will occur one or two time units after the previous one is then
Suppose a call center service employee needs five time units for a break. The likelihood that she won't miss a call is equal to the likelihood that the next call will come in at time five or later. It is with that
Relationship between distribution function and density function
The distribution function of a random variable or a probability distribution with probability density function or is formed as an integral over the density function:
This follows directly from the definition of the distribution function. The distribution functions of random variables or probability distributions with probability density functions are therefore always continuous .
If the distribution function is differentiable , its derivative is a density function of the distribution:
This relationship still applies when is continuous and there are at most countably many places where it is not differentiable; which values are used in these places is irrelevant.
In general, a density function exists if and only if the distribution function is absolutely continuous . This condition implies, among other things, that is continuous and almost everywhere has a derivative that matches the density.
It should be noted, however, that there are distributions like the Cantor distribution that have a continuous, almost everywhere differentiable distribution function, but still no probability density. Distribution functions can always be differentiated almost everywhere, but the corresponding derivation generally only covers the absolutely continuous part of the distribution.
Densify on partial intervals
The probability density of a random variable that only accepts values in a sub-interval of the real numbers can be chosen in such a way that it has the value outside the interval . An example is the exponential distribution with . Alternatively, the probability density can be viewed as a function ; H. as a density of distribution on with respect to the Lebesgue measure .
In the case of the non-linear transformation , the random variable applies to the expected value
It is therefore not necessary to calculate the probability density function by itself.
Convolution and sum of random variables
For probability distributions with probability density functions, the convolution (of probability distributions) can be traced back to the convolution (of functions) of the corresponding probability density functions. If probability distributions are with probability density functions and , then is
The convolution of and and denotes the convolution of the functions and . The probability density function of the convolution of two probability distributions is therefore exactly the convolution of the probability density functions of the probability distributions.
This property is directly transferred to the sum of stochastically independent random variables . If two stochastically independent random variables with probability density functions and are given, then is
The probability density function of the sum is thus the convolution of the probability density functions of the individual random variables.
Determination of key figures using probability density functions
Many of the typical key figures of a random variable or a probability distribution can be derived directly from the probability density functions if they exist.
The mode of a probability distribution or random variable is defined directly via the probability density function. One is called a mode when the probability density function has a local maximum at the point . That means it is
- for all
for a .
Of course, a probability density function can also have two or more local maxima ( bimodal distributions and multimodal distributions ). In the case of the uniform distribution in the example section above, the probability density function even has an infinite number of local maxima.
The median is usually defined via the distribution function or, more specifically, via the quantile function . If a probability density function exists, a median is given by that for which
applies. Due to the continuity of the associated distribution function, always exists in this case , but is generally not unique.
The expected value of a random variable with a probability density function is given by
if the integral exists.
Variance and standard deviation
Is a random variable with probability density function given and designated
is the expected value of the random variable, then the variance of the random variable is given by
Alternatively, the law of displacement also applies
Here, too, the statements only apply if all occurring integrals exist. The standard deviation can then be calculated directly as the square root of the variance.
Higher moments, skewness and curvature
Using the rule for nonlinear transformations given above, higher torques can also be calculated directly. So for the kth moment of a random variable with probability density function
and for the k th absolute moment
Denotes the expected value of , then results for the central moments
and the absolute central moments
The skewness and curvature of the distribution can be determined directly via the central moments , see the corresponding main article.
Again the probability density function of the exponential distribution for the parameter is given , that is
There is always one mode of exponential distribution . Because on the interval the probability density function is constantly equal to zero, and on the interval it is strictly monotonically decreasing , so there is a local maximum at the point 0. From the monotony it follows directly that it is the only local maximum, so the mode is clearly determined.
To determine the median one forms (since the probability density function on the left of zero vanishes)
A short calculation gives you
This also satisfies the second of the two equations in the Median section above and is therefore a median.
The expected value is obtained with the help of partial integration
Similarly, the variance can be determined by applying the partial integration twice.
A density function is given by for as well as for and for , because it is completely non-negative and it applies
The following applies to:
The distribution function can be written as
If the density is a random variable , then for example
For the expectation of results
Multi-dimensional random variables
Probability densities can also be defined for multi-dimensional random variables, i.e. for random vectors. If a -value random variable , then a function is called the probability density (with regard to the Lebesgue measure) of the random variable , if it holds
for all borel quantities .
In particular, it then follows for -dimensional intervals with real numbers :
The concept of the distribution function can also be extended to multi-dimensional random variables. In the notation, the vector and the symbol are to be read component-wise. is here a mapping of into the interval [0,1] and it applies
If n-times is continuously differentiable, one obtains a probability density by partial differentiation:
The densities of the component variables can be calculated as the densities of the marginal distributions by integration over the other variables.
The following also applies: If a -value random variable with density, the following are equivalent:
- has a density of the form , where is the real probability density of .
- The random variables are independent.
Estimating a probability density based on discrete data
Discreetly recorded, but actually constant data (for example body height in centimeters) can be represented as frequency density. The histogram thus obtained is a piece-wise constant estimate of the density function. Alternatively, the density function can be estimated by a continuous function , for example with so-called kernel density estimators . The core used for this should correspond to the expected measurement error .
Let it be an approximating random variable with the characteristics and the probabilities . The limit transition from an approximating discrete random variable to a continuous random variable can be modeled by a probability histogram. To do this, the range of values of the random variable is divided into equal intervals . These intervals with the length and the corresponding class centers are used to approximate the density function by the probability histogram, which consists of rectangles with the area that are located above the class centers. For small can be understood as an approximation of the continuous random variable . If the interval lengths are reduced, the approximation of by improves . The border crossing for all intervals leads to in the case of variance
and in the case of the expected value
This approximation results in the definition of the variance for continuous random variables.
- Hans-Otto Georgii: Stochastics: Introduction to probability theory and statistics. 4th edition. de Gruyter textbook, Berlin 2009, ISBN 978-3-11-021526-7 .
- Norbert Henze : Stochastics for beginners. 7th edition. Vieweg Verlag, Wiesbaden 2008, ISBN 978-3-8348-0423-5 .
- Achim Klenke: Probability Theory. 2nd Edition. Springer-Verlag, ISBN 978-3-540-76317-8 .
- Lothar Sachs , Jürgen Hedderich: Applied statistics: Collection of methods with R. 12th edition. Springer-Verlag, Berlin / Heidelberg 2006, ISBN 978-3-540-32160-6 .
- NG Ushakov: Density of a probability distribution . In: Michiel Hazewinkel (Ed.): Encyclopaedia of Mathematics . Springer-Verlag , Berlin 2002, ISBN 978-1-55608-010-4 (English, online ).
- Eric W. Weisstein : Probability Density Function . In: MathWorld (English).
- ^ Georgii: Stochastics. 2009, pp. 19, 24.
- ^ AV Prokhorov: Mode . In: Michiel Hazewinkel (Ed.): Encyclopaedia of Mathematics . Springer-Verlag , Berlin 2002, ISBN 978-1-55608-010-4 (English, online ).
- ↑ L. Fahrmeir, R. Künstler u. a .: Statistics. The way to data analysis. 8th edition. Springer 2016, p. 262 ff.