# Probability generating function

A probability generating function , also briefly generating function or generating function called, is in the probability theory a special real function . Each discrete probability distribution on the natural numbers and each random variable with values ​​in the natural numbers can be assigned a probability-generating function. Conversely, the probability distribution or the distribution of the random variable can also be uniquely reconstructed from each probability-generating function.

Because of this clear assignment , probability-generating functions make it possible to transfer certain properties of the distributions and operations of random variables to properties and operations of functions. For example, there is a relationship between the derivatives of the probability-generating function and the expected value , the variance and other moments of the random variable. Likewise, the addition of stochastically independent random variables or the convolution of the probability distributions corresponds to the multiplication of the corresponding probability-generating functions. This simplification of important operations then enables, for example, the investigation of complex stochastic objects such as the Bienaymé-Galton-Watson process .

## definition

The probability-generating function can be specified in two ways: on the one hand by means of a probability distribution , on the other hand by means of the distribution of a random variable . Both types are equivalent in the sense that every probability distribution can be understood as a distribution of a random variable and every distribution of a random variable is again a probability distribution. Is set for both definitions . With the amount of is natural numbers including 0, respectively. ${\ displaystyle 0 ^ {0}: = 1}$${\ displaystyle \ mathbb {N} _ {0}}$

### For probability distributions

Is a probability distribution on with probability function , it means the function ${\ displaystyle P}$${\ displaystyle (\ mathbb {N} _ {0}, {\ mathcal {P}} (\ mathbb {N} _ {0}))}$ ${\ displaystyle f_ {P} (k) = P (\ {k \})}$

${\ displaystyle m_ {P} \ colon [0,1] \ to [0,1]}$

defined by

${\ displaystyle m_ {P} (t) = \ sum _ {k = 0} ^ {\ infty} f_ {P} (k) t ^ {k}}$

the probability-generating function of or of . ${\ displaystyle P}$${\ displaystyle f_ {P}}$

### For random variables

For a random variable with values ​​in is the probability generating function ${\ displaystyle X}$${\ displaystyle \ mathbb {N} _ {0}}$

${\ displaystyle m_ {X} \ colon [0,1] \ to [0,1]}$

of or of defined as ${\ displaystyle X}$${\ displaystyle P_ {X}}$

${\ displaystyle m_ {X} (t): = m_ {P \ circ X ^ {- 1}} (t) = \ sum _ {k = 0} ^ {\ infty} t ^ {k} P [X = k]}$.

Thus the probability-generating function of a random variable is precisely the probability-generating function of its distribution . Alternatively, the probability-generating function of a random variable can also be defined via the expected value as

${\ displaystyle m_ {X} (t): = \ operatorname {E} \ left [t ^ {X} \ right]}$.

## Elementary examples

A Bernoulli-distributed random variable is given , that is . Then is and . In purely formal terms, one understands as a random variable with values ​​in whole and then sets for . Then ${\ displaystyle X}$${\ displaystyle X \ sim \ operatorname {Ber} (p)}$${\ displaystyle P (X = 0) = 1-p}$${\ displaystyle P (X = 1) = p}$${\ displaystyle X}$${\ displaystyle \ mathbb {N} _ {0}}$${\ displaystyle P (X = n) = 0}$${\ displaystyle n \ geq 2}$

${\ displaystyle m_ {X} (t) = \ sum _ {k = 0} ^ {\ infty} t ^ {k} P [X = k] = 1-p + pt}$

If the random variable is binomially distributed with parameters and , that is , then for${\ displaystyle Y}$ ${\ displaystyle n}$${\ displaystyle p}$${\ displaystyle Y \ sim \ operatorname {Bin} _ {n, p}}$${\ displaystyle k \ leq n}$

${\ displaystyle P (X = k) = {\ binom {n} {k}} p ^ {k} (1-p) ^ {nk}}$

and for . The probability generating function is then ${\ displaystyle P (X = k) = 0}$${\ displaystyle k> n}$

${\ displaystyle m_ {X} (t) = \ sum _ {k = 0} ^ {n} {\ binom {n} {k}} (pt) ^ {k} (1-p) ^ {nk} = (pt + 1-p) ^ {n}}$.

This follows by means of the binomial theorem .

## properties

### Properties as a function

The probability generating function is a power series and has a radius of convergence of at least 1, so it converges for all . This follows from the fact that all coefficients in the power series are positive and add up to 1. It then follows for . Thus, the probability-generating functions inherit all properties of the power series on the examined interval : They are continuous and infinitely differentiable over the interval . ${\ displaystyle t \ in [0,1]}$${\ displaystyle \ sum _ {k = 0} ^ {\ infty} \ left | t ^ {k} P [X = k] \ right | \ leq 1}$${\ displaystyle t \ in [-1,1]}$${\ displaystyle [0,1]}$${\ displaystyle [0,1)}$

Since each of the monomials is convex and monotonically increasing and these properties are closed with respect to conical combinations , the probability generating function is also convex and monotonically increasing. ${\ displaystyle x ^ {k}}$

### Reversibility

The probability generating function determines the distribution of uniquely: ${\ displaystyle X}$

If and -value random variables are with for all with one , then follows for all .${\ displaystyle X}$${\ displaystyle Y}$ ${\ displaystyle \ mathbb {N} _ {0}}$${\ displaystyle m_ {X} (t) = m_ {Y} (t)}$${\ displaystyle t \ in [0, c]}$${\ displaystyle c> 0}$${\ displaystyle P [X = k] = P [Y = k]}$${\ displaystyle k \ in \ mathbb {N} _ {0}}$

According to the Taylor formula , it then applies to everyone${\ displaystyle k \ in \ mathbb {N} _ {0}}$

${\ displaystyle P [X = k] = {\ dfrac {m_ {X} ^ {(k)} (0)} {k!}} = {\ dfrac {m_ {Y} ^ {(k)} (0 )} {k!}} = P [Y = k]}$.

This relationship shows that the probabilities can be "generated" and the probability function can be reconstructed from the probability-generating function. ${\ displaystyle m_ {X}}$${\ displaystyle P [X = k]}$

### Convolution and sums of random variables

If and are independent -value random variables, then applies to the probability-generating function of${\ displaystyle X}$${\ displaystyle Y}$ ${\ displaystyle \ mathbb {N} _ {0}}$${\ displaystyle X + Y}$

${\ displaystyle m_ {X + Y} (t) = \ operatorname {E} (t ^ {X + Y}) = \ operatorname {E} (t ^ {X} \ cdot t ^ {Y}) = \ operatorname {E} (t ^ {X}) \ cdot \ operatorname {E} (t ^ {Y}) = m_ {X} (t) \ cdot m_ {Y} (t)}$,

because with and are also and independent. This can be generalized directly to finite sums of independent random variables: If independent -value random variables are, then for${\ displaystyle X}$${\ displaystyle Y}$${\ displaystyle t ^ {X}}$${\ displaystyle t ^ {Y}}$${\ displaystyle X_ {1}, \ ldots, X_ {n}}$${\ displaystyle \ mathbb {N} _ {0}}$${\ displaystyle S_ {n} = \ sum _ {i = 1} ^ {n} X_ {i}}$

${\ displaystyle m_ {S_ {n}} (t) = \ prod _ {i = 1} ^ {n} m_ {X_ {i}} (t)}$.

This then directly follows for the probability-generating function of the convolution of the probability measures${\ displaystyle P * Q}$${\ displaystyle P, Q}$

${\ displaystyle m_ {P * Q} (t) = m_ {P} (t) m_ {Q} (t)}$.

example

Let be independent, Bernoulli-distributed random variables for the same parameter . Then the sum of the random variables is known to be binomially distributed to the parameters and , thus . With the probability-generating functions for the Bernoulli distribution and the binomial distribution derived above in the section Elementary Examples${\ displaystyle X_ {1}, X_ {2}}$${\ displaystyle p}$${\ displaystyle 2}$${\ displaystyle p}$${\ displaystyle X_ {1} + X_ {2} \ sim \ operatorname {Bin} _ {2, p}}$

${\ displaystyle m_ {X_ {1}} (t) \ cdot m_ {X2} (t) = (1-p + pt) ^ {2} = m _ {\ operatorname {Bin} _ {2, p}} ( t) = m_ {X_ {1} + X_ {2}} (t)}$.

### Moment generation

For a -value random variable and is ${\ displaystyle \ mathbb {N} _ {0}}$${\ displaystyle X}$${\ displaystyle k \ in \ mathbb {N} _ {0}}$

${\ displaystyle \ operatorname {E} \ left [{\ binom {X} {k}} \ right] = {\ dfrac {\ lim _ {t \ uparrow 1} m_ {X} ^ {(k)} (t )} {k!}}}$

respectively

${\ displaystyle \ operatorname {E} \ left [X (X-1) \ dots (X-k + 1) \ right] = \ lim _ {t \ uparrow 1} m_ {X} ^ {(k)} ( t)}$.

Both sides of the two equations are finite if and only if is finite. ${\ displaystyle \ operatorname {E} \ left [X ^ {k} \ right]}$

In particular, the expected value and the variance of a -value random variable can be determined from its probability-generating function: ${\ displaystyle \ mathbb {N} _ {0}}$

${\ displaystyle \ operatorname {E} \ left [X \ right] = \ lim _ {t \ uparrow 1} m_ {X} '(t)}$,
${\ displaystyle \ operatorname {Var} \ left [X \ right] = \ operatorname {E} \ left [X (X-1) \ right] + \ operatorname {E} \ left [X \ right] - \ operatorname { E} \ left [X \ right] ^ {2} = \ lim _ {t \ uparrow 1} \ left (m_ {X} '' (t) + m_ {X} '(t) -m_ {X}' (t) ^ {2} \ right)}$.

The consideration of the left-hand limit value is necessary here, since the differentiability of power series on the edge of the convergence radius is not necessarily given.

example

Let be a binomially distributed random variable, so . Then ${\ displaystyle X}$${\ displaystyle X \ sim \ operatorname {Bin} _ {n, p}}$

${\ displaystyle m_ {X} (t) = (pt + 1-p) ^ {n}, \ quad m '_ {X} (t) = np (pt + 1-p) ^ {n-1} { \ text {and}} m '' _ {X} (t) = n (n-1) p ^ {2} (pt + 1-p) ^ {n-2}}$

Both derivatives are polynomials and can therefore be evaluated for without any problems , so the limit value on the left does not need to be considered. It is ${\ displaystyle t = 1}$

${\ displaystyle m '_ {X} (1) = np {\ text {and}} m' '_ {X} (1) = n (n-1) p ^ {2}}$.

This follows with the above results

${\ displaystyle \ operatorname {E} (X) = m '_ {X} (1) = np, \ quad \ operatorname {Var} (X) = m_ {X}' '(1) + m_ {X}' (1) -m_ {X} '(1) ^ {2} = np (1-p)}$.

### Linear transformation of random variables

Linear transformations of the random variable have the following effect on the probability-generating function:

${\ displaystyle m_ {aX + b} (t) = t ^ {b} m_ {X} (t ^ {a})}$.

example

If a Bernoulli-distributed random variable, that is , then for the random variable there is two-point distribution on . The probability generating function is then ${\ displaystyle X}$${\ displaystyle X \ sim \ operatorname {Ber} (p)}$${\ displaystyle a, b \ in \ mathbb {N}}$${\ displaystyle Y = aX + b}$ ${\ displaystyle \ {b, a + b \}}$

${\ displaystyle m_ {Y} (t) = m_ {aX + b} (t) = t ^ {b} m_ {X} (t ^ {a}) = t ^ {b} \ cdot (1-p + pt ^ {a}) = (1-p) t ^ {b} + pt ^ {a + b}}$.

### convergence

The point-wise convergence of the probability-generating function can be directly related to the convergence in distribution :

Are random variables with associated probability generating functions , then converge if and in distribution to when the probability generating functions for all with a pointwise to converge.${\ displaystyle X, X_ {1}, X_ {2}, X_ {3}, \ dots}$${\ displaystyle m, m_ {1}, m_ {2}, m_ {3}, \ dots}$${\ displaystyle X_ {n}}$${\ displaystyle X}$${\ displaystyle m_ {n}}$${\ displaystyle t \ in [0, \ varepsilon)}$${\ displaystyle \ varepsilon \ in (0,1)}$${\ displaystyle m}$

The statement also applies to the probability-generating functions of probability distributions and the weak convergence .

### Probability-generating functions of random sums

Using probability-generating functions, sums can easily be calculated using a random number of summands. Are independent, identically distributed random variables with values ​​in and a further, independent random variable with the same value range. Then has the random variable ${\ displaystyle (X_ {i}) _ {i \ in \ mathbb {N}}}$ ${\ displaystyle \ mathbb {N} _ {0}}$${\ displaystyle T}$${\ displaystyle X_ {i}}$

${\ displaystyle Z = \ sum _ {i = 1} ^ {T} X_ {i}}$

the probability generating function

${\ displaystyle m_ {Z} (t) = m_ {T} (m_ {X_ {1}} (t))}$.

This property is used, for example, in the analysis of the Galton-Watson process . According to the above rules for calculating the expected value, the chain rule then applies

${\ displaystyle \ operatorname {E} (Z) = \ operatorname {E} (T) \ cdot \ operatorname {E} (X_ {1})}$,

which corresponds to the formula of Wald .

The following then applies to the variance

${\ displaystyle \ operatorname {Var} (Z) = \ operatorname {Var} (T) \ operatorname {E} (X_ {1}) ^ {2} + \ operatorname {E} (T) \ operatorname {Var} ( X_ {1})}$.

This is exactly the Blackwell-Girshick equation . It also follows by means of the above rules for determining the variance and the product rule.

## Multivariate probability generating function

If a -dimensional random vector with values ​​in , then the probability generating function of is defined as ${\ displaystyle X = (X_ {1}, \ dots, X_ {k})}$${\ displaystyle k}$${\ displaystyle \ mathbb {N} _ {0} ^ {k}}$${\ displaystyle X}$

${\ displaystyle m_ {X} (t): = m_ {X} (t_ {1}, \ dots, t_ {k}) = \ operatorname {E} \ left (\ prod _ {i = 1} ^ {k } t_ {i} ^ {X_ {i}} \ right) = \ sum _ {x_ {1}, \ ldots, x_ {k} = 0} ^ {\ infty} f_ {P} (x_ {1}, \ ldots, x_ {k}) t_ {1} ^ {x_ {1}} \ dots t_ {k} ^ {x_ {k}}}$

with . ${\ displaystyle f_ {P} (x_ {1}, \ ldots, x_ {k}) = P (X_ {1} = x_ {1}, \ dotsc, X_ {k} = x_ {k})}$

### Expected value, variance, and covariance

Analogously to the one-dimensional case, the following applies

${\ displaystyle \ operatorname {E} (X_ {i}) = {\ frac {\ partial m_ {X}} {\ partial t_ {i}}} (1, \ dots, 1) \ quad \ forall i \ in \ {1, \ dots, k \}}$

such as

${\ displaystyle \ operatorname {Var} (X_ {i}) = {\ frac {\ partial ^ {2} m_ {X}} {{\ partial t_ {i}} ^ {2}}} (1, \ dots , 1) + {\ frac {\ partial m_ {X}} {\ partial t_ {i}}} (1, \ dots, 1) \ left (1 - {\ frac {\ partial m_ {X}} {\ partial t_ {i}}} (1, \ dots, 1) \ right) \ quad \ forall i \ in \ {1, \ dots, k \}}$

and

${\ displaystyle \ operatorname {Cov} (X_ {i}, X_ {j}) = {\ frac {\ partial ^ {2} m_ {X}} {\ partial t_ {i} \ partial t_ {j}}} (1, \ dots, 1) - {\ frac {\ partial m_ {X}} {\ partial t_ {i}}} (1, \ dots, 1) \ cdot {\ frac {\ partial m_ {X}} {\ partial t_ {j}}} (1, \ dots, 1) \ quad \ forall i, j \ in \ {1, \ dots, k \}}$

## Examples

The table lists some of the probability-generating functions of popular discrete distributions. Probability-generating functions of probability distributions, which are not listed here, can be found in the respective article of the probability function.

distribution Probability generating function ${\ displaystyle m_ {X} (t)}$
Bernoulli distribution ${\ displaystyle m_ {X} (t) = 1-p + pt}$
Two-point distribution ${\ displaystyle m_ {X} (t) = (1-p) t ^ {a} + pt ^ {b}}$
Binomial distribution ${\ displaystyle B (n, p)}$ ${\ displaystyle m_ {X} (t) = (1-p + pt) ^ {n}}$
Geometric distribution ${\ displaystyle G (p)}$ ${\ displaystyle m_ {X} (t) = {\ frac {p} {1- (1-p) t}}}$
Negative binomial distribution ${\ displaystyle NB (r, p)}$ ${\ displaystyle m_ {X} (t) = \ left ({\ frac {p} {1- (1-p) t}} \ right) ^ {r}}$
Discrete equal distribution on${\ displaystyle \ {1, \ dotsc, n \}}$ ${\ displaystyle m_ {X} (t) = \ sum _ {k = 1} ^ {n} {\ frac {1} {n}} t ^ {k} = {\ frac {t ^ {n + 1} -t} {n (t-1)}}}$
Logarithmic distribution ${\ displaystyle m_ {X} (t) = {\ frac {\ ln (1-pt)} {\ ln (1-p)}}}$
Poisson distribution ${\ displaystyle P _ {\ lambda}}$ ${\ displaystyle m_ {X} (t) = \ mathrm {e} ^ {\ lambda (t-1)}}$
Generalized binomial distribution ${\ displaystyle GB (p)}$ ${\ displaystyle m_ {X} (t) = \ prod \ limits _ {j = 1} ^ {n} (1- {p_ {j}} + {p_ {j}} {t})}$
Multivariate distributions
Multinomial distribution ${\ displaystyle m_ {X} (t) = {\ biggl (} \ sum _ {i = 1} ^ {k} p_ {i} t_ {i} {\ biggr)} ^ {n}}$

In particular, the probability-generating function of the binomial distribution is equal to the n-fold product of the probability-generating function of the Bernoulli distribution, since the binomial distribution is exactly the sum of independent Bernoulli distributions. The same applies to the geometric distribution and the negative binomial distribution.

## Connection with other generating functions

The probability generating function of a random variable with a probability function is a special case of a generating function with for . In addition to the probability-generating function, there are three other generating functions in stochastics, which are not only defined for discrete distributions. The torque generating function is defined as . Accordingly, the characteristic function is defined as . Therefore applies . ${\ displaystyle X}$ ${\ displaystyle p}$${\ displaystyle a_ {i} = p \ left ({i} \ right)}$${\ displaystyle i \ in \ mathbb {N} _ {0}}$${\ displaystyle M_ {X} \ left (t \ right): = \ operatorname {E} \ left (e ^ {tX} \ right)}$${\ displaystyle m_ {X} \ left (e ^ {t} \ right) = M_ {X} \ left (t \ right)}$${\ displaystyle \ varphi _ {X} \ left (t \ right): = \ operatorname {E} \ left (e ^ {itX} \ right)}$${\ displaystyle m_ {X} \ left (e ^ {it} \ right) = \ varphi _ {X} \ left (t \ right)}$

There is also the cumulative-generating function as the logarithm of the torque-generating function. The term cumulative is derived from it.

## literature

• Klaus D. Schmidt: Measure and Probability. Springer, Berlin Heidelberg 2009, ISBN 978-3-540-89729-3 , p. 370 ff.
• Achim Klenke: Probability Theory . 3. Edition. Springer-Verlag, Berlin Heidelberg 2013, ISBN 978-3-642-36017-6 .
• Ulrich Krengel: Introduction to probability theory and statistics . For studies, professional practice and teaching. 8th edition. Vieweg, Wiesbaden 2005, ISBN 3-8348-0063-5 .
• Hans-Otto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 .
• Christian Hesse: Applied probability theory . 1st edition. Vieweg, Wiesbaden 2003, ISBN 3-528-03183-2 .

## Individual evidence

1. Ehrhard Behrends: Elementary Stochastics . A learning book - co-developed by students. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-8348-1939-0 , pp. 108 , doi : 10.1007 / 978-3-8348-2331-1 .
2. Achim Klenke: Probability Theory . 3. Edition. Springer-Verlag, Berlin Heidelberg 2013, ISBN 978-3-642-36017-6 , p. 79 , doi : 10.1007 / 978-3-642-36018-3 .
3. ^ Hans-Otto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 , p. 111 , doi : 10.1515 / 9783110215274 .
4. ^ A b Hans-Otto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 , p. 114 , doi : 10.1515 / 9783110215274 .
5. Achim Klenke: Probability Theory . 3. Edition. Springer-Verlag, Berlin Heidelberg 2013, ISBN 978-3-642-36017-6 , p. 83 , doi : 10.1007 / 978-3-642-36018-3 .