# Convergence in distribution

The convergence in distribution , sometimes also called convergence after distribution , is a concept of convergence that comes from stochastics . In addition to the convergence in the p-th mean , the convergence in probability and the almost certain convergence, it is one of the probability-theoretic convergence concepts and, compared to these, a weaker convergence concept.

In contrast to the other convergence terms in stochastics, the convergence in distribution is not the convergence of random variables, but the convergence of the distributions of random variables , which are probability distributions . Therefore, the convergence in distribution essentially corresponds to the weak convergence (in the sense of measure theory) , applied to probability distributions and formalized by means of random variables.

The convergence in distribution is used, for example, in the formulation of the central limit theorem , the best known representative of which is the central limit theorem by Lindeberg-Lévy .

## Definition of real-valued random variables

Real random variables with associated distribution functions are given${\ displaystyle X, X_ {1}, X_ {2}, X_ {3}, \ dots}$ ${\ displaystyle F, F_ {1}, F_ {2}, F_ {3}, \ dots}$

Then the sequence converges in distribution to if one of the following two equivalent conditions is met: ${\ displaystyle X_ {1}, X_ {2}, X_ {3}, \ dots}$${\ displaystyle X}$

• The distribution functions converge weakly against the distribution function . It means that${\ displaystyle F_ {1}, F_ {2}, F_ {3}, \ dots}$ ${\ displaystyle F}$
${\ displaystyle \ lim _ {n \ to \ infty} F_ {n} (t) = F (t)}$for everyone who is steady .${\ displaystyle t \ in \ mathbb {R}}$${\ displaystyle F}$
• It is
${\ displaystyle \ lim _ {n \ to \ infty} \ operatorname {E} (f (X_ {n})) = \ operatorname {E} (f (X))}$
for all real continuous bounded functions .${\ displaystyle f}$

## Comments on the definition

In contrast to the other types of convergence in stochastics, the convergence in distribution is not a convergence of random variables, but of measures. Strictly speaking, one would have to say that the distributions of the random variables converge and not the random variables in distribution. It should also be noted that in the definition all random variables can be defined on different probability spaces.

There are a variety of different notations for the convergence in distribution in the literature, including , , or , sometimes . The "W" and the "D" stand for weak convergence and convergence in distribution , the "L" for law . The notation should not be confused with the notation for the convergence in the (first) mean . ${\ displaystyle X_ {n} {\ stackrel {w} {\ rightarrow}} X}$${\ displaystyle X_ {n} {\ stackrel {\ mathcal {D}} {\ rightarrow}} X}$${\ displaystyle X_ {n} {\ stackrel {d} {\ rightarrow}} X}$${\ displaystyle X_ {n} {\ stackrel {\ mathcal {L}} {\ rightarrow}} X}$${\ displaystyle X_ {n} {\ stackrel {n \ to \ infty} {\ implies}} P_ {X}}$${\ displaystyle X_ {n} {\ stackrel {\ mathcal {L}} {\ rightarrow}} X}$ ${\ displaystyle X_ {n} {\ stackrel {{\ mathcal {L}} ^ {1}} {\ rightarrow}} X}$

## Motivation of the definition

Intuitively one would say of a sequence of probability measures that it converges to if ${\ displaystyle (P_ {n}) _ {n \ in \ mathbb {N}}}$${\ displaystyle P}$

${\ displaystyle \ lim _ {n \ to \ infty} P_ {n} (A) = P (A)}$

holds for every set from the considered σ-algebra. However, if one sets the Dirac measure in the point as a sequence of probability measures , then this sequence converges intuitively against the Dirac measure in 0. In the measurement space , however, the above requirement for the convergence of the measures is violated , for example for the set . In order to avoid such contradictions, one defines a sequence of measures as convergent to if ${\ displaystyle A}$${\ displaystyle P_ {n} = \ delta _ {\ tfrac {1} {n}}}$${\ displaystyle {\ tfrac {1} {n}}}$${\ displaystyle \ delta _ {0}}$ ${\ displaystyle (\ mathbb {R}, {\ mathcal {B}} (\ mathbb {R}))}$${\ displaystyle A = (- \ infty, 0]}$${\ displaystyle (\ mu _ {n}) _ {n \ in \ mathbb {N}}}$${\ displaystyle \ mu}$

${\ displaystyle \ lim _ {n \ to \ infty} \ int _ {X} f \ mathrm {d} \ mu _ {n} = \ int _ {X} f \ mathrm {d} \ mu}$

for everyone from a certain function class (continuous, bounded etc.). If one now applies this definition to probability measures (or distributions of random variables) and continuous bounded functions, one obtains the convergence in distribution in the general case. ${\ displaystyle f}$

Only Helly-Bray's theorem links this convergence (also called weak convergence in measure theory ) with the weak convergence of distribution functions and thus provides a more tangible characterization of the convergence in distribution via the convergence of the distribution function. For didactic reasons, however, this characterization is usually given first.

## example

If one considers a sequence of random variables distributed on the point Dirac , then each of the distribution functions has the form ${\ displaystyle - {\ tfrac {1} {n}}}$

${\ displaystyle F_ {X_ {n}} (x) = {\ begin {cases} 0 & {\ text {if}} x <- {\ tfrac {1} {n}} \\ 1 & {\ text {if} } x \ geq - {\ tfrac {1} {n}} \ end {cases}}}$.

The sequence of these distribution functions converges point by point to the distribution function

${\ displaystyle F_ {X} (x) = {\ begin {cases} 0 & {\ text {falls}} x <0 \\ 1 & {\ text {falls}} x \ geq 0 \ end {cases}}}$,

because for all distribution functions agree and for each there is one , so that always applies to all . The distribution function , however, is the distribution function of a Dirac distribution in the 0, so the sequence of the distributions of the random variables in distribution converges to the Dirac distribution in the 0. ${\ displaystyle x \ geq 0}$${\ displaystyle \ epsilon <0}$${\ displaystyle N (\ epsilon)}$${\ displaystyle n> N (\ epsilon)}$${\ displaystyle F_ {X_ {n}} (\ epsilon) = 0}$${\ displaystyle F_ {X}}$

Conversely, if one defines a sequence of Dirac-distributed random variables on the points , then these have the distribution functions ${\ displaystyle {\ tfrac {1} {n}}}$

${\ displaystyle F_ {X_ {n} ^ {*}} (x) = {\ begin {cases} 0 & {\ text {falls}} x <{\ tfrac {1} {n}} \\ 1 & {\ text {if}} x \ geq {\ tfrac {1} {n}} \ end {cases}}}$.

With an argument analogous to the above one shows that this sequence of distribution functions counterpoints

${\ displaystyle F ^ {*} (x) = {\ begin {cases} 0 & {\ text {falls}} x \ leq 0 \\ 1 & {\ text {falls}} x> 0 \ end {cases}}}$

converges. This point-wise limit function is not a distribution function , since it is not continuous on the right-hand side. But the sequence of converges point by point to this at every continuity point of the distribution function described above . Thus the in distribution also converges to the Dirac measure in the 0. ${\ displaystyle F_ {X_ {n} ^ {*}} (x)}$${\ displaystyle F_ {X}}$${\ displaystyle X_ {n} ^ {*}}$

Therefore, when checking for convergence in distribution, it must be taken into account that not only point-by-point convergence is relevant, but also whether any modifications of the limit function exist that meet the requirements for the continuity points.

## properties

• The Portmanteau theorem describes equivalent characterizations of the convergence in distribution.
• If the distribution function of a real random variable is continuous, then the convergence in distribution is equivalent to the uniform convergence of the distribution functions.
• Since the concept of convergence is only defined via the distributions of the random variables, it is not necessary that the random variables are defined on the same probability space.
• If those in distribution converge against , then the characteristic functions for all converge pointwise against . For the reverse conclusion it must also be assumed that it is continuously at the zero point.${\ displaystyle X_ {n}}$${\ displaystyle X}$ ${\ displaystyle \ varphi _ {X_ {n}} (t)}$${\ displaystyle t \ in \ mathbb {R}}$${\ displaystyle \ varphi _ {X} (t)}$${\ displaystyle \ varphi _ {X} (t)}$

## Relationship to other convergence concepts of stochastics

In general, the implications apply to the concepts of convergence in probability theory

${\ displaystyle {\ begin {matrix} {\ text {almost certain}} \\ {\ text {convergence}} \ end {matrix}} \ implies {\ begin {matrix} {\ text {convergence in}} \\ {\ text {probability}} \ end {matrix}} \ implies {\ begin {matrix} {\ text {convergence in}} \\ {\ text {distribution}} \ end {matrix}}}$

and

${\ displaystyle {\ begin {matrix} {\ text {convergence in}} \\ {\ text {p-th mean}} \ end {matrix}} \ implies {\ begin {matrix} {\ text {convergence in} } \\ {\ text {probability}} \ end {matrix}} \ implies {\ begin {matrix} {\ text {convergence in}} \\ {\ text {distribution}} \ end {matrix}}}$.

The convergence in distribution is therefore the weakest concept of convergence. The relationships with the other types of convergence are detailed in the sections below.

### Convergence in probability

According to Slutzky's theorem, convergence in probability results in convergence in distribution; the reverse is generally not true. For example, if the random variable is Bernoulli-distributed with parameters , that is ${\ displaystyle X}$ ${\ displaystyle p = q = {\ tfrac {1} {2}}}$

${\ displaystyle P (X = 1) = P (X = 0) = {\ frac {1} {2}}}$,

and if one sets against for all so converged in distribution , since they have the same distribution. It is always true, however, that the random variables cannot converge in probability. However, there are criteria under which the convergence in distribution results in convergence in probability. If, for example, all random variables are defined on the same probability space and converge in distribution to the random variable , which is almost certainly constant, then they also converge in probability to . ${\ displaystyle X_ {n} = 1-X}$${\ displaystyle n \ in \ mathbb {N}}$${\ displaystyle X_ {n}}$${\ displaystyle X}$${\ displaystyle | X_ {n} -X | = 1}$${\ displaystyle X_ {n}}$${\ displaystyle X}$${\ displaystyle X_ {n}}$${\ displaystyle X}$

### Almost certain convergence

The scorochod representation makes a statement about the conditions under which the convergence in distribution can be used to infer the almost certain convergence .

## general definition

In general, the convergence in distribution can be defined as follows: A random variable and a sequence of random variables with values ​​in a metric space are given . ${\ displaystyle X}$${\ displaystyle (X_ {n}) _ {n \ in \ mathbb {N}}}$ ${\ displaystyle E}$

Then those in distribution converge against if and only if their distributions converge weakly in the sense of measure theory against the distribution of . This means that for all continuous bounded functions it holds that ${\ displaystyle X_ {n}}$${\ displaystyle X}$${\ displaystyle P_ {X_ {n}}}$ ${\ displaystyle P_ {X}}$${\ displaystyle X}$ ${\ displaystyle f}$

${\ displaystyle \ lim _ {n \ to \ infty} \ operatorname {E} (f (X_ {n})) = \ operatorname {E} (f (X))}$.

## Relation to weak convergence

The weak convergence of finite measures is defined as follows: A sequence of finite measures on a metric space , provided with Borel's σ-algebra , converges weakly to if ${\ displaystyle (\ mu _ {n}) _ {n \ in \ mathbb {N}}}$${\ displaystyle (\ Omega, d)}$${\ displaystyle \ mu}$

${\ displaystyle \ lim _ {n \ to \ infty} \ int _ {\ Omega} f \ mathrm {d} \ mu _ {n} = \ int _ {\ Omega} f \ mathrm {d} \ mu}$

for all bounded continuous functions from to . The size of the base area is retained under weak limit values, since the function is continuous and limited. Thus, weak limit values ​​of sequences of probability measures are again probability measures. So it makes sense to define the weak convergence only for sequences of probability measures, which some authors do. ${\ displaystyle f}$${\ displaystyle X}$${\ displaystyle \ mathbb {R}}$${\ displaystyle \ Omega}$${\ displaystyle f \ equiv 1}$

If one transfers this definition for a sequence of probability measures to random variables, one obtains ${\ displaystyle (P_ {n}) _ {n \ in \ mathbb {N}}}$

${\ displaystyle \ lim _ {n \ to \ infty} \ int _ {\ Omega} f (X) \ mathrm {d} P_ {n} = \ int _ {\ Omega} f (X) \ mathrm {d} P}$,

what in stochastic notation of the definition given above

${\ displaystyle \ lim _ {n \ to \ infty} \ operatorname {E} (f (X_ {n})) = \ operatorname {E} (f (X))}$.

corresponds. The convergence in distribution is therefore a special case of weak convergence in the sense of measure theory, formulated for distributions of random variables and via the expected value.

Thus, the convergence in distribution is also an example of the functional- analytical concept of weak - * - convergence , for details see Weak convergence (measure theory) #classification .

## generalization

A modification of the convergence in distribution for random variables with values ​​in infinite-dimensional spaces is the fdd-convergence . With it, the convergence in distribution of all finite-dimensional marginal distributions is required.

## Individual evidence

1. Elstrodt: Measure and Integration Theory. 2009, p. 381.
2. Kusolitsch: Measure and probability theory. 2014, p. 287.
3. Meintrup, Schäffler: Stochastics. 2005, p. 174.