In probability theory and statistics, the conditional expected value describes the expected value of a random variable under the prerequisite that additional information about the outcome of the underlying random experiment is available. The condition can be, for example, that it is known whether a certain event has occurred or which values another random variable has assumed; abstractly, the additional information can be understood as a subspace of the underlying event space.
Abstract conditional expectation values and, as a special case, conditional probabilities thereof , generalize the elementary concept of conditional probability in probability theory and statistics .
Conditional expectation values play an important role in modern stochastics , for example in the investigation of stochastic processes , and are used, among other things, in the definition of martingales .
interpretation
The formation of the conditional expectation value is to a certain extent a smoothing of a random variable on a partial σalgebra . σalgebras model available information, and a smoothed version of the random variable, which can already be measured on a partial σalgebra , contains less information about the outcome of a random experiment. The formation of the conditional expectation is accompanied by a reduction in the depth of observation; the conditional expectation reduces the information about a random variable to a random variable that is simpler in terms of measurability, similar to how, as an extreme case, the expected value of a random variable reduces the information to a single number.
history
The concept, which is very old in some aspects ( Laplace already calculated conditional densities), was formalized by Andrei Kolmogorow in 1933 using the RadonNikodym theorem . In works by Paul Halmos in 1950 and Joseph L. Doob in 1953, conditional expectations were transferred to the form of partial σalgebras on abstract spaces that are common today.
introduction
When an event with is given, is the conditional probability${\ displaystyle B}$${\ displaystyle P (B)> 0}$
 ${\ displaystyle P (A \ mid B) = {\ frac {P (A \ cap B)} {P (B)}}}$
how likely the event is if one has the information that the event has occurred. The conditional expected value gives accordingly${\ displaystyle A}$${\ displaystyle B}$
 ${\ displaystyle \ operatorname {E} (Y \ mid B) = {\ frac {\ operatorname {E} (1_ {B} \ cdot Y)} {P (B)}}}$
which value one expects on average for the random variable when one has the information that the event has occurred. Here is the indicator function of , i.e. the random variable that takes on the value when it occurs and when it does not.
${\ displaystyle Y}$${\ displaystyle B}$${\ displaystyle 1_ {B}}$${\ displaystyle B}$${\ displaystyle 1}$${\ displaystyle B}$${\ displaystyle 0}$
Example: be the number when rolling a regular die and be the event to roll a 5 or 6. Then
${\ displaystyle Y}$${\ displaystyle B}$

${\ displaystyle \ operatorname {E} (Y \ mid B) \, = \, {\ frac {P (Y = 5) \ cdot 5 + P (Y = 6) \ cdot 6} {P (B)}} \, = \, {\ frac {11/6} {2/6}} \, = \, 5 {,} 5}$.
However, this elementary concept of conditional probabilities and expected values is often not sufficient. Rather, what we are looking for is conditional probabilities and conditional expected values in the form
(a) or ,
${\ displaystyle P (A \,  \, X = x)}$${\ displaystyle \ operatorname {E} (Y \,  \, X = x)}$
 if you know that a random variable has a value ,${\ displaystyle X}$${\ displaystyle x}$
(b) or ,
${\ displaystyle P (A \,  \, X)}$${\ displaystyle \ operatorname {E} (Y \,  \, X)}$
 if one considers the value found at (a) as a random variable (depending on ),${\ displaystyle x}$
(c) or ,
${\ displaystyle P (A \,  \, {\ mathcal {B}})}$${\ displaystyle \ operatorname {E} (Y \,  \, {\ mathcal {B}})}$
 if one has the information for every event in a σalgebra whether it has occurred or not.${\ displaystyle {\ mathcal {B}}}$
In contrast to (a), the expressions in (b) and (c) are themselves random variables , since they still depend on the random variable or the realization of the events in .
the expected value of Y under condition B is often spoken of. and is given by Y expected value X or given expected value of Y spoken.
${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle \ operatorname {E} (Y \ mid B)}$${\ displaystyle \ operatorname {E} (Y \ mid X)}$${\ displaystyle \ operatorname {E} (Y \,  \, {\ mathcal {B}})}$${\ displaystyle {\ mathcal {B}}}$
The specified variants of conditional probabilities and expected values are all related to one another. In fact, it is sufficient to define only one variant, because all of them can be derived from one another:
 Conditional probabilities and conditional expectation values contain the same thing: Conditional expectation values can, just like ordinary expectation values , be calculated as sums or integrals from conditional probabilities. Conversely, the conditional probability of an event is simply the conditional expected value of the indicator function of the event: .${\ displaystyle P (A  \ dotso) = \ operatorname {E} (1_ {A}  \ dotso)}$
 The variants in (a) and (b) are equivalent. The random variable has the value for the result , i. H. one gets for the value when one observes for the value . Conversely, when given, one can always find an expression that is dependent on , so that this relationship is fulfilled. The same applies to conditional expected values.${\ displaystyle P (A  X)}$${\ displaystyle \ omega}$${\ displaystyle P (A  X) (\ omega) = P (A  X = X (\ omega))}$${\ displaystyle P (A  X)}$${\ displaystyle P (A  X = x)}$${\ displaystyle X}$${\ displaystyle x}$${\ displaystyle P (A  X)}$${\ displaystyle x}$${\ displaystyle P (A  X = x)}$
 The variants in (b) and (c) are also equivalent, because one can choose as the set of all events of the form (the σalgebra generated by ), and vice versa as the family .${\ displaystyle {\ mathcal {B}}}$${\ displaystyle \ {X \ in E \}}$${\ displaystyle X}$ ${\ displaystyle \ sigma (X)}$${\ displaystyle X}$${\ displaystyle (1_ {B}) _ {B \ in {\ mathcal {B}}}}$
Discreet case
Here we consider the case that holds for all values of . This case is particularly easy to deal with because the elementary definition is fully applicable:
${\ displaystyle P (X = x)> 0}$${\ displaystyle x}$${\ displaystyle X}$
 ${\ displaystyle P (A \ mid X = x) \, = \, {\ frac {P (A \ cap \ {X = x \})} {P (X = x)}}}$
The function (where the argument denotes) has all the properties of a probability measure , it is a socalled regular conditional probability. The conditional distribution of a random variable is therefore also a perfectly ordinary probability distribution . The expected value of this distribution is the conditional expected value of , given :
${\ displaystyle P (\, \ cdot \,  \, X = x)}$${\ displaystyle \ cdot}$ ${\ displaystyle P (Y \ in \, \ cdot \,  \, X = x)}$${\ displaystyle Y}$${\ displaystyle Y}$${\ displaystyle X = x}$
 ${\ displaystyle \ operatorname {E} (Y \ mid X = x).}$
If it is also discrete, then it applies
${\ displaystyle Y}$
 ${\ displaystyle \ operatorname {E} (Y \ mid X = x) = \ sum _ {y} yP (Y = y \ mid X = x) = \ sum _ {y} y {\ frac {P (X = x, Y = y)} {P (X = x)}} \ ,,}$
where all in the value range of is added up.
${\ displaystyle y}$${\ displaystyle Y}$
example
${\ displaystyle X}$and let the numbers of pips in two independent throws with a regular die and the sum of the pips. The distribution of is given by , . But if we know the result of the first litter and know that we are e.g. B. having rolled the value , we get the conditional distribution
${\ displaystyle Y}$${\ displaystyle Z = X + Y}$${\ displaystyle Z}$${\ displaystyle \ textstyle P (Z = z) = {\ frac {6  7z } {36}}}$${\ displaystyle z = 2, \ dotsc, 12}$${\ displaystyle X}$${\ displaystyle 4}$

${\ displaystyle P (Z = z \ mid X = 4) \, = \, {\ frac {P (X = 4, Y = z4)} {P (X = 4)}} \, = \, {\ begin {cases} 1/6 & {\ text {if}} z = 5, \ dotsc, 10 \\ 0 & {\ text {otherwise}} \ end {cases}}}$.
The expected value of this distribution, the conditional expectation of given is,
${\ displaystyle Z}$${\ displaystyle X = 4}$

${\ displaystyle \ operatorname {E} (Z \ mid X = 4) \, = \, {\ tfrac {1} {6}} (5 + 6 + \ dotsb +10) \, = \, 7 {,} 5}$.
More generally applies to any values of${\ displaystyle x}$${\ displaystyle X}$

${\ displaystyle \ operatorname {E} (Z \ mid X = x) \, = \, {\ tfrac {1} {6}} ((x + 1) + \ dotsb + (x + 6)) \, = \, x + 3 {,} 5}$.
If we substitute for the value of , we get the conditional expectation of , given :
${\ displaystyle x}$${\ displaystyle X}$${\ displaystyle Z}$${\ displaystyle X}$

${\ displaystyle \ operatorname {E} (Z \ mid X) \, = \, X + 3 {,} 5}$.
This expression is a random variable; if the result has occurred, has the value and the value
${\ displaystyle \ omega}$${\ displaystyle X}$${\ displaystyle X (\ omega)}$${\ displaystyle \ operatorname {E} (Z  X)}$

${\ displaystyle \ operatorname {E} (Z \ mid X) (\ omega) \, = \, \ operatorname {E} (Z \ mid X = X (\ omega)) \, = \, X (\ omega) +3 {,} 5}$.
Theorem about total probability
The probability of an event can be calculated by decomposing according to the values of :
${\ displaystyle A}$${\ displaystyle x}$${\ displaystyle X}$
 ${\ displaystyle P (A) = \ sum _ {x} P (X = x) \, P (A  X = x)}$
More generally, the formula
applies to every event in σalgebra${\ displaystyle B = \ {X \ in E \}}$${\ displaystyle \ sigma (X)}$

${\ displaystyle P (B \ cap A) = \ sum _ {x \ in E} P (X = x) \, P (A  X = x)}$.
The equivalent formulation is obtained with the help of the transformation formula
for the image size

${\ displaystyle P (B \ cap A) = \ int _ {B} P (A  X) \, dP}$.
General case
In the general case, the definition is far less intuitive than in the discrete case, because one can no longer assume that the events on which one is conditioned have a probability .
${\ displaystyle> 0}$
An example
We consider two independent standard normally distributed random variables and . Without much thought, one can also state the conditional expected value, given , of the random variable , i.e. H. the mean value that one expects for the expression if one knows:
${\ displaystyle X}$${\ displaystyle Y}$${\ displaystyle X}$${\ displaystyle Z = 2X + Y3}$${\ displaystyle 2X + Y3}$${\ displaystyle X}$

${\ displaystyle \ operatorname {E} (Z  X) = 2X3}$ or. ${\ displaystyle \ operatorname {E} (Z  X = x) = 2x3}$
As before, itself is a random variable, for whose value only the σalgebra generated by is decisive. If, for example , that is , we get also .
${\ displaystyle \ operatorname {E} (Z  X)}$${\ displaystyle X}$${\ displaystyle \ sigma (X)}$${\ displaystyle X '= 2X}$${\ displaystyle \ sigma (X ') = \ sigma (X)}$${\ displaystyle \ operatorname {E} (Z  X ') = \ operatorname {E} (X' + Y3  X ') = X'3 = 2X3}$
The problem arises from the following consideration: The equations given assume that there is a standard normal distribution for each individual value of . Indeed, but you could also assume that in the case of constant value and being a standard normal distribution only in the other cases: Since the event , the probability has had and overall still independent and standard normally distributed. But you would get instead . This shows that the conditional expected value is not clearly defined and that it only makes sense to define the conditional expected value for all values of simultaneously, since it can be changed as desired for individual values.
${\ displaystyle Y}$${\ displaystyle X}$${\ displaystyle Y}$${\ displaystyle X = 0}$${\ displaystyle 2}$${\ displaystyle X = 0}$${\ displaystyle 0}$${\ displaystyle X}$${\ displaystyle Y}$${\ displaystyle \ operatorname {E} (Z  X = 0) =  1}$${\ displaystyle \ operatorname {E} (Z  X = 0) =  3}$${\ displaystyle X}$
Kolmogorov's approach
Since the elementary definition cannot be transferred to the general case, the question arises as to which properties one would like to retain and which one is prepared to forego. The approach generally used today, which goes back to Kolmogorow (1933) and which has proven particularly useful in the theory of stochastic processes , requires only two properties:
(1) should be a measurable function of . Applied to σalgebra , this means that it should be a measurable random variable.
${\ displaystyle P (A  X)}$${\ displaystyle X}$${\ displaystyle {\ mathcal {B}} = \ sigma (X)}$${\ displaystyle P (A  {\ mathcal {B}})}$${\ displaystyle {\ mathcal {B}}}$
(2) In analogy to the theorem about total probability, the equation
${\ displaystyle B \ in {\ mathcal {B}}}$
 ${\ displaystyle \ int _ {B} P (A  {\ mathcal {B}}) \, dP \; = \; P (B \ cap A)}$
be fulfilled.
Among other things, it is not required
 that conditional probabilities are clearly defined,
 that there is always a probability measure${\ displaystyle P (\, \ cdot \,  \, {\ mathcal {B}})}$
 the property .${\ displaystyle P (X = x  X = x) = 1}$
For conditional expectation values (2) has the form
 ${\ displaystyle \ int _ {B} \ operatorname {E} (X  {\ mathcal {B}}) \, dP \; = \; \ int _ {B} X \, dP}$
for all sets for which the integrals are defined. With indicator functions , this equation can be written as
${\ displaystyle B \ in {\ mathcal {B}}}$

${\ displaystyle \ operatorname {E} (\ mathrm {1} _ {B} \ operatorname {E} (X  {\ mathcal {B}})) = \ operatorname {E} (\ mathrm {1} _ {B } X)}$.
In this form the equation is used in the following definition.
Formal definition
Smoothing property: here is the uniform distribution on , the σalgebra generated by the intervals with end points 0, ¼, ½, ¾, 1 and the σalgebra generated by the intervals with end points 0, ½, 1. The formation of the conditional expected value causes a smoothing within the ranges described by the σalgebras.${\ displaystyle P}$${\ displaystyle [0,1]}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle {\ mathcal {C}}}$
A probability space and a partial σalgebra are given .
${\ displaystyle (\ Omega, {\ mathcal {A}}, P)}$${\ displaystyle {\ mathcal {B}} \ subset {\ mathcal {A}}}$
Let (1) be a random variable whose expected value exists. The conditional expectation of , given , is a random variable that satisfies the following two conditions:
${\ displaystyle X}$${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle Z}$

${\ displaystyle Z}$is measurable and${\ displaystyle {\ mathcal {B}}}$
 for all true .${\ displaystyle B \ in {\ mathcal {B}}}$${\ displaystyle \ operatorname {E} (\ mathrm {1} _ {B} Z) = \ operatorname {E} (\ mathrm {1} _ {B} X) \,}$
The set of all results (ie all elements of ) with regard to which two conditional expectation values differ from given (“versions of the conditional expectation value”) is a null set ( contained in) . This allows the uniform notation for a conditional expected value of given to be justified.
${\ displaystyle \ Omega}$${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle \ operatorname {E} (X  {\ mathcal {B}})}$${\ displaystyle Z}$${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$
The notation denotes the conditional expected value of , where the σalgebra generated by the random variable is given.
${\ displaystyle \ operatorname {E} (X \,  \, X_ {1}, \ dotsc, X_ {n})}$${\ displaystyle X}$${\ displaystyle Y = (X_ {1}, \ dotsc, X_ {n})}$ ${\ displaystyle {\ mathcal {B}} = \ sigma (Y)}$
(2) The conditional probability of an event , given , is defined as the random variable
${\ displaystyle A \ in {\ mathcal {A}}}$${\ displaystyle {\ mathcal {B}}}$

${\ displaystyle P (A  {\ mathcal {B}}) = \ operatorname {E} (\ mathrm {1} _ {A}  {\ mathcal {B}})}$,
d. H. than the conditional expectation of the indicator function of .
${\ displaystyle A}$
Since the conditional probabilities of different events are thus defined without reference to one another and are not clearly defined, there is generally no need for a probability measure . However, if this is the case, i. H. if the conditional probabilities , to a stochastic core of to be summed up,
${\ displaystyle P (A  {\ mathcal {B}})}$${\ displaystyle A \ in {\ mathcal {A}}}$${\ displaystyle P (\; \ cdot \;  {\ mathcal {B}}) (\ omega)}$${\ displaystyle P (A  {\ mathcal {B}})}$${\ displaystyle A \ in {\ mathcal {A}}}$ ${\ displaystyle \ pi}$${\ displaystyle (\ Omega, {\ mathcal {B}})}$${\ displaystyle (\ Omega, {\ mathcal {A}})}$

${\ displaystyle P (A  {\ mathcal {B}}) (\ omega) = \ pi (\ omega; A)}$ for all ,${\ displaystyle \ omega \ in \ Omega, \, A \ in {\ mathcal {A}}}$
one speaks of regular conditional probability . A concrete version of the conditional expectation is then called an integral
 ${\ displaystyle \ textstyle \ operatorname {E} (X  {\ mathcal {B}}) (\ omega) = \ int \ pi (\ omega; d \ omega ') \, X (\ omega')}$
given.
Factoring:The conditional expectation , which is defined as a random variable (i.e. a function of ), can also be represented as a function of : There is a measurable function such that
${\ displaystyle \ operatorname {E} (X  X_ {1}, \ dotsc, X_ {n})}$${\ displaystyle \ omega}$${\ displaystyle X_ {1}, \ dotsc, X_ {n}}$ ${\ displaystyle f}$

${\ displaystyle \ operatorname {E} (X \,  \, X_ {1}, \ dotsc, X_ {n}) (\ omega) \, = \, f (X_ {1} (\ omega), \ dotsc , X_ {n} (\ omega))}$ for everyone .${\ displaystyle \ omega \ in \ Omega}$
With this one can formally define expected values conditional on individual values:

${\ displaystyle \ operatorname {E} (X \,  \, X_ {1} = x_ {1}, \ dotsc, X_ {n} = x_ {n}) \, = \, f (x_ {1}, \ dotsc, x_ {n})}$.
When using such expressions, special care must be taken because of the lack of uniqueness in the general case.
Existence: The general existence of conditional expectation values for integrable random variables (random variables that have a finite expectation value), in particular of conditional probabilities, follows from the RadonNikodým theorem ; namely, the definition says nothing else than that a density of the signed measure is related to the measure , both defined on the measuring space . The definition can still be generalized slightly, so that cases such as for a Cauchydistributed random variable can also be recorded.
${\ displaystyle \ operatorname {E} (X  {\ mathcal {B}})}$${\ displaystyle \ nu (B) = \ operatorname {E} (\ mathrm {1} _ {B} X)}$${\ displaystyle \ mu (B) = P (B)}$${\ displaystyle (\ Omega, {\ mathcal {B}})}$${\ displaystyle \ operatorname {E} (X  X ) = 0}$
Regular conditional probabilities, also in factored form, exist in Polish spaces with the Borel σalgebra , more generally: If any random variable with values is in a Polish space, then a version of the distribution exists in the form of a stochastic kernel :
${\ displaystyle Z}$${\ displaystyle P (Z \ in \, \ cdot \, \,  X_ {1}, \ dotsc, X_ {n})}$${\ displaystyle \ pi}$

${\ displaystyle P (Z \ in \, \ cdot \, \,  X_ {1}, \ dotsc, X_ {n}) (\ omega) \, = \, \ pi (X_ {1} (\ omega) , \ dotsc, X_ {n} (\ omega) \,; \; \ cdot \;)}$ for all ${\ displaystyle \ omega \ in \ Omega}$
Special cases
(1) The trivial σalgebra results in simple expectation values and probabilities:
${\ displaystyle {\ mathcal {B}} = \ {\ varnothing, \ Omega \}}$

${\ displaystyle \ operatorname {E} (X  {\ mathcal {B}}) (\ omega) = \ operatorname {E} (X)}$ for all ${\ displaystyle \ omega \ in \ Omega}$

${\ displaystyle P (A  {\ mathcal {B}}) (\ omega) = P (A)}$ for all ${\ displaystyle \ omega \ in \ Omega}$
Correspondingly, and for all conditions, the value of a constant random variable applies .
${\ displaystyle \ operatorname {E} (X  Y) (\ omega) = \ operatorname {E} (X)}$${\ displaystyle P (A  Y) (\ omega) = P (A)}$${\ displaystyle \ omega \ in \ Omega}$${\ displaystyle Y}$
(2) Simple σalgebras: If with , and has no subsets in apart from itself and the empty set , then the value of on agrees with the conventional conditional probability :
${\ displaystyle B \ in {\ mathcal {B}}}$${\ displaystyle P (B)> 0}$${\ displaystyle B}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle P (A \ mid {\ mathcal {B}})}$${\ displaystyle B}$

${\ displaystyle P (A  {\ mathcal {B}}) (\ omega) = {\ frac {P (A \ cap B)} {P (B)}}}$ for all ${\ displaystyle \ omega \ in B}$
This shows that the calculations listed above are consistent with the general definition in the discrete case.
(3) Computing with densities: If a bounded density function of the common distribution of random variables is , then
${\ displaystyle f_ {X, Y} \ colon (a, b) \ times (c, d) \ to (0, \ infty)}$${\ displaystyle X, Y}$
 ${\ displaystyle f_ {X \ mid Y} (x, y) = {f_ {X, Y} (x, y) \ over \ int _ {a} ^ {b} f_ {X, Y} (u, y )you}}$
a density of a regular conditional distribution in the factored form and holds for the conditional expectation value
${\ displaystyle P (X \ in \, \ cdot \, \,  Y)}$

${\ displaystyle \ operatorname {E} (X  Y) = \ int _ {a} ^ {b} x \ cdot f_ {X \ mid Y} (x, Y) \, dx}$.
(4) Regular conditional distributions can also be given in the following cases:
 if is independent of , in the form ,${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle P (X \ in \, \ cdot \, \,  {\ mathcal {B}}) = P (X \ in \, \ cdot \,)}$
 if measurable, in the form ( Dirac dimension ),${\ displaystyle X}$ ${\ displaystyle {\ mathcal {B}}}$${\ displaystyle P (X \ in \, \ cdot \, \,  {\ mathcal {B}}) = \ delta _ {X}}$
 for the pair , if measurable, in the form , provided that a regular conditional distribution of is used to compute the expression on the righthand side .${\ displaystyle (X, Y)}$${\ displaystyle X}$ ${\ displaystyle {\ mathcal {B}}}$${\ displaystyle P ((X, Y) \ in \, \ cdot \, \,  {\ mathcal {B}}) = P ((x, Y) \ in \, \ cdot \, \,  {\ mathcal {B}}) \,  _ {x = X}}$${\ displaystyle Y}$
Calculation rules
All of the following statements are only valid almost certainly (  almost everywhere ) as long as they contain conditional expected values. Instead of you can also write a random variable.
${\ displaystyle P}$${\ displaystyle {\ mathcal {B}}}$
 Extracting independent factors:
 Is independent of , then applies .${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle \ operatorname {E} (X  {\ mathcal {B}}) = \ operatorname {E} (X)}$
 Is independent of and from , then applies .${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle Y}$${\ displaystyle \ operatorname {E} (XY  {\ mathcal {B}}) = \ operatorname {E} (X) \, \ operatorname {E} (Y  {\ mathcal {B}})}$
 Are independent, independent, from and from independent, then applies${\ displaystyle X, Y}$${\ displaystyle {\ mathcal {A}}, {\ mathcal {B}}}$${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle Y}$${\ displaystyle {\ mathcal {A}}}$${\ displaystyle \ operatorname {E} (\ operatorname {E} (XY  {\ mathcal {A}})  {\ mathcal {B}}) = \ operatorname {E} (X) \ cdot \ operatorname {E} (Y) = \ operatorname {E} (\ operatorname {E} (XY  {\ mathcal {B}})  {\ mathcal {A}}).}$
 Extracting known factors:
 If measurable, then applies .${\ displaystyle X}$ ${\ displaystyle {\ mathcal {B}}}$${\ displaystyle \ operatorname {E} (X  {\ mathcal {B}}) = X}$
 If measurable, then applies .${\ displaystyle X}$ ${\ displaystyle {\ mathcal {B}}}$${\ displaystyle \ operatorname {E} (XY  {\ mathcal {B}}) = X \, \ operatorname {E} (Y  {\ mathcal {B}})}$
 Total expected value: .${\ displaystyle \ operatorname {E} (\ operatorname {E} (X  {\ mathcal {B}})) = \ operatorname {E} (X)}$
 Tower property: The following applies to partial σalgebras .${\ displaystyle {\ mathcal {B}} _ {1} \ subset {\ mathcal {B}} _ {2} \ subset {\ mathcal {A}}}$${\ displaystyle \ operatorname {E} (\ operatorname {E} (X  {\ mathcal {B}} _ {2})  {\ mathcal {B}} _ {1}) = \ operatorname {E} (X  {\ mathcal {B}} _ {1}) (= \ operatorname {E} (\ operatorname {E} (X  {\ mathcal {B}} _ {1})  {\ mathcal {B}} _ {2}))}$
 Linearity: It applies and for .${\ displaystyle \ operatorname {E} (X_ {1} + X_ {2}  {\ mathcal {B}}) = \ operatorname {E} (X_ {1}  {\ mathcal {B}}) + \ operatorname {E} (X_ {2}  {\ mathcal {B}})}$${\ displaystyle \ operatorname {E} (aX  {\ mathcal {B}}) = a \, \ operatorname {E} (X  {\ mathcal {B}})}$${\ displaystyle a \ in \ mathbb {R}}$
 Monotony: From follows .${\ displaystyle X_ {1} \ leq X_ {2}}$${\ displaystyle \ operatorname {E} (X_ {1}  {\ mathcal {B}}) \ leq \ operatorname {E} (X_ {2}  {\ mathcal {B}})}$

Monotonous Convergence : Off and Follows .${\ displaystyle X_ {n} \ uparrow X}$${\ displaystyle \ operatorname {E} (X_ {1}  {\ mathcal {B}})>  \ infty}$${\ displaystyle \ operatorname {E} (X_ {n}  {\ mathcal {B}}) \ uparrow \ operatorname {E} (X  {\ mathcal {B}})}$

Dominated convergence : out and with follows .${\ displaystyle X_ {n} \ to X}$${\ displaystyle  X_ {n}  \ leq Y}$${\ displaystyle \ operatorname {E} (Y  {\ mathcal {B}}) <\ infty}$${\ displaystyle \ operatorname {E} (X_ {n}  {\ mathcal {B}}) \ to \ operatorname {E} (X  {\ mathcal {B}})}$

Lemma of Fatou : From it follows .${\ displaystyle \ textstyle \ operatorname {E} (\ inf _ {n} X_ {n}  {\ mathcal {B}})>  \ infty}$${\ displaystyle \ textstyle \ operatorname {E} (\ liminf _ {n \ to \ infty} X_ {n}  {\ mathcal {B}}) \ leq \ liminf _ {n \ to \ infty} \ operatorname {E } (X_ {n}  {\ mathcal {B}})}$

Jensen's inequality : is a convex function , the following applies .${\ displaystyle f \ colon \ mathbb {R} \ rightarrow \ mathbb {R}}$${\ displaystyle f (\ operatorname {E} (X  {\ mathcal {B}})) \ leq \ operatorname {E} (f (X)  {\ mathcal {B}})}$
 Conditional expectation values as projections: The previous characteristics (in particular the extraction of known factors and the tower property) to imply for messbares${\ displaystyle L ^ {2}}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle Y}$

${\ displaystyle \ operatorname {E} (Y (X \ operatorname {E} (X  {\ mathcal {B}}))) = 0}$,
 d. H. the conditional expectation is in the sense of the scalar product of L ^{2} ( P ) the orthogonal projection of the subspace of the measurable functions, i. H. is the best approximation of by a measurable function of . The definition and the proof of the existence of the conditional expectation can also be built on the theory of Hilbert spaces and the projection theorem using this approach .${\ displaystyle \ operatorname {E} (X  {\ mathcal {B}})}$^{}${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle \ operatorname {E} (X  {\ mathcal {B}})}$${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$${\ displaystyle X}$

Conditional variance : With the help of conditional expectation values, analogous to the definition of the variance as the mean square deviation from the expected value, the conditional variance can also be considered. The shift rate applies${\ displaystyle \ operatorname {Var} (X \ mid Y) = \ operatorname {E} {\ bigl (} (X \ operatorname {E} (X \ mid Y)) ^ {2} \ mid Y {\ bigr )}}$
 ${\ displaystyle \ operatorname {Var} (X \ mid Y) = \ operatorname {E} (X ^ {2} \ mid Y)  {\ bigl (} \ operatorname {E} (X \ mid Y) {\ bigr )} ^ {2}}$
 as well as the socalled variance decomposition

${\ displaystyle \ operatorname {Var} (X) = \ operatorname {E} (\ operatorname {Var} (X \ mid Y)) + \ operatorname {Var} (\ operatorname {E} (X \ mid Y))}$.

Martingale convergence : For a random variable which has a finite expectation, the following applies if either is an increasing sequence of partial σalgebras and or if is a decreasing sequence of partial σalgebras and .${\ displaystyle X}$${\ displaystyle \ operatorname {E} (X  {\ mathcal {B}} _ {n}) \ to \ operatorname {E} (X  {\ mathcal {B}})}$${\ displaystyle {\ mathcal {B}} _ {1} \ subset {\ mathcal {B}} _ {2} \ subset \ dotsb}$${\ displaystyle \ textstyle {\ mathcal {B}} = \ sigma (\ bigcup _ {n = 1} ^ {\ infty} {\ mathcal {B}} _ {n})}$${\ displaystyle {\ mathcal {B}} _ {1} \ supset {\ mathcal {B}} _ {2} \ supset \ dotsb}$${\ displaystyle \ textstyle {\ mathcal {B}} = \ bigcap _ {n = 1} ^ {\ infty} {\ mathcal {B}} _ {n}}$
Further examples
(1) We consider the example from the discrete case above. and let the numbers of pips in two independent throws with a regular die and the sum of the pips. The calculation of the conditional expectation of , given , is simplified with the help of the calculation rules; initially applies
${\ displaystyle X}$${\ displaystyle Y}$${\ displaystyle Z = X + Y}$${\ displaystyle Z}$${\ displaystyle X}$

${\ displaystyle \ operatorname {E} (Z  X) = \ operatorname {E} (X + Y  X) = \ operatorname {E} (X  X) + \ operatorname {E} (Y  X)}$.
Because is a measurable function of and is independent of , and . So we get
${\ displaystyle X}$${\ displaystyle X}$${\ displaystyle Y}$${\ displaystyle X}$${\ displaystyle \ operatorname {E} (X  X) = X}$${\ displaystyle \ operatorname {E} (Y  X) = \ operatorname {E} (Y)}$

${\ displaystyle \ operatorname {E} (Z  X) = X + \ operatorname {E} (Y) = X + 3 {,} 5}$.
(2) If and are independent and Poisson distributed with parameters and , then the conditional distribution of , given , is a binomial distribution with parameters and , that is
${\ displaystyle X}$${\ displaystyle Y}$${\ displaystyle \ lambda}$${\ displaystyle \ mu}$${\ displaystyle X}$${\ displaystyle X + Y = n}$${\ displaystyle n}$${\ displaystyle \ textstyle p = {\ frac {\ lambda} {\ lambda + \ mu}}}$
 ${\ displaystyle P (X = k \ mid X + Y = n) \, = \, {\ begin {cases} {\ binom {n} {k}} \, p ^ {k} \, (1p ) ^ {nk} & {\ text {if}} k = 0, \ dotsc, n \\ 0 & {\ text {otherwise}}. \ end {cases}}}$
So it is and so .
${\ displaystyle \ operatorname {E} (X \ mid X + Y = n) = np = {\ tfrac {\ lambda n} {\ lambda + \ mu}}}$${\ displaystyle \ operatorname {E} (X \ mid X + Y) = {\ tfrac {\ lambda} {\ lambda + \ mu}} (X + Y)}$
literature
 Achim Klenke: Probability Theory . 3. Edition. SpringerVerlag, Berlin Heidelberg 2013, ISBN 9783642360176 .
 Christian Hesse: Applied probability theory . 1st edition. Vieweg, Wiesbaden 2003, ISBN 3528031832 .
References and comments

^ Olav Kallenberg: Foundations of Modern Probability. 2nd edition. Springer, New York 2002, ISBN 0387953132 , p. 573.

↑ ^{a } ^{b} Very generally, for example, you can bet almost anywhere.${\ displaystyle \ textstyle \ operatorname {E} (Y  \ dotso) = \ lim _ {n \ to \ infty} {\ frac {1} {2 ^ {n}}} \ sum _ {k = 1} ^ {\ infty} P (Y \ geq {\ frac {k} {2 ^ {n}}}  \ dotso)  {\,}}$${\ displaystyle \ textstyle \, \ lim _ {n \ to \ infty} {\ frac {1} {2 ^ {n}}} \ sum _ {k = 1} ^ {\ infty} P (Y \ leq  {\ frac {k} {2 ^ {n}}}  \ dotso)}$

↑ This factorization is always possible as a measurable function. It is generally ambiguous unless it is surjective .${\ displaystyle X}$

↑ The mathematical formulation is based on the following abstraction of the term “known”: If the realization of a random variable or of events is known, not every dependent variable, but only every measurably dependent variable is also known automatically (or more precisely only those which have a generate σalgebra which is a subset of the others). In this sense, σalgebras are suitable for describing available information : The σalgebra consists of the events, the realization of which is known in principle after receiving the information about the value of . The set is generally assumed to be a σalgebra.${\ displaystyle \ sigma (X)}$${\ displaystyle X}$${\ displaystyle {\ mathcal {B}}}$

↑ A. Kolmogoroff: Basic concepts of the calculation of probability . Springer, Berlin 1933. In the introduction to the book, the theory of conditional probabilities and expectations is mentioned as a major innovation. For the definition of the conditional probability with respect to a random variable Kolmogorow (p. 42) uses the equation , i. H. which should be fulfilled for every choice of with ( the elementary definition is used for the condition on ). In the subsequent proof of existence and uniqueness, Kolmogorov shows that after multiplication with the left side of the equation agrees with, the right side with , which corresponds to the expressions given above, but he then works on the level of the image space of further. The procedure is similar for conditional expectations.${\ displaystyle u}$${\ displaystyle {\ mathsf {P}} _ {\ {u \ subset A \}} (B) = {\ mathsf {E}} _ {\ {u \ subset A \}} {\ mathsf {P}} _ {u} (B)}$${\ displaystyle P (B \,  \, \ {u \ in A \}) = \ operatorname {E} (P (B  u) \,  \, \ {u \ in A \})}$${\ displaystyle A}$${\ displaystyle P (u \ in A)> 0}$${\ displaystyle \ {u \ in A \}}$${\ displaystyle P (u \ in A)}$${\ displaystyle P (B \ cap \ {u \ in A \})}$${\ displaystyle \ textstyle \ int _ {\ {u \ in A \}} P (B  u) \, dP}$${\ displaystyle u}$