# Conditional probability

Conditional probability (also conditional probability ) is the probability of the occurrence of an event under the condition that the occurrence of another event is already known. It is written as. The vertical line is to be read as “under the condition” and is to be understood as follows: If the event has occurred, the possibilities are limited to the results in . This also changes the probability; this new probability for the event is given by . The conditional probability can therefore be interpreted as a reassessment of the probability of when the information is available that the event has already occurred. Sometimes the spelling is also used, but it can also have other meanings. ${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle P (A \ mid B)}$${\ displaystyle B}$${\ displaystyle B}$${\ displaystyle A}$${\ displaystyle P (A \ mid B)}$${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle P_ {B} (A)}$

For a generalized, abstract notion of conditional probabilities, see conditional expectation .

## Motivation and Definition

Sometimes one would like to investigate how strong the statistical influence of one variable is on another. For example, you want to know whether smoking ( ) is carcinogenic ( ). The logical implication would require that the conclusion applies to all instances, i.e. that every smoker gets cancer. A single smoker who does not get cancer would falsify the statement "Smoking causes cancer with logical certainty" or "Every smoker gets cancer". Nevertheless, although there are smokers without cancer, there is a statistical connection between these two events: the likelihood of developing cancer is increased among smokers. This probability is the conditional probability that someone will get cancer, provided they are a smoker. ${\ displaystyle R}$${\ displaystyle K}$${\ textstyle R \ Rightarrow K}$${\ displaystyle P (K \ mid R)}$

The probability that someone smokes under the condition that he has cancer can now also be examined stochastically. In the calculation of probability it should be noted that the term condition is not linked to a causal or temporal relationship. The conditional probability is a measure of how strong the statistical influence of on is. It can be viewed as a stochastic measure of how likely the conclusion is. However, like all statistical quantities, it says nothing about the possible causality of the connection. ${\ displaystyle R}$${\ displaystyle K}$${\ displaystyle R \ Rightarrow K}$

With this motivation one arrives at the following definition:

If and are and is arbitrary events , then the conditional probability of , provided (also: the probability of under the condition ), noted as (with a straight line between and ), is defined by: ${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle P (B)> 0}$${\ textstyle A}$${\ displaystyle B}$${\ textstyle A}$${\ textstyle B}$${\ displaystyle P (A \ mid B)}$${\ displaystyle A}$${\ displaystyle B}$

${\ displaystyle P (A \ mid B) = {\ frac {P (A \ cap B)} {P (B)}}}$

Therein is the probability that and occur together. is called joint probability, joint probability or intersection probability. denotes the set- theoretical intersection of the events and . ${\ displaystyle P (A \ cap B)}$${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle P (A \ cap B)}$${\ displaystyle A \ cap B}$${\ displaystyle A}$${\ displaystyle B}$

## Multiplication rate

The decision tree illustrated .${\ displaystyle P (A \ cap B) = P \ left (A \ mid B \ right) P (B)}$

By transforming the definition formula, the multiplication theorem for two events is created:

${\ displaystyle P (A \ cap B) = P (A \ mid B) \ cdot P (B).}$

If one generalizes the above expression of the multiplication theorem, which is valid for two events, one obtains the general multiplication theorem. Consider the case with random events . ${\ displaystyle n}$${\ displaystyle A_ {1}, A_ {2}, \ dotsc, A_ {n}}$

{\ displaystyle {\ begin {aligned} P \ left (\ bigcap _ {k = 1} ^ {n} A_ {k} \ right) & = P \ left (A_ {1} \ right) \ prod _ {k = 2} ^ {n} P \ left (A_ {k} \ mid \ bigcap _ {j = 1} ^ {k-1} A_ {j} \ right) \\\\ P \ left (A_ {1} \ cap A_ {2} \ cap \ dotsb \ cap A_ {n} \ right) & = P \ left (A_ {1} \ right) \ cdot {\ frac {P \ left (A_ {1} \ cap A_ { 2} \ right)} {P \ left (A_ {1} \ right)}} \ cdot {\ frac {P \ left (A_ {1} \ cap A_ {2} \ cap A_ {3} \ right)} {P \ left (A_ {1} \ cap A_ {2} \ right)}} \ dotsm {\ frac {P \ left (A_ {1} \ cap \ dotsb \ cap A_ {n} \ right)} {P \ left (A_ {1} \ cap \ dotsb \ cap A_ {n-1} \ right)}} \\\\ & = P (A_ {1}) \ cdot P \ left (A_ {2} \ mid A_ {1} \ right) \ cdot P \ left (A_ {3} \ mid A_ {1} \ cap A_ {2} \ right) \ dotsm P \ left (A_ {n} \ mid A_ {1} \ cap \ dotsb \ cap A_ {n-1} \ right) \ end {aligned}}}

Calculating with a decision tree is particularly clear here , as the diagram “takes into account” as it were: The data are easy to use and sequentially lead to the correct calculation process.

## Law of total probability

If only conditional probabilities and the probabilities of the conditional event are known, the total probability of results from ${\ displaystyle A}$

${\ displaystyle P (A) = P \ left (A \ mid B \ right) \ cdot P (B) + P \ left (A \ mid B ^ {\ mathrm {c}} \ right) \ cdot P \ left (B ^ {\ mathrm {c}} \ right),}$

where the counter event to denotes. ${\ displaystyle B ^ {\ mathrm {c}}}$${\ displaystyle B}$

There is also a generalization here. Given are events with for all that form a partition of the probability space, i. i.e., they are pairwise disjoint and . Then: ${\ displaystyle B_ {1}, B_ {2}, \ dotsc}$${\ displaystyle P (B_ {j})> 0}$${\ displaystyle j}$${\ displaystyle \ Omega}$${\ displaystyle \ bigcup \ limits _ {j = 1} ^ {\ infty} B_ {j} = \ Omega}$

${\ displaystyle P (A) = \ sum _ {j = 1} ^ {\ infty} P \ left (A \ mid B_ {j} \ right) \ cdot P \ left (B_ {j} \ right)}$.

## Stochastic independence

If and only if and are stochastically independent , then : ${\ displaystyle A}$${\ displaystyle B}$

${\ displaystyle P (A \ cap B) = P (A) \ cdot P (B)}$,

${\ displaystyle P (A \ mid B) = {\ frac {P (A) \ cdot P (B)} {P (B)}} = P (A)}$ or .${\ displaystyle P (A \ mid B) = P (A \ mid B ^ {\ mathrm {c}})}$

In other words, regardless of whether the event occurred or not, the probability of the event is always the same. ${\ displaystyle B}$${\ displaystyle A}$

## Bayes' theorem

For the relationship between and , Bayes’s theorem results directly from the definition and the multiplication law : ${\ displaystyle P (A \ mid B)}$${\ displaystyle P (B \ mid A)}$

${\ displaystyle P (A \ mid B) = {\ frac {P (A \ cap B)} {P (B)}} = {\ frac {P (B \ cap A)} {P (B)}} = {\ frac {P (B \ mid A) \ cdot P (A)} {P (B)}}}$.

The law of the total probability can be calculated in the denominator. ${\ displaystyle P (B)}$

## Continuous random variable

For two random variables , with common density , a density of represented by ${\ displaystyle X}$${\ displaystyle Y}$ ${\ displaystyle f_ {X, Y}}$${\ displaystyle f_ {Y}}$${\ displaystyle Y}$

${\ displaystyle f_ {Y} (y) = \ int f_ {X, Y} (x, y) \, dx}$.

If so, one can define a conditional density of , given (or assumed) the event , by ${\ displaystyle f_ {Y} (y)> 0}$ ${\ displaystyle f_ {X} (\, \ cdot \, | Y = y)}$${\ displaystyle X}$${\ displaystyle \ {Y = y \}}$

${\ displaystyle f_ {X} (x | Y = y) \, = \, {\ frac {f_ {X, Y} (x, y)} {f_ {Y} (y)}}}$.

Instead of writing , for the conditional density. The latter formula should not be understood like the density of a random variable . ${\ displaystyle f_ {X} (x | Y = y)}$${\ displaystyle f_ {X | Y} (x, y)}$${\ displaystyle X | Y}$

The (one) simultaneous density of and is then obtained from the formula ${\ displaystyle X}$${\ displaystyle Y}$

${\ displaystyle \, \! f_ {X, Y} (x, y) = f_ {Y} (y) f_ {X} (x | Y = y)}$.

From this a form of the law of total probability can be derived:

${\ displaystyle f_ {X} (x) = \ int f_ {X, Y} (x, y) \, dy = \ int f_ {Y} (y) f_ {X} (x | Y = y) \, dy.}$

This process is known as marginalization .

It should be noted that densities that deliver the same integral values represent the same probability distribution as standard . Densities are therefore not clearly defined. An acceptable choice for , and is any measurable function , which in the integral the correct probabilities , or for any , results. The function must ${\ displaystyle f_ {X, Y}}$${\ displaystyle f_ {X}}$${\ displaystyle f_ {Y}}$${\ displaystyle P (X \ in A, Y \ in B)}$${\ displaystyle P (X \ in A)}$${\ displaystyle P (Y \ in B)}$${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle f_ {X} (\ cdot | Y = \ cdot)}$

${\ displaystyle P (X \ in A, Y \ in B) = \ int _ {B} f_ {Y} (y) \ int _ {A} f_ {X} (x | Y = y) \, dx \ , dy}$

fulfill. The formulas given above therefore only apply if the various densities are appropriately selected.

## Examples

Depending on the degree of overlap of two events and , therefore, the size of the intersection , the entry of event can the probability that event has occurred, increase or decrease, up to 1 ( is almost certainly occurred) or to 0 ( almost certainly did not occur). ${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle A \ cap B}$${\ displaystyle B}$${\ displaystyle A}$${\ displaystyle A}$${\ displaystyle A}$

### Examples with dice

The examples below always relate to throws with a fair standard die. The spelling denotes an event that a one, a two or a three was rolled on a roll. ${\ displaystyle A = \ {1,2,3 \}}$${\ displaystyle A}$

#### special cases

• ${\ displaystyle A \ cap B = \ emptyset}$
${\ displaystyle A}$and have no common elements. When it occurs, it can no longer occur and vice versa.${\ displaystyle B}$${\ displaystyle B}$${\ displaystyle A}$
Example:
Event Event When occurs (i.e. a four, a five or a six is ​​rolled), it is certainly no longer possible.${\ displaystyle A = \ {1,2,3 \},}$${\ displaystyle B = \ {4,5,6 \}.}$${\ displaystyle B}$${\ displaystyle A}$
${\ displaystyle P (A \ mid B) = {\ frac {P (A \ cap B)} {P (B)}} = {\ frac {0} {P (B)}} = 0}$.
• ${\ displaystyle A \ cap B = A}$
The event is a subset of the event .${\ displaystyle A}$${\ displaystyle B}$
Example:
Event event${\ displaystyle A = \ {1,2 \},}$${\ displaystyle B = \ {1,2,3 \}.}$
${\ displaystyle P (A \ mid B) = {\ frac {P (A \ cap B)} {P (B)}} = {\ frac {P (A)} {P (B)}} = {\ frac {2/6} {3/6}} = {\ frac {2} {3}}}$.
The probability of (here a priori ) increases in this case inversely proportional to the probability of (here , the probability increases here by a factor of 2).${\ displaystyle A}$${\ displaystyle P (A) = {\ tfrac {1} {3}}}$${\ displaystyle B}$${\ displaystyle P (B) = {\ tfrac {1} {2}}}$
In this case, knowledge of the absolute probabilities and is sufficient to calculate the conditional probability of under the condition . The exact size of the intersection does not need to be known.${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle P (A)}$${\ displaystyle P (B)}$${\ displaystyle A \ cap B}$
• ${\ displaystyle A \ cap B = B}$
The event is a superset of the event or the event is a subset of the event .${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle B}$${\ displaystyle A}$
Example: If has occurred, must therefore also have occurred.${\ displaystyle A = \ {1,2,3 \}, B = \ {1,2 \}.}$${\ displaystyle B}$${\ displaystyle A}$
${\ displaystyle P (A \ mid B) = {\ frac {P (A \ cap B)} {P (B)}} = {\ frac {P (B)} {P (B)}} = 1}$.

#### General case

More generally, in the Laplace experiment one needs the number of elements of the intersection to calculate the conditional probability of under the condition${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle A \ cap B.}$

The event of throwing at least a four (i.e. 4 or higher) has a priori probability . ${\ displaystyle A = \ {4,5,6 \}}$${\ displaystyle P (A) = {\ tfrac {1} {2}}}$

If it is now known that an even number was rolled, i.e. that the event has occurred, then the conditional probability results for under the condition due to ${\ displaystyle B = \ {2,4,6 \}}$${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle A \ cap B = \ {4,6 \}}$

${\ displaystyle P (A \ mid B) = {\ frac {P (A \ cap B)} {P (B)}} = {\ frac {2/6} {3/6}} = {\ frac { 2} {3}}}$.

The conditional probability in this case is higher than the initial probability.

If an odd number is rolled, so the event has occurred, is the conditional probability of the condition for equal ${\ displaystyle B = \ {1,3,5 \}}$${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle A \ cap B = \ {5 \}}$

${\ displaystyle P (A \ mid B) = {\ frac {P (A \ cap B)} {P (B)}} = {\ frac {1/6} {3/6}} = {\ frac { 1} {3}}}$.

The conditional probability in this case is smaller than the a priori probability.

The event has a priori probability . When we know that the event has occurred, the probability of due to changes ${\ displaystyle A = \ {1,2,3,4 \}}$${\ displaystyle P (A) = {\ tfrac {2} {3}}}$${\ displaystyle B = \ {3,4,5,6 \}}$${\ displaystyle A}$${\ displaystyle A \ cap B = \ {3,4 \}}$

${\ displaystyle P (A \ mid B) = {\ frac {P (A \ cap B)} {P (B)}} = {\ frac {2/6} {4/6}} = {\ frac { 1} {2}}}$.

In this example too, the occurrence of the event makes the event less likely, i.e. That is, the probability that the event occurred as a result of the throw has become smaller than the a priori probability, because the event in any case occurred as a result of the throw . ${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle A}$${\ displaystyle B}$

### Throwing machine

Conditional probability as partial areas

An illustrative example enables conditional probabilities to be understood directly using set diagrams . A throwing machine is considered that randomly throws objects (e.g. balls, darts) onto a certain surface (e.g. a wall) so that every location on the wall is hit with the same probability. The function assigns the area or a certain partial area of the wall (e.g. any circle marked with a pen) to its area or area. Then the probability that the projectile in strikes is proportional to the ratio of the partial area to the total, so . ${\ displaystyle M}$${\ displaystyle F}$${\ displaystyle M}$${\ displaystyle A}$${\ displaystyle F (M)}$${\ displaystyle F (A)}$${\ displaystyle P (A)}$${\ displaystyle A}$${\ displaystyle P (A) = F (A) / F (M)}$

Now it is also assumed that the projectile hit within another partial area that overlaps with the partial area. Then the probability that the projectile in impinges . The conditional probability that the projectile also hits within the overlapping partial area at the same time under the additional condition is proportional to the area of ​​that part of the area that is also in , i.e. the area of the intersection . Conversely, for an equally large intersection, the less likely that a projectile hitting in will also hit in, the larger it was assumed. So is inversely proportional to . ${\ displaystyle B}$${\ displaystyle A}$${\ displaystyle P (B)}$${\ displaystyle B}$${\ displaystyle P (B) = F (B) / F (M)}$${\ displaystyle P (A \ mid B)}$${\ displaystyle A}$${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle F (A \ cap B)}$${\ displaystyle A \ cap B}$${\ displaystyle A \ cap B}$${\ displaystyle B}$${\ displaystyle A \ cap B}$${\ displaystyle F (B)}$${\ displaystyle P (A \ mid B)}$${\ displaystyle P (B)}$

Thus, the probability of an impact in with an additionally assumed impact in results as a conditional probability , i.e. by definition. ${\ displaystyle A}$${\ displaystyle B}$${\ displaystyle P (A \ mid B) = F (A \ cap B) / F (B) = P (A \ cap B) / P (B)}$

### Further examples

• For example, the conditional probability (the earth is wet when it rains) is usually high, because assuming that it rains at a point in time, one should expect the earth to get wet. So conditional probability asks how likely one event is if I already know another. In our example, I know it's raining and I wonder what the likelihood is that the earth is wet. Obviously, the conditional probability is different from the unconditional.${\ displaystyle P ({\ text {the earth is wet}} \ mid {\ text {it's raining}})}$
• The probability that someone who speaks French is French is neither the same as the probability that someone who is French also speaks French, nor do the two probabilities add up to 100%.
• People v. Collins (1968): In this criminal case in California, one of the reasons that a defendant was wrongly convicted of a bank robbery was that, according to witness statements, the perpetrator - like the defendant - had both a beard and a mustache, which was considered rare. Whoever wears a beard, however, very often also wears a mustache - the court did not base its wrong judgment on the conditional probabilities, as it would have been correct.
• Sports draws: In 2013, two German and two Spanish teams made it to the semi-finals of the Champions League . The probability that a purely German and a purely Spanish semi-final will be drawn in this constellation is a third, not around fifty percent. We are looking for the probability that the second German (Spanish) club will be drawn as the second club, provided that a German (Spanish) club is the first to be drawn from the lottery pot. But if a German (Spanish) club was drawn as the first team, only one of the three teams remaining in the lottery wheel is also German (Spanish). Therefore the probability we are looking for is 13 . This can also be seen in the fact that in this case six pairings are possible. The option of a purely German (Spanish) final game is opposed to two other options.
This simple case can also be solved elementarily without any conditional probability: Each of the four teams will be given one of the other three teams with the same probability. Only one of these three teams comes from the same country. So the probability we are looking for is 13 .
• In medicine, there is often only a limited probability (conditionality) for the cause (causality) or etiology of a disease.