Birthday paradox

from Wikipedia, the free encyclopedia

The birthday paradox, sometimes referred to as the birthday problem, is an example of how certain probabilities (and also coincidences ) are intuitively often incorrectly estimated:

"If there are at least 23 people in a room, then the chance that two or more of these people will have birthdays on the same day (regardless of the year of birth) is greater than 50%."

The wrong estimate of the probability occurs because the birthday paradox asks how likely it is that any two people from a group will have a birthday on any one and the same day of the year. The problem is often mistakenly interpreted as "how likely it is that a certain person in a group will have a birthday on a certain day of the year" (e.g. matching the birthday of another, additional person), and this probability is real significantly smaller.

The paradox is often attributed to Richard von Mises , e.g. B. by Persi Diaconis and Frederick Mosteller . According to Donald E. Knuth , this origin is not certain: the birthday paradox was informally discussed among mathematicians as early as the 1930s, but an exact originator cannot be determined.

Containment

Question: What is the probability that in 23 people at least two of them will have their birthdays on the same day of the year ?

The answer is perplexing to most and is therefore perceived as paradoxical . Most people estimate the probability incorrectly by a power of ten. It is not (as is usually estimated) between 1% and 5%, but rather over 50%, in 50 people even over 97%.

In contrast to this is the likelihood that someone at a certain his birthday day (without consideration of the vintage): If, for example, takes one of the 23 people and calls for someone with exactly this has a birthday on the same day. If the birthday of one of the people present determines the specific day, another 231 people (i.e. a total of 254 people) are necessary to achieve a probability of 50% (see binomial distribution ).

The reason for this big difference is that people can be made into different pairs; the number of possible pairs therefore rises faster and faster with the number of people in the group - when the th person joins the number of pairs increases . The condition for the event in question is already met if one of these couples has a birthday on the same day. Since the probability of having a birthday on the same day is the same for every couple and the number of couples increases with increasing number of people, the probability that two people in the group have birthdays on the same day also increases with increasing number Group size increases faster and faster.

Unevenly distributed birthdays

In reality, not all due dates are equally likely. B. more children are born in summer than in winter. This slightly increases the likelihood that two people will have birthdays on the same day. However, simulations show that even for real data, the probability that two people have their birthdays on the same day still exceeds 50% for 23 people. Even taking the leap day neglected in the derivation into account does not change this.

Importance in cryptography

This effect is important for cryptographic hash functions that are supposed to produce a unique check value from a text. It is much easier to find two random texts that have the same check value than to find another text for a given text that has the same check value (see collision attack ).

Mathematical derivations

In the following, February 29 is neglected and it is assumed that the birthdays of the people are independent, identically distributed random variables from the discrete uniform distribution on the 365-element set . This assumption is not fulfilled, for example, if there are twins among the people present .

In the urn model , this assumption corresponds to a drawing of balls with replacement from an urn, the 365 balls labeled “1. January "," 2. January "etc. to" 31. December ”contains.

Probability that at least two people have birthdays on the same day

The number of all possible combinations is for people , with all cases being equally likely. For example, there are possible cases of birthday combinations for two people .

= Probability of at least two birthdays = probability that at least one birthday with
your coincides

Of these possible cases include

only different birthdays. For the first person, the birthday can be freely chosen, for the second there are 364 days on which the first is not the birthday, etc.

This results in the probability of using Laplace's formula

that all people have their birthdays on different days.

The probability of at least one double birthday in the course of a year is therefore:

For results:

According to the drawer principle (neglecting February 29th) the probability is 1 for everyone , so there are definitely two people with the same birthday. If February 29th is not neglected as a birthday, then this only applies from .

An approximation

The expression for P can be further transformed:

This can be approximated well with the Stirling formula

which can easily be evaluated with a calculator.

In a group of 23 people you have to make various comparisons to get a complete overview of whether there are common birthdays, and if so, how many.

Probability for a given day

Another question arises if you are not looking for random matches of birthdays, but for a match with a fixed selected day in the year.

If you ignore February 29th, as before, the probability of a person having a birthday on such a certain day is 1/365 ≈ 0.27%.

The probability for the opposite, i.e. the probability of not having a birthday on a certain day , is thus

If there are two people, the probability that neither of them has a birthday on the previously selected day is the same (as before, we assume that the birthdays of the people are independent).

Having at least one hit (at least one person out of two has their birthday on a certain day) is again the opposite probability:

Continuing in this way for larger numbers of people one obtains: The probability that at least one person of the people present has a birthday on a certain day is

This can be used to calculate how many people you need to achieve a certain probability that at least one person will have a birthday on a certain day:

For a probability of 50% you need

People. As with the previous problem, 253 people have to make 253 comparisons with the specific date in order to have a complete overview of the situation.

Finally, in the event that one of the people present has a birthday, the probability that at least one of the other people has a birthday on the same day is calculated as

In contrast to the probability that at least two people have birthday on one day (see above), there is no one for which a reliable statement can be made: For every number of people there is the possibility that the selected day does not appear as a birthday (that The drawer principle is not applicable). For all true

Probability that exactly two people have birthdays on the same day

With this problem, the specific event is: "2 people have their birthday on the same day, everyone else on different days." There are 365 possibilities for the day of the double birthday. The two people can be selected in ways. The remaining people are distributed one after the other over the remaining 364 days in such a way that there is no further multiple occupancy. There are ways to do that . After that there are still days of the year left on which nobody has a birthday. Overall, favorable cases are obtained for the occurrence of the event . The probability that the event will occur is because all 365 days of the year are assumed to be equally likely.

The probabilities represent a sequence of numbers depending on , which increases strictly monotonously to . There the probability is around 38.6%. After that, the sequence is strictly monotonous. Ab is the probability 0, since the event can no longer occur in these cases because there are multiple birthdays or several double birthdays.

Exemplary explanation of the occurrence of the apparent paradox

As with many problems in combinatorics and probability, the exact context or the course of the experiment is also important here. Let us think of the following experiments. To make things easier, a year always has exactly 365 days.

A specific person on a specific day

Peter's birthday is on January 19th. Peter has 365 friends who each have their birthdays on a different day . The probability that one of his friends will celebrate his birthday on January 19th is . With two friends this probability is already . With every further friend the probability increases by until finally with 365 friends the probability is.

Any person on any day

Let's change the experiment so that the specific birthday (here: January 19th) of a specific person (here: Peter) is not required. This time it was Peter's birthday and that of his friends on any day. In this experiment we ask about the probability that any person in a room will have a birthday together on any day. To this end, we will initially only determine the probability in a rough calculation. One by one, we will bring Peter's friends to the experiment. The probability that his friend Ulf will celebrate his birthday on the same day is . For another friend called Rainer, the approximated probability is already . The probability increases by because Rainer could have a birthday together with Peter or Ulf. For the next person called Robert, the estimated probability is accordingly already . The probability increases rapidly compared to the previous experiment. The apparent paradox arises from the fact that with each additional person the potential opportunities for possible joint birthdays increase.

However, these are approximate values. So far, the possibility has not been taken into account that some people in this group of people might have their birthdays together. If Rainer is involved in the experiment, the probability does not increase by , because we have to take into account that Ulf and Peter may already be celebrating their birthdays together. The probability we are looking for is therefore somewhat smaller than . With Robert, the probability is again a little below the value of , since some of the people present (Peter, Ulf and Rainer) may already have birthdays together.

Related questions

In the game of memory , the pairs under cards (consisting of pairs) must be revealed. At the start of the game, all cards are face down, and as long as different cards are revealed, players only have a chance to find a pair at random. Therefore the question arises - similar to the birthday paradox - how many cards one has to reveal in order to get at least one pair with a certain probability (e.g. 50%).

The number of different motifs here corresponds to the number of days in the year (365) in the birthday paradox. Memory is usually played with 32 pairs, but there are also other variants, so it makes sense to keep the number variable.

If one sets the probability of only revealing different cards by revealing cards, then the following applies:

The result is for When 10 cards are revealed, the probability is greater than 50% to get at least one pair ( ). For , the limit is 12 cards. With a hypothetical memory with 183 pairs you have to uncover 23 cards, with 365 pairs 32 cards are necessary.

This result has important practical implications for the game, as players would lose interest if it took too long to reveal the first pair.

See also

  • The collecting picture problem addresses a similar question. This is - transferred to the observation of birthdays in a group of people - about how many people have to be selected so that each day of the year appears as a birthday for one of the people.
  • The goat problem is often cited as an example of how the human mind is prone to fallacies when it comes to determining probabilities.
  • The Lincoln-Kennedy mystery is also a phenomenon that has to do with the consistency of biographical data.

Web links

Wiktionary: birthday paradox  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. Richard von Mises: About division and occupation probabilities. Revue de la Faculté de Sciences de l'Université d'Istanbul NS, 4. 1938–39, pp. 145–163.
  2. ^ P. Diaconis, F. Mosteller: Methods for Studying Coincidences. In: Journal of the American Statistical Association. 84, 4, pp. 853-861.
  3. ^ Donald E. Knuth: The Art of Computer Programming . Vol. 3: Sorting and Searching. Second Edition, ISBN 0-201-89685-0 . P. 513.
  4. Emma Hawe, Alison Macfarlane and John Bithell: Daily and seasonal variation in live births, stillbirths and infant mortality in England and Wales, 1979-96. In: Health Statistics Quarterly. 9 Spring 2001 (PDF; 180 kB), p. 7: "There was a clear seasonal pattern in the number of daily live births throughout the entire period, with lower numbers of births in the winter than the summer months."
  5. D. Bloom (1973): A birthday problem. American Mathematical Monthly, Vol. 80, pp. 1141–1142 contains evidence using Lagrange multipliers that birthdays that are not evenly spaced increases the likelihood that two people will have birthdays on the same day.
  6. Stefan Kirchner in de.sci.mathematik , November 3, 2005.
  7. Hugo doorman in de.sci.mathematik , January 22 of 2005.
  8. ↑ It is no coincidence that at 183 (≈ 365/2) you get the same number as for the birthday paradox: The product representation for the probability shows (at least for the first factors) a great deal of similarity.