Poisson distribution
The Poisson distribution (named after the mathematician Siméon Denis Poisson ) is a probability distribution that can be used to model the number of events that occur independently of one another at a constant mean rate in a fixed time interval or spatial area. It is a univariate discrete probability distribution , which represents a frequently occurring limit value of the binomial distribution for an infinite number of attempts. But it can also be derived axiomatically from fundamental process properties.
The gains of a Poisson process are Poisson-distributed random variables . Extensions to the Poisson distribution such as the generalized Poisson distribution and the mixed Poisson distribution are mainly used in the field of actuarial mathematics .
definition
The Poisson distribution is a discrete probability distribution. It is determined by a real parameter that describes the expected value and at the same time the variance of the distribution. It assigns the probabilities to the natural numbers
to, with the Euler's number , and the faculty of designated. The parameter clearly describes the frequency of events expected during an observation. The Poisson distribution then indicates the probability of a certain number of events in an individual case if the mean event rate is known.
properties
Distribution function
The distribution function of the Poisson distribution is
and are likely to for most to find events where you can expect on average. The regularized gamma function denotes the lower limit.
Expected value, variance, moment
If the random variable is Poisson distributed, then it is both the expectation value and the variance , because it holds
such as
After the shift theorem now follows:
The same applies to the third centered moment .
Median
It seems reasonable to assume that the median is close to . However, there is no exact formula that is the most accurate estimate possible
Coefficient of variation
The coefficient of variation is obtained immediately from the expected value and the variance
- .
Leaning and curvature
The skew arises too
- .
The curvature can also be shown closed as
- .
and the excess as
- .
Higher moments
The -th moment can be specified as a polynomial of degrees in and is the -th complete Bell polynomial evaluated at the points :
- .
Accumulators
The cumulant producing function of the Poisson distribution is
- .
This means that all accumulators are the same
Characteristic function
The characteristic function has the form
- .
Probability generating function
For the probability generating function one obtains
- .
Moment generating function
The moment generating function of the Poisson distribution is
Reproductivity
The Poisson distribution is reproductive ; That is, the sum of stochastically independent Poisson-distributed random variables with the parameters is again Poisson-distributed with the parameter . The following applies to the folding
The Poisson distributions thus form a convolution half-group . This result follows directly from the characteristic function of the Poisson distribution and the fact that the characteristic function of a sum of independent random variables is the product of the characteristic functions.
The Poisson distribution is therefore also infinitely divisible . According to a sentence of the Soviet mathematician Dmitri Abramowitsch Raikow , the converse also applies: If a Poisson-distributed random variable is the sum of two independent random variables and , then the summands and are also Poisson-distributed. A Poisson-distributed random variable can only be broken down into Poisson-distributed independent summands. This theorem is an analogue of Cramer's theorem for normal distribution.
Thinning
Often stochastic experiments occur in which the events are actually Poisson distributed, but the counting only takes place if an additional condition is met. For example, the number of eggs an insect lays might be Poisson distributed, but there is only a certain probability that a larva will hatch from each egg. An observer of this Poisson-distributed random variable with parameters counts each event with only one probability (independently of one another).
Alternatively, an error in the counting could mean that the event is not registered. If there are originally events, only events are counted according to the binomial distribution . In this case the true value is unknown and varies between the measured value (all events present were seen) and infinite (there were more events than were seen). The probability of a measured value can then be found using the product of the probability of a successful measurement and the original Poisson distribution , summed up over all possible values :
- .
The values found for the detection probability are again Poisson distributed. The detection probability reduces the parameter of the original Poisson distribution to . This is also known as the thinning of the Poisson distribution.
calculation
The calculation of can be done recursively as follows. First one determines , then one after the other . As it increases , the probabilities become greater, as long as is. Will , they will shrink. The mode , i.e. the value with the greatest probability, is , if is not an integer, otherwise there are two neighboring ones (see diagram above right).
If the calculation causes problems because the values of and are too large , then the following approximation obtained with the Stirling formula can help:
Poisson-distributed random numbers are usually generated using the inversion method.
Parameter estimation
Maximum likelihood estimator
From a sample of observations for , the Poisson population parameter shall be estimated. The maximum likelihood estimator is given by the arithmetic mean
- .
The maximum likelihood estimator is an unbiased , efficient and sufficient estimator for the parameter .
Confidence interval
The confidence interval for is obtained from the relationship between the Poisson and Chi-square distributions . If a sample value is available, then a confidence interval for the confidence level is given by
- ,
where denotes the quantile function of the chi-square distribution with degrees of freedom .
Forecast interval
The task of the forecast interval is to predict an area in which the realization of an estimation function can be found with high probability before a sample is drawn. The number of Poisson-distributed events that will not be exceeded with a given probability can be calculated from the inversion of the distribution function:
This can again be expressed by the regularized gamma function . An elementary form of the inversion of the distribution function or the gamma function is not known. In this case, a two-column table of values does a good job; it can be easily calculated using the sum given in the Distribution function section above and shows which probabilities are assigned to certain values of .
Relationship to other distributions
Relationship to the binomial distribution
Just like the binomial distribution , the Poisson distribution predicts the expected result of a series of Bernoulli experiments . The latter are random experiments that only know two possible results (for example “success” and “failure”), that is, they have a dichotomous event space. If the temporal or spatial observation interval is subdivided further, the number of attempts increases . The progressive subdivision causes a decrease in the probability of success such that the product converges towards a finite limit value . Accordingly, the binomial probability distribution approximates the mathematically somewhat simpler Poisson distribution.
The Poisson distribution can be derived from the binomial distribution. It is the limit distribution of the binomial distribution with very small proportions of the features of interest and a very large sample size: and under the secondary condition that the product has a value that is neither zero nor infinite. is then the expectation value for all binomial distributions considered in the limit value formation as well as for the resulting Poisson distribution.
Both the Poisson distribution and the binomial distribution are special cases of the Panjer distribution .
Relationship to the generalized binomial distribution
The generalized binomial distribution can also be approximated for large samples and small success probabilities using the Poisson approximation.
Relationship to normal distribution
The Poisson distribution has a strongly asymmetrical shape for small values of . For-widening is symmetrical and down similar to about a Gaussian normal distribution with and :
Relationship to the Erlang distribution
- In a Poisson process , the random number of events in a specified interval of the Poisson distribution is sufficient . The random distance (distance or time) until the occurrence of the -th event as well as the distance between the events and , on the other hand, are - Erlang-distributed . It is also said that the Poisson distribution and the Erlang distribution are conjugate distributions. In the case , this Erlang distribution changes into an exponential distribution ( ). The number of expected events per unit interval denotes . is then the distribution density of the distance that will elapse before the next event occurs, as well as the distance between two successive events.
- The following applies to the distribution functions of the Erlang distribution and the Poisson distribution
- .
Relationship to the chi-square distribution
The distribution functions of the Poisson distribution and the chi-square distribution with degrees of freedom are related in the following way:
The probability of finding more events or more events in an interval within which events are expected on average is equal to the probability that the value of is. So it applies
- .
This follows from with and as regularized gamma functions .
Relationship to the Skellam distribution
In contrast, the difference between two stochastically independent Poisson-distributed random variables and with the parameters and is not again Poisson-distributed, but Skellam-distributed . The following applies:
- ,
where denotes the modified Bessel function .
Further Poisson distributions
Some other distributions are sometimes called "Poisson" and are generalizations of the Poisson distribution described here:
- The generalized Poisson distribution is a discrete distribution with two shape parameters. If you set one of them equal to zero, you get the usual Poisson distribution again.
- The mixed Poisson distribution combines the Poisson distribution with another probability density .
- The Poisson gamma distribution arises when combined with the gamma distribution. It corresponds to the negative binomial distribution .
- Another generalization is the composite Poisson distribution . It arises when a sum of independently and identically distributed random variables is formed and the number of summands is Poisson distributed. Unlike most distributions, this distribution does not specify whether it is continuous or discrete. If the accumulated random variables are logarithmically distributed , one obtains the negative binomial distribution and, as a special case, the geometric distribution .
Free Poisson distribution
In free probability theory there is a free analogue to the Poisson distribution, the free Poisson distribution . In analogy to a corresponding limit theorem for the Poisson distribution, it is defined as the limit value of the iterated free convolution for .
Two-dimensional Poisson distribution
The two-dimensional Poisson distribution, also known as the bivariate Poisson distribution, is defined by
The marginal distributions are Poisson distributed with the parameters and and it applies . The difference is Skellam-distributed with the parameters and .
This means that it is relatively easy to introduce dependencies between Poisson-distributed random variables if one knows or can estimate the mean values of the marginal distributions and the covariance. One can then easily generate the bivariate Poisson distribution by defining three independent Poisson-distributed random variables with parameters and then setting them.
The multivariate Poisson distribution can be defined analogously.
Application examples
"Rare" events
The classic example comes from Ladislaus von Bortkewitsch , who was able to prove by examining the number of deaths from hoofbeats in the individual cavalry units of the Prussian army per year that these numbers can be well described by a Poisson distribution.
In general, the following conditions must apply to the individual counting events (in the example, the individual deaths from hoofbeats) so that the number is Poisson distributed:
- Single events: The probability that two events will occur in a short period of time is negligible.
- Proportionality: The probability of observing an event in a short period of time is proportional to the length of the period.
- Homogeneity: The probability of observing an event in a short period of time is independent of the location of the period.
- Independence: The probability of observing an event in a short period of time is independent of the probability of an event in other non-overlapping periods.
Alternatively, these conditions can also be explained by the fact that the waiting time between two events is exponentially distributed . Since this is memoryless , the events occur almost randomly and independently of one another.
It must be checked in each individual case whether the conditions are met, but typical examples are:
- Number of printing errors on a book page
- Number of incoming calls per hour in a switchboard
- Number of radioactive decays of a substance in a given time interval (provided that the decay rate does not decrease noticeably, i.e. the measurement time is short compared to the half-life )
- Number of lightning strikes per hectare and year
- Number of vaccination damages per year
- the bombing of London
According to Palm-Chinchin's theorem , even general renewal processes converge under relatively mild conditions to a Poisson process , i.e. This means that here too the Poisson distribution results for the number of events. This means that the conditions given above can still be weakened considerably.
Customer arrivals
In queuing systems , customers or orders arrive in the system to be served. In queuing theory , the different models are described in Kendall notation . In particular, the number of customers arriving in a certain time interval is often modeled with a Poisson distribution (abbreviated to M for exponentially distributed inter-arrival times). This modeling is very attractive, since this assumption often results in simple analytical solutions.
Often this assumption can also be justified approximately, an example is to be used here to illustrate what this assumption means: A department store is entered by a customer on average every 10 seconds on a Saturday. If the new people are now counted every minute, then an average of 6 people would be expected to enter the store per minute. The choice of the length of the interval is up to the observer. If you were to choose an hour as the observation interval , the result would be an interval of 1 second . The relative fluctuation in the number of customers ( ) decreases as the interval increases, and consequently as the interval increases . In principle, the longer interval allows a more precise observation over the longer averaging, but is associated with more effort and cannot record changes in the conditions (e.g. arrival of a bus with tourists willing to shop) within the interval.
A Poisson distribution could exist under the following boundary conditions:
- Customers have to arrive individually. In reality, however, groups of people often arrive together.
- The probability that a customer will arrive could be proportional to the length of the observation period.
- There are certainly rush hours with increased customer traffic and lulls throughout the day.
- The customer arrivals in different time periods are not necessarily independent. For example, if the department store is overcrowded, customers could be put off.
In this example, the Poisson distribution assumption is difficult to justify, so there are queuing models e.g. B. with group arrivals, finite queues or other arrival distributions in order to model this arrival process more realistically. Fortunately, some key metrics, such as B. According to Little's Law, the average number of customers in the system, not dependent on the specific distribution, i. That is, even if assumptions are violated, the same result applies.
Ball fan model
In the area of counting combinatorics , a standard task is to distribute balls or spheres on compartments and to count how many possibilities there are. If one arranges the balls the subjects randomly, you get one for the number of balls in a solid professional binomial distribution with . One application is e.g. B. the distribution of raisins on a cake, with the aim that each piece contains a minimum number of raisins.
The picture on the right shows a section of a floor with square tiles on which grains of rice were randomly scattered. The fields contain each grain of rice and in total there are grains of rice in the examined section. One can now determine the probabilities directly via the binomial distribution , but the requirements of the Poisson approximation are also met.
The comparison between the experiment and the calculated Poisson distribution , where grains of rice is / squares, intuitively shows a good agreement. Statistically one could check the goodness of fit with a goodness of fit test.
counted | ||
0 |
15th |
12.7 |
1 |
15th |
17.2 |
2 |
11 |
11.6 |
3 |
5 |
5.2 |
4th |
1 |
1.7 |
5 |
2 |
0.5 |
The probability that a given field will be left blank is around 26%:
Sports results
In many sports a competition is about getting more counting events than the opponent within a certain period of time. The physicist Metin Tolan has extensively examined the applicability of the Poisson distribution in sport in his book on soccer.
The (temporal) constancy of the event probability - a sufficient prerequisite for the application of Poisson statistics (see above under Poisson assumptions ) - is generally at most approximately given in sports results. But one is only at the pure count value, e.g. B. the number of goals a team is interested in, a Poisson distribution results even with a time-dependent goal rate. It is more difficult to justify the assumption that is often made that the scores or scores of two teams are independent. If this assumption cannot be sufficiently justified statistically, e.g. B. by hypothesis or adaptation tests for agreement of the data with the Poisson distribution, one can, for example, switch to the bivariate Poisson distribution and introduce a dependency by estimating the covariance .
Tolan argues that the number of goals a team has scored in a football game can be assumed to be a Poisson distribution as a good approximation. In his approach, however, he only takes into account the average number of goals per game and team. That is, for example, he does not consider the strength of the opposing team. He has also proven that over 70% of the variance in the distribution of points in the Bundesliga can be explained by chance. This also proves from a stochastic point of view why football is exciting.
For the 2015 Cup final , Tolan would have On the basis of the previous Bundesliga season, for example, an estimated 2.12 goals for VfL Wolfsburg and 1.38 goals for Borussia Dortmund . Andreas Heuer goes one step further and defines the strength of a team as the average goal difference of a team when playing against an average opponent on a neutral pitch. Using the data from the previous Bundesliga season, one would have estimated an average goal difference of 1 for VfL Wolfsburg and 0.15 for Borussia Dortmund. In order to come to a game prognosis, one has to consider the average number of goals per game according to Heuer. For these two teams that would be 2.92 and Heuer would estimate 1.885 goals for VfL Wolfsburg and 1.035 goals for Borussia Dortmund. For seasonal forecasts, Heuer also takes into account other parameters in its complete model, such as home strength, market value or the performance of the teams in the pre-season. In practice, the final ended with 3 goals for Wolfsburg and one goal for Dortmund.
Two thirds law in roulette
The Poisson distribution gives a good estimate of how many different numbers will be hit in 37 roulette games.
literature
- Alessandro Birolini : Reliability Engineering . 7th edition., Springer, 2013, ISBN 978-3-642-39534-5
- Joseph K. Blitzstein, Jessica Hwang: Introduction to Probability . Chapman & Hall, 2014, ISBN 978-1-4665-7557-8
- Catherine Forbes, Merran Evans: Statistical Distributions . 4th edition. Wiley, 2011, ISBN 978-0-470-39063-4
Web links
- AV Prokhorov: Poisson distribution . In: Michiel Hazewinkel (Ed.): Encyclopaedia of Mathematics . Springer-Verlag , Berlin 2002, ISBN 978-1-55608-010-4 (English, online ).
- Eric W. Weisstein : Poisson Distribution . In: MathWorld (English).
- University of Konstanz - Interactive animation
- StatWiki - Derivation of the torque generating function
- poisson distribution.de - Generally understandable explanations, tasks, tools and proofs for the Poisson distribution
- Online Poisson Distribution Calculator
Individual evidence
- ↑ Adell, Jodra: The median of the poisson distribution . In: Metrika , 61, 2005, pp. 337-346, doi: 10.1007 / s001840400350 .
- ^ A. Papoulis: Poisson Process and Shot Noise . In: Probability, Random Variables, and Stochastic Processes . 2nd ed. McGraw-Hill, New York 1984, pp. 554-576.
- ↑ JG Skellam: The frequency distribution of the difference between two Poisson variates belonging to different populations . In: Journal of the Royal Statistical Society , Series A, 109 (3), 1946, p. 296, JSTOR 2981372 .
- ↑ Kazutomu Kawamura: The structure of bivariate Poisson distribution . In: Kodai Mathematical Seminar Reports , Volume 25, Number 2, 1973, pp. 246-256, doi: 10.2996 / kmj / 1138846776
- ↑ Kazutomu Kawamura: The structure of multivariate Poisson distribution . In: Kodai Mathematical Seminar Reports , Volume 25, Number 2, 1973, pp. 333-345, doi: 10.2996 / kmj / 1138036064
- ↑ Ladislaus von Bortkewitsch: The law of small numbers. Leipzig 1898 ( archive.org )
- ^ Poisson distribution ( Memento from September 20, 2015 in the Internet Archive ) Humboldt University Berlin
- ^ RD Clarke: An application of the Poisson distribution . In: Journal of the Institute of Actuaries. Volume 73, Number 3, 1946, p. 481, doi: 10.1017 / S0020268100035435 .
- ^ Donald Gross, Carl M. Harris: Fundamentals of Queuing Theory . Wiley & Sons, New York 1994.
- ^ Rolf Schassberger : Queues . Springer Verlag, Vienna, 1973, ISBN 3-211-81074-9
- ↑ Metin Tolan: Sometimes the Better Wins: The Physics of Soccer Game , Piper, 2011
- ↑ Alessandro Birolini : Reliability Engineering , Springer, 2014, especially A7.8.2
- ↑ Holger Dambeck: Is football a game of chance? In: Spektrum der Wissenschaft , June 2010, pp. 68–70.
- ↑ Andreas Heuer: The perfect tip. Wiley-VCH, 2012.