Maximum a posteriori estimate

from Wikipedia, the free encyclopedia

The maximum a posteriori method (= MAP) is an estimation method in mathematical statistics , more precisely a special Bayes estimator . The method estimates an unknown parameter by the modal value of the a posteriori distribution . There is thus a certain similarity to the maximum likelihood method .

description

The following situation is given: is an unknown parameter of a population that is to be estimated on the basis of observations . Furthermore, let the sample distribution of , i.e. the probability of , if the ( true ) parameter of the population has the value .

The function

is known as the likelihood function , and the estimate

as a maximum likelihood estimator of .

An a priori distribution of is now also available. This can be viewed as a random variable , as is common in Bayesian statistics . The a posteriori distribution of can be obtained using Bayes' theorem :

The maximum posteriori method now uses the modal value of the posterior distribution as an estimate for . Since the integral in the denominator does not depend on, it does not need to be taken into account when maximizing:

.

The MAP estimator of is identical to the maximum likelihood estimator (= ML estimator) if a non -informative a priori distribution (e.g. uniform distribution ) is used.

Comparison with other Bayesian (point) estimation methods

In the literature, the MAP estimator is described as the equivalent of Bayesian statistics to the ML estimator.

However, the MAP estimate does not play the same role in Bayesian statistics as the ML estimator does in frequentist statistics:

  • Bayesian statisticians usually express the (a posteriori) information about an unknown parameter in a probability distribution rather than a point estimator.
  • The expected value of the posterior distribution is superior to the MAP estimator if, as is customary in Bayesian statistics, the posterior variance of an estimator is used as a measure of quality.
  • In many cases, the median is also a better estimator than the MAP estimator.

In Bayesian decision theory , estimators other than the MAP estimator are optimal for the most common loss functions :

  • In the case of a quadratic loss function, the expected value of the posterior distribution is the optimal estimator.
  • If the absolute value of the estimation error is used as the loss function (i.e. , with a as any estimation function), the median of the posteriori distribution is the optimal estimation function.

Comparison with the maximum likelihood method

In contrast to the ML method, the MAP method takes into account prior knowledge in the form of a priori probabilities . These a priori probabilities, together with the sample, result in the a posteriori probability according to Bayes' theorem. The MAP method uses the most likely parameter value among the posterior distribution, while the ML method uses the parameter with the highest likelihood (i.e., no prior knowledge). However, the use of an a priori distribution is unacceptable to a strictly frequentist statistician. Therefore the ML method is used instead of the MAP method in classic statistics.

The maximum likelihood estimator can be viewed as a special case of a maximum a posteriori estimator in which the a priori distribution is uninformative (for limited value ranges of approximately a uniform distribution ). Conversely, each maximum a posteriori estimator for sampling distribution and a-priori distribution of a maximum likelihood estimator for sample distribution with

.

Both procedures can simulate each other and are "equally powerful" in this sense.

example

There are red and black balls in an urn . By pulling and replacing, you should find out how high the (true) proportion of red balls is in the urn. The number of red balls can then be described by a binomial distribution with sample size N = 10 and an unknown parameter (i.e. a distribution). In the following we assume that 7 red balls were drawn in such a test.

Maximum likelihood estimation

With the ML method, the proportion of red balls is estimated at 70%.

Non-informative a priori distribution

The beta (1,1) distribution (equivalent to the constant uniform distribution over the interval ) can be used as a non-informative a priori distribution for a binomially distributed random variable . This prior knowledge assumes all possible values as equally probable.

A posteriori distribution is then the distribution, the mode of which is 0.7. The MAP estimate therefore also estimates the proportion of red balls at 70%. The expected value of the distribution is at . Therefore, using the a posteriori expected value as an estimate function, the proportion of red spheres would be estimated at 66.67%.

Assuming that previous knowledge (equal probability for all ) correctly describes the distribution of the true value over many such urns, the a posteriori expectation value minimizes the mean square deviation of the estimator from the respective true value.

Informative a priori distribution

Now it is assumed that a certain prior knowledge is known about the proportion of red balls, which can be expressed in a distribution. This corresponds, for example, to the previous knowledge that 4 out of 8 balls drawn were red.

A posteriori distribution in this case is the distribution, the mode of which is 0.611. Using the MAP method, the proportion of red balls is therefore estimated at 61.1%. In this case the MAP estimator lies between the modal value of the a priori distribution and the maximum likelihood estimator.

The expected value of the a posteriori distribution would be 0.6, i.e. H. Using the a posteriori expected value as an estimation function, the proportion of red balls would be estimated at 60%.

literature

  • Bernhard Rüger: Inductive Statistics. Introduction for economists and social scientists . R. Oldenbourg Verlag, Munich Vienna 1988. ISBN 3-486-20535-8
  • James O. Berger: Statistical decision theory and Bayesian analysis . Springer Series in Statistics, Springer-Verlag, New York Berlin Heidelberg 1985. ISBN 0-387-96098-8

Individual evidence

  1. Bernhard Rüger: Inductive Statistics. Introduction for economists and social scientists , p. 161f
  2. James O. Berger: Statistical decision theory and Bayesian analysis , p. 133
  3. James O. Berger: Statistical decision theory and Bayesian analysis , p. 136
  4. James O. Berger: Statistical decision theory and Bayesian analysis , p. 134
  5. James O. Berger: Statistical decision theory and Bayesian analysis , pp. 161f.