Likelihood function

from Wikipedia, the free encyclopedia

The likelihood function (often simply likelihood ), occasionally plausibility function , or Mutmaßlichkeitsfunktion called, is a special real-valued function in mathematical statistics , which consists of a probability density function or a discrete density is obtained by treating a parameter in density as a variable. The main use of the likelihood function is the construction of estimation functions using the maximum likelihood method . In addition, further functions such as the log-likelihood function and the score function are derived from it, which are used, for example, as auxiliary functions in the maximum likelihood method or for the construction of optimality criteria in estimation theory.

The concept came from Ronald Aylmer Fisher in the 1920s, who believed it was a self-contained framework for statistical modeling and inference. Barnard and Birnbaum later led a scientific school that advocated the plausibility principle that postulated that all relevant information for statistical inference is contained in the likelihood function.

definition

A probability density function or a count density is given

,

which also depends on one or more parameters from a parameter set . So it is . Then the function is called

,

by

is defined, the likelihood function. The density function thus becomes the likelihood function in that the parameter is understood as a variable and the variable is treated as a parameter. If a specific one is fixed, the likelihood function for the observation value is also called . In the case of a counting density, this gives the probability of on given a given parameter .

Examples

Probability density

If one considers independently and identically normally distributed random variables with an unknown expected value and unknown variance , the probability density function has due to the independence assumption

Thus the parameter is given as and comes from the parameter set . Hence the likelihood function

,

it therefore agrees with the density function, with the difference that and are variables and are treated as parameters. If one sets and , then the likelihood function is the observation value

.

Counting density

Is a binomially distributed random variable for the parameter at a fixed , i.e.

,

so it owns the counting density

for . Hence the likelihood function is of form

with and . The likelihood function for the observation value is then given by

.

use

The likelihood function is mainly used in the maximum likelihood method, an intuitively accessible estimation method for estimating an unknown parameter . In the case of an observation result, it is assumed that this is a “typical” observation result in the sense that it is very likely to receive such a result. The probability of obtaining depends on the probability density function and thus also on. Therefore, the estimate for the unknown parameter is given as that parameter for which the probability of occurrence of is maximal. To do this, one considers the likelihood function for the observation value and searches so that

.

This corresponds to the determination of a maximum point of the likelihood function, which is usually determined by setting the derivative to zero:

.

If this equation is difficult to solve, the log-likelihood function can be used as an aid.

Constructive terms

Log likelihood function

definition

The log likelihood function (also called logarithmic plausibility function) is defined as the (natural) logarithm from the likelihood function, i.e.

.

Sometimes it is also referred to as.

Examples

Building on the above two examples for the likelihood function, the following applies to the log-likelihood function in the case of the independently and identically normally distributed random variables

.

In the case of the binomial distribution, the log-likelihood function applies

.

Both follow from the calculation rules for the logarithm (see logarithmic laws ).

properties

Since the logarithm is a strictly monotonically increasing function , every minimum of the log-likelihood function is also a minimum of the likelihood function. Likewise, every maximum of the log-likelihood function is also a maximum of the likelihood function.

In addition, the log-likelihood function is additive for independently and identically distributed random variables. This means that if there are independently and identically distributed random variables with density and log-likelihood function , then the log-likelihood function has

.

This follows directly from the fact that the densities of are formed as a product and the rules of the logarithm.

use

Since the log-likelihood function has the same maximum digits as the likelihood function, it is a common tool for solving the equation

,

which arises with the maximum likelihood method. Instead of this equation is then the equation

solved. In particular, the additivity of the log-likelihood function for independently and identically distributed random variables makes it easier to solve the equation in many cases.

Score function

definition

In single-parameter models, the score function is defined as the first derivative of the log-likelihood function

So it is the logarithmic derivative of the likelihood function. The score function indicates the slope of the log-likelihood function at the respective point and does not always have to exist. She also shows up at Fisher Information .

example

For the binomial distribution it was already shown above that the likelihood function depends on the form

is. thats why

.

If you derive this function , the first term is omitted as a constant and follows with the derivation rules for the logarithm (see derivative and integral )

for the score function.

distribution

The score function is asymptotically normally distributed with the expected value zero and variance as the expected value of the Fisher information (also known as the expected Fisher information ):

or .

Pseudo-likelihood function

For the solution of the maximum likelihood problem, only finding the maximum of the likelihood function is important. This is one of the reasons why the maximum likelihood method often works even though the requirements are not met. A pseudo-likelihood function is used in the following cases :

  • the distribution requirements for the maximum likelihood method are not met: The likelihood function is then called a pseudo-likelihood function and
  • the actual likelihood function or log-likelihood function is too difficult to maximize and is e.g. B. replaced by a smoothed version and this pseudo-likelihood function is then maximized.

The core of the likelihood function

The core of the likelihood function ( core of the plausibility function ) is obtained from the likelihood function by neglecting all multiplicative constants. Usually, denotes both the likelihood function and its core. The use of the log-likelihood function often makes numerical sense. Multiplicative constants in then change into additive constants in , which in turn can often be ignored. A log-likelihood function without additive constants is called the core of the log-likelihood function . Here too, both the log-likelihood function and its core are usually referred to. For example, the core of the log-likelihood function of a normal distribution with unknown expected value and known variance would be :

.

Individual evidence

  1. ^ A b Hans-Otto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 , p. 203 , doi : 10.1515 / 9783110215274 .
  2. ^ Fisher, On the "probable error" of a coefficient of correlation deduced from a small sample, Metron, Volume 1, 1921, pp. 3-32.
  3. ^ Fisher, On the mathematical foundations of theoretical statistics, Philosophical Transactions of the Royal Society A, Volume 222, 1922, pp. 309-368.
  4. ^ Ludger Rüschendorf: Mathematical Statistics . Springer Verlag, Berlin Heidelberg 2014, ISBN 978-3-642-41996-6 , p. 162 , doi : 10.1007 / 978-3-642-41997-3 .
  5. a b Ulrich Krengel : Introduction to probability theory and statistics . For studies, professional practice and teaching. 8th edition. Vieweg, Wiesbaden 2005, ISBN 3-8348-0063-5 , p.  62 , doi : 10.1007 / 978-3-663-09885-0 .
  6. Reinhard Viertl: "Introduction to stochastics: with elements of Bayesian statistics and approaches for the analysis of fuzzy data." Springer-Verlag, 2013, p. 110.
  7. Claudia Czado, Thorsten Schmidt: Mathematical Statistics . Springer-Verlag, Berlin Heidelberg 2011, ISBN 978-3-642-17260-1 , p. 85 , doi : 10.1007 / 978-3-642-17261-8 .
  8. ^ Hans-Otto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 , p. 201 , doi : 10.1515 / 9783110215274 .
  9. ^ Leonhard Held and Daniel Sabanés Bové: Applied Statistical Inference: Likelihood and Bayes. Springer Heidelberg New York Dordrecht London (2014). ISBN 978-3-642-37886-7 , p. 87.
  10. ^ Leonhard Held and Daniel Sabanés Bové: Applied Statistical Inference: Likelihood and Bayes. Springer Heidelberg New York Dordrecht London (2014). ISBN 978-3-642-37886-7 , p. 15.
  11. ^ Leonhard Held and Daniel Sabanés Bové: Applied Statistical Inference: Likelihood and Bayes. Springer Heidelberg New York Dordrecht London (2014). ISBN 978-3-642-37886-7 , p. 27. ff.