Fisher information

The Fisher information (named after the statistician Ronald Fisher ) is a parameter from the mathematical statistics , which for a family of probability densities can be defined and statements about the best possible quality of parameter estimates provides in this model.

definition

A one-parameter statistical standard model is given , that is, ${\ displaystyle (X, {\ mathcal {A}}, (P _ {\ vartheta}) _ {\ vartheta \ in \ Theta})}$

it is , ${\ displaystyle \ Theta \ subset \ mathbb {R}}$
The all have a density function relative to a fixed σ-finite measure , that is, they form a dominant distribution class . ${\ displaystyle P _ {\ vartheta}}$ ${\ displaystyle f (x, \ vartheta)}$ ${\ displaystyle \ mu}$

Furthermore, there is an open set and the score function exists ${\ displaystyle \ Theta}$

{\ displaystyle S _ {\ vartheta} (x): = {\ frac {\ partial} {\ partial \ vartheta}} \ ln f (x, \ vartheta) = {\ frac {{\ frac {\ partial} {\ partial \ vartheta}} f (x, \ vartheta)} {f (x, \ vartheta)}}}

and be finite. Then the Fisher information of the model is defined as either

{\ displaystyle I (\ vartheta): = \ operatorname {Var} _ {\ vartheta} (S _ {\ vartheta})}

or as

{\ displaystyle I (\ vartheta): = \ operatorname {E} _ {\ vartheta} (S _ {\ vartheta} ^ {2})}

.

The variance refers to the probability distribution . Under the regularity condition ${\ displaystyle \ operatorname {Var} _ {\ vartheta}}$ ${\ displaystyle P _ {\ vartheta}}$

{\ displaystyle \ int {\ frac {\ partial} {\ partial \ vartheta}} \, f (x, \ vartheta) \, \ mathrm {d} \ mu (x) = {\ frac {\ partial} {\ partial \ vartheta}} \ int f (x, \ vartheta) \, \ mathrm {d} \ mu (x)}

the two definitions coincide. The regularity condition also applies

{\ displaystyle \ int {\ frac {\ partial ^ {2}} {\ partial \ vartheta ^ {2}}} \, f (x, \ vartheta) \, \ mathrm {d} \ mu (x) = { \ frac {\ partial ^ {2}} {\ partial \ vartheta ^ {2}}} \ int f (x, \ vartheta) \, \ mathrm {d} \ mu (x)}

,

so the Fisher information is given by

{\ displaystyle I (\ vartheta) = - \ operatorname {E} _ {\ vartheta} \ left ({\ frac {\ partial} {\ partial \ vartheta}} S _ {\ vartheta} \ right)}

.

Comments on the definition

The following things should be considered when defining:

It does not follow from the fact that the model is one-parametric that it is a question of probability distributions over a one-dimensional basic space. One-parameter only means that the distributions are determined by a one-dimensional parameter. No requirements are placed on the dimensions of the floor space.
In most cases the measure with respect to which the density functions are defined is either the Lebesgue measure or the counting measure . In the case of the counting measure, the density functions are probability functions ; the integral is accordingly replaced by a sum. If the Lebesgue measure is involved, the integral is a Lebesgue integral , but in most cases it can be replaced by the traditional Riemann integral . You then write accordingly instead of . ${\ displaystyle \ mu}$ ${\ displaystyle \ lambda}$ ${\ displaystyle \ mathrm {d} x}$ ${\ displaystyle \ mathrm {d} \ lambda (x)}$

Sufficient for the existence of the score function is, for example, that on is completely positive and continuously differentiable according to .

{\ displaystyle f (x, \ vartheta)}

{\ displaystyle X \ times \ Theta}

{\ displaystyle \ vartheta}

The first regularity condition applies, for example, by definition in regular statistical models . Mostly one shows the interchangeability of integration and differentiation with the classical statements of analysis.

Under the first regularity condition, the score function is centered, that is, it is . The equivalence of the first two definitions of Fisher information follows from this using the variance shift theorem.

{\ displaystyle \ operatorname {E} _ {\ vartheta} (S _ {\ vartheta}) = 0}

Examples

Discrete ground space: Poisson distribution

The basic space is given as a statistical model , provided with the σ-algebra , the power set . For is the Poisson distribution . Accordingly, the density function, here with regard to the counting measure, is given by ${\ displaystyle X = \ {0,1,2, \ dots \}}$ ${\ displaystyle {\ mathcal {A}} = {\ mathcal {P}} (X)}$ ${\ displaystyle \ lambda \ in (0, \ infty)}$ ${\ displaystyle P _ {\ lambda}}$

{\ displaystyle f (x, \ lambda) = {\ frac {\ lambda ^ {x}} {x!}} \, \ mathrm {e} ^ {- \ lambda}}

.

This results in the score function too

{\ displaystyle S _ {\ lambda} (x) = {\ frac {\ partial} {\ partial \ lambda}} \ ln f (x, \ lambda) = {\ frac {\ partial} {\ partial \ lambda}} \ left (x \ ln (\ lambda) - \ ln (x!) - \ lambda \ right) = {\ frac {x} {\ lambda}} - 1}

The Fisher information is thus according to the calculation rules for the variance under linear transformations

{\ displaystyle I (\ lambda) = \ operatorname {Var} _ {\ lambda} (S _ {\ lambda}) = {\ frac {1} {\ lambda}}}

.

Continuous base space: exponential distribution

This time and is chosen as the statistical model . The are exponentially distributed with parameter . Thus they have the density function (with respect to the Lebesgue measure) ${\ displaystyle X = (0, \ infty)}$ ${\ displaystyle {\ mathcal {A}} = {\ mathcal {B}} ((0, \ infty))}$ ${\ displaystyle P _ {\ lambda}}$ ${\ displaystyle \ lambda \ in (0, \ infty)}$

{\ displaystyle f (x, \ lambda) = \ lambda \ exp (- \ lambda x)}

.

Hence the score function

{\ displaystyle S _ {\ lambda} (x) = {\ frac {\ partial} {\ partial \ lambda}} \ ln f (x, \ lambda) = {\ frac {\ partial} {\ partial \ lambda}} \ left (\ ln (\ lambda) - \ lambda x \ right) = {\ frac {1} {\ lambda}} - x}

,

hence the Fisher information

{\ displaystyle I (\ lambda) = \ operatorname {Var} _ {\ lambda} (S _ {\ lambda}) = {\ frac {1} {\ lambda ^ {2}}}}

Fisher information of an exponential family

Is given by a one-parameter exponential family , so it has the density function ${\ displaystyle P _ {\ vartheta}}$

{\ displaystyle f (x, \ vartheta) = h (x) A (\ vartheta) \ exp (\ eta (\ vartheta) T (x))}

,

so the score function is given by

{\ displaystyle S _ {\ vartheta} (x) = \ eta '(\ vartheta) T (x) + {\ frac {A' (\ vartheta)} {A (\ vartheta)}}}

.

From this it follows for the Fisher information

{\ displaystyle I (\ vartheta) = \ left [\ eta '(\ vartheta) \ right] ^ {2} \ cdot \ operatorname {Var} _ {\ vartheta} (T (x))}

.

If the exponential family is given in the natural parameterization , then this simplifies to ${\ displaystyle \ eta (\ vartheta) = \ vartheta}$

{\ displaystyle S _ {\ vartheta} (x) = T (x) + {\ frac {A '(\ vartheta)} {A (\ vartheta)}} {\ text {and}} I (\ vartheta) = \ operatorname {Var} _ {\ vartheta} (T (x))}

In this case, the variance of the canonical statistic is Fisher information. ${\ displaystyle T}$

Properties and uses

Additivity

The Fisher information in the case of independent and identically distributed random variables additive under the first regularity, that is, for the Fisher information of a sample of independent and identically distributed random variables with the Fisher information applies ${\ displaystyle {\ mathcal {I}} ^ {(n)}}$ ${\ displaystyle X_ {1}, \ dotsc, X_ {n}}$ ${\ displaystyle {\ mathcal {I}}}$

{\ displaystyle {\ mathcal {I}} ^ {(n)} (\ vartheta) = n \ cdot {\ mathcal {I}} (\ vartheta)}

.

This property follows directly from Bienaymé's equation .

Sufficiency

Furthermore, for sufficient statistics , the Fisher information regarding is the same as for , where applies. ${\ displaystyle T}$ ${\ displaystyle f _ {\ vartheta} (X)}$ ${\ displaystyle g _ {\ vartheta} (T (X))}$ ${\ displaystyle f _ {\ vartheta} (x) = h (x) g _ {\ vartheta} (T (x))}$

use

The Fisher information is used specifically in the Cramér-Rao inequality , where its reciprocal value provides a lower bound for the variance of an estimator if the mentioned regularity condition is valid : If an unbiased estimator for the unknown parameter then applies . ${\ displaystyle \ vartheta}$ ${\ displaystyle T (X)}$ ${\ displaystyle \ vartheta}$ ${\ displaystyle \ operatorname {Var} _ {\ vartheta} (T (X)) \ geq {\ mathcal {I}} (\ vartheta) ^ {- 1}}$

Extensions to higher dimensions

If the model of multiple parameters with dependent, the Fisher information than can be symmetric matrix defined, wherein ${\ displaystyle \ vartheta _ {i}}$ ${\ displaystyle 1 \ leq i \ leq k}$ ${\ displaystyle {\ mathcal {I}} (\ vartheta) = ({\ mathcal {I}} _ {ij} (\ vartheta)) _ {i, j = 1, \ dotsc, k}}$

{\ displaystyle {\ mathcal {I}} _ {ij} (\ vartheta) = \ operatorname {E} _ {\ vartheta} \ left [{\ frac {\ partial} {\ partial \ vartheta _ {i}}} \ log f _ {\ vartheta} (X) \ cdot {\ frac {\ partial} {\ partial \ vartheta _ {j}}} \ log f _ {\ vartheta} (X) \ right]}

applies. It is called the Fisher information matrix. The properties are essentially retained. Under the regularity is the covariance matrix of the score function. ${\ displaystyle {\ mathcal {I}} (\ vartheta)}$

Example: normal distribution

If is normally distributed with expected value as parameter and known variance , then is . It follows ${\ displaystyle X}$ ${\ displaystyle \ vartheta}$ ${\ displaystyle v> 0}$ ${\ displaystyle f _ {\ vartheta} (x) = {\ frac {1} {\ sqrt {2 \ pi v}}} \ mathrm {e} ^ {- {\ frac {(x- \ vartheta) ^ {2 }} {2v}}}}$

{\ displaystyle {\ frac {\ partial} {\ partial \ vartheta}} \ log f _ {\ vartheta} (x) = {\ frac {x- \ vartheta} {v}}}

,

so

{\ displaystyle {\ mathcal {I}} (\ vartheta) = \ operatorname {Var} \ left ({\ frac {X- \ vartheta} {v}} \ right) = {\ frac {1} {v}} }

.

If, on the other hand, one considers both the expected value and the variance as unknown parameters, the result is ${\ displaystyle \ mu}$ ${\ displaystyle v}$

{\ displaystyle {\ mathcal {I}} (\ mu, v) = {\ begin {pmatrix} {\ dfrac {1} {v}} & 0 \\ 0 & {\ dfrac {1} {2v ^ {2}} } \ end {pmatrix}}}

as a Fisher information matrix.

literature

Hans-Otto Georgii : Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 , doi : 10.1515 / 9783110215274 .
Ludger Rüschendorf: Mathematical Statistics . Springer Verlag, Berlin Heidelberg 2014, ISBN 978-3-642-41996-6 , doi : 10.1007 / 978-3-642-41997-3 .
Claudia Czado, Thorsten Schmidt: Mathematical Statistics . Springer-Verlag, Berlin Heidelberg 2011, ISBN 978-3-642-17260-1 , doi : 10.1007 / 978-3-642-17261-8 .
Helmut Pruscha: Lectures on mathematical statistics. BG Teubner, Stuttgart 2000, ISBN 3-519-02393-8 , Section V.1.

Individual evidence

^ Georgii: Stochastics. 2009, p. 210.
↑ Czado Schmidt: Mathematical Statistics. 2011, p. 116.

[1] Georgii: Stochastics. 2009, p. 210.

[2] Czado Schmidt: Mathematical Statistics. 2011, p. 116.