Sufficient statistics

In mathematical statistics one is sufficient statistic , also sufficient statistic called, is a statistic that all relevant information regarding the unknown parameter from the random sample contains. From the point of view of dimension theory, sufficiency with regard to a model is a possible property of measurable functions that map from the sample space into any measurement space . Such images are characterized as sufficient (also: exhaustive ) that transform a high-dimensional data vector into a simpler form without losing essential information about the underlying probability distribution . The counterpart of sufficiency is freedom of distribution ; it corresponds to an uninformative transformation.

Clearly formulated are precisely those statistics that contain all information about the parameters of the model to be estimated that are contained in the sample.

Sufficiency is one of the classic reduction principles of mathematical statistics, along with faithfulness to expectations and equivariance / invariance . Sufficiency gets its meaning from the theorem of Rao-Blackwell . It follows from this that "optimal" estimators with regard to the mean square error or corresponding generalizations can always be found in the set of sufficient estimators.

definition

Formally, the sample room , any measurement room and a measurable mapping between the two rooms. Furthermore, let be a random variable on the sample space, the distribution of which comes from a family of probability measures . then means sufficient for the family if the distribution does not depend on. ${\ displaystyle \ Psi}$ ${\ displaystyle \ Omega _ {T}}$ ${\ displaystyle T \ colon \ Psi \ to \ Omega _ {T}}$ ${\ displaystyle X = (X_ {1}, \ dotsc, X_ {n})}$ ${\ displaystyle {\ mathcal {P}} = \ {P _ {\ vartheta}; \; \ vartheta \ in \ Theta \}}$ ${\ displaystyle T \;}$ ${\ displaystyle {\ mathcal {P}},}$ ${\ displaystyle X \; | \; T (X) = t}$ ${\ displaystyle \ vartheta}$

More generally, the sufficiency of a statistic can be defined by means of the sufficiency of σ-algebras : A statistic is called sufficient , or exhaustive if the σ-algebra it generates is a sufficient σ-algebra. ${\ displaystyle T}$

Example: binomial distribution

We get a simple example of sufficient statistics when examining identically independent Bernoulli-distributed random variables. The underlying model is therefore a Bernoulli process . In this case, the counting density of the random variable is given by, where they are either 0 or 1. Note that the counting measure is finite (especially -finite) and dominates the class because of the existence of the densities. Therefore one recognizes from the Neyman characterization that sufficient for is. ${\ displaystyle X = (X_ {1}, \ ldots, X_ {n})}$ ${\ displaystyle f _ {\ vartheta} (x) = \ vartheta ^ {\ sum _ {i = 1} ^ {n} x_ {i}} (1- \ vartheta) ^ {n- \ sum _ {i = 1 } ^ {n} x_ {i}}}$ ${\ displaystyle x_ {i}}$ ${\ displaystyle \ left (\ left \ {0.1 \ right \} ^ {n}, {\ mathcal {P}} \ left (\ left \ {0.1 \ right \} ^ {n} \ right) \ right)}$ ${\ displaystyle \ sigma}$ ${\ displaystyle \ sum _ {i = 1} ^ {n} X_ {i}}$ ${\ displaystyle \ vartheta}$

With the help of the definition, one shows the sufficiency of by calculating. If one now uses conditional probabilities one obtains: ${\ displaystyle \ sum _ {i = 1} ^ {n} X_ {i}}$ ${\ displaystyle f _ {\ vartheta} (x) | (\ sum _ {i = 1} ^ {n} X_ {i} = t)}$

{\ displaystyle f _ {\ vartheta} (x) | (\ sum _ {i = 1} ^ {n} X_ {i} = t) = {\ frac {\ vartheta ^ {t} (1- \ vartheta) ^ {nt}} {{\ binom {n} {t}} \ vartheta ^ {t} (1- \ vartheta) ^ {nt}}} = {\ frac {1} {\ binom {n} {t}} }}

.

This conditional density is now independent of and therefore sufficient. ${\ displaystyle \ vartheta}$ ${\ displaystyle \ sum _ {i = 1} ^ {n} X_ {i}}$

So speaking heuristically, instead of knowing the entire data vector, it is sufficient to simply know the number of successes in this experiment in order to obtain all information about the unknown parameter . ${\ displaystyle \ vartheta}$

Sentences about sufficiency in dominated distribution classes

Halmos-Savage theorem

Halmos-Savage's theorem provides a sufficiency criterion under the assumption that the distribution class is dominated. If countless infinite measures of the distribution class can then be combined into one measure , so that this dominates the distribution class and each probability measure of the distribution class has a measurable density , then it is a sufficient σ-algebra. ${\ displaystyle P ^ {*}}$ ${\ displaystyle {\ mathcal {S}}}$ ${\ displaystyle P ^ {*}}$ ${\ displaystyle {\ mathcal {S}}}$

Neyman criterion

Assuming that there is a dominated distribution class , a statistic is sufficient if and only if measurable functions exist, so that the density can be broken down as follows: This characterization of the sufficiency goes back to Jerzy Neyman . In particular, bijective transformations of sufficient statistics are sufficient again. The Neyman criterion is derived from the Halmos-Savage theorem, but is easier to handle. ${\ displaystyle {\ mathcal {P}}}$ ${\ displaystyle T \;}$ ${\ displaystyle g _ {\ vartheta} \ \ left (\ vartheta \ in \ Theta \ right)}$ ${\ displaystyle h \;}$ ${\ displaystyle f _ {\ vartheta}}$ ${\ displaystyle f _ {\ vartheta} (x) = h (x) g _ {\ vartheta} (T (x)).}$

More sufficiency terms

Minimal sufficiency

Minimal sufficiency is a stronger requirement than sufficiency, which is also defined for statistics and σ-algebras. It poses the question of the maximum possible data compression, i.e. the smallest possible sufficient σ-algebra.

Strong sufficiency

Strong sufficiency is a modification of the conventional term sufficiency, which is defined using Markov kernels . For Borel spaces, strong sufficiency and sufficiency coincide.

Important sentences

Basu's theorems make a connection between sufficiency, freedom of distribution and completeness .

The Lehmann-Scheffé theorem and Rao-Blackwell's theorem make statements about the existence of consistently best unbiased estimators with the help of sufficient statistics .

Web links

AS Kholevo: Sufficient statistics . In: Michiel Hazewinkel (Ed.): Encyclopaedia of Mathematics . Springer-Verlag , Berlin 2002, ISBN 978-1-55608-010-4 (English, online ).

literature

Ludger Rüschendorf: Mathematical Statistics . Springer Verlag, Berlin Heidelberg 2014, ISBN 978-3-642-41996-6 , doi : 10.1007 / 978-3-642-41997-3 .
Helmut Pruscha: Lectures on mathematical statistics. BG Teubner, Stuttgart 2000, ISBN 3-519-02393-8 , Section II.3.

Individual evidence

^ Leonhard Held and Daniel Sabanés Bové: Applied Statistical Inference: Likelihood and Bayes. Springer Heidelberg New York Dordrecht London (2014). ISBN 978-3-642-37886-7 , p. 41.

[1] Leonhard Held and Daniel Sabanés Bové: Applied Statistical Inference: Likelihood and Bayes. Springer Heidelberg New York Dordrecht London (2014). ISBN 978-3-642-37886-7 , p. 41.