Entropy estimate

from Wikipedia, the free encyclopedia

The topic of entropy estimation deals with the different methods for the statistical estimation of Shannon entropy on the basis of finite samples .

For the formal calculation of the Shannon entropy, according to the definition, knowledge of the probabilities of the underlying news source is necessary. In practice, however, these probabilities are mostly unknown, and it is necessary to estimate the probabilities of the messages from a given finite sample in order to infer the entropy of the totality.

Due to the natural statistical fluctuations in finite samples, systematic and unsystematic deviations in the estimates are to be expected. In the usual maximum likelihood estimator for the entropy, the probabilities are , in the Shannon entropy

,

replaced by the maximum likelihood estimator . If, in the case of a total of observations, the event occurs with an absolute frequency of , the use of leads to the maximum likelihood estimator for the entropy, which is frequently used in practice

From a statistical point of view, this estimator is particularly suitable if the sample is much larger than the possible number of different events, i.e. H. given is. Otherwise, the above estimator often leads to a systematic underestimation of the entropy. This error is particularly noticeable if the size of the sample is not very much larger than the number of different messages from the source. In practice, however, the latter is often of particular interest.

"Finite sample" corrections

There are a number of approaches in the literature that deal with successively reducing the systematic error with suitable correction terms. In doing so, Taylor series expansion of the entropy is usually carried out. For example, the estimator results for corrections up to the first order in

The correction term was first considered by Miller for the study of medical data. Other applications in the context of genetic research were later made by Herzel, for example. The first calculations of correction terms up to the second order were first published by Harris. It turns out that the second order correction terms are not independent of the probabilities to be estimated. In addition, substituting the probabilities in these terms with the maximum likelihood estimator does not lead to improvements. Harris' result is therefore not very suitable for practical purposes.

Higher order corrections

An alternative procedure in which only observable contributions contribute to the higher order correction terms was first proposed by Peter Grassberger . The condition is assumed for the probabilities to be estimated , with the absolute frequencies being viewed as independent, Poisson-distributed random variables. These assumptions are mostly very well fulfilled, especially for the examples that are interesting in practice. The starting point for the derivation of higher-order corrections is the Rényi entropy of the order

The formal connection with the Shannon entropy results from the border crossing , i. H. . It then seems obvious to first look for undistorted estimators for each of the summands . In the case of integer values , such undistorted estimators exist, i. H.

with for . For a formal formation of the limit value , an analytical continuation for any real values ​​of is necessary. The function was proposed by Grassberger for this purpose . This does not lead to an undistorted estimator for the entropy, but an asymptotically undistorted entropy estimator results ,

which leads to improvements in practice for finite samples. The function refers to the so-called digamma function . In the interesting case of small probabilities , the systematic error of this estimator is smaller than that of the estimator with the corrections proposed by Miller.

Systematic corrections

In a similar way, a parameterized family of general entropy estimators can be specified, which continue the above estimators or represent them asymptotically. Instead of a Poisson distribution , a binomial distribution is assumed for the absolute frequencies. No further restrictions on the probabilities are made. As an entropy estimator you get

,

where the real variable parameterizes different entropy estimates.

Examples

1. In this case the correction term disappears and the entropy estimator is obtained

A similar estimator was also discussed by Wolpert and Wolf in the context of Bayes' theory. Asymptotically, this estimator corresponds to the Miller estimator.

2. The estimator for approximates the estimator . Numerical analyzes show that the difference between and is negligible. The systematic error of is less than the systematic error of the estimator .

3. The case corresponds asymptotically to another entropy estimator derived from Grassberger. The latter has a smaller bias than the Miller estimator and .

Systematic error (distortion)

The systematic error of an estimator is defined as the expected deviation between the estimator under consideration and the variable to be estimated. In the case at hand, this results in accordance with this definition

This expression is explicitly dependent on the probabilities and the parameter . For each selection of these variables there is a characteristic value for the estimation error, which can be determined analytically as follows

The function on the right side of this formula is an incomplete beta function and belongs to the class of so-called special functions. However, no such formula can be derived for the unsystematic error. The latter must therefore usually be determined numerically.

credentials

  1. ^ CE Shannon, Bell Syst. Tech. 27 (1948) 379.
  2. ^ CE Shannon and W. Weaver, 1949 The Mathematical Theory of Communication (Urbana, IL: University of Illinois Press.)
  3. G. Miller: Note on the bias of information estimates . In H. Quastler, ed., Information theory in psychology II-B, p. 95 (Free Press, Glencoe, IL, 1955).
  4. H. Herzel, Sys. Anal. Mod. Sim. 5, (1988) 435.
  5. ^ B. Harris, Colloquia Math. Soc. Janos Bolya, p. 323 (1975)
  6. ^ A b P. Grassberger, Phys. Lett. A 128, (1988) 369.
  7. a b T. Schürmann, J. Phys. A: Math. Gen. 37 (2004) L295, arxiv : cond-mat / 0403192 .
  8. DH Wolpert and DR Wolf, Phys. Rev. E 52, 6841 (1995).
  9. ^ P. Grassberger, (2003) arxiv : physics / 0307138 .
  10. http://functions.wolfram.com/GammaBetaErf/Beta3/07/01/01/

See also