The set of Sanov is a result of mathematical sub-region of stochastics . It is a central statement of the theory of large deviations theory and shows a close connection to information theory . The proposition formalizes the intuition that the overall probability of a rare event is dominated by the probability of the most plausible partial event. It is named after the Russian mathematician Ivan Nikolajewitsch Sanov (1919–1968).
converges almost certainly to the expected value . However, it does not make any statement about the speed of convergence. Typically, the mean value will be close to , but it cannot be ruled out that it still deviates significantly from the limit value for an arbitrarily large value , that is to say, for example, applies. Sanov's theorem quantifies how quickly the probability of such a deviation decreases . In addition to the asymptotic behavior, one can also ask how likely the mean value for a specific one will deviate. For example, in his famous work The Doctrine of Chances , Abraham de Moivre dealt with a thought experiment of coin tossing. What is the probability of ?
the empirical distribution of the first coin tosses, where the Dirac measure denotes the position . Then it always holds and converges according to the law of large numbers . In addition, let the subset of all distributions with expected value be at least . Then the probability that the random measure is in is exactly the probability .
Under these assumptions, Sanov's theorem says that holds
for every set
Here is the inside and the conclusion of . In addition, if the left and right sides of the chain of inequalities match, then the limit exists and it holds
Remarks
The specific choice of the base of the logarithm is irrelevant, but it must be ensured that the same is used as for the divergence (cf. Shannon (unit) ).
From the finiteness of and the continuity of it follows that the infimum over the interior of can really be greater.
If convex , then the measure is well defined and is the information projection (engl. Projection information ) of on called.
Except for sublinear additive terms in the exponent (i.e. sub-exponential factors), asymptotic applies if the divergence is given in Nat . It can even show that for each applies
The empirical measure cannot assume arbitrary values, but always lies in , the elements of are called types . The probability that a specific type is can be estimated by.
Overall, the probability is thus dominated by the type that has the smallest divergence from the "true" distribution . Any other type with has an exponentially smaller probability.
General case
Sanov's theorem can be generalized considerably, in particular the finiteness of the basic set is unnecessary. Let now be an arbitrary Polish space ( e.g. der ), the set of all probability measures with the weak topology and again a sequence of iid -valent random variables with for one . Every absolutely continuous measure has a Radon-Nikodým density with respect to , the divergence is then explained by, for all other measures one can safely set. Note that the empirical measures are obviously always absolutely continuous with respect to.
These customized requirements of the states set of Sanov again that for any amount applies
↑ a b Ivan N. Sanov: On the Probability of Large Deviations of Random Variables . In: North Carolina State University, Department of Statistics (Ed.): Institute of Statistics Mimeo Series . No.192 , 1958 (English, ncsu.edu - Russian: О вероятности больших отклонений случайных величин . 1957. Translated by Dana EA Quade).
^ A b c Thomas M. Cover, Joy A. Thomas: Elements of Information Theory . 2nd Edition. John Wiley & Sons, Hoboken, NJ, USA 2006, p.362f .
^ Imre Csiszár, František Matúš: Information Projections Revisited . In: IEEE Transaction on Information Theory . tape49 , no.6 , 2003, p.1474-1490 , doi : 10.1109 / TIT.2003.810633 .