# Chebyshev's inequality

The Chebyshev inequality , also known as Chebyshev inequality or Bienaymé-Chebyshev inequality , is an inequality in stochastics , a branch of mathematics. It is named after Irénée-Jules Bienaymé and Pafnuti Lwowitsch Tschebyscheff ; whose name can be found in the literature in various spellings, including Chebyshev , Chebyshev , Chebyshev or Chebyshev . In Chebyshev's inequality, the probability that a random variable will deviate from its expected value by more than a given threshold is estimated by its variance .

## statement

Let be a random variable with expected value${\ displaystyle X}$

${\ displaystyle \ mu: = \ operatorname {E} (X)}$

and finite variance

${\ displaystyle \ sigma ^ {2}: = \ operatorname {Var} (X)}$.

Then for all real numbers : ${\ displaystyle k> 0}$

${\ displaystyle \ operatorname {P} \ left (\ left | X- \ mu \ right | \ geq k \ right) \ leq {\ frac {\ sigma ^ {2}} {k ^ {2}}}}$.

By moving to the complementary event , one obtains

${\ displaystyle \ operatorname {P} \ left (\ left | X- \ mu \ right | .

## Goodness of estimate

The limits given by Chebyshev's inequality are sharp in the sense that there are random variables for which equality applies in the estimation.

This is the case, for example, for a discrete random variable with ${\ displaystyle X}$

${\ displaystyle \ operatorname {P} \ left (X = 0 \ right) = 1-p}$

and

${\ displaystyle \ operatorname {P} \ left (X = -a \ right) = \ operatorname {P} \ left (X = a \ right) = p / 2}$,

where is a truly positive real number and . Then and , so the estimation follows ${\ displaystyle a}$${\ displaystyle p \ in (0,1)}$${\ displaystyle \ mu = \ operatorname {E} (X) = 0}$${\ displaystyle \ sigma ^ {2} = \ operatorname {Var} (X) = a ^ {2} p}$

${\ displaystyle P (| X-0 | \ geq k) \ leq {\ frac {a ^ {2} p} {k ^ {2}}}}$,

which is filled with equality, since then applies. ${\ displaystyle k = a}$${\ displaystyle P (| X | \ geq k) = P (| X | \ geq a) = p}$

In general, however, the estimates are rather weak. For example, they are for trivial. Nevertheless, the theorem is often useful because it does not require distribution assumptions about the random variables and can therefore be used for all distributions with finite variance (especially those that differ greatly from the normal distribution ). In addition, the bounds are easy to calculate. ${\ displaystyle k \ leq \ sigma}$

## variants

### Deviations expressed by the standard deviation

If the standard deviation is different from zero and a positive number, one obtains an often-cited variant of Chebyshev's inequality: ${\ displaystyle \ sigma}$${\ displaystyle \ lambda}$${\ displaystyle k = \ lambda \ sigma}$

${\ displaystyle \ operatorname {P} \ left (\ left | X- \ mu \ right | \ geq \ lambda \ sigma \ right) \ leq {\ frac {1} {\ lambda ^ {2}}}}$.

This inequality only provides a meaningful estimate, for it is trivial, because probabilities are always bounded by 1. ${\ displaystyle \ lambda> 1}$${\ displaystyle 0 <\ lambda \ leq 1}$

### Generalization to higher moments

Chebyshev's inequality can be generalized to higher moments . If these generalized inequality often called (simply) also as Chebyshev inequality ( English Chebyshev's inequality ), while in the context of probability theory sometimes called Markov inequality (or as Markov inequality o. Ä., English Markov's inequality ) is called. With some authors, the generalized inequality can also be found under the name Chebyshev-Markov inequality (or Chebyshev-Markov inequality or the like).

The generalized inequality says that for a measure space and a measurable function and always the inequality ${\ displaystyle (\ Omega, \ Sigma, \ nu)}$${\ displaystyle f \ colon \ Omega \ to \ mathbb {R} _ {0} ^ {+}}$${\ displaystyle \ varepsilon, p \ in \ mathbb {R} ^ {+}}$

${\ displaystyle \ nu (\ {x \ mid f (x) \ geq \ varepsilon \}) \ leq {\ frac {1} {\ varepsilon ^ {p}}} \ int _ {\ Omega} f ^ {p } {\ rm {d}} \ nu}$.

applies.

This follows from

${\ displaystyle \ int _ {\ Omega} f ^ {p} \; {\ rm {d}} \ nu \ geq \ int _ {\ {x \ mid f (x) \ geq \ varepsilon \}} f ^ {p} \; {\ rm {d}} \ nu \ geq \ int _ {\ {x \ mid f (x) \ geq \ varepsilon \}} \ varepsilon ^ {p} \; {\ rm {d} } \ nu = \ varepsilon ^ {p} \ nu (\ {x \ mid f (x) \ geq \ varepsilon \})}$

The above version of the inequality can be obtained as a special case by putting , and , because then is ${\ displaystyle \ nu = P}$${\ displaystyle f = | X- \ mu |}$${\ displaystyle p = 2}$

${\ displaystyle P (| X- \ mu | \ geq k) = P (| X- \ mu | ^ {2} \ geq k ^ {2}) \ leq {\ frac {1} {k ^ {2} }} \ int _ {\ Omega} | X- \ mu | ^ {2} \; {\ rm {d}} P = {\ frac {\ sigma ^ {2}} {k ^ {2}}}}$.

### Chebyshev exponential inequality

The fact that the generalization applies to all positive moments at the same time can be used to prove the so-called exponential Chebyshev inequality . Let be a real random variable distributed according to and be a real number. In the notation above, we now put , and and get ${\ displaystyle X \ sim P}$${\ displaystyle P}$${\ displaystyle a \ in \ mathbb {R}}$${\ displaystyle \ nu = P}$${\ displaystyle \ varepsilon = \ mathrm {e} ^ {a}}$${\ displaystyle f (x) = \ mathrm {e} ^ {x}}$

${\ displaystyle P (X \ geq a) = P (\ mathrm {e} ^ {X} \ geq \ varepsilon) \ leq \ inf _ {p \ in \ mathbb {R} ^ {+}} {\ frac { 1} {\ varepsilon ^ {p}}} \ int _ {\ mathbb {R}} \ mathrm {e} ^ {px} \, \ mathrm {d} P = \ inf _ {p \ in \ mathbb {R } ^ {+}} {\ frac {E (\ mathrm {e} ^ {pX})} {\ mathrm {e} ^ {pa}}}.}$

The counter is the torque generating function of . The application of the exponential Chebyshev inequality to a sum of independent and identically distributed random variables is the decisive step in the proof of the Chernoff inequality . ${\ displaystyle M_ {X} (p) = E (\ mathrm {e} ^ {pX})}$${\ displaystyle X}$

## history

In most textbooks, the inequality only bears the name of Pafnuti Lwowitsch Chebyshev . He published his proof for discrete random variables simultaneously in St. Petersburg and Paris in 1867, there in Joseph Liouville's Journal Journal de Mathématiques Pures et Appliquées . A more general proof, however, was provided by Irénée-Jules Bienaymé in 1853 in the Paper Considérations a l'appui de la découverte de Laplace sur la loi de probabilité dans la méthode des moindres carrés. released. This was even reprinted in the same journal just before Chebyshev's publication in Liouville's journal. In a later publication, Chebyshev recognized the first publication of Bienaymé.

## Applications

• The Chebyshev inequality goes into the proofs of the Borel-Cantelli lemma and the weak law of large numbers .
• The generalization to higher moments can be used to show that from the -convergence of function Follow the convergence in measure follows.${\ displaystyle L ^ {p} \;}$
• The following applies to the median .${\ displaystyle m}$${\ displaystyle \ left | \ mu -m \ right | \ leq \ sigma}$

## Examples

### example 1

For example, suppose the length of Wikipedia articles has an expected value of 1000 characters with a standard deviation of 200 characters. From Chebyshev's inequality one can then deduce that there is at least a 75% probability that a Wikipedia article is between 600 and 1400 characters long ( ). ${\ displaystyle k = 400, ~ \ mu = 1000, ~ \ sigma = 200}$

The probability value is calculated in the following way:

${\ displaystyle \ operatorname {P} \ left (\ left | X-1000 \ right | <400 \ right) \ geq 1 - {\ frac {200 ^ {2}} {400 ^ {2}}} = 0 { ,} 75 = 75 \ \%}$

### Example 2

Another consequence of the theorem is that for every probability distribution with mean and finite standard deviation, at least half of the values lie in the interval ( ). ${\ displaystyle \ mu}$${\ displaystyle \ sigma}$${\ displaystyle (\ mu - {\ sqrt {2}} \ sigma, \ mu + {\ sqrt {2}} \ sigma)}$${\ displaystyle k ^ {2} = 2 \ sigma ^ {2}}$

### Example 3

A random event occurs when an attempt is made with probability . The experiment is repeated several times; the event occurs thereby times. is then binomially distributed and has expectation and variance ; the relative frequency of occurrence thus has expected value and variance . The Chebyshev inequality provides the deviation of the relative frequency from the expected value ${\ displaystyle p}$${\ displaystyle n}$${\ displaystyle k}$${\ displaystyle k}$${\ displaystyle np}$${\ displaystyle np (1-p)}$${\ displaystyle {\ tfrac {k} {n}}}$${\ displaystyle p}$${\ displaystyle {\ tfrac {p (1-p)} {n}}}$

${\ displaystyle \ operatorname {P} \ left (\ left | {\ frac {k} {n}} - p \ right | \ geq \ epsilon \ right) \ leq {\ frac {p (1-p)} { \ epsilon ^ {2} n}} \ leq {\ frac {1} {4 \ epsilon ^ {2} n}}}$,

for the second estimate, the relationship that follows directly from the inequality of the arithmetic and geometric mean was used. ${\ displaystyle {\ sqrt {p (1-p)}} \ leq {\ tfrac {1} {2}}}$

This formula is the special case of a weak law of large numbers that shows the stochastic convergence of the relative frequencies against the expected value.

The Chebyshev inequality provides only a rough estimate for this example, the Chernoff inequality provides a quantitative improvement .

## Evidence sketch

Most authors cite Chebyshev's inequality as a special case of the Markov inequality

${\ displaystyle P \ left (Y \ geq k \ right) \ leq {\ frac {\ operatorname {E} \ left (h (Y) \ right)} {h (k)}}.}$

with and the function . ${\ displaystyle Y = | X- \ mu |}$${\ displaystyle h (x) = x ^ {2}}$

How one can deduce the Markov inequality with school-style means from an immediately transparent comparison of areas and then derive this version of Chebyshev's inequality can be found in Wirths, for example. For a direct proof one defines

${\ displaystyle A_ {k} = \ {\ omega \ in \ Omega \ mid | X- \ mu | \ geq k \}}$.

If the indicator function denotes the set , then the inequality applies to all${\ displaystyle \ mathbf {1} _ {A}}$${\ displaystyle A}$${\ displaystyle \ omega}$

${\ displaystyle | X (\ omega) - \ mu | ^ {2} \ geq k ^ {2} \ mathbf {1} _ {A_ {k}} (\ omega)}$.

Because if, then the right side is zero and the inequality is fulfilled. If , after the quantities have been defined, the left side has at least the value , and the inequality is again satisfied. With the monotony of the expected value and its elementary calculation rules, the definition of the variance follows ${\ displaystyle \ omega \ notin A_ {k}}$${\ displaystyle \ omega \ in A_ {k}}$${\ displaystyle A_ {k}}$${\ displaystyle k ^ {2}}$

${\ displaystyle \ sigma ^ {2} = \ operatorname {Var} (X) = \ operatorname {E} (| X- \ mu | ^ {2}) \ geq \ operatorname {E} (k ^ {2} \ mathbf {1} _ {A_ {k}}) = k ^ {2} P (A_ {k}) = k ^ {2} P (| X- \ mu | \ geq k)}$.

Divide by gives the inequality. ${\ displaystyle k ^ {2}}$

## literature

Wikibooks: Description with example  - learning and teaching materials

1. Norbert Henze: Stochastics for beginners . An introduction to the fascinating world of chance. 10th edition. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-658-03076-6 , p. 165 , doi : 10.1007 / 978-3-658-03077-3 .
2. ^ Hans-Otto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 , p. 112 , doi : 10.1515 / 9783110215274 .
3. ^ Robert B. Ash: Real Analysis and Probability. 1972, pp. 84-85 & p. 227
4. TO Širjaev: Probability. 1988, p. 572
5. ^ RG Laha, VK Rohatgi: Probability Theory. 1979, p. 33
6. ^ Heinz Bauer: Measure and integration theory. 1992, p. 128
7. Matthias Löwe: Large deviations. (PDF; 418 KB) Westfälische Wilhelms-Universität Münster, Institute for Mathematical Stochastics, p. 4 .
8. Chebyshev, Pafnutii Lvovich . In: Michiel Hazewinkel (Ed.): Encyclopaedia of Mathematics . Springer-Verlag , Berlin 2002, ISBN 978-1-55608-010-4 (English, online ).
9. ^ VV Sazonov: Bienaymé, Irenée-Jules . In: Michiel Hazewinkel (Ed.): Encyclopaedia of Mathematics . Springer-Verlag , Berlin 2002, ISBN 978-1-55608-010-4 (English, online ).
10. Heinz Bauer: Probability Theory. 2002, p. 69 ff
11. Achim Klenke: Probability Theory . 3. Edition. Springer-Verlag, Berlin Heidelberg 2013, ISBN 978-3-642-36017-6 , p. 110 , doi : 10.1007 / 978-3-642-36018-3 .
12. ^ Hans-Otto Georgii: Stochastics . Introduction to probability theory and statistics. 4th edition. Walter de Gruyter, Berlin 2009, ISBN 978-3-11-021526-7 , p. 122 , doi : 10.1515 / 9783110215274 .
13. Klaus D. Schmidt: Measure and probability . 2nd, revised edition. Springer-Verlag, Heidelberg Dordrecht London New York 2011, ISBN 978-3-642-21025-9 , pp. 288 , doi : 10.1007 / 978-3-642-21026-6 .
14. H. Wirths: The expectation value - sketches for the development of concepts from grade 8 to 13. In: Mathematik in der Schule 1995 / Heft 6, pp. 330–343
15. Ehrhard Behrends: Elementary Stochastics . A learning book - co-developed by students. Springer Spectrum, Wiesbaden 2013, ISBN 978-3-8348-1939-0 , pp. 229-230 , doi : 10.1007 / 978-3-8348-2331-1 .