F distribution

The F distribution or Fisher distribution, also Fisher-Snedecor distribution (after Ronald Aylmer Fisher and George W. Snedecor ), is a continuous probability distribution . An F-distributed random variable is obtained as the quotient of two respectively by the corresponding number of degrees of freedom divided chi-square distributed random variables. The F-distribution has two independent degrees of freedom as parameters and thus forms a two-parameter distribution family .

The F-distribution is often used in a test ( F-test ) to determine whether the difference between two sample variances is due to statistical variability or whether it indicates different populations . In the analysis of variance , an F-statistic is also used to test for significant differences between populations (groups).

definition

Density function of the F-distribution with selected degrees of freedom and${\ displaystyle m}$${\ displaystyle n}$
Distribution function of the F-distribution with selected degrees of freedom and${\ displaystyle m}$${\ displaystyle n}$

A continuous random variable satisfies the F-distribution , with degrees of freedom in the numerator and degrees of freedom in the denominator, if it has the probability density ${\ displaystyle F (m, n)}$${\ displaystyle m}$${\ displaystyle n}$

${\ displaystyle f (x \ mid m, n) = m ^ {\ frac {m} {2}} n ^ {\ frac {n} {2}} \ cdot {\ frac {\ Gamma ({\ frac { m + n} {2}})} {\ Gamma ({\ frac {m} {2}}) \ Gamma ({\ frac {n} {2}})}} \ cdot {\ frac {x ^ { {\ frac {m} {2}} - 1}} {(mx + n) ^ {\ frac {m + n} {2}}}}, \ quad x> 0}$

owns. It is with the gamma function at the place designated. ${\ displaystyle \ Gamma (x)}$${\ displaystyle x}$

Historically, the following definition forms the origin of the F-distribution as the distribution of size

${\ displaystyle F_ {m, n} = {\ frac {\ chi _ {m} ^ {2} / m} {\ chi _ {n} ^ {2} / n}},}$

where and are independent, chi-squared distributed random variables with degrees of freedom, respectively . ${\ displaystyle \ chi _ {m} ^ {2}}$${\ displaystyle \ chi _ {n} ^ {2}}$${\ displaystyle m}$${\ displaystyle n}$

properties

Expected value

The expected value only exists for and then has the value ${\ displaystyle n> 2}$

${\ displaystyle \ operatorname {E} (F_ {m, n}) = {\ frac {n} {n-2}}}$.

Variance

The variance is only defined for and then reads ${\ displaystyle n> 4}$

${\ displaystyle \ operatorname {Var} (F_ {m, n}) = {\ frac {2n ^ {2} (m + n-2)} {m (n-2) ^ {2} (n-4) }}}$.

Distribution function

The values ​​of the distribution are usually determined numerically and given in a table . A complete tabulation with regard to all degrees of freedom is i. A. not necessary, so that most distribution tables give the quantiles in terms of selected degrees of freedom and probabilities. The relationship is also used here: ${\ displaystyle P (X \ leq x) = F (x | m; n)}$

${\ displaystyle F ^ {- 1} (p; m; n) = {\ frac {1} {F ^ {- 1} (1-p; n; m)}},}$

where the -quantile of the F-distribution with and means degrees of freedom. ${\ displaystyle F ^ {- 1} (p; m; n)}$${\ displaystyle p}$${\ displaystyle m}$${\ displaystyle n}$

The F-distribution can be expressed in closed form as

${\ displaystyle F (x | m; n) = I \ left ({\ frac {m \ cdot x} {m \ cdot x + n}}, {\ frac {m} {2}}, {\ frac { n} {2}} \ right),}$

where is the regularized incomplete beta function. ${\ displaystyle I (z, a, b) = {\ frac {1} {B (a, b)}} \ cdot \ int _ {0} ^ {z} t ^ {a-1} (1-t ) ^ {b-1} \ mathrm {d} t}$

maximum

For takes at the point ${\ displaystyle m> 2}$${\ displaystyle f}$

${\ displaystyle x _ {\ mathrm {max}} = {\ frac {n (m-2)} {m (n + 2)}}}$

the maximum.

entropy

The entropy of the F distribution (expressed in nats ) is

${\ displaystyle H (X) = \ ln \ left ({\ frac {n} {m}} \ cdot {\ frac {\ Gamma \ left ({\ frac {m} {2}} \ right) \ Gamma \ left ({\ frac {n} {2}} \ right)} {\ Gamma \ left ({\ frac {m} {2}} + {\ frac {n} {2}} \ right)}} \ right ) + \ left (1 - {\ frac {m} {2}} \ right) \ psi \ left ({\ frac {m} {2}} \ right) - \ left (1 + {\ frac {n} {2}} \ right) \ psi \ left ({\ frac {n} {2}} \ right) + {\ frac {m + n} {2}} \ psi \ left ({\ frac {m + n } {2}} \ right),}$

where denotes the digamma function . ${\ displaystyle \ psi}$

Relationships with other distributions

In the following, the symbol means “is distributed like”. ${\ displaystyle \ sim}$

Relationship to Beta Distribution

The random variable

${\ displaystyle Y = {\ frac {{\ frac {m} {n}} F_ {m, n}} {1 + {\ frac {m} {n}} F_ {m, n}}}}$

is beta distributed with parameters and the following applies: ${\ displaystyle m / 2}$${\ displaystyle n / 2}$ ${\ displaystyle \ left (Y \ sim \ operatorname {Beta} (m / 2, n / 2) \ right).}$

${\ displaystyle Y \ sim {\ frac {\ chi _ {m} ^ {2}} {\ chi _ {m} ^ {2} + \ chi _ {n} ^ {2}}}}$

where and are independent chi-square distributed random variables with and degrees of freedom, respectively . ${\ displaystyle \ chi _ {m} ^ {2}}$${\ displaystyle \ chi _ {n} ^ {2}}$${\ displaystyle m}$${\ displaystyle n}$

Relationship to the chi-square distribution

From the independent and chi-square-distributed random variables with or degrees of freedom, ${\ displaystyle \ chi _ {m} ^ {2}}$${\ displaystyle \ chi _ {n} ^ {2}}$ ${\ displaystyle m}$${\ displaystyle n}$

${\ displaystyle F_ {m, n} = {\ frac {\ chi _ {m} ^ {2} / m} {\ chi _ {n} ^ {2} / n}}}$

to construct. This random variable is -distributed. ${\ displaystyle F (m, n)}$

Relationship to the non-central F-distribution

For independent random variables and is ${\ displaystyle X \ sim \ chi ^ {2} (\ delta, m)}$${\ displaystyle Y \ sim \ chi ^ {2} (n)}$

${\ displaystyle Z = {\ frac {X / m} {Y / n}}}$

distributed according to the non-central F-distribution with non-centrality parameter . There is a non-central chi-square distribution with non-centrality parameters and degrees of freedom. The central F-distribution results for . ${\ displaystyle Z \ sim F (\ delta, m, n)}$${\ displaystyle \ delta}$${\ displaystyle \ chi ^ {2} (\ delta, \, m)}$${\ displaystyle \ delta}$${\ displaystyle m}$${\ displaystyle \ delta = 0}$${\ displaystyle F (m, \, n)}$

Density of the non-central F distribution

${\ displaystyle g (z | m, n, \ delta) = f (z | m, n) \ cdot e ^ {- \ delta / 2} {} _ {1} {\ mathcal {F}} _ {1 } \ left ({\ frac {m + n} {2}}, {\ frac {m} {2}}, {\ frac {m \ cdot z \ cdot \ delta} {2 (m \ cdot z + n )}} \ right).}$

The function is a special hypergeometric function , also called Kummer's function, and represents the density of the central F-distribution given above . ${\ displaystyle {} _ {1} {\ mathcal {F}} _ {1} (a, b, x)}$${\ displaystyle f (x | m, n)}$

The expectation and variance of the non-central F-distribution are given by

${\ displaystyle {\ frac {n (1+ \ delta / m)} {n-2}}}$ With ${\ displaystyle n> 2}$

and

${\ displaystyle {\ frac {2n ^ {2} (m (1+ \ delta / m) ^ {2} + (n-2) (1 + 2 \ delta / m))} {m (n-2) ^ {2} (n-4)}}}$ With ${\ displaystyle n> 4.}$

Both result in the formulas of the central F-distribution. ${\ displaystyle \ delta \ to 0}$

Relationship to normal distribution

If the independent normally distributed random variables are the parameters ${\ displaystyle X_ {1}, X_ {2}, \ dotsc, X_ {m}, Y_ {1}, Y_ {2}, \ dotsc, Y_ {n}}$

${\ displaystyle \ operatorname {E} (X_ {i}) = \ mu, \ quad \ operatorname {Var} (X_ {i}) = \ sigma ^ {2}}$
${\ displaystyle \ operatorname {E} (Y_ {j}) = \ nu, \ quad \ operatorname {Var} (Y_ {j}) = \ tau ^ {2}}$

the respective sample variances and are independent, and the following applies : ${\ displaystyle S_ {X} ^ {2}}$${\ displaystyle S_ {Y} ^ {2}}$

${\ displaystyle {\ frac {S_ {X} ^ {2}} {\ sigma ^ {2}}} \ sim \ chi _ {m-1} ^ {2} / (m-1)}$
${\ displaystyle {\ frac {S_ {Y} ^ {2}} {\ tau ^ {2}}} \ sim \ chi _ {n-1} ^ {2} / (n-1)}$

Therefore the random variable is subject to

${\ displaystyle F = {\ frac {S_ {X} ^ {2} / \ sigma ^ {2}} {S_ {Y} ^ {2} / \ tau ^ {2}}}}$

an F-distribution with degrees of freedom in the numerator and degrees of freedom in the denominator. ${\ displaystyle m-1}$${\ displaystyle n-1}$

Relationship to Student's t-distribution

If ( Student's t-distribution ) then is${\ displaystyle X \ sim t_ {n}}$${\ displaystyle X ^ {2} \ sim F (1, n).}$

The square of a t-distributed random variable with degrees of freedom follows an F-distribution with and degrees of freedom. ${\ displaystyle n}$${\ displaystyle m = 1}$${\ displaystyle n}$

Derivation of the density

The probability density of the F-distribution can be derived (cf. derivation of the density of Student's t-distribution ) from the common density of the two independent random variables and , which are both chi-square distributed. ${\ displaystyle \ chi _ {m} ^ {2}}$${\ displaystyle \ chi _ {n} ^ {2}}$

${\ displaystyle g _ {\ chi _ {m} ^ {2}, \ chi _ {n} ^ {2}} (x, y) = \ left ({\ frac {1} {2 ^ {\ frac {m } {2}} \ Gamma ({\ tfrac {m} {2}})}} x ^ {{\ frac {m} {2}} - 1} \ operatorname {exp} \ left \ {- {\ frac {x} {2}} \ right \} \ right) \ cdot \ left ({\ frac {1} {2 ^ {\ frac {n} {2}} \ Gamma ({\ tfrac {n} {2} })}} y ^ {{\ frac {n} {2}} - 1} \ operatorname {exp} \ left \ {- {\ frac {y} {2}} \ right \} \ right)}$.

With the transformation

${\ displaystyle f = {\ frac {x / m} {y / n}}, v = y}$

we get the common density of and , where and applies. ${\ displaystyle F = {\ frac {\ chi _ {m} ^ {2} / m} {\ chi _ {n} ^ {2} / n}}}$${\ displaystyle \ chi _ {n} ^ {2}}$${\ displaystyle f \ geq 0}$${\ displaystyle v \ geq 0}$

The Jacobide terminant of this transformation is:

${\ displaystyle \ det {\ frac {\ partial (x, y)} {\ partial (f, v)}} = {\ begin {vmatrix} {\ frac {m} {n}} v & 0 \\\ Diamond & 1 \ end {vmatrix}} = {\ frac {m} {n}} v}$

The value is not important because it is multiplied by 0 when calculating the determinant. So the new density function is written ${\ displaystyle \ Diamond}$

${\ displaystyle g_ {F, \ chi _ {n} ^ {2}} (f, v) = {\ frac {1} {2 ^ {\ frac {m} {2}} \ Gamma ({\ frac { m} {2}})}} \ left (fv \, {\ frac {m} {n}} \ right) ^ {{\ frac {m} {2}} - 1} e ^ {- {\ frac {1} {2}} (fv \, {\ frac {m} {n}})} \ cdot {\ frac {1} {2 ^ {\ frac {n} {2}} \ Gamma ({\ frac {n} {2}})}} v ^ {{\ frac {n} {2}} - 1} e ^ {- {\ frac {1} {2}} v} \ cdot {\ frac {m} {n}} v.}$

We are now looking for the marginal distribution as an integral over the variable that is not of interest : ${\ displaystyle g_ {m, \, n} (f)}$${\ displaystyle v}$

${\ displaystyle g_ {m, n} (f) = \ int \ limits _ {0} ^ {\ infty} g_ {F, \ chi _ {n} ^ {2}} (f, v) \, dv = {\ frac {({\ frac {m} {n}}) ^ {\ frac {m} {2}} f ^ {{\ frac {m} {2}} - 1}} {2 ^ {\ frac {m + n} {2}} \ Gamma ({\ frac {m} {2}}) \ Gamma ({\ frac {n} {2}})}} \ int \ limits _ {0} ^ {\ infty} v ^ {{\ frac {m + n} {2}} - 1} e ^ {- {\ frac {v} {2}} (1 + {\ frac {m} {n}} f)} \, dv = m ^ {\ frac {m} {2}} n ^ {\ frac {n} {2}} \ cdot {\ frac {\ Gamma ({\ frac {m} {2}} + {\ frac {n} {2}})} {\ Gamma ({\ frac {m} {2}}) \ Gamma ({\ frac {n} {2}})}} \ cdot {\ frac {f ^ { {\ frac {m} {2}} - 1}} {(mf + n) ^ {\ frac {m + n} {2}}}}.}$

Quantile functions

The -quantile of the F-distribution is the solution of the equation and therefore in principle to be calculated using the inverse function. Specifically applies here ${\ displaystyle p}$${\ displaystyle x_ {p}}$${\ displaystyle p = F (x_ {p} | m, \, n)}$

${\ displaystyle x_ {p} = {\ frac {nI ^ {- 1} (p, {\ frac {m} {2}}, {\ frac {n} {2}})} {m (1-I ^ {- 1} (p, {\ frac {m} {2}}, {\ frac {n} {2}}))}}}$

with as the inverse of the regularized incomplete beta function. This value can be found in the F-distribution table under the coordinates , and or in the quantile table of the Fisher distribution . ${\ displaystyle I ^ {- 1}}$${\ displaystyle x_ {p}}$${\ displaystyle p}$${\ displaystyle m}$${\ displaystyle n}$

For some values , the quantile functions can be calculated explicitly. The beta integral is solved with where invertible functions occur for a few indices: ${\ displaystyle m}$${\ displaystyle n}$${\ displaystyle x_ {p} (m, \, n)}$${\ displaystyle I ({\ tfrac {mx} {mx + n}}, {\ tfrac {m} {2}}, {\ tfrac {n} {2}})}$${\ displaystyle m, n = 1,2, \ dotsc,}$

${\ displaystyle {\ begin {array} {c | c | c | c | c} m \ downarrow, \, n \ rightarrow & 1 & 2 & 3 & 4 \\\ hline 1 & \ tan ({\ frac {\ pi} {2}} p ) ^ {2} & {\ frac {2p ^ {2}} {1-p ^ {2}}} &? & {\ Frac {4} {2 \ cos ({\ frac {2 \ arcsin (p) } {3}}) - 1}} - 4 \\\ hline 2 & {\ frac {1} {2}} ({\ frac {1} {(1-p) ^ {2}}} - 1) & {\ frac {p} {1-p}} & {\ frac {3} {2}} ({\ frac {1} {(1-p) ^ {2/3}}} - 1) & {\ frac {2} {\ sqrt {1-p}}} - 2 \\\ hline 3 &? & {\ frac {2p ^ {2/3}} {3-3p ^ {2/3}}} &? & ? \\\ hline 4 & {\ frac {1} {(4 \ sin ({\ frac {\ arcsin (1-p)} {3}})) ^ {2}}} - {\ frac {1} { 4}} & {\ frac {\ sqrt {p}} {2 (1 - {\ sqrt {p}})}} &? & {\ Frac {1} {{\ frac {1} {2}} + \ sin ({\ frac {\ arcsin (1-2p)} {3}})}} - 1 \\\ end {array}}}$

The general expressions for higher indices can even be read from the complete row and column. One finds:

${\ displaystyle x_ {p} (2, \, n) = {\ frac {n} {2}} \ left ({\ frac {1} {(1-p) ^ {2 / n}}} - 1 \ right)}$
${\ displaystyle x_ {p} (m, \, 2) = {\ frac {2} {m}} \ left ({\ frac {p ^ {2 / m}} {1-p ^ {2 / m} }} \ right)}$

literature

• Joachim Hartung, Bärbel Elpelt, Karl-Heinz Klösener: Statistics. 12th edition, Oldenbourg 1999, p. 156 ff., ISBN 3-486-24984-3 .