Gaussian test

In mathematical statistics, the Gauss test or Z test is a group of hypothesis tests with a standard normally distributed test test variable under the null hypothesis. The test is named after Carl Friedrich Gauß .

With the z-test based on are sampling - averages hypotheses about the expected values tested those populations from which the samples come from.

The Gaussian test follows a method similar to the t-test . The most important difference lies in the requirements for the application of these tests: While the t-test works with the empirical standard deviations of the samples, the standard deviations of the population must be known for the Gaussian test . Furthermore, the Gauss test basically uses the standard normal distribution as the characteristic value distribution, while the t test uses the t distribution. The Gauss test is therefore only suitable to a limited extent for small samples.

Mathematical basics

If there are independent, normally distributed random variables with expected value and standard deviation , then their arithmetic mean is ${\ displaystyle X_ {1}, X_ {2}, \ dots, X_ {n}}$ ${\ displaystyle \ mu _ {X}}$ ${\ displaystyle \ sigma _ {X}}$

{\ displaystyle {\ bar {X}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} X_ {i}}

normally distributed with expected value and standard error . ${\ displaystyle \ mu _ {X}}$ ${\ displaystyle \ sigma _ {X} / {\ sqrt {n}}}$

The sampling function

{\ displaystyle Z = {\ frac {{\ bar {X}} - \ mu _ {0}} {\ sigma _ {X}}} {\ sqrt {n}}}

is then standard normal distributed under the null hypothesis and is used as test statistic . ${\ displaystyle \ mu _ {X} = \ mu _ {0}}$

The test statistic can be written as:

{\ displaystyle Z = {\ frac {{\ bar {X}} - \ mu _ {X}} {\ sigma _ {X}}} {\ sqrt {n}} + {\ frac {\ mu _ {X } - \ mu _ {0}} {\ sigma _ {X}}} {\ sqrt {n}} = X + {\ frac {\ mu _ {X} - \ mu _ {0}} {\ sigma _ { X}}} {\ sqrt {n}}}

,

So like a standard normally distributed random variable plus a number that shows the distance between the real and the assumed expected value in a standardized way. ${\ displaystyle X}$

There are also independent normally distributed random variables with expected value , standard deviation and arithmetic mean ${\ displaystyle Y_ {1}, Y_ {2}, \ dots, Y_ {m}}$ ${\ displaystyle \ mu _ {Y}}$ ${\ displaystyle \ sigma _ {Y}}$

{\ displaystyle {\ bar {Y}} = {\ frac {1} {m}} \ sum _ {i = 1} ^ {m} Y_ {i}}

that are also independent of the sample, the distribution is normal with the expected value and standard deviation . ${\ displaystyle X}$ ${\ displaystyle {\ bar {X}} - {\ bar {Y}}}$ ${\ displaystyle \ mu _ {X} - \ mu _ {Y}}$ ${\ displaystyle {\ sqrt {{\ frac {\ sigma _ {X} ^ {2}} {n}} + {\ frac {\ sigma _ {Y} ^ {2}} {m}}}}}$

The sampling function

{\ displaystyle Z = {\ frac {({\ bar {X}} - {\ bar {Y}}) - \ delta} {\ sqrt {{\ frac {\ sigma _ {X} ^ {2}} { n}} + {\ frac {\ sigma _ {Y} ^ {2}} {m}}}}}}

is then standard normal distributed under the null hypothesis and is used as test statistic. ${\ displaystyle \ mu _ {X} - \ mu _ {Y} = \ delta}$

One-sample Gaussian test

application

The one- sample Gaussian test uses the arithmetic mean of a sample to check whether the expected value of the associated population is not equal to (or smaller or larger) a specified value.

The sample consists of the characteristics of independent random variables and comes from a normally distributed population with an unknown expected value and known standard deviation . ${\ displaystyle x_ {1}, x_ {2}, \ dots, x_ {n}}$ ${\ displaystyle \ mu}$ ${\ displaystyle \ sigma}$

It will be tested at one

two-sided test: against ${\ displaystyle \! H_ {0} \ colon \ mu = \ mu _ {0}}$ ${\ displaystyle \! H_ {1} \ colon \ mu \ neq \ mu _ {0}}$
right-sided test: against ${\ displaystyle H_ {0} \ colon \ mu \ leq \ mu _ {0}}$ ${\ displaystyle \! H_ {1} \ colon \ mu> \ mu _ {0}}$
left-sided test: against ${\ displaystyle H_ {0} \ colon \ mu \ geq \ mu _ {0}}$ ${\ displaystyle \! H_ {1} \ colon \ mu <\ mu _ {0}}$

The value of is specified by the user. ${\ displaystyle \ mu _ {0}}$

Calculation of the test size

The sample mean is used to calculate the test size . ${\ displaystyle {\ bar {x}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} x_ {i}}$ ${\ displaystyle z = {\ sqrt {n}} \ cdot {\ frac {{\ bar {x}} - \ mu _ {0}} {\ sigma}}}$

Two-sample Gaussian test for independent samples

application

The two-sample Gaussian test for independent samples uses the arithmetic means of the samples to check whether the expected values of the associated populations are different.

The independent samples and should also be mutually independent and normally distributed populations with unknown expected values or and known standard deviations or originate. ${\ displaystyle x_ {1}, x_ {2}, \ dots, x_ {n}}$ ${\ displaystyle y_ {1}, y_ {2}, \ dots, y_ {m}}$ ${\ displaystyle \ mu _ {X}}$ ${\ displaystyle \ mu _ {Y}}$ ${\ displaystyle \ sigma _ {X}}$ ${\ displaystyle \ sigma _ {Y}}$

It will be tested at one

two-sided test: against ${\ displaystyle \! \, H_ {0} \ colon \ mu _ {X} - \ mu _ {Y} = \ mu _ {0} \! \,}$ ${\ displaystyle \! \, H_ {1} \ colon \ mu _ {X} - \ mu _ {Y} \ neq \ mu _ {0}}$
right-sided test: against ${\ displaystyle H_ {0} \ colon \ mu _ {X} - \ mu _ {Y} \ leq \ mu _ {0}}$ ${\ displaystyle \! H_ {1} \ colon \ mu _ {X} - \ mu _ {Y}> \ mu _ {0}}$
left-sided test: against ${\ displaystyle H_ {0} \ colon \ mu _ {X} - \ mu _ {Y} \ geq \ mu _ {0}}$ ${\ displaystyle \! H_ {1} \ colon \ mu _ {X} - \ mu _ {Y} <\ mu _ {0}}$

The value of is specified by the user. ${\ displaystyle \ mu _ {0}}$

Calculation of the test size

The sample means and are used to calculate the test size . ${\ displaystyle {\ bar {x}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} x_ {i}}$ ${\ displaystyle {\ bar {y}} = {\ frac {1} {m}} \ sum _ {i = 1} ^ {m} y_ {i}}$ ${\ displaystyle z = {\ frac {{\ bar {x}} - {\ bar {y}} - \ mu _ {0}} {\ sqrt {{\ frac {\ sigma _ {X} ^ {2} } {n}} + {\ frac {\ sigma _ {Y} ^ {2}} {m}}}}}}$

Two-sample Gaussian test for dependent (connected) samples

application

For the two-sample Gaussian test for dependent samples, pairs of measured values must be available, such as one such as B. found in before-and-after measurements. The pair differences are used to check whether the expected value of the associated population is not equal to (or smaller or larger) a specified value for these differences . ${\ displaystyle (x_ {i}, y_ {i})}$

The differences should come from a normally distributed population with an unknown expected value and a known standard deviation . ${\ displaystyle d_ {i} = x_ {i} -y_ {i}}$ ${\ displaystyle \ mu}$ ${\ displaystyle \ sigma}$

It will be tested at one

two-sided test: against ${\ displaystyle \! H_ {0} \ colon \ mu = \ mu _ {0}}$ ${\ displaystyle \! H_ {1} \ colon \ mu \ neq \ mu _ {0}}$
right-sided test: against ${\ displaystyle H_ {0} \ colon \ mu \ leq \ mu _ {0}}$ ${\ displaystyle \! H_ {1} \ colon \ mu> \ mu _ {0}}$
left-sided test: against ${\ displaystyle H_ {0} \ colon \ mu \ geq \ mu _ {0}}$ ${\ displaystyle \! H_ {1} \ colon \ mu <\ mu _ {0}}$

${\ displaystyle \ mu _ {0}}$ is specified by the user. In most use cases, "inequality" ( ) is tested, then is . ${\ displaystyle H_ {1}}$ ${\ displaystyle \ mu _ {0} = 0}$

Calculation of the test size

The differences form a new sample with an arithmetic mean . So you can apply the one-sample Gaussian test to the sample of the differences and get the test variable . ${\ displaystyle d_ {i}}$ ${\ displaystyle {\ bar {d}} = {\ frac {1} {n}} \ sum _ {i = 1} ^ {n} d_ {i}}$ ${\ displaystyle z = {\ sqrt {n}} \ cdot {\ frac {{\ bar {d}} - \ mu _ {0}} {\ sigma}}}$

Decision on the hypotheses

In all three Gaussian tests, the general criteria for hypothesis tests are used to decide whether to accept or reject the hypotheses . Since there is a standard normally distributed random variable under the null hypothesis, the following rules are obtained. ${\ displaystyle Z}$

Rejection of (i.e. acceptance of ) at the level of significance if: ${\ displaystyle H_ {0}}$ ${\ displaystyle H_ {1}}$ ${\ displaystyle \ alpha}$

for the two-sided test: (this is the - quantile of the standard normal distribution) ${\ displaystyle | z |> u (1- \ alpha / 2)}$ ${\ displaystyle (1- \ alpha / 2)}$

in the right-sided test:

{\ displaystyle z> u (1- \ alpha)}

in the left-sided test:

{\ displaystyle z <u (\ alpha)}

Gaussian test for non-normally distributed random variables

For large sample sizes (> 30 as a rule of thumb), the assumption of normal distribution can be dispensed with due to the central limit value theorem . If the requirements for the expected values and standard deviations of the random variables involved are fulfilled for the Gauss test, it is assumed that the sums required to calculate z are approximately normally distributed and that the Gauss test delivers correct results to a good approximation.

example

A certain blood parameter B is normally distributed in the population in a very good approximation with . A group of chemically related drugs is known to be able to shift the distribution of the blood parameter, i. H. they may change the expected value (while maintaining the form of distribution). ${\ displaystyle \ sigma = 2}$

For a pharmaceutical P from this group it should be checked whether such a change actually occurs. Random independent samples of size n = 22 give the following measurements for B:

ohne Gabe von P   x_i  12 13 10 12 14 11 14 18 15 13 15 13 11 17 11 12 13 14 15 13 14 13
mit Gabe von P    y_i  13 14 13 17 13 16 16 19 17 15 17 15 15 20 15 15 14 15 13 15 16 15

Various hypotheses are to be tested with these measured values. The level of significance should be 0.05 in each case; the associated u-values are then (in the following all values rounded): ${\ displaystyle \ alpha}$

${\ displaystyle u (1- \ alpha / 2) = u (0 {,} 975) = 1 {,} 960}$
${\ displaystyle u (1- \ alpha) = u (0 {,} 95) = 1 {,} 645}$
${\ displaystyle u (\ alpha) = u (0 {,} 05) = - 1 {,} 645}$

For the mean values one calculates and . ${\ displaystyle {\ bar {x}} = 13 {,} 32}$ ${\ displaystyle {\ bar {y}} = 15 {,} 36}$

1st hypothesis: The mean values of B after administration of P are above 15.

Procedure: right-hand, one-sample Gaussian test

{\ displaystyle H_ {0}: \ mu \ leq \ mu _ {0} = 15}

and

{\ displaystyle H_ {1}: \ mu> 15 \! \,}

{\ displaystyle z = {\ sqrt {22}} \ cdot {\ frac {15 {,} 36-15} {2}} = 0 {,} 84 <1 {,} 645}

Decision: H ₀ is retained. It could not be proven that the administration of P leads to an average B value above 15.

2nd hypothesis: The values of B differ on average in the two populations with and without the administration of P.

Procedure: two-tailed two-sample Gaussian test for independent samples

{\ displaystyle H_ {0}: \ mu _ {x} - \ mu _ {y} = \ mu _ {0} = 0 \! \,}

and

{\ displaystyle H_ {1}: \ mu _ {x} \ neq \ mu _ {y}}

{\ displaystyle | z | = {\ sqrt {22}} \ cdot {\ frac {| 13 {,} 32-15 {,} 36 |} {2 \ cdot {\ sqrt {2}}}} = 3 { ,} 38> 1 {,} 960}

Decision: H ₀ is rejected in favor of H ₁ . With a probability of error of 0.05 or less, it was demonstrated that the B values differ on average with regard to the administration or non-administration of P.

Let us now consider an experiment with dependent samples. In extensive before-and-after studies, a normal distribution was also found for the change in the B values due to the administration of the pharmaceuticals concerned, with . In the table of measured values, the measured values superimposed on each other have now been determined in a before-and-after test. ${\ displaystyle \ sigma = 1 {,} 6}$

3rd hypothesis: The values of B after the administration of P are on average more than 1.25 above the values before the administration of P.

Procedure: left-hand two-sample Gaussian test for dependent samples

{\ displaystyle H_ {0}: \ mu \ geq \ mu _ {0} = - 1 {,} 25}

and

{\ displaystyle H_ {1}: \ mu <-1 {,} 25 \! \,}

{\ displaystyle {\ bar {d}} = {\ bar {x}} - {\ bar {y}} = - 2 {,} 045}

{\ displaystyle z = {\ sqrt {22}} \ cdot {\ frac {-2 {,} 045 + 1.25} {1 {,} 6}} = - 2 {,} 33 <-1 {,} 645}

Decision: H ₀ is rejected in favor of H ₁ . With a probability of error of at most 0.05, it was shown that in before-and-after examinations the B values after the administration of P are on average more than 1.25 above the B values before the administration of P.

literature

Rönz / Strohe (Ed.): Lexicon Statistics . Gabler, 1994, ISBN 978-3-409-19952-0 .
Irle: Probability Theory and Statistics . Cape. 20. Vieweg and Teubner, 2nd edition 2005, ISBN 978-3-519-12395-8 .
Cramer / Kamps: Fundamentals of Probability Theory and Statistics: A script for students of computer science, engineering and economics . P. 271ff. Springer, 2nd edition 2008, ISBN 978-3-540-77760-1 .

Gaussian test

contents

Mathematical basics

One-sample Gaussian test

application

Calculation of the test size

Two-sample Gaussian test for independent samples

application

Calculation of the test size

Two-sample Gaussian test for dependent (connected) samples

application

Calculation of the test size

Decision on the hypotheses

Gaussian test for non-normally distributed random variables

example

See also

literature