Post hoc test

Post-hoc tests are significance tests from mathematical statistics . The simple analysis of variance , the Kruskal-Wallis test, or the median test only determines that there are significant differences in a group of means. The post-hoc tests use paired mean value comparisons to provide information about which mean values differ significantly from one another. Or they enable a statement to be made about which group mean values are not significantly different through group-wise comparisons.

Overview of the post-hoc tests

The post-hoc tests differ in several criteria, e.g. B. are the sample sizes in all groups the same (balanced case) or not (unbalanced case) or is the variance the same in all groups (variance homogeneity) or not (variance heterogeneity). The homogeneity of variance can be checked with the Levene test .

test	comparison of	Homogeneity of variance	Sample sizes
smallest significant difference	Mean pairs	No	Unequal
Bonferroni test for smallest significant difference	Mean pairs	Yes	Unequal
Šidák	Mean pairs	No
Tamhane ${\ displaystyle T_ {2}}$	Mean pairs	No
Games-Howell	Mean pairs	No
Dunnett's ${\ displaystyle T_ {3}}$	Mean pairs	No	With small sample sizes
Dunnett's ${\ displaystyle C}$	Mean pairs	No	With large sample sizes
Ryan-Einot-Gabriel-Welch	spanned mean values	Yes
Duncan	spanned mean values	Yes	Equal
Tukey b	spanned mean values	Yes
Student-Newman-Keuls	spanned mean values	Yes	Equal
Tukey	spanned mean values	Yes	Equal
Hochberg	spanned mean values	Yes
Gabriel	spanned mean values	Yes
Scheffé	Mean pairs	Yes	Unequal

The tests can be partially classified depending on how conservative they are:

Conservative - Duncan> Scheffé> Tukey> Newman-Keuls> smallest significant difference - not conservative .

Requirements and notation

It is assumed that the alternative hypothesis was accepted for the mean value comparisons in groups and at a level of significance . That is, there are differences between at least two group mean values. The hypotheses for all of the following tests are ${\ displaystyle m}$ ${\ displaystyle \ alpha}$

* for the pairwise tests:	${\ displaystyle H_ {0}: \ mu _ {i} = \ mu _ {j} \,}$ vs. and ${\ displaystyle H_ {1}: \ mu _ {i} \ neq \ mu _ {j}}$
* for the spanned ordered mean values:	${\ displaystyle H_ {0}: \ mu _ {(i)} = \ mu _ {(i + p-1)} \,}$ vs. . ${\ displaystyle H_ {1}: \ mu _ {(i)} \ neq \ mu _ {(i + p-1)} \,}$

Furthermore, let the number of observations in the group and the number of all observations. The tests are divided into tests for the balanced case ( ) and for the unbalanced case (the sample sizes in the groups can be different). ${\ displaystyle n_ {i}}$ ${\ displaystyle i}$ ${\ displaystyle n = n_ {1} + \ dots + n_ {m}}$ ${\ displaystyle r = n_ {1} = \ dots = n_ {m}}$

Tests for the unbalanced case

Test for the smallest significant difference

In the test on the smallest significant difference ( least significant difference test , in short: LSD test ), also test smallest backed difference , or least significant difference test is the test statistic :

{\ displaystyle T = {\ frac {{\ overline {X}} _ {i} - {\ overline {X}} _ {j}} {S {\ sqrt {{\ tfrac {1} {n_ {i} }} + {\ tfrac {1} {n_ {j}}}}}}} \ sim t_ {nm}}

With

{\ displaystyle S ^ {2} = {\ frac {1} {nm}} \ sum _ {j = 1} ^ {m} (n_ {j} -1) S_ {j} ^ {2}}

and the group variance of the group . ${\ displaystyle S_ {j} ^ {2}}$ ${\ displaystyle j}$

The least significant difference test is based on the two-sample t-test , but the variance is calculated using all groups.

Bonferroni test for smallest significant difference

In the Bonferroni test for the smallest significant difference, the test statistic is identical to the test statistic for the test for the smallest significant difference. However, the significance level is corrected using the Bonferroni method . If the analysis of variance is carried out with the significance level , then the corrected significance level is used for the pairwise mean value comparisons: ${\ displaystyle \ alpha}$ ${\ displaystyle \ alpha ^ {*}}$

{\ displaystyle \ alpha ^ {*} = {\ frac {2} {m (m-1)}} \ alpha}

.

The critical values for the corrected level of significance can be found in special tables or can be calculated using the approximation

{\ displaystyle t_ {nm; 1- \ alpha / 2} \ approx {\ frac {z_ {1- \ alpha}} {1 - {\ tfrac {z_ {1- \ alpha} ^ {2} +1} { 4 (nm)}}}}}

to be determined. is the - quantile from the standard normal distribution . ${\ displaystyle z_ {1- \ alpha}}$ ${\ displaystyle (1- \ alpha)}$

The test should only be used if it is not too large , otherwise the corrected significance level will be too small and non-rejection areas of the t-tests will overlap. Is z. B. and , then is . ${\ displaystyle m}$ ${\ displaystyle m = 5}$ ${\ displaystyle \ alpha = 5 \, \%}$ ${\ displaystyle \ alpha ^ {*} = 0 {,} 5 \, \%}$

Scheffé test

The Scheffé test actually requires homogeneity of variance in the groups, but it is insensitive to the violation of this requirement.

Simple Scheffé test

The simple Scheffé test checks vs. with the help of the test statistics ${\ displaystyle H_ {0}: \ mu _ {i} = \ mu _ {j} \}$ ${\ displaystyle H_ {1}: \ mu _ {i} \ neq \ mu _ {j}}$

{\ displaystyle F = {\ frac {{\ tfrac {1} {m-1}} ({\ overline {X}} _ {i} - {\ overline {X}} _ {j}) ^ {2} } {S ^ {2} \ left ({\ tfrac {1} {n_ {i}}} + {\ tfrac {1} {n_ {j}}} \ right)}} \ sim F_ {m-1, nm}}

.

The simple Scheffé test is a special case of the general Scheffé test for a linear contrast for two mean values.

Linear contrast

A linear contrast of one or more mean values is defined as

{\ displaystyle \ Lambda = c_ {1} \ mu _ {1} + \ dots + c_ {m} \ mu _ {m}}

with .

{\ displaystyle c_ {1} + \ dots + c_ {m} = 0}

For the simple Scheffé test, the linear contrast is:

{\ displaystyle c_ {k} = {\ begin {cases} 1 & k = i \\ - 1 & k = j \\ 0 & {\ text {otherwise}} \ end {cases}}}

.

Two contrasts and are called orthogonal if applies ${\ displaystyle \ Lambda ^ {(1)}}$ ${\ displaystyle \ Lambda ^ {(2)}}$

{\ displaystyle c_ {1} ^ {(1)} c_ {1} ^ {(2)} + \ dots + c_ {m} ^ {(1)} c_ {m} ^ {(2)} = 0}

.

General Scheffé test

For the general Scheffé test, the hypotheses for all (orthogonal) contrasts vs. for at least one contrast. The test statistic results in ${\ displaystyle H_ {0}: \ Lambda = 0 \,}$ ${\ displaystyle H_ {1}: \ Lambda \ neq 0}$

{\ displaystyle F = {\ frac {\ displaystyle {\ frac {1} {m-1}} \ left (\ sum _ {j = 1} ^ {m} c_ {j} {\ overline {X}} _ {j} \ right) ^ {2}} {\ displaystyle S ^ {2} \ left (\ sum _ {j = 1} ^ {m} {\ frac {c_ {j} ^ {2}} {n_ { j}}} \ right)}} \ sim F_ {m-1, nm}}

.

The idea is based on the variance decomposition of the estimated contrast ${\ displaystyle L = c_ {1} {\ overline {X}} _ {1} + \ dots + c_ {m} {\ overline {X}} _ {m}}$

{\ displaystyle {\ frac {\ operatorname {Var} (L)} {\ operatorname {Var} (L)}} = {\ frac {\ operatorname {E} (L ^ {2}) - (\ operatorname {E } (L)) ^ {2}} {\ operatorname {Var} (L)}} = {\ frac {\ operatorname {E} (L ^ {2})} {\ operatorname {Var} (L)}} }

,

since under validity of the null hypothesis is true: . ${\ displaystyle \ operatorname {E} (L) = 0}$

Tests for the balanced case

These tests are intended for the balanced case; that is, the sample size in each group is the same . SPSS also performs the test if the sample sizes are unequal in each group, but it is then calculated as the harmonic mean of the sample sizes. ${\ displaystyle r}$ ${\ displaystyle r}$

The test statistics are always the same for the following tests

{\ displaystyle Q = {\ frac {| {\ overline {X}} _ {i} - {\ overline {X}} _ {j} |} {S / {\ sqrt {r}}}}}

.

The critical values are only available in tabular form (mostly for or ). There are between the mean values and further mean values. ${\ displaystyle q (\ alpha, p, f)}$ ${\ displaystyle \ alpha = 5 \, \%}$ ${\ displaystyle \ alpha = 10 \, \%}$ ${\ displaystyle i}$ ${\ displaystyle j}$ ${\ displaystyle p-2}$

Tukey test

The critical values result from the Tukey test

{\ displaystyle q (\ alpha, m, nm) \,}

,

d. In other words , there is no Bonferroni correction and the number of overstretched mean values is not taken into account.

Student-Newman-Keuls test

In the Student-Newman-Keuls test, the critical values result from

{\ displaystyle q (\ alpha, p, nm) \,}

,

d. In other words , there is no Bonferroni correction and the number of overstretched mean values is taken into account.

Duncan's test

The critical values result from the Duncan test

{\ displaystyle q (1- (1- \ alpha) ^ {p-1}, p, nm) \,}

,

d. That is, a Bonferroni correction takes place and the number of overstretched mean values is taken into account.

When using the Duncan test, it should be noted that it only carries out group-wise comparisons, so that unambiguous statements about significance are not always possible.

example

Rent burden rate in%
state	number	Median	medium	Std.dev.
Saxony	1356	19.0	22.3	12.5
Brandenburg	803	19.0	23.4	13.2
Mecklenburg-Western Pomerania	491	20.0	22.1	10.3
Thuringia	744	21.0	24.0	13.3
Berlin	998	22.0	24.4	11.9
Baden-Württemberg	3246	22.0	24.8	14.2
Bavaria	3954	22.0	25.4	14.2
North Rhine-Westphalia	5266	23.0	25.8	13.8
Hesse	1904	23.0	26.3	14.3
Saxony-Anhalt	801	23.0	26.6	14.3
Rhineland-Palatinate	1276	24.0	26.1	13.5
Lower Saxony	2374	24.0	27.9	15.7
Hamburg	528	24.5	29.3	18.9
Schleswig-Holstein	890	25.0	27.9	14.8
Saarland	312	26.0	26.7	11.9
Bremen	194	27.0	29.2	15.8
Germany	9527	22.0	25.5	14.0

For the Mietbelastungsquote (= ratio of gross rental income on household income), taken from the CAMPUS Files for the microcensus 2002 of the Federal Statistical Office , give both the nonparametric median test and parametric way analysis of variance ( English one-way ANOVA ) highly significant differences in the medians or mean values of the federal states. In other words, there are differences between the federal states in the mean rental expenditure (in relation to income).

Since the Levene test rejects the null hypothesis of homogeneity of variance and the observation numbers differ significantly in the sample, only the following test methods remain to determine the difference:

smallest significant difference
Bonferroni test for smallest significant difference
Scheffé

Since the Scheffé test in SPSS performs pairwise comparisons as well as outputs homogeneous subgroups, let's look at its results.

Pairwise comparisons

The pairwise comparison is used to provide information about significant differences between the mean values of the individual groups. In the present example, for the respective pairwise comparisons for each combination of two federal states are output:

the difference , ${\ displaystyle {\ overline {x}} _ {i} - {\ overline {x}} _ {j}}$
the standard error,
the p-value (column: significance ), which means a rejection of the equality of the mean values if the specified significance level is not reached, and
a 95% confidence interval for the difference in mean. If the confidence interval does not contain zero, the null hypothesis is rejected at the significance level of 5%.

At a given level of significance of 5%, only the mean values for Schleswig-Holstein and Saxony are significant (p-value equals 2.1%), for all other comparisons with Schleswig-Holstein not.

Group comparisons

By means of the group-by-group comparison, detailed statements can be made about the homogeneity of the group mean values. However, this comparison allows only limited statements about the significant differences between the groups.

In the present example an iterative process is carried out to find homogeneous subgroups, i.e. H. Groups in which the null hypothesis of equality of means is not rejected. For this purpose, the observed mean values are sorted according to size and a series of tests is carried out. ${\ displaystyle {\ overline {x}} _ {(1)} \ leq \ dots \ leq {\ overline {x}} _ {(16)}}$

Overstretched mean values	Tested null hypotheses ${\ displaystyle H_ {0}}$
16				${\ displaystyle \ mu _ {(1)} = \ dots = \ mu _ {(16)}}$
15th			${\ displaystyle \ mu _ {(1)} = \ dots = \ mu _ {(15)}}$		${\ displaystyle \ mu _ {(2)} = \ dots = \ mu _ {(16)}}$
14th		${\ displaystyle \ mu _ {(1)} = \ dots = \ mu _ {(14)}}$		${\ displaystyle \ mu _ {(2)} = \ dots = \ mu _ {(15)}}$		${\ displaystyle \ mu _ {(3)} = \ dots = \ mu _ {(16)}}$
13	${\ displaystyle \ mu _ {(1)} = \ dots = \ mu _ {(13)}}$		${\ displaystyle \ mu _ {(2)} = \ dots = \ mu _ {(14)}}$		${\ displaystyle \ mu _ {(3)} = \ dots = \ mu _ {(15)}}$		${\ displaystyle \ mu _ {(4)} = \ dots = \ mu _ {(16)}}$
...	In general, further tests are carried out with fewer and fewer groups
For example:	${\ displaystyle H_ {0}}$ not refused		${\ displaystyle H_ {0}}$ not rejected in previously contained ${\ displaystyle H_ {0}}$			${\ displaystyle H_ {0}}$ declined

In the first step, the null hypothesis is tested and rejected; we already know that the mean values are different. Then first ${\ displaystyle H_ {0}: \ mu _ {(1)} = \ dots = \ mu _ {(16)}}$

removed the state with the largest mean and tested the null hypothesis and ${\ displaystyle H_ {0}: \ mu _ {(1)} = \ dots = \ mu _ {(15)}}$
removed the state with the smallest mean and tested the null hypothesis . ${\ displaystyle H_ {0}: \ mu _ {(2)} = \ dots = \ mu _ {(16)}}$

In both tests, only groups with 15 federal states are tested. If the null hypothesis is rejected in one of the tests (red in the table), the state with the largest mean and the state with the smallest mean are removed from the group and the test is repeated. A sequence of null hypotheses to be tested is thus built up with an ever decreasing number of mean values.

The procedure is canceled if

either the null hypothesis cannot be rejected in one of the tests (green in the table) or
the considered null hypothesis is already part of a null hypothesis that has not been rejected (yellow in the table) or
only one state is left.

The "green" subgroups are issued by SPSS.

For the example there are two homogeneous subgroups with 14 federal states each. In other words, the null hypothesis of equality of the means could not be rejected here. Bremen and Hamburg are excluded from homogeneous subgroup 1, and Saxony and Mecklenburg-Western Pomerania are excluded from homogeneous subgroup 2. Statements about which mean values of which federal states are significantly different cannot be made in this case.

Individual evidence

^ Ajit C. Tamhane: Multiple comparisons in model I one-way ANOVA with unequal variances . In: Communications in Statistics - Theory and Methods . tape 6 , no. 1 , 1977, pp. 15-32 , doi : 10.1080 / 03610927708827466 .
↑ Werner Timischl : Applied Statistics. An introduction for biologists and medical professionals. 2013, 3rd edition, p. 373.

literature

Bernd Rönz: script: Computational Statistics I . Humboldt University of Berlin, Chair of Statistics, Berlin 2001.

[1] Ajit C. Tamhane: Multiple comparisons in model I one-way ANOVA with unequal variances . In: Communications in Statistics - Theory and Methods . tape 6 , no. 1 , 1977, pp. 15-32 , doi : 10.1080 / 03610927708827466 .

[2] Werner Timischl : Applied Statistics. An introduction for biologists and medical professionals. 2013, 3rd edition, p. 373.