McNemar test

The McNemar test is a statistical test for related samples, in which a dichotomous characteristic is considered, e.g. B. can occur with a four-field table . Linked samples exist when there is a connection between the observations; B. makes a before-and-after comparison of patients in the context of medical statistics.

Mathematical formulation

Sample 2	0	1
	Sample 1
0	${\ displaystyle a}$	${\ displaystyle b}$	${\ displaystyle a + b}$
1	${\ displaystyle c}$	${\ displaystyle d}$	${\ displaystyle c + d}$
	${\ displaystyle a + c}$	${\ displaystyle b + d}$	${\ displaystyle n}$

The McNemar test checks whether a change has occurred in a connected sample. If there were no changes, then there should be or . The following mathematical formulation of the hypotheses results for the probabilities of the occurrence of etc. ${\ displaystyle a + b \ approx a + c}$ ${\ displaystyle c + d \ approx b + d}$ ${\ displaystyle p _ {\ bullet}}$ ${\ displaystyle (0,0)}$

{\ displaystyle H_ {0}: p_ {a} + p_ {c} = p_ {a} + p_ {b}}

{\ displaystyle H_ {1}: p_ {a} + p_ {c} \ neq p_ {a} + p_ {b}}

or the equivalent hypotheses

{\ displaystyle H_ {0}: p_ {c} = p_ {b}}

{\ displaystyle H_ {1}: p_ {c} \ neq p_ {b}}

Exact test

For the exact test, the observations "bottom left" and "top right" in the contingency table are viewed as random drawings with the two possible results "bottom left" and "top right". If there is a likelihood that an observation will land “bottom left”, then the hypotheses of the McNemar test translate into the hypotheses of a binomial test ${\ displaystyle \ pi}$

{\ displaystyle H_ {0}: \ pi = 0 {,} 5}

{\ displaystyle H_ {1}: \ pi \ neq 0 {,} 5}

The test statistic : “Number of observations at the top right” is then binomially distributed with (analogous for ). ${\ displaystyle B}$ ${\ displaystyle B (b + c; 0 {,} 5)}$ ${\ displaystyle C}$

The exact test is z. B. used in SPSS when calling the McNemar test if is. ${\ displaystyle b + c <25}$

χ ² -Test Statistics

McNemar (1947) used a test to solve the test problem. If the null hypothesis is valid, the expected cell frequencies are even , so the test statistic results ${\ displaystyle \ chi ^ {2}}$ ${\ displaystyle {\ tfrac {b + c} {2}}}$

{\ displaystyle {\ hat {X}} ^ {'2} = {\ frac {(b - {\ tfrac {b + c} {2}}) ^ {2}} {\ tfrac {b + c} { 2}}} + {\ frac {(c - {\ tfrac {b + c} {2}}) ^ {2}} {\ tfrac {b + c} {2}}} = {\ frac {(bc ) ^ {2}} {b + c}}}

.

This test statistic is distributed approximately with one degree of freedom. ${\ displaystyle \ chi ^ {2}}$

Yates correction

Since the frequencies are discrete, the test statistics are also discrete. Since the distribution is a continuous distribution, there is an approximation error. In order to reduce this approximation error, Yates proposed a general continuity correction . This results in the following test statistic: ${\ displaystyle X ^ {'2}}$ ${\ displaystyle \ chi ^ {2}}$

{\ displaystyle {\ hat {X}} ^ {2} = {\ frac {(| bc | -0 {,} 5) ^ {2}} {b + c}}}

.

The subtrahend 0.5 is the so-called Yates correction . Assuming a symmetrical distribution of the two variables or samples to be tested, reducing the amount of the deviation (bc) by 0.5 improves the approximation of the calculated -distributed test variable to the results of the exact test according to Fisher . ${\ displaystyle \ chi ^ {2}}$

It is especially necessary for smaller samples ( ) and can be omitted for larger samples. ${\ displaystyle b + c <30}$

Edwards correction

The Yates correction was originally developed for 2x2 crosstabs. In the McNemar test, however, a 2x1 crosstab is actually considered, and it can be shown that the above test statistic corrects too much with the Yates correction. This is why Edwards' correction is often used:

{\ displaystyle {\ hat {X}} ^ {* 2} = {\ frac {(\ left | bc \ right | -1) ^ {2}} {b + c}}.}

z. E.g. in SPSS and R , the McNemar test with continuity correction uses the Edwards correction. The question of the size of the subtrahend for the continuity correction only plays a role for small sample sizes.

Action

	Sample 1 positive	Sample 1 negative
Sample 2 positive	a	b
Sample 2 negative	c	d

To compare whether the frequencies in the samples differ significantly, consider the ratio of the difference between the two samples, which had different results for the two samples, in the example b and c, to the sum of the two values. The test variable determined in this way is compared with the values of the distribution for 1 degree of freedom and the corresponding confidence level (mostly 95% confidence level or 5% significance level). The exact calculation rule is: ${\ displaystyle \ chi ^ {2}}$

{\ displaystyle {\ hat {\ mathrm {X}}} ^ {2} = {\ frac {(\ left | bc \ right | -0 {,} 5) ^ {2}} {b + c}}}

If the calculated test variable is the same as or greater than the comparative value of the distribution (for 1 degree of freedom and 95% quantile, e.g. 3.84), one can assume that there is a statistically significant difference between the two samples and that a result (positive or negative) occurs in one of the groups so frequently that a purely random difference is very certain (at a 95% confidence level, the statement obtained e.g. in 95% of the cases agrees with reality) can be excluded. ${\ displaystyle \ chi ^ {2}}$

The test itself does not say whether this significance means an improvement or a deterioration. Because the McNemar test can only be carried out on both sides (it checks whether there are changes - not whether there is an increase or decrease in frequencies). However, the direction of the change can easily be deduced from the data, depending on whether higher frequencies occur in field b or c.

If there is constant data or discrete data with too many feature classes , the media dichotomization is often used in order to be able to check the data with the McNemar test.

example

Smoker

The aim is to investigate whether an anti-smoking campaign can successfully reduce the number of smokers. To do this, the number of smokers before and after the campaign is first recorded in random samples. In the table above, sample 1 indicates the measurement before and sample 2 indicates the measurement after the campaign. In order to compare whether there has been a significant change in the number of smokers, only the “changers” are of interest, ie the people whose smoking behavior has changed between the two measurements. These frequencies can be found in table fields b and c . If the campaign had no influence on smoking habits, then there should be just as many smokers who become non-smokers as non-smokers who become smokers due to random or disruptive influences. It is precisely this basic idea that is checked by the McNemar test (see formula above).

From a significant difference in the test variable of the McNemar test alone, however, it cannot be directly concluded that the number of smokers has decreased , since, as I said, only non-targeted tests are carried out for significant differences; the McNemar test initially only says that one Change has taken place, but not in which direction. That is, even if the campaign had increased the number of smokers significantly, the McNemar test would show a difference here. In order to avoid such misinterpretations, one has to take a closer look at the determined values for b and c . In this case, b would have to be significantly smaller than c , since c stands for the smokers who have become non-smokers.

Car-free Sunday

	Opinion on the car-free Sunday		Total
Opinion before car-free Sunday	Therefore	On the other hand
Therefore	8th	5	13
On the other hand	16	11	27
Total	24	16	40

40 people were asked before a car-free Sunday whether they are against or for a car-free Sunday. After a car-free Sunday, the same people are interviewed again (= connected sample ). The aim is to examine whether the experience of a car-free Sunday has caused a significant change in perception. The 8 or 11 respondents whose opinion has not changed do not say anything about possible changes in their opinion. It is checked whether the changes from for to against or from against to for are balanced or not:

{\ displaystyle H_ {0}: p _ {{\ text {for}} \ rightarrow {\ text {against}}} = p _ {{\ text {against}} \ rightarrow {\ text {for}}}}

vs.

{\ displaystyle H_ {1}: p _ {{\ text {for}} \ rightarrow {\ text {against}}} \ neq p _ {{\ text {against}} \ rightarrow {\ text {for}}}}

With and the following test values result: ${\ displaystyle b = 5}$ ${\ displaystyle c = 16}$

${\ displaystyle v = {\ frac {(\ left | 5-16 \ right | -0 {,} 5) ^ {2}} {5 + 16}} = 5 {,} 2500}$ or.
${\ displaystyle v ^ {*} = {\ frac {(\ left | 5-16 \ right | -1) ^ {2}} {5 + 16}} = 4 {,} 7619}$ .

For a significance level of , there is a critical value of . Since both test values , and , are greater than the critical value, the null hypothesis is rejected in both cases. That is, there is a significant change in attitudes. ${\ displaystyle \ alpha = 5 \, \%}$ ${\ displaystyle \ chi _ {1; 0 {,} 95} ^ {2} = 3 {,} 84}$ ${\ displaystyle v}$ ${\ displaystyle v ^ {*}}$

In the exact test, the “number of changed opinions from for to against” is binomially distributed under the null hypothesis above, so it follows a binomial distribution (analogous for ). The critical values are 6 and 15, i.e. i.e., lies or lies in the interval , then the null hypothesis cannot be rejected. Even with the exact test, the null hypothesis is rejected. ${\ displaystyle B:}$ ${\ displaystyle B (n = b + c; p = 0 {,} 5)}$ ${\ displaystyle C}$ ${\ displaystyle b}$ ${\ displaystyle c}$ ${\ displaystyle [6; 15]}$

Procedure	Calculated value ${\ displaystyle p}$
Exact test	0.0266
Edwards continuity correction with ${\ displaystyle -1}$	0.0291
Yates continuity correction with ${\ displaystyle -0 {,} 5}$	0.0219

literature

Christel Weiß: Basic knowledge of medical statistics. 3. Edition. Springer, Berlin 2005, ISBN 3-540-24072-1 .

Individual evidence

↑ Quinn McNemar: Note on the sampling error of the difference between correlated proportions or percentages . In: Psychometrika . tape 12 , no. 2 , June 18, 1947, p. 153-157 , doi : 10.1007 / BF02295996 , PMID 20254758 .
^ F. Yates: Contingency tables involving small numbers and the χ ² test. In: Journal of the Royal Statistical Society. 1, 1934, pp. 217-235, (Supplement) doi : 10.2307 / 2983604 , JSTOR 2983604 .
^ F. Yates: Tests of significance for 2 × 2 contingency tables. In: Journal of the Royal Statistical Society. 147, 1984, pp. 426-463, (Series A). doi : 10.2307 / 2981577 , JSTOR i349611
↑ Catalina Stefanescu, Vance W. Berger, Scott Hershberger: Yates's continuity correction . In: B. Everitt, D. Howell (Eds.): The Encyclopedia of Behavioral Statistics . John Wiley & Sons, 2005 ( london.edu [PDF]).
^ Allen L. Edwards: Note on the correction for continuity in testing the significance of the difference between correlated proportions . In: Psychometrika . tape 13 , no. 3 , 1948, p. 185-187 , doi : 10.1007 / BF02289261 .

[McNemar1947-1] Quinn McNemar: Note on the sampling error of the difference between correlated proportions or percentages . In: Psychometrika . tape 12 , no. 2 , June 18, 1947, p. 153-157 , doi : 10.1007 / BF02295996 , PMID 20254758 .

[2] F. Yates: Contingency tables involving small numbers and the χ ² test. In: Journal of the Royal Statistical Society. 1, 1934, pp. 217-235, (Supplement) doi : 10.2307 / 2983604 , JSTOR 2983604 .

[3] F. Yates: Tests of significance for 2 × 2 contingency tables. In: Journal of the Royal Statistical Society. 147, 1984, pp. 426-463, (Series A). doi : 10.2307 / 2981577 , JSTOR i349611

[4] Catalina Stefanescu, Vance W. Berger, Scott Hershberger: Yates's continuity correction . In: B. Everitt, D. Howell (Eds.): The Encyclopedia of Behavioral Statistics . John Wiley & Sons, 2005 ( london.edu [PDF]).

[5] Allen L. Edwards: Note on the correction for continuity in testing the significance of the difference between correlated proportions . In: Psychometrika . tape 13 , no. 3 , 1948, p. 185-187 , doi : 10.1007 / BF02289261 .

McNemar test

contents