Equivalency test

from Wikipedia, the free encyclopedia

Equivalence tests are a variation of hypothesis tests that are used to draw statistical conclusions from observed data. In equivalence tests, the null hypothesis is defined as an effect large enough to be considered interesting, specified by an equivalence limit. The alternative hypothesis is any effect that is less extreme than the bound equivalence. The observed data are compared statistically with the equivalence limits. If the statistical test shows that the observed data is surprising, assuming that true effects are at least as extreme as the equivalence limits, a Neyman-Pearson approach can be used for statistical inference to find effect sizes larger than the equivalence limits to reject with a pre-determined Type 1 error rate .

Equivalence tests originate from the field of pharmacodynamics and drug development. One application is to show that a new drug that is cheaper than available alternatives works as well as an existing drug. Essentially, equivalency tests consist of calculating a confidence interval around an observed effect size and rejecting effects that are more extreme than the equivalence limit if the confidence interval does not overlap the equivalence limit. For two-tailed tests , an upper and lower equivalence limit is given. In non-inferiority studies, where the aim is to test the hypothesis that a new treatment is no worse than existing treatments, only a lower limit of equivalency is pre-set.

Mean differences (black squares) and 90% confidence intervals (horizontal lines) with equivalence limits ΔL = −0.5 and ΔU = 0.5 for four combinations of test results that are statistically equivalent or not and statistically different from zero or no. Pattern A is statistically equivalent, pattern B is statistically different from 0, pattern C is practically insignificant, and pattern D is inconclusive (neither statistically different from 0 nor equivalent).

Equivalence tests can be performed in addition to significance tests with the null hypothesis. This could prevent frequent misinterpretations of p-values larger than the alpha value to aid in the lack of a true effect. In addition, equivalence tests can identify statistically significant, but practically insignificant effects if the effects are statistically different from zero, but also statistically smaller than any effect size considered reasonable (see first figure).

TOST procedure

A very simple equivalence test approach is the procedure of the "two one sided t-tests " ( English two one sided tests , short: TOST ). In the TOST procedure, an upper (Δ U ) and a lower (–Δ L ) equivalence limit are given based on the smallest effect size of interest (e.g. a positive or negative difference of d = 0.3). Two composite null hypotheses are tested: H 01 : Δ ≤ -Δ L and H 02 : Δ ≥ Δ U . If both one-sided tests can be statistically rejected, we can conclude that -Δ L <Δ <Δ U , or that the observed effect is within the equivalence limits and is statistically smaller than any effect that is considered reasonable and practically equivalent. Alternatives to the TOST process have also been developed. A recent modification to TOST enables the repeated measurement and multi-variable assessment approach.

Comparison between t-test and equivalence test

The equivalence test can be “induced” from the t-test for comparison purposes. In the case of a t-test for the significance level α t-test and which achieves a power of 1-β t-test for an effect size d r , both tests lead to the same conclusion if the parameters Δ = d r and α equiv.-test = β t-test and β equiv.-test = α t-test coincide, i.e. H. the errors (type I and type II) between the t-test and the equivalence test are reversed. In order to ensure this for the t-test, either the sample size planning must be carried out correctly or a corrected test must be determined by adapting the significance level α t-test . Both approaches have practical problems, since the sample size planning is based on non-verifiable assumptions regarding the standard deviation and numerical problems arise when adjusting the α t-test (so-called revised t-test ). These restrictions do not occur when using the equivalence test.

The second figure enables a comparison of the equivalence test and the t-test if the sample size planning is affected by differences between the a priori standard deviation and the standard deviation from the sample . Using an equivalency test instead of a t-test ensures that α equiv.-test (or β t-test ) is constrained, which the t-test does not. Especially in this case , the type II error in the t-test can be of any size. In contrast, the t-test turns out to be stricter than the one planned for d r , which can lead to incidental disadvantages (e.g. for a device manufacturer). This makes the equivalency test safer to use.

Probability of passing the t-test (a) or the equivalence test (b), depending on the actual error ?, cf.

further reading

credentials

  1. ^ Walter W. Hauck, Sharon Anderson: A new statistical procedure for testing equivalence in two-group comparative bioavailability trials . In: Journal of Pharmacokinetics and Biopharmaceutics . 12, No. 1, February 1, 1984, ISSN  0090-466X , pp. 83-91. doi : 10.1007 / BF01063612 . PMID 6747820 .
  2. James L. Rogers, Kenneth I. Howard, John T. Vessey: Using significance tests to evaluate equivalence between two experimental groups. . In: Psychological Bulletin . 113, No. 3, 1993, pp. 553-565. doi : 10.1037 / 0033-2909.113.3.553 .
  3. Daniël Lakens: Equivalence tests . In: Social Psychological and Personality Science . 8, No. 4, May 5, 2017, pp. 355–362. doi : 10.1177 / 1948550617697177 . PMID 28736600 .
  4. ^ Donald J. Schuirmann: A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability . In: Journal of Pharmacokinetics and Biopharmaceutics . 15, No. 6, December 1, 1987, ISSN  0090-466X , pp. 657-680. doi : 10.1007 / BF01068419 .
  5. Michael A. Seaman, Ronald C. Serlin: Equivalence confidence intervals for two-group comparisons of means. . In: Psychological Methods . 3, No. 4, 1998, pp. 403-411. doi : 10.1037 / 1082-989x.3.4.403 .
  6. ^ Stefan Wellek: Testing statistical hypotheses of equivalence and noninferiority . Chapman and Hall / CRC, 2010, ISBN 978-1439808184 .
  7. ^ Evangeline M. Rose, Thomas Mathew, Derek A. Coss, Bernard Lohr, Kevin E. Omland: A new statistical method to test equivalence: an application in male and female eastern bluebird song . In: Animal Behavior . 145, 2018, ISSN  0003-3472 , pp. 77-85. doi : 10.1016 / j.anbehav.2018.09.004 .
  8. a b c Michael Siebert, David Ellenberger: Validation of automatic passenger counting: introducing the t-test-induced equivalence test . In: Transportation . April 10, 2019, ISSN  0049-4488 . doi : 10.1007 / s11116-019-09991-9 .
  9. ^ Michael Siebert, David Ellenberger: Validation of automatic passenger counting: introducing the t-test-induced equivalence test . In: Transportation . April 10, 2019, ISSN  0049-4488 . doi : 10.1007 / s11116-019-09991-9 .