Selectivity of a test

from Wikipedia, the free encyclopedia

The selectivity of a test, also called quality , power , power ( English for power , performance , strength ) of a test or test strength or test severity , or severity for short , describes the decision-making ability of a statistical test in test theory , a branch of mathematical statistics . In the context of assessing a binary classifier , the selectivity of a test is also called sensitivitydesignated. Just like the level of a test, the selectivity of a test is a term derived from the quality function ( selectivity function ).

The power of a test indicates the ability of a test to recognize differences (effects) when they actually exist. More precisely, the discriminatory power indicates the probability with which a statistical test correctly rejects the null hypothesis to be rejected (“There is no difference”) if the alternative hypothesis (“There is a difference”) is true. Assuming that the null hypothesis represents the absence of a specific disease (“not sick”), the alternative hypothesis the presence of the disease (“sick”) and the rejection of the null hypothesis represents a positive diagnostic test , the selectivity of the test is equivalent to the sensitivity of the Tests (the likelihood that someone will have a positive test result). At the same time, this fact represents a bridge between test theory and the theory of diagnostic testing.

The selectivity of the test can therefore be interpreted as the “power to reject” the test. High selectivity of the test speaks against, low selectivity for the null hypothesis . An attempt is made to determine the rejection range in such a way that the probability of rejecting a “false null hypothesis” , i. H. for maintaining the alternative hypothesis on the condition that is true is as large as possible: . In order to be able to calculate the discriminatory power of a test, the alternative hypothesis must be specified in the form of a concrete point hypothesis .

It forms the complement to the type II error probability , ie the probability of making a wrong decision in favor of the null hypothesis ( ) if the is valid . The selectivity itself is the probability of avoiding such an error .

description

For a type II error probability , the corresponding selectivity is . For example, if Experiment E has a Power of and Experiment F has a Power of , Experiment E is more likely to have a Type II error than Experiment F , and Experiment F is more reliable because of its lower probability of a Type II error as experiment E. Equivalently, the discriminatory power of a test can be viewed as the probability that a statistical test correctly rejects the null hypothesis to be rejected (“There is no difference”) if the alternative hypothesis (“There is a difference”) is true, ie

.

So it can be seen as the ability of a test to detect a specific effect when that specific effect is actually present. If there is no equality, but only the negation of (for example, one would simply have a negation for an unobservable population parameter), then the selectivity of the test cannot be calculated, unless the probabilities for all possible values ​​of the parameter, which violating the null hypothesis are known. One generally refers to the discriminatory power of a test against a specific alternative hypothesis (point hypothesis).

As the selectivity increases, the probability of a Type II error decreases, since the selectivity is the same . A similar concept is the Type I Error Probability . The smaller the probability is for a given type 1 error , the more sharply the test separates and . A test is called selective if it has a relatively high degree of selectivity compared to other possible tests for a given one. If true, the maximum power of a test is the same .

Selectivity analyzes or power analyzes can be used to calculate the required minimum sample size at which an effect of a certain size ( effect size ) can be recognized with sufficient probability . Example: "How many times do I have to flip a coin to conclude that it has been tampered with to some extent?"

In the context of assessing a binary classifier , the selectivity of a test is also referred to as sensitivity .

Decision table

reality
H 0 is true H 1 is true
Decision of
the test ...
... for H 0 Correct decision (specificity)
Probability: 1 - α
Type 2 error
probability: β
... for H 1 Type 1 error
Probability: α
correct decision
probability: 1-β (selectivity of the test)

Choice of the β-error level

Influence of the sample size on the quality function or selectivity of a one-sided (in this case left-sided) test
Influence of the sample size on the quality function or selectivity of a two-sided test

For studies of the effectiveness of medical treatments, Cohen (1969: 56) suggests a value that is 4 times as high as that for the level of significance . So if is, the error level should be 20%. If the error probability (probability of an error of the 2nd type) is below this 20% limit in an investigation, the selectivity ( ) is thus greater than 80%.

It should be borne in mind that errors generally cannot be controlled directly at a given, fixed level of significance . The error in many asymptotic or nonparametric tests is simply unpredictable, or there are only simulation studies. In contrast, with some tests, for example the t test , the error can be controlled if the statistical evaluation is preceded by sample size planning.

An equivalence test ( induced from the parameters of the t-test ) can be used to control the (t-test) error independently of the sample size planning. In this case the (t-test) level of significance is variable.

Determining factors of selectivity

There are several ways to increase the power of a test. The selectivity ( ) increases:

  • with increasing difference of (this means: a large difference between two subpopulations is less often overlooked than a small difference)
  • with decreasing feature spread
  • with increasing significance level (if not specified)
  • with increasing sample size as the standard error then becomes smaller: . Smaller effects can be separated by a larger sample size
  • for one-sided tests compared to two-sided tests: For the two-sided test you need a sample size that is approximately larger in order to achieve the same selectivity as for the one-sided test.
  • by the use of the best or most discriminative ( English most powerful ) Tests
  • by reducing scatter in the data, e.g. B. through the use of filters or the choice of homogeneous subgroups (stratification)
  • by increasing the sensitivity of the measuring process (strengthening the effects, e.g. through higher dosages)

Important for the selectivity and Power is also the type of statistical tests: Parametric tests such as the t have test if the distribution assumption is true, with the same sample size is always a higher selectivity than non-parametric tests such as the Wilcoxon signed rank -Test . However, if the assumed and the true distribution differ from one another, for example, if a Laplace distribution is actually the basis, while a normal distribution was assumed, nonparametric methods can, however, also have a significantly greater degree of selectivity than their parametric counterparts.

Opposite notation

In some sources - which can cause confusion - the exact opposite notation is used for the type 2 error and the selectivity, i.e. the probability of committing a type 2 error is denoted by the value , whereas the selectivity is denoted by .

See also

literature

Web links

Wiktionary: Power  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. ^ Ludwig Fahrmeir , Rita artist, Iris Pigeot , Gerhard Tutz : Statistics. The way to data analysis. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2016, ISBN 978-3-662-50371-3 , p. 393.
  2. ^ Otfried Beyer, Horst Hackel: Probability calculation and mathematical statistics. 1976, p. 154.
  3. Ludwig von Auer : Econometrics. An introduction. 6., through. u. updated edition. Springer, 2013, ISBN 978-3-642-40209-8 , p. 128.
  4. ^ Otfried Beyer, Horst Hackel: Probability calculation and mathematical statistics. 1976, p. 154.
  5. This is true because . For the meaning of the notation, see Truth Matrix: Right and Wrong Classifications .
  6. Frederick J. Dorey: In Brief: Statistics in Brief: Statistical Power: What Is It and When Should It Be Used ?. (2011), 619-620.
  7. Ludwig von Auer : Econometrics. An introduction. 6., through. u. updated edition. Springer, 2013, ISBN 978-3-642-40209-8 , p. 128.
  8. a b c d Lothar Sachs , Jürgen Hedderich: Applied Statistics: Collection of Methods with R. 8., revised. and additional edition. Springer Spectrum, Berlin / Heidelberg 2018, ISBN 978-3-662-56657-2 , p. 461
  9. ^ J. Bortz: Statistics for social scientists . Springer, Berlin 1999, ISBN 3-540-21271-X .
  10. Erwin Kreyszig: Statistical methods and their applications. 7th edition. Göttingen 1998, pp. 209ff.