Test theory (statistics)

from Wikipedia, the free encyclopedia

In addition to estimation theory, test theory is a central part of mathematical statistics and deals with the construction and investigation of statistical tests . Such tests attempt to answer questions such as

  • Does a new drug really work better than the older, well-studied preparation?
  • Is climate change anthropogenic or not?
  • Will building a factory in a new location pay off within ten years or not?

to answer. On the one hand, the modeling and construction of a test play a role, on the other hand, the question of which quality standards a test should meet and whether such a test even exists.

For the following explanations, note that in the test situations there is an asymmetry between the decisions to be made for or against a hypothesis. In the case of the above-mentioned drug test, a decision in favor of the new drug, even though it is worse than an existing one, would have significantly more dramatic consequences (severe damage to patients, high costs for possible compensation claims, futile costs for the new introduction, loss of image, ...) than the opposite Wrong decision (missed market opportunity). This asymmetry is reflected in the modeling; an error of the first type should be avoided if possible, that is, its probability should be limited. This motivates the following conceptualizations.

Basic concepts

Null hypothesis and alternative

A (not necessarily parametric) statistical model is given . The values ​​that the data can take on are formalized in a σ-algebra that describes which subsets of a probability are assigned to. is a family of probability measures . The index set is then disjointed into two sets and broken down. Thereby means

The central question of the test theory is now: Suppose there is some unknown probability distribution with front and data are given. How can you make the best possible statement about whether is or is?

It should be noted that the role of the null hypothesis and that of the alternative can also be reversed if the question changes.

Statistical test

The statistical test formalizes the decision to be made. Here, 0 = "acceptance of the null hypothesis" and 1 = "acceptance of alternative" set. Values ​​between 0 and 1 then correspond to the probability of choosing the alternative. Mathematically, a test is a measurable function

which delivers a decision if the data is available . One then speaks of a test of against . The amount

is called the rejection area of the test and contains all the data, if available, the alternative is chosen.

A test is called a non-randomized test if . Otherwise the test is called a randomized test . Non-randomized tests therefore always provide a clear decision.

Errors 1st and 2nd kind

Is a given, so you can in two ways make a mistake. The first type of mistake is the decision for , although is. With recourse to the notation of the conditional probability is then

the probability of a type 1 error . Similarly, one speaks of a type 2 error if one decides in favor of but is. The probability of a type 2 error is thus

Quality function, level and selectivity

For a given test the function is called

the quality function of the test. It denotes the expected value with regard to the probability measure .

Is a given such that

,

this is what the level of the test is called. Even applies

,

this is the name of the effective level of the test. The effective level of the test is thus an upper bound for an error of type I.

For one , the selectivity of the test is called at the point . It corresponds to the probability of not making a type 2 error if the parameter is present.

Optimality terms for tests

Various terms of optimality can be formulated for tests, which differ in their strength. The stronger the concept of optimality, the stronger the conditions under which an optimal test exists. In addition to terms of optimality, reduction principles are often formulated (see below) in order to only have to search for optimal tests within a small number of tests.

Consistently best tests

A consistently best test is a test whose selectivity is always greater than that of all other tests at a given level. Thus, the probability of an error of the type 2 with consistently best tests is always smaller than for any other test.

The central existence statement for consistently best tests is the Neyman-Pearson lemma . It states that the Neyman-Pearson test is an equally best test. This result can be extended to more general test problems under suitable conditions (e.g. in the case of monotonic density quotients ).

Maximin tests

Maximin tests are tests in which the worst-case probability of a type 2 error is lower than in all other tests at a given level. The great advantage of Maximin tests is that they exist under much more general conditions than equally best tests.

Rigorous testing

Strict tests are tests in which the maximum deviation of the selectivity from the selectivity of the locally best test (or the envelope power function ) is smaller than in all other tests at the specified level. Like Maximin tests, strict tests exist even under weak conditions.

Reduction principles

Reduction principles are procedures that make it possible to search for optimal elements in smaller classes of tests. An important reduction principle is the restriction to unadulterated tests . These are those tests at a given level for which the selectivity of the test is always above the level. Thus, unadulterated tests are always better than the "naive" test, which triggers a purely random decision. Similar tests are an important tool for finding consistently best, unadulterated tests . With these, the quality function on the transition from the null hypothesis to the alternative assumes exactly the value of the level.

Test theory as a decision problem

Many optimality and reduction principles of test theory can be classified in a statistical decision problem within the framework of decision theory and compared with one another.

As in test theory, the basis of the statistical decision problem is a statistical model and a decision space , which is always present in test theory . Decision functions are then precisely the statistical tests , with the randomized tests corresponding to the randomized decision functions and the non- randomized tests corresponding to the non- randomized decision functions .

A typical choice for the loss function is the Neyman-Pearson loss function , which, with the same weighting for the 1st and 2nd choice errors, is the risk function

for a statistical test . Here or denote the probability of an error of the 1st or 2nd type, if present.

If you restrict the number of tests to the number of tests for the level and use the above risk function, then there are

literature