Behrens-Fisher problem

from Wikipedia, the free encyclopedia

The Behrens-Fisher problem is a problem of mathematical statistics whose exact solutions have been shown to have undesirable properties, which is why approximations are preferred.

Is searched for a nichtrandomisierter similar test of the null hypothesis of equal expected values , , two normally distributed populations, their variances and are unknown and are not assumed to be equal. The similarity of the test means that the null hypothesis, if it is valid , is rejected with an exact probability , the given level of significance , no matter how large and different the unknown variances and are. For reasons of the power of the test, the following "Behrens-Fisher" test variable is used:

where and are the means and and are the standard deviations of the two samples; with and denotes their respective scope.

The Behrens-Fisher problem generalizes the t-test for two independent samples; this assumes that the variances of the two populations match.

Emergence

In 1935 Ronald Fisher introduced the " fiducial inference " to solve this problem. He was referring to an earlier work by Walter-Ulrich Behrens from 1929. Behrens and Fisher suggested determining the distribution of the above-mentioned test variable .

Fisher approximated this distribution by ignoring the randomness of the relative size . As a result, the resulting test did not have the desired property of having a probability of rejecting the null hypothesis whenever it is true. This sparked a controversy commonly known as the Behrens-Fisher problem.

Non-existence of a desirable solution

Linnik (1968, Theorem 8.3.1) has shown that there is no continuous function for the boundary between acceptance and rejection range of the Behrens-Fisher test variable mentioned above , which only depends on the quotient of the empirical variances of the mean values,, (and of course constants such as , and the significance level ). The boundary between the acceptance and rejection area of ​​any exact solution of the Behrens-Fisher problem is necessarily discontinuous in this quotient. Even more: An exact solution requires that the rejection region of the Behrens-Fisher test variable contain neighborhoods of points for which is an intolerable property (Linnik, 1968). That instead of Linnik and said variance quotient on and relates, is not essential since the latter by means of the problem is described in an equivalent manner.

Best approximation using a non-convergent series approach

One work that Linnik (1968) never mentioned is that of BL Welch (1947). Two decades earlier, Welch (1947), who, like Fisher, worked at University College London , made an approach to the exact solution of the Behrens-Fisher problem, which defines the boundary between the acceptance and rejection range of the test variable as a continuous function in would describe. Welch (1947) gives this limit for a given level of significance, initially for the empirical mean value difference as a function of the empirical variances and in the form of a partial differential equation of infinite order. He also describes the method of approximating the solution as precisely as desired using three Taylor expansions. The series expansion of this function shows that it can be factored into a product of the estimated standard deviation of the mean difference,, and a function that only depends on the variance quotient (and constants). The function standardized according to the test variable depends - as desired - only on the variance quotient . If Welch's series approach converged evenly, so that the function would be infinitely differentiable, i.e. also continuous, this would contradict Linnik's proof that such a function does not exist. It follows that Welch's approach cannot converge uniformly. Graphical representations of the function up to differently developed orders, with very small as well as somewhat larger , and make this conclusion appear quite credible, although for not too small , and the results with regard to the smoothness of and the accuracy of the numerically calculated error probabilities of the first kind are considerable are. Aspin's (1948) development of the series approach Welch to the fourth power in inverse numbers of degrees of freedom provides by far the most accurate approximation, unless , and are much smaller than usual. The resulting Welch-Aspin test is described in detail in Bachmaier (2000) in German.

The approximation in the so-called Welch test

There are several approximate approaches to solving the Behrens-Fisher problem. One of the most widely used approximations (for example in Microsoft Excel ) also comes from Welch. The test based on this Welch approximation is also known as the Welch test .

The variance of the mean difference is . Welch (1938) approximated the distribution by the Pearson curve of type III (a scaled chi-square distribution ) whose first two moments ( expectation and variance ) agree with those of . This applies to the following number of degrees of freedom (df) with generally non-integer values:

If the null hypothesis of equal expected values,, is valid , the distribution of the Behrens-Fisher test variable mentioned at the beginning , which depends a little on the quotient of the standard deviations ,, could be approximated by Student's t-distribution with these degrees of freedom. However, this now also contains the variances of the populations, which are unknown. In the end, the following estimate of the degrees of freedom, which is simply based on the replacement of the population variances with the sample variances, has prevailed:

However, this estimate makes it a random variable. However, there is no t-distribution with a random number of degrees of freedom. However, this does not prevent you from comparing the test variable with corresponding quantile values ​​of the t-distribution with the estimated degrees of freedom. In this way, an infinitely often differentiable function, dependent on the empirical variances, arises as the limit between the acceptance and rejection range of the test variable .

This method doesn't hold the level of significance exactly, but it's not too far from it. Only if the population variances, and , are identical or, in the case of rather small sample sizes, can be assumed to be at least nearly identical, Student's usual t-test is the better choice.

literature