Testing general linear hypotheses

from Wikipedia, the free encyclopedia

In the test theory is the testing of general linear hypotheses , testing hypotheses linear , general linear hypothesis testing , multiple hypothesis testing the generalization of test problems in regression models. In comparison to the t test, this test procedure allows testing of several null hypotheses with regard to a group of parameters in linear single equation models. Under multiple hypothesis tests one understands on the one hand the F test for the multiple regression model , which is characterized by the fact that the test statistics of the hypothesis testfollows an F-distribution under the null hypothesis and the t test for the multiple regression model . An ordinary F test only tests a single equation.

Starting position

Since many variables of interest do not depend on just one independent variable, we consider one dependent variable that we want to explain by several independent variables. For example, the total production of an economy depends on its capital input, labor input and its area. Such a multiple dependency comes much closer to reality and one abandons the assumption of simple linear regression , in which the variable of interest only depends on one variable. To model such a multiple dependency, we consider a typical multiple linear regression model or, more precisely, a classic linear model that includes the assumption of normal distribution with given data for statistical units as the starting point . It should be noted that in addition to the dimension of the independent variables, we also integrate a time dimension, which results in a linear system of equations which can also be represented in a matrices. The relationship between the dependent variable and the independent variables can be shown as follows

.

In vector matrix form too


or in compact notation

Here represents a vector of unknown parameters (known as regression coefficients ) that must be estimated from the data. It is also assumed that the error terms are zero on average: which means that we can assume that our model is correct on average. The data matrix is assumed to have full (column) rank , that is, it applies or . Furthermore, one expects the error for the covariance matrix to be true. Furthermore, it is assumed that the Gauss-Markow assumptions apply so that the above model can be estimated efficiently and without distortion using the least squares method .

General linear hypothesis

The most general null hypothesis comprises a number of a linear restrictions on the coefficients. One can formulate this general linear hypothesis , i.e. the hypothesis that is interested in rejecting it, as

respectively

with the -Hypothesenmatrix and the vector of the restrictions and the vector of the regression coefficients and the -Zeilenvektoren . The test problem is then

,

where it is assumed that or applies and linearly independent restrictions exist. As usual, the vector of the regression coefficients is assumed to be estimated using the least squares method . The following efficiency property applies to the least squares estimator

This means that any parameter constellation of the least squares estimator is better than that of any linear unbiased estimator. This result is also known as the Gauss-Markow theorem.

T-test for the multiple regression model

Single equation model

In many cases one is only interested in testing a single hypothesis, e.g. B. a single linear combination of the regression coefficients. For example, one might be interested in testing whether the sum of the regression coefficients corresponds to a certain value; H.

or in vector notation

For example, if one were to test whether the sum of the coefficients was 1; H. , then from an economic point of view this would represent a test for constant economies of scale . First of all, the attestation statistics for this test must be drawn up. Therefore, one is interested in finding out the parameterization of the distribution of the individual linear combinations. For the distribution results

,

where represents the least squares estimator. One standardized to the standard normal distribution and obtained for the pivot size

,

that the limits of the central fluctuation interval surround them with the probability , d. H.


,

where the - is the quantile of the standard normal distribution . The problem with this expression is that the variance of the confounders is usually unknown. If the unknown parameter is replaced by the unbiased estimator for the disturbance variable variance, the following distribution results for the pivot variable

.

The pivot size is now instead of the standard normal distribution , t -distributed with straight lines of freedom. This results in the following probability for testing the individual equation

and thus the following confidence interval

.

Single equation models can be represented not only as an F test for the multiple regression model, but alternatively also as a t test.

F-test for the multiple regression model

Construction of the test statistic

For the construction of the test statistic , one uses the following result, which is easy to check with the help of the assumption of the faithfulness to expectations of the least squares estimator and the calculation rules for covariance matrices

,

d. H. In the present classic model, the null hypothesis follows a normal distribution with a covariance matrix and expected value .

It can be shown that the weighted hypothesis sum of squares under the null hypothesis

follows a chi-square distribution with straight lines of freedom. This measures how far the estimated value deviates from the null hypothesis . Furthermore, the corresponding sum of the squared deviations is (analogous to the residual sum of squares ). This sum of the squared deviations is weighted with the inverse covariance matrix , because deviations that are just as large are not necessarily an indicator for a large covariance . Another important result that is needed to construct the test statistic is

The test statistic now results with stochastic independence from and as

.

From this result it can be seen that the test statistic can alternatively be expressed as the quotient of the "mean hypothesis square " and the " mean residual square "

and ,

can be represented as

,


where represents the rank of the
residue generating matrix and represents the rank of the hypothesis matrix. If you divide the sums of squares by (or ), you get mean squares of deviations . This makes sense, since larger deviations are to be expected for more hypotheses (observations). These test statistic provides the framework and basis for testing general linear hypotheses and interval estimators for the unknown vector . As usual, these test statistic are sensitive to the test problem; H. if the deviation is large relative to the error variance , this speaks against it

To finally carry out the test, one uses the corresponding quantiles of the F distribution. The null hypothesis is rejected if

,

the F statistic is therefore greater than the critical value . The critical value can be read from a quantile table for the F distribution.

Multiple testing problem

The problem with multiple testing is that with each individual test at a significance level of 5%, you have a 5% probability of incorrectly finding out the difference you are looking for (alpha error) and these errors add up due to multiple null hypotheses. The false detection rate plays an important role in this context ; it is defined for a test procedure as the expected ratio of incorrectly rejected null hypotheses to the rejected null hypotheses.

Web links

literature

Individual evidence

  1. ^ Ludwig Fahrmeir , Thomas Kneib , Stefan Lang, Brian Marx: Regression: models, methods and applications. Springer Science & Business Media, 2013, ISBN 978-3-642-34332-2 , p. 296 ff.
  2. ^ Ludwig Fahrmeir , Thomas Kneib , Stefan Lang, Brian Marx: Regression: models, methods and applications. Springer Science & Business Media, 2013, ISBN 978-3-642-34332-2 , p. 285 ff.
  3. George G. Judge, R. Carter Hill, W. Griffiths, Helmut Lütkepohl , TC Lee. Introduction to the Theory and Practice of Econometrics. 2nd Edition. John Wiley & Sons, New York / Chichester / Brisbane / Toronto / Singapore 1988, ISBN 0-471-62414-4 , p. 242 ff.