# Regression with stochastic regressors

In the regression with stochastic regressors is special statistical analysis techniques to detect any dependencies of a statistical size of other variables, called regressors . In classic regression models (e.g. simple linear regression , multiple linear regression ) it is generally assumed that the regressors are non-random and often even adjustable quantities. In many practical cases, especially econometric models , this assumption cannot be maintained. You have to start from random, i.e. stochastic, regressors . It is of particular interest here how stochastic regressors affect the properties of the estimates (e.g. least squares estimators ) and significance tests . In short, the properties known for classical regression models are retained (at least approximately) as long as the stochastic regressors are uncorrelated with the disturbance terms (so-called exogeneity exists). However, if they are correlated (so-called endogeneity is present), then one basically has to go other ways.

## Examples

### First-order autoregressive process (AR (1))

The first order autoregressive process is a simple model of time series analysis and has the form

${\ displaystyle Y_ {t} = \ beta _ {0} + \ beta _ {1} Y_ {t-1} + \ varepsilon _ {t} \ quad, t = 1, \ dots, n}$,

where white represents noise . The regressor at the time is the random regressor from the time . ${\ displaystyle \ varepsilon _ {t}}$ ${\ displaystyle t}$${\ displaystyle t-1}$

### Error-in-the-variables model

In the simplest case, a simple linear regression model is given (see, for example):

${\ displaystyle Y_ {i} = \ beta _ {0} + \ beta _ {1} x_ {i} + \ varepsilon _ {i}; \ i = 1, \ dots, n}$,

however, it can only be observed with random error ; H. you then have the stochastic regressor . Such models are called error-in-variables models . ${\ displaystyle x_ {i}}$${\ displaystyle u_ {i}}$${\ displaystyle z_ {i} = x_ {i} + u_ {i}}$

### Simultaneous equations

As an example, consider the Keynesian consumption function with two simultaneous equations (see e.g.):

${\ displaystyle Y_ {i} = \ beta _ {0} + \ beta _ {1} X_ {i} + \ varepsilon _ {i}; \ X_ {i} = Y_ {i} + I_ {i}; \ i = 1. \ dots, n}$

This includes consumption, income and investment. If you put the first equation in the second, you get: ${\ displaystyle Y_ {i}}$${\ displaystyle X_ {i}}$${\ displaystyle I_ {i}}$

${\ displaystyle X_ {i} = {\ frac {1} {1- \ beta _ {1}}} (\ beta _ {0} + I_ {i} + \ varepsilon _ {i})}$,

d. H. is random because it depends on. ${\ displaystyle X_ {i}}$${\ displaystyle \ varepsilon _ {i}}$

## General case

We consider a multiple linear regression model in vector-matrix form

${\ displaystyle Y = X \ beta + \ varepsilon}$.

It is the dimensional random vector of regressands, the matrix of regressors, the dimensional parameters to be estimated vector and the dimensional random vector of interference with and . It is assumed here that the data matrix has a full rank with probability 1, i. H. . The least squares estimator for has the shape ${\ displaystyle Y}$${\ displaystyle n}$${\ displaystyle X}$${\ displaystyle (n \ times r)}$${\ displaystyle \ beta}$${\ displaystyle r}$${\ displaystyle \ varepsilon}$${\ displaystyle n}$${\ displaystyle \ operatorname {E} \ varepsilon = 0}$${\ displaystyle \ operatorname {Cov} \ varepsilon = \ sigma ^ {2} I_ {n}}$ ${\ displaystyle X}$${\ displaystyle \ operatorname {P} \ left [\, \ operatorname {Rank} (X) = r \, \ right] = 1}$${\ displaystyle \ beta}$

${\ displaystyle b = (X ^ {T} X) ^ {- 1} X ^ {T} Y = \ beta + (X ^ {T} X) ^ {- 1} X ^ {T} \ varepsilon}$.

Since you can write with , is a linear function of the disturbance variables, which makes a linear estimator . ${\ displaystyle b = \ beta + A \ varepsilon}$${\ displaystyle A = (X ^ {T} X) ^ {- 1} X ^ {T}}$${\ displaystyle b}$${\ displaystyle b}$

### Non-random regressors

In this standard case, it is known that

• ${\ displaystyle b}$is best linear unbiased estimator (BLUE) with .${\ displaystyle \ operatorname {Cov} b = \ sigma ^ {2} (X ^ {T} X) ^ {- 1}}$
• If the average square of the observed values ​​of the explanatory variables remains finite even with an infinite sample size: with positive definite , then is consistent for .${\ displaystyle \ lim _ {n \ to \ infty} {\ frac {1} {n}} X ^ {T} X = Q}$ ${\ displaystyle Q}$${\ displaystyle b}$ ${\ displaystyle \ beta}$
• If the disturbance variable is normally distributed , then it is also normally distributed and t - or F - distributed test statistics can be formed.${\ displaystyle b}$

### Exogeneity of the regressors

This means that the regressors are stochastic but uncorrelated with the disturbance term , see e.g. B. In the error-in-the-variables example above, one has exogeneity when and are uncorrelated. Then: ${\ displaystyle X}$${\ displaystyle \ varepsilon}$${\ displaystyle u_ {i}}$${\ displaystyle \ varepsilon _ {i}}$

• ${\ displaystyle b}$is still BLUE with , see${\ displaystyle \ operatorname {Cov} b = \ sigma ^ {2} \ operatorname {E} (X ^ {T} X) ^ {- 1}}$
• If in probability converges to a positively definite matrix , then is consistent for , see e.g. B.${\ displaystyle {\ frac {1} {n}} X ^ {T} X}$ ${\ displaystyle Q}$${\ displaystyle b}$${\ displaystyle \ beta}$
• If is normally distributed, then is asymptotically normally distributed . The classic test statistics can be used for large ones.${\ displaystyle \ varepsilon}$${\ displaystyle b}$ ${\ displaystyle n}$

### General stochastic regressors

${\ displaystyle X}$and are correlated, such as B. in the Keynesian consumer function. Then it is distorted and no longer consistent for . The classic test statistics cannot be used. In principle, other methods must be chosen. ${\ displaystyle \ varepsilon}$${\ displaystyle b}$ ${\ displaystyle \ beta}$

For models of time series analysis , if there is an ARMA model more generally than in the above AR (1) example , there are special, partly recursive least-squares methods that generally lead to non-linear least-squares estimators , see e.g. B.

Under the heading Simultaneous equations one finds the method of the instrument variables , and since z. As the two-stage least squares estimator (Engl. Two-stage least squares estimator ) and the generalized method of moments (Engl. Generalized Method of Moments ) estimator gained GMM estimator . See, B.

## Individual evidence

1. Schneeweiß, H .: Ökonometrie , Physica Verlag 1990 (4th edition) Chapter 7 (3rd edition 1978)
2. Schönfeld, P .: Methods of Econometrics , Volume II, Stochastic Regressors and Simultaneous Equations , Vahlen (Munich) 1971
3. ^ Green, WH: Econometric Analysis , Prentice Hall 2002 (5th edition), page 42, full text
4. ^ Green, WH: Econometric Analysis , Prentice Hall 2002 (5th edition), Theorem 4.3
5. ^ Verbeek, M .: A Guide to Modern Econometrics, Wiley 2004 (2nd edition), page 34, full text
6. Brockwell, PJ and Davis, RA: Time Series: Theory and Methods , Springer 1991 (2nd edition)
7. ^ Green, WH: Econometric Analysis , Prentice Hall 2002 (5th edition), Chapter 15