Prediction matrix

In the statistics is the prediction matrix ( English prediction matrix ) has a symmetrical and idempotent matrix and a projection matrix . The prediction matrix is sometimes hat matrix or roof matrix called because it on maps. Accordingly, it is either noted with or . The term “prediction matrix” or “prediction matrix” was coined by Hoaglin & Welsh (1978) and Chatterjee & Hadi (1986) and stems from the fact that if the matrix is applied to the values, it generates the predicted values ( values). Another matrix that is important in statistics is the residual matrix, which is defined by the prediction matrix and is also a projection matrix. ${\ displaystyle y}$ ${\ displaystyle {\ hat {y}}}$ ${\ displaystyle \ mathbf {P}}$ ${\ displaystyle \ mathbf {H}}$ ${\ displaystyle y}$ ${\ displaystyle {\ hat {y}}}$

definition

Given a typical multiple linear regression model , with the vector of the unknown regression parameters , the experiment plan matrix , the vector of the dependent variables and the vector of the disturbance variables . Then the prediction matrix is defined by ${\ displaystyle \ mathbf {y} = \ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}}}$ ${\ displaystyle {\ boldsymbol {\ beta}}}$ ${\ displaystyle p \ times 1}$ ${\ displaystyle n \ times p}$ ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle n \ times 1}$ ${\ displaystyle \ mathbf {y}}$ ${\ displaystyle n \ times 1}$ ${\ displaystyle {\ boldsymbol {\ varepsilon}}}$

{\ displaystyle \ mathbf {P} \ equiv \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top } \ quad}

with .

{\ displaystyle \ quad \ mathbf {P} \ in \ mathbb {R} ^ {n \ times n}}

The matrix is also called the Moore-Penrose inverse of . ${\ displaystyle \ mathbf {X} ^ {+} = \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top}}$ ${\ displaystyle \ mathbf {X}}$

The regression (hyper) plane estimated using the least squares method is then given by the sample regression function , where is the least squares estimate vector. The prediction matrix is the matrix of the orthogonal projection onto the column space of and has a maximum of rank ( is the number of parameters of the regression model). If there is a matrix with , then is . Since is a projection matrix, holds . The idempotency and symmetry properties ( and ) imply that there is an orthogonal projector on the column space . The direction of projection results from the matrix , the columns of which are perpendicular to . The matrix is called the prediction matrix because the prediction values are obtained by multiplying the vector by this matrix on the left . This can be shown by inserting the KQ parameter estimator as follows: ${\ displaystyle {\ hat {\ mathbf {y}}} = {\ widehat {\ operatorname {E} (\ mathbf {y})}} = \ mathbf {X} {\ hat {\ varvec {\ beta}} }}$ ${\ displaystyle {\ hat {\ varvec {\ beta}}} = \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {y}}$ ${\ displaystyle \ mathbf {P}}$ ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle p}$ ${\ displaystyle p = k + 1}$ ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle (n \ times p)}$ ${\ displaystyle \ operatorname {rank} (\ mathbf {X}) = p}$ ${\ displaystyle \ operatorname {rank} (\ mathbf {P}) = p}$ ${\ displaystyle \ mathbf {P}}$ ${\ displaystyle \ operatorname {rank} (\ mathbf {P}) = \ operatorname {track} (\ mathbf {P}) = p}$ ${\ displaystyle \ mathbf {P} \ cdot \ mathbf {P} = \ mathbf {P}}$ ${\ displaystyle \ mathbf {P} ^ {\ top} = \ mathbf {P}}$ ${\ displaystyle \ mathbf {P}}$ ${\ displaystyle S (\ mathbf {X}) = S (\ mathbf {P})}$ ${\ displaystyle (\ mathbf {I} - \ mathbf {P})}$ ${\ displaystyle S (\ mathbf {X})}$ ${\ displaystyle \ mathbf {P}}$ ${\ displaystyle {\ hat {\ mathbf {y}}}}$ ${\ displaystyle \ mathbf {y}}$

{\ displaystyle {\ hat {\ mathbf {y}}} = \ mathbf {X} {\ hat {\ boldsymbol {\ beta}}} = \ underbrace {\ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top}} _ {= \ mathbf {P}} \ mathbf {y} = \ mathbf {P} \ mathbf {y}}

.

The predictive values of (the values) can thus be understood as a function of the observed values. Numerous statistical results can also be represented with the prediction matrix. For example, the Residualvektor can by means of the prediction matrix represented as: . The (non-trivial) covariance matrix of the residual vector is and plays a role in the analysis of leverage values . ${\ displaystyle y}$ ${\ displaystyle {\ hat {y}}}$ ${\ displaystyle y}$ ${\ displaystyle {\ hat {\ boldsymbol {\ varepsilon}}} = \ mathbf {y} - {\ hat {\ mathbf {y}}} = \ mathbf {y} - \ mathbf {X} {\ hat {\ boldsymbol {\ beta}}} = (\ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X } ^ {\ top}) \ mathbf {y} = (\ mathbf {I} - \ mathbf {P}) \ mathbf {y}}$ ${\ displaystyle \ operatorname {Cov} ({\ hat {\ boldsymbol {\ varepsilon}}}) = \ sigma ^ {2} (\ mathbf {I} - \ mathbf {P})}$

properties

Idempotence

The prediction matrix is idempotent. This can be interpreted to mean that “applying the regression twice leads to the same result”. The idempotential property of the prediction matrix can be shown as follows:

{\ displaystyle \ mathbf {P} ^ {2} = \ mathbf {P} \ cdot \ mathbf {P} = \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} = \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {I} \ mathbf {X} ^ {\ top} = \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} = \ mathbf {P}}

,

where is the identity matrix . ${\ displaystyle \ mathbf {I}}$

symmetry

The prediction matrix is symmetrical. The symmetry property of the prediction matrix can be shown as follows

{\ displaystyle \ mathbf {P} ^ {\ top} = \ left (\ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ right) ^ {\ top} = \ left (\ left (\ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {-1} \ right) \ left (\ mathbf {X} ^ {\ top} \ right) \ right) ^ {\ top} = \ \ left (\ mathbf {X} ^ {\ top} \ right) ^ {\ top} \ left (\ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ right) ^ {\ top} = \ \ mathbf {X} \ left (\ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ right) ^ {\ top} \ mathbf {X} ^ {\ top} = \ \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} = \ mathbf {P}}

Leverage values

The diagonal elements of the prediction matrix can be interpreted as leverage values and play a major role in regression diagnostics . You are given by ${\ displaystyle p_ {ii}}$ ${\ displaystyle \ mathbf {P}}$

{\ displaystyle p_ {ii} = \ mathbf {x} _ {i} ^ {\ top} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {x} _ {i}}

.

These leverage values are used in calculating Cook's Distance and can be used to identify influential observations . It holds , where represents the number of rows in the design matrix that are different. If all lines are different, then applies . ${\ displaystyle {\ frac {1} {n}} \ leq p_ {ii} \ leq {\ frac {1} {r}}}$ ${\ displaystyle r}$ ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle {\ frac {1} {n}} \ leq p_ {ii} \ leq 1}$

Individual evidence

↑ David C. Hoaglin & Roy E. Welsch: The Hat Matrix in Regression and ANOVA. In: The American Statistician, 32 (1), 1978, pp. 17-22, doi : 10.1080 / 00031305.1978.10479237 , JSTOR 2683469 .
↑ ^a ^b Samprit Chatterjee & Ali S. Hadi: Influential observations, high leverage points, and outliers in linear regression. In: Statistical Science, 1 (3), 1986, pp. 379-393, doi : 10.1214 / ss / 1177013622 , JSTOR 2245477 .
^ Wilhelm Caspary: Error-tolerant evaluation of measurement data , p. 124
^ Rainer Schlittgen : Regression analyzes with R. , ISBN 978-3-486-73967-1 , p. 27 (accessed via De Gruyter Online).
^ Ludwig Fahrmeir , Thomas Kneib , Stefan Lang, Brian Marx: Regression: models, methods and applications. Springer Science & Business Media, 2013, ISBN 978-3-642-34332-2 , p. 122.
^ Ludwig Fahrmeir, Thomas Kneib, Stefan Lang, Brian Marx: Regression: models, methods and applications. Springer Science & Business Media, 2013, ISBN 978-3-642-34332-2 , p. 108.

[1] David C. Hoaglin & Roy E. Welsch: The Hat Matrix in Regression and ANOVA. In: The American Statistician, 32 (1), 1978, pp. 17-22, doi : 10.1080 / 00031305.1978.10479237 , JSTOR 2683469 .

[Chatterjee&Hadi-2] Samprit Chatterjee & Ali S. Hadi: Influential observations, high leverage points, and outliers in linear regression. In: Statistical Science, 1 (3), 1986, pp. 379-393, doi : 10.1214 / ss / 1177013622 , JSTOR 2245477 .

[3] Wilhelm Caspary: Error-tolerant evaluation of measurement data , p. 124

[4] Rainer Schlittgen : Regression analyzes with R. , ISBN 978-3-486-73967-1 , p. 27 (accessed via De Gruyter Online).

[5] Ludwig Fahrmeir , Thomas Kneib , Stefan Lang, Brian Marx: Regression: models, methods and applications. Springer Science & Business Media, 2013, ISBN 978-3-642-34332-2 , p. 122.

[6] Ludwig Fahrmeir, Thomas Kneib, Stefan Lang, Brian Marx: Regression: models, methods and applications. Springer Science & Business Media, 2013, ISBN 978-3-642-34332-2 , p. 108.