Projection matrix (statistics)

In statistics , a projection matrix is a symmetrical and idempotent matrix. Furthermore, all eigenvalues of a projection matrix are either 0 or 1 and the rank and trace of a projection matrix are identical. The only nonsingular projection matrix is the identity matrix . All other projection matrices are singular. The most important projection matrices in statistics are the prediction matrix and the residue-generating matrix or residual matrix . They are an example of an orthogonal projection in the sense of linear algebra , where every vector of a vector space with a scalar product for a given projection matrix can be unambiguously decomposed according to . Another important projection matrix in statistics is the centering matrix . ${\ displaystyle {\ boldsymbol {P}}}$ ${\ displaystyle {\ varvec {Q}} = {\ varvec {I}} - {\ varvec {P}}}$ ${\ displaystyle y}$ ${\ displaystyle {\ boldsymbol {P}}}$ ${\ displaystyle y = {\ varvec {P}} y + ({\ varvec {I}} - {\ varvec {P}}) y}$

Starting position

As a starting point, we consider a typical multiple linear regression model with given data for statistical units and regressors. The relationship between the dependent variable and the independent variables can be shown as follows ${\ displaystyle \ {y_ {i}, x_ {ik} \} _ {i = 1, \ dots, n, k = 1, \ dots, K}}$ ${\ displaystyle n}$ ${\ displaystyle K}$

{\ displaystyle y_ {i} = \ beta _ {0} + x_ {i1} \ beta _ {1} + x_ {i2} \ beta _ {2} + \ ldots + x_ {iK} \ beta _ {K} + \ varepsilon _ {i} = \ mathbf {x} _ {i} ^ {\ top} {\ boldsymbol {\ beta}} + \ varepsilon _ {i}, \ quad i = 1,2, \ dotsc, n }

.

In matrix notation too

{\ displaystyle {\ begin {pmatrix} y_ {1} \\ y_ {2} \\\ vdots \\ y_ {n} \ end {pmatrix}} _ {(n \ times 1)} \ quad = \ quad { \ begin {pmatrix} 1 & x_ {11} & x_ {12} & \ cdots & x_ {1K} \\ 1 & x_ {21} & x_ {22} & \ cdots & x_ {2K} \\\ vdots & \ vdots & \ vdots & \ ddots & \ vdots \\ 1 & x_ {n1} & x_ {n2} & \ cdots & x_ {nK} \ end {pmatrix}} _ {(n \ times p)} \ quad \ cdot \ quad {\ begin {pmatrix} \ beta _ {0} \\\ beta _ {1} \\\ vdots \\\ beta _ {K} \ end {pmatrix}} _ {(p \ times 1)} \ quad + \ quad {\ begin {pmatrix} \ varepsilon _ {1} \\\ varepsilon _ {2} \\\ vdots \\\ varepsilon _ {n} \ end {pmatrix}} _ {(n \ times 1)}}

with . In compact notation ${\ displaystyle p = K + 1}$

{\ displaystyle \ mathbf {y} = \ mathbf {X} {\ boldsymbol {\ beta}} + {\ boldsymbol {\ varepsilon}}}

.

Here represents a vector of unknown parameters (known as regression coefficients ) that must be estimated from the data. It is also assumed that the error terms are zero on average: which means that we can assume that our model is correct on average. ${\ displaystyle {\ boldsymbol {\ beta}}}$ ${\ displaystyle \ mathbb {E} [{\ boldsymbol {\ boldsymbol {\ varepsilon}}}] = \ mathbf {0}}$

Prediction matrix

One of the most important projection matrices in statistics is the prediction matrix . The prediction matrix is defined as follows

{\ displaystyle {\ boldsymbol {P}} \ equiv \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ { \ top} \ quad}

with ,

{\ displaystyle \ quad {\ boldsymbol {P}} \ in \ mathbb {R} ^ {n \ times n}}

where the data matrix represents. The diagonal elements of the prediction matrix are named and can be interpreted as leverage values. ${\ displaystyle \ mathbf {X}}$ ${\ displaystyle {\ boldsymbol {P}}}$ ${\ displaystyle p_ {ii}}$

Residual generating matrix

The residuenerzeugende matrix ( English residual matrix-maker ), and residue-generating matrix , residual matrix defined as follows

{\ displaystyle {\ boldsymbol {Q}} = \ left (\ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1 } \ mathbf {X} ^ {\ top} \ right) = \ left (\ mathbf {I} - {\ boldsymbol {P}} \ right)}

,

where P represents the prediction matrix. The name residue-generating matrix results from the fact that this projection matrix multiplied by the y-vector results in the residual vector . This can be expressed compactly by the prediction matrix as follows ${\ displaystyle {\ hat {\ boldsymbol {\ varepsilon}}}}$

{\ displaystyle {\ hat {\ boldsymbol {\ varepsilon}}} = \ mathbf {y} - \ mathbf {\ hat {y}} = \ mathbf {y} - {\ boldsymbol {P}} \ mathbf {y} = \ left (\ mathbf {I} - {\ varvec {P}} \ right) \ mathbf {y} = {\ varvec {Q}} \ mathbf {y}}

.

In linear models, the rank and trace of a projection matrix are identical. The following applies to the rank of the residue-generating matrix

{\ displaystyle {\ begin {aligned} \ operatorname {rank} ({\ varvec {Q}}) & = \ operatorname {track} ({\ varvec {Q}}) \\ & = \ operatorname {track} (\ mathbf {I} - \ mathbf {P}) \\ & = \ sum \ nolimits _ {i = 1} ^ {n} (1-p_ {ii}) \\ & = n- \ sum \ nolimits _ {i = 1} ^ {n} p_ {ii} \\ & = n- \ operatorname {track} ({\ varvec {P}}) \\ & = n- \ operatorname {rank} ({\ varvec {P}} ) \\ & = np \\ & = n- (K + 1) \ end {aligned}}}

Idempotence

The idempotential property of the residue generating matrix can be shown as follows

{\ displaystyle {\ begin {aligned} {\ varvec {Q}} ^ {2} & = {\ varvec {Q}} \ cdot {\ varvec {Q}} \\ & = \ left (\ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ right) \ left (\ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ right) \\ & = \ mathbf {I} \ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top } \ mathbf {I} + \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \\ & = \ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} + \ mathbf {X} \ left (\ mathbf {X} ^ {\ top } \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \\ & = \ mathbf {I} - \ mathbf {X} \ left (\ ma thbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \\ & = \ left (\ mathbf {I} - {\ boldsymbol {P }} \ right) \\ & = {\ boldsymbol {Q}} \ qquad \ Box \ end {aligned}}}

symmetry

The symmetry of the residue generating matrix follows directly from the symmetry of the prediction matrix and can be shown as follows

{\ displaystyle {\ begin {aligned} {\ varvec {Q}} ^ {\ top} & = \ left (\ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \ right) ^ {\ top} \\ & = \ \ mathbf {I} ^ {\ top} - \ left ( \ left (\ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ right) \ left (\ mathbf {X} ^ {\ top} \ right) \ right) ^ {\ top} \\ & = \ \ mathbf {I} - \ left (\ mathbf {X} ^ {\ top} \ right) ^ {\ top} \ left (\ mathbf {X } \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ right) ^ {\ top} \\ & = \ \ mathbf {I} - \ mathbf {X } \ left (\ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ right) ^ {\ top} \ mathbf {X} ^ {\ top} \\ & = \ \ mathbf {I} - \ mathbf {X} \ left (\ mathbf {X} ^ {\ top} \ mathbf {X} \ right) ^ {- 1} \ mathbf {X} ^ {\ top} \\ & = \ left (\ mathbf {I} - {\ varvec {P}} \ right) \\ & = {\ varvec {Q}} \ qquad \ Box \ end {aligned}}}

Other properties

The projection matrix has a wealth of useful algebraic properties. In the language of linear algebra, the projection matrix is an orthogonal projection onto the column space of the data matrix . Further properties of the projection matrices are summarized below: ${\ displaystyle \ mathbf {X}}$

${\ displaystyle \ mathbf {u} = (\ mathbf {I} - \ mathbf {P}) \ mathbf {y},}$ and ${\ displaystyle \ mathbf {u} = \ mathbf {y} - \ mathbf {P} \ mathbf {y} \ perp \ mathbf {X}}$
${\ displaystyle \ mathbf {X}}$ is invariant under : consequently . ${\ displaystyle \ mathbf {P}}$ ${\ displaystyle \ mathbf {PX} = \ mathbf {X},}$ ${\ displaystyle \ left (\ mathbf {I} - \ mathbf {P} \ right) \ mathbf {X} = \ mathbf {0}}$
${\ displaystyle \ left (\ mathbf {I} - \ mathbf {P} \ right) \ mathbf {P} = \ mathbf {P} \ left (\ mathbf {I} - \ mathbf {P} \ right) = \ mathbf {0}}$ ("Applying the regression to the residuals yields ") ${\ displaystyle {\ hat {y}} = 0}$
${\ displaystyle \ mathbf {P}}$ is unique for a certain subspace
All eigenvalues of a projection matrix are either 0 or 1

Applications

Estimation of the variance parameter using least squares estimation

The residual sum , short SQR ( S umme the Q uadrate the R estabweichungen (or "residuals") or English sum of squared residuals , short SSR ) results in matrix notation

{\ displaystyle SQR: = {\ hat {\ varepsilon}}} ^ {\ top} {\ hat {\ varepsilon}}} = \ mathbf {y} ^ {\ top} (\ mathbf { I} - \ mathbf {P}) ^ {\ top} (\ mathbf {I} - \ mathbf {P}) \ mathbf {y} = \ mathbf {y} ^ {\ top} {\ varvec {Q}} {\ boldsymbol {Q}} \ mathbf {y} = \ mathbf {y} ^ {\ top} {\ boldsymbol {Q}} \ mathbf {y}}

.

This can also be written as

{\ displaystyle SQR: = {\ hat {\ varepsilon}}} ^ {\ top} {\ hat {\ varepsilon}}} = \ | y - {\ hat {y}} \ | _ {2} ^ {2} = \ sum \ limits _ {i = 1} ^ {n} (y_ {i} - {\ hat {y}} _ {i}) ^ {2}}

.

An unbiased estimate of the variance of the disturbance variables is the " mean residual square ":

{\ displaystyle {\ hat {\ sigma}} ^ {2} = {\ frac {SQR} {np}} = {\ frac {\ sum \ nolimits _ {i = 1} ^ {n} (y_ {i} - {\ hat {y}} _ {i}) ^ {2}} {np}}}

.

With the help of the residual-generating matrix, the variance of the error terms can also be written as

{\ displaystyle {\ hat {\ sigma}} ^ {2} = {\ frac {\ mathbf {y} ^ {\ top} {\ varvec {Q}} \ mathbf {y}} {np}} = {\ frac {\ mathbf {y} ^ {\ top} {\ varvec {Q}} \ mathbf {y}} {\ operatorname {rank} ({\ varvec {Q}})}}}

.

Individual evidence

↑ Alexander Basilevsky: Applied Matrix Algebra in the Statistical Sciences . Dover, 2005, ISBN 0-486-44538-0 , pp. 160-176.
^ Wilhelm Caspary: Error-tolerant evaluation of measurement data , ".124
↑ Peter Hackl : Introduction to Econometrics. 2nd updated edition, Pearson Deutschland GmbH, 2008., ISBN 978-3-86894-156-2 , p. 75.
^ P. Gans: Data Fitting in the Chemical Sciences . Wiley, 1992, ISBN 0-471-93412-7 .
^ NR Draper, H. Smith: Applied Regression Analysis . Wiley, 1998, ISBN 0-471-17082-8 .

[1] Alexander Basilevsky: Applied Matrix Algebra in the Statistical Sciences . Dover, 2005, ISBN 0-486-44538-0 , pp. 160-176.

[Caspary-2] Wilhelm Caspary: Error-tolerant evaluation of measurement data , ".124

[3] Peter Hackl : Introduction to Econometrics. 2nd updated edition, Pearson Deutschland GmbH, 2008., ISBN 978-3-86894-156-2 , p. 75.

[4] P. Gans: Data Fitting in the Chemical Sciences . Wiley, 1992, ISBN 0-471-93412-7 .

[5] NR Draper, H. Smith: Applied Regression Analysis . Wiley, 1998, ISBN 0-471-17082-8 .