In the statistics is the prediction matrix ( English prediction matrix ) has a symmetrical and idempotent matrix and a projection matrix . The prediction matrix is sometimes hat matrix or roof matrix called because it on maps. Accordingly, it is either noted with or . The term “prediction matrix” or “prediction matrix” was coined by Hoaglin & Welsh (1978) and Chatterjee & Hadi (1986) and stems from the fact that if the matrix is applied to the values, it generates the predicted values ( values). Another matrix that is important in statistics is the residual matrix, which is defined by the prediction matrix and is also a projection matrix.
The regression (hyper) plane estimated using the least squares method is then given by the sample regression function , where is the least squares estimate vector. The prediction matrix is the matrix of the orthogonal projection onto the column space of and has a maximum of rank ( is the number of parameters of the regression model). If there is a matrix with , then is . Since is a projection matrix, holds . The idempotency and symmetry properties ( and ) imply that there is an orthogonal projector on the column space . The direction of projection results from the matrix , the columns of which are perpendicular to . The matrix is called the prediction matrix because the prediction values are obtained by multiplying the vector by this matrix on the left . This can be shown by inserting the KQ parameter estimator as follows:
.
The predictive values of (the values) can thus be understood as a function of the observed values. Numerous statistical results can also be represented with the prediction matrix. For example, the Residualvektor can by means of the prediction matrix represented as: . The (non-trivial) covariance matrix of the residual vector is and plays a role in the analysis of leverage values .
properties
Idempotence
The prediction matrix is idempotent. This can be interpreted to mean that “applying the regression twice leads to the same result”. The idempotential property of the prediction matrix can be shown as follows:
The prediction matrix is symmetrical. The symmetry property of the prediction matrix can be shown as follows
Leverage values
The diagonal elements of the prediction matrix can be interpreted as leverage values and play a major role in regression diagnostics . You are given by
.
These leverage values are used in calculating Cook's Distance and can be used to identify influential observations . It holds , where represents the number of rows in the design matrix that are different. If all lines are different, then applies .
↑ a b Samprit Chatterjee & Ali S. Hadi: Influential observations, high leverage points, and outliers in linear regression. In: Statistical Science, 1 (3), 1986, pp. 379-393, doi : 10.1214 / ss / 1177013622 , JSTOR 2245477 .
^ Ludwig Fahrmeir, Thomas Kneib, Stefan Lang, Brian Marx: Regression: models, methods and applications. Springer Science & Business Media, 2013, ISBN 978-3-642-34332-2 , p. 108.