PRESS statistics

from Wikipedia, the free encyclopedia

Under the PRESS statistic (PRESS: Predicted Residual Sum of Squares English for predicted residual sum of squares ) or predictive residual sum of squares (English predictive residual sum of squares ) is defined as a measure to adapt a particular model to a sample that has not been taken into account in estimating the model . The main difference to a normal residual square sum is that only measured and estimated values ​​that are "new" to the model are used to calculate the PRESS statistics. That is, the model was estimated using a training data set. Then new observations are added (test data set) for which estimates are carried out with the “trained” model.

PRESS is sometimes referred to as the result or a form of leave-one-out cross-validation and is used synonymously for this cross-validation . The PRESS concept can also be used for other predictions.

calculation

The PRESS statistics are calculated as follows:

.

This corresponds to a sum of squares, where stands for the new observation values and their predicted values. To the above-mentioned difference to ordinary residual sum ( residual sum of squares , shortly RSS ) to make it clear, can also be expressed differently formula:

.

The purpose of this is to make it clear that values ​​were predicted for an external data set. The difference to the normal residual sum of squares is only in the context of the data considered and not in the calculation rule.

use

From the PRESS statistics, MSEP ( mean square error of prediction ) and RMSEP ( root mean square error of prediction ) can also be calculated through further calculations . These are all measures for assessing the predictive ability of models (e.g. in the case of a principal component regression ). However, since PRESS does not take the size of the data set into account, this key figure is only suitable for comparing models with the same number of observations.

PRESS is also used for partial least-squares estimation (PKQ for short) for cross-validation (verification) of samples.

The PRESS statistic can also be an indication of overfitting in the regression analysis . Models that contain too many parameters tend to have small residuals on the observations (low ) that were used for the model, but relatively large residuals on new observations (high ).

Individual evidence

  1. ^ Rainer Schlittgen : Multivariate Statistics. 2009, Part III: Dependencies, p. 183 (accessed via De Gruyter Online).
  2. ^ Richard Kramer: Chemometric Techniques for Quantitative Analysis . CRC Press. 1998, p. 168.
  3. Scheiber, Josef Heinrich. "Development, validation and application of an interpretable and alignment-free 4D-QSAR methodology." (2007). P. 41.
  4. 13. MODEL OPTIMIZATION AND VALIDATION - Explanations in a PCR tutorial (en)
  5. ^ Draper, Norman Richard, and Harry Smith. "Applied regression analysis 2nd ed." (1981).