Early stopping

Early Stopping denotes a regularization to overfitting in iterative methods of machine learning to avoid.

background

Blue: training error
Red: generalization
error Both errors decrease in the first iteration steps. At a certain point, the generalization error increases again.

When training a machine learning model, model parameters are sought with which a defined error between the true and the predicted label is minimized. The goal is to determine parameters that offer the best possible generalization. This means that the model not only performs well on the limited training data set, but also has a small error on previously unseen data. This error is known as a generalization error . A model that has a low training error and a comparatively high generalization error is called overfitted. Overfitting is made possible by too high a number of parameters, which means that the training data can (partially) be learned by heart.

With iterative training methods it can often be observed that both the training error and the generalization error decrease in the first few steps, but from a certain point on the generalization error increases while the training error continues to decrease.

Regularization through early stopping

For example, instead of reducing the number of parameters or adding a penalty term to the error function , early stopping pauses the training as soon as a significant deterioration (or no (significant) improvement) in the generalization performance is detected over a predefined period of time. The training algorithm then returns the model parameters with the best generalization performance up to this point in time.

Early stopping can be viewed as an efficient hyperparameter optimization , in which a hyperparameter determines the number of training steps.

Since it is not possible to determine the generalization error for data with an unknown probability distribution, in practice it is often approximated by an error determined on validation data. Ideally, the training and validation data do not overlap. A cross-validation process, for example, can be used to split the data set into training and validation data .

Individual evidence

^ ^A ^b ^c Ian Goodfellow, Yoshua Bengio, Aaron Courville: Deep Learning . MIT Press , 2016, chap. 7.8 Early Stopping ( deeplearningbook.org ).

[:0-1] A ^b ^c Ian Goodfellow, Yoshua Bengio, Aaron Courville: Deep Learning . MIT Press , 2016, chap. 7.8 Early Stopping ( deeplearningbook.org ).