Probably Approximately Correct Learning

Probably Approximately Correct Learning (WARL) or English probably approximately correct learning (PAC learning) is a framework for machine learning that by Leslie Valiant in his paper A theory of the learnable was introduced.

In this framework, the learning unit receives examples that are classified according to a specific function . The goal of the training is to find an approximation of this function with a high probability . The learning unit is expected to learn the concept with any rate of approach , any probability of success, and any distribution of examples.

definition

The PAC framework allows a precise mathematical analysis of learning processes . be the finite hypothesis space . be the desired accuracy of the classifier generated by the learning process for unseen data. be the probability that the learning process cannot generate such a classifier. It applies and . With a consistent learning process, training examples are sufficient to learn a classifier with the requirements of and . In other words, training examples are sufficient to learn with the probability of a PAC-learnable problem in such a way that an error rate of maximum is obtained on new data . The runtime up to the output of the classifier must be polynomial in and . For is paid ${\ displaystyle H}$ ${\ displaystyle \ epsilon}$ ${\ displaystyle \ delta}$ ${\ displaystyle 0 <\ epsilon <0.5}$ ${\ displaystyle 0 <\ delta <0.5}$ ${\ displaystyle m}$ ${\ displaystyle \ epsilon}$ ${\ displaystyle \ delta}$ ${\ displaystyle m}$ ${\ displaystyle 1- \ delta}$ ${\ displaystyle 1- \ epsilon}$ ${\ displaystyle {\ frac {1} {\ epsilon}}, {\ frac {1} {\ delta}}}$ ${\ displaystyle m}$ ${\ displaystyle m}$

{\ displaystyle m \ geq {\ frac {1} {\ epsilon}} \ left (\ ln (| H |) + \ ln \ left ({\ frac {1} {\ delta}} \ right) \ right) }

Derivation

The estimate for m is closely related to the version space. By definition, a consistent learning process outputs a hypothesis from the version space. Any hypothesis in version space is consistent with the training data, but can make mistakes on unseen data. Be the hypotheses that are more likely to make a real mistake bigger . Such a hypothesis is consistent with probability with one random example and with probability with m examples. If at least one such hypothesis exists, then it is part of the version space and could be output as a hypothesis by a consistent learning process. The upper limit of the probability that such a hypothesis is contained in the version space is limited by . You need an estimate depending on the number of training examples. It applies . In at least all cases, according to the above requirement, no hypothesis with a real error larger than the version space should be included, i.e. H. . This follows and results in the resolution for m ${\ displaystyle h_ {1}, \ ldots, h _ {\ ell}}$ ${\ displaystyle \ epsilon}$ ${\ displaystyle 1- \ epsilon}$ ${\ displaystyle (1- \ epsilon) ^ {m}}$ ${\ displaystyle \ ell (1- \ epsilon) ^ {m}}$ ${\ displaystyle \ ell (1- \ epsilon) ^ {m} \ leq | H | (1- \ epsilon) ^ {m} \ leq | H | e ^ {- \ epsilon m}}$ ${\ displaystyle 1- \ delta}$ ${\ displaystyle \ epsilon}$ ${\ displaystyle 1- | H | e ^ {- \ epsilon m}> 1- \ delta}$ ${\ displaystyle | H | e ^ {- \ epsilon m} \ leq \ delta}$

{\ displaystyle m \ geq {\ frac {1} {\ epsilon}} \ left (\ ln (| H |) + \ ln \ left ({\ frac {1} {\ delta}} \ right) \ right) }

.

The estimate for the number of required examples m is usually very rough and in practice fewer examples are sufficient. This model was extended to deal with noise , i.e. incorrectly classified examples.

credentials

^ LG Valiant: A Theory of the Learnable . In: Communications of the ACM . tape 27 (11) , 1984, pp. 1134-1142 . [1] (PDF; 806 kB)

literature

M. Kearns, U. Vazirani: An Introduction to Computational Learning Theory . MIT Press, 1994, ISBN 0-262-11193-4 .
Tom M. Mitchell: Machine Learning . McGraw-Hill Education, 1997, ISBN 0-07-115467-1 .

[1] LG Valiant: A Theory of the Learnable . In: Communications of the ACM . tape 27 (11) , 1984, pp. 1134-1142 . [1] (PDF; 806 kB)