Nearest neighbor classification

from Wikipedia, the free encyclopedia
K nearest neighbors in a two-dimensional point set with k = 1 (dark blue) and k = 5 (light blue). The radius of the circles is not fixed.

The closest neighbor classification is a nonparametric method for estimating probability density functions . The resulting k-Nearest-Neighbor-Algorithm ( KNN , in English "k-nearest-neighbor algorithm") is a classification method in which a class is assigned taking its closest neighbors into account . The part of the learning consists of simply saving the training examples, which is also known as lazy learning ("sluggish learning"). Data normalization can increase the accuracy of this algorithm.

k nearest neighbor algorithm

The classification of an object (often described by a feature vector ) takes place in the simplest case by majority decision. The k next already classified objects of participate in the majority decision . Many distance measures are conceivable ( Euclidean distance , Manhattan metric , etc.). is assigned to the class that has the greatest number of objects of these neighbors. For two classes a tie in the majority decision can be prevented by an odd one.

Voronoi diagram with seven support points

For a small one, there is a risk that noise in the training data will worsen the classification results. A Voronoi diagram results for . If the choice is too large, there is a risk of including points with a large margin in the classification decision. This risk is particularly great if the training data are not evenly distributed or if only a few examples are available. If the training data is not evenly distributed, a weighted distance function can be used, which assigns a higher weight to closer points than to points further away. A practical problem is also the storage and computation effort of the algorithm in the case of high-dimensional spaces and a lot of training data.

See also


Individual evidence

  1. S. Madeh Piryonesi, Tamer E. El-Diraby: Role of Data Analytics Infrastructure Asset Management: Overcoming Data Size and Quality problem . In: Journal of Transportation Engineering, Part B: Pavements . tape 146 , no. 2 , June 2020, ISSN  2573-5438 , p. 04020022 , doi : 10.1061 / JPEODX.0000175 ( [accessed August 19, 2020]).
  2. ^ Hastie, Trevor .: The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations , Tibshirani, Robert., Friedman, JH (Jerome H.), Springer, New York 2001, ISBN 0- 387-95284-5 , OCLC 46809224 .