Nearest neighbor classification

K nearest neighbors in a two-dimensional point set with k = 1 (dark blue) and k = 5 (light blue). The radius of the circles is not fixed.

The closest neighbor classification is a nonparametric method for estimating probability density functions . The resulting k-Nearest-Neighbor-Algorithm ( KNN , in English "k-nearest-neighbor algorithm") is a classification method in which a class is assigned taking its closest neighbors into account . The part of the learning consists of simply saving the training examples, which is also known as lazy learning ("sluggish learning"). Data normalization can increase the accuracy of this algorithm. ${\ displaystyle k}$

k nearest neighbor algorithm

The classification of an object (often described by a feature vector ) takes place in the simplest case by majority decision. The k next already classified objects of participate in the majority decision . Many distance measures are conceivable ( Euclidean distance , Manhattan metric , etc.). is assigned to the class that has the greatest number of objects of these neighbors. For two classes a tie in the majority decision can be prevented by an odd one. ${\ displaystyle x \ in \ mathbb {R} ^ {n}}$ ${\ displaystyle x}$ ${\ displaystyle x}$ ${\ displaystyle k}$ ${\ displaystyle k}$

Voronoi diagram with seven support points

For a small one, there is a risk that noise in the training data will worsen the classification results. A Voronoi diagram results for . If the choice is too large, there is a risk of including points with a large margin in the classification decision. This risk is particularly great if the training data are not evenly distributed or if only a few examples are available. If the training data is not evenly distributed, a weighted distance function can be used, which assigns a higher weight to closer points than to points further away. A practical problem is also the storage and computation effort of the algorithm in the case of high-dimensional spaces and a lot of training data. ${\ displaystyle k}$ ${\ displaystyle k = 1}$ ${\ displaystyle k}$ ${\ displaystyle x}$

literature

Wolfgang Ertel : Basic Course Artificial Intelligence: A practice-oriented introduction . 3rd edition, Springer Vieweg 2013. ISBN 978-3-8348-1677-1
Thomas A. Runkler: Data Analytics Models and Algorithms for Intelligent Data Analysis . Springer Vieweg 2012. ISBN 978-3-8348-2588-9

Individual evidence

↑ S. Madeh Piryonesi, Tamer E. El-Diraby: Role of Data Analytics Infrastructure Asset Management: Overcoming Data Size and Quality problem . In: Journal of Transportation Engineering, Part B: Pavements . tape 146 , no. 2 , June 2020, ISSN 2573-5438 , p. 04020022 , doi : 10.1061 / JPEODX.0000175 ( ascelibrary.org [accessed August 19, 2020]).
^ Hastie, Trevor .: The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations , Tibshirani, Robert., Friedman, JH (Jerome H.), Springer, New York 2001, ISBN 0- 387-95284-5 , OCLC 46809224 .

[1] S. Madeh Piryonesi, Tamer E. El-Diraby: Role of Data Analytics Infrastructure Asset Management: Overcoming Data Size and Quality problem . In: Journal of Transportation Engineering, Part B: Pavements . tape 146 , no. 2 , June 2020, ISSN 2573-5438 , p. 04020022 , doi : 10.1061 / JPEODX.0000175 ( ascelibrary.org [accessed August 19, 2020]).

[2] Hastie, Trevor .: The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations , Tibshirani, Robert., Friedman, JH (Jerome H.), Springer, New York 2001, ISBN 0- 387-95284-5 , OCLC 46809224 .

Nearest neighbor classification

contents

k nearest neighbor algorithm

See also

literature

Individual evidence