Nearest neighbor classification
The closest neighbor classification is a nonparametric method for estimating probability density functions . The resulting k-Nearest-Neighbor-Algorithm ( KNN , in English "k-nearest-neighbor algorithm") is a classification method in which a class is assigned taking its closest neighbors into account . The part of the learning consists of simply saving the training examples, which is also known as lazy learning ("sluggish learning"). Data normalization can increase the accuracy of this algorithm.
k nearest neighbor algorithm
The classification of an object (often described by a feature vector ) takes place in the simplest case by majority decision. The k next already classified objects of participate in the majority decision . Many distance measures are conceivable ( Euclidean distance , Manhattan metric , etc.). is assigned to the class that has the greatest number of objects of these neighbors. For two classes a tie in the majority decision can be prevented by an odd one.
For a small one, there is a risk that noise in the training data will worsen the classification results. A Voronoi diagram results for . If the choice is too large, there is a risk of including points with a large margin in the classification decision. This risk is particularly great if the training data are not evenly distributed or if only a few examples are available. If the training data is not evenly distributed, a weighted distance function can be used, which assigns a higher weight to closer points than to points further away. A practical problem is also the storage and computation effort of the algorithm in the case of high-dimensional spaces and a lot of training data.
See also
- Pattern recognition
- Feature space
- Scikit-learn a free machine learning software library for the Python programming language
- OpenCV a free program library with algorithms for image processing and machine vision
literature
- Wolfgang Ertel : Basic Course Artificial Intelligence: A practice-oriented introduction . 3rd edition, Springer Vieweg 2013. ISBN 978-3-8348-1677-1
- Thomas A. Runkler: Data Analytics Models and Algorithms for Intelligent Data Analysis . Springer Vieweg 2012. ISBN 978-3-8348-2588-9
Individual evidence
- ↑ S. Madeh Piryonesi, Tamer E. El-Diraby: Role of Data Analytics Infrastructure Asset Management: Overcoming Data Size and Quality problem . In: Journal of Transportation Engineering, Part B: Pavements . tape 146 , no. 2 , June 2020, ISSN 2573-5438 , p. 04020022 , doi : 10.1061 / JPEODX.0000175 ( ascelibrary.org [accessed August 19, 2020]).
- ^ Hastie, Trevor .: The elements of statistical learning: data mining, inference, and prediction: with 200 full-color illustrations , Tibshirani, Robert., Friedman, JH (Jerome H.), Springer, New York 2001, ISBN 0- 387-95284-5 , OCLC 46809224 .