Jenks-Caspall algorithm
The Jenks-Caspall algorithm is a statistical method for the automatic classification of values on the basis of so-called natural breaks (e.g. natural discontinuities), i.e. That is, an attempt is made to minimize the differences within a class and to maximize the differences between the classes. The process was invented by George Frederick Jenks (1916–1996) and Fred Caspall in the 1960s.
The sum of the absolute deviations from the class mean is minimized in two different large steps by shifting values between the classes:
- re-iterative cycling : the values at the edge of each class are compared with the mean of their own class and the mean of the next higher class. If a value is closer to the mean of the neighboring class, it is shifted to this class. This is carried out iteratively until no further optimization is possible.
- forced Cycling : Values are randomly moved to an adjacent class. Then iteratively optimizes and checks whether the entire process has brought an improvement, i. That is, whether the sum of the deviations from the class mean has decreased. If not, the values are shifted back again.
The algorithm is used, for example, in geography to classify raster data in geographic information systems. According to Jenks and Caspall, it doesn't offer an optimal solution to the natural breaks problem, but at the time of its release it was the best they could find.
literature
- George F. Jenks and Fred C. Caspall: “Error on Choroplethic Maps. Definition, Measurement, Reduction ”. In: Annals of the Association of American Geographers . Vol. 61, 1971, pp. 217-244, doi : 10.1111 / j.1467-8306.1971.tb00779.x , JSTOR 2562442 .