Jaccard coefficient
The Jaccard coefficient or Jaccard index after the Swiss botanist Paul Jaccard (1868–1944) is an indicator for the similarity of quantities .
history
Jaccard developed the "Jaccard coefficient" in his 1902 publication Lois de distribution florale dans la zone alpine on page 72. He called it "coefficient de communauté florale".
The Jaccard coefficient was able to establish itself in mathematics and is used as a measure of similarity for sets, vectors and, more generally, for objects. The Jaccard coefficient is used specifically for automatic text recognition and interpretation.
definition
To calculate the Jaccard coefficient of two sets, one divides the number of common elements (intersection) by the size of the union:
- .
The following applies to quantities
- .
The closer the Jaccard coefficient is to 1, the greater the similarity of the sets. The minimum value of the Jaccard coefficient is 0.
example
The two sets and have the Jaccard coefficient
Jaccard metric
The Jaccard metric can be derived from the Jaccard coefficient. This metric is calculated using the formula
- .
General:
- .
Applications
In the field of text mining and in particular the duplicate detection , the Jaccard similarity is a known measure for the similarity of two elements. Two strings are decomposed into tokens (eg. B. divided at the space, or using N-grams with ). The resulting sets of string sections are used as described above to calculate the similarity of the two sets.
Individual evidence
- ^ Paul Jaccard: Lois de distribution florale dans la zone alpine , Bulletin de la Société Vaudoise des Sciences Naturelles, Volume 38 (1902), p. 72, accessed online on November 23, 2018.
- ↑ Similarity measures for vectors at Fraunhofer. Retrieved November 23, 2018.
- ^ Jaccard coefficient in Hans Friedrich Eckey, Reinhold Kosfeld, Martina Rengers: Multivariate Statistics , Betriebswirtschaftlicher Verlag Dr. Th. Gabler GmbH, Wiesbaden, 2002, ISBN 3-409-11969-8 , p. 219. Accessed November 23, 2018.
- ↑ Jaccard coefficient in seo-suedwes. Retrieved November 23, 2018.
- ^ Bing Liu: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data . 2nd Edition. Springer-Verlag, Berlin / Heidelberg 2011, ISBN 978-3-642-19459-7 , pp. 231 f .