Fuzzy search

from Wikipedia, the free encyclopedia
Wikimedia search. Did you mean: andré emotions

The fuzzy search , including fuzzy search or fuzzy string search called in including computer science , a class of string matching algorithms , ie those that a particular string (english string ) looking at a longer string or a text or should find.

Typical of the "fuzzy" (English fuzzy ) search method is that not the exact string must be taken as the search criteria, but also similar strings are to be found. A well-known measure for calculating this similarity is the so-called Levenshtein distance ; it indicates how many operations - delete, insert and replace - of letters in words are necessary to derive one string from the other: the fewer operations are required, the more similar the two strings are. Another possibility is based on so-called N-grams , by means of which it is calculated via certain probabilities which letter or character string combination could follow another. Another approach is not based directly on the graphic representation of a word, but searches for character strings that sound the same : the phonetic search . A well-known method for the English language that indexes words according to their sound is the Soundex algorithm.

Both approaches make it possible to find strings you are looking for, for example if the exact spelling of a name or phrase is not known, inflected forms of a word are to be found or error-tolerant search results are to be accepted. The fuzzy search is used, for example, in databases , search engines or computer linguistic applications.

Practical examples

When searching in databases, fault-tolerant search tools can compensate for typing and spelling errors using string matching algorithms. Similarities between the entered search term and the entries in the database are determined even without stored word lists. Hits can be sorted according to relevance and proximity to the search term. The search for the term "Leven ss ton" for example, would also entries on "Leven sh find ton". If synonym lists are stored, the fuzzy search will also find terms such as “television set” for the term “television”.

The use of complex string matching methods, such as the Levenshtein algorithm, is usually associated with an enormous amount of computation and often leads to a long delay in the case of large amounts of data.

For example, in bioinformatics, gene sequences are compared with gene databases. A well-known collection of algorithms for this is Blast .

Individual evidence

  1. Jan Petzold: High-performance browsing of larger amounts of data with JavaScript. In: heise Developer. Heise Medien GmbH & Co. KG, March 13, 2015, accessed on April 29, 2020 : “Inaccurate search (fuzzy): The search result also contains elements that are similar to the search term. It is mostly used in autocomplete fields or as a search suggestion (e.g. "developer" also finds "developer"). "
  2. Fuzzy search. In: IT-Wissen.info. DATACOM Buchverlag GmbH, accessed on April 29, 2020 : “The fuzzy search is a fuzzy or fault-tolerant search. In contrast to an exact search, the fuzzy search also shows results if the search term was entered incorrectly or if the exact spelling of the search term is not known. "
  3. Sonja Franz: How search algorithms work: The Levenshtein distance. In: interger_net. integer_net GmbH, February 22, 2018, accessed on April 29, 2020 : “The Levenshtein distance describes the minimum number of changes that are necessary to generate the second string from the first string. Changes are considered to be the addition, removal and replacement of characters. It is named after the Russian mathematician Vladimir Levenshtein (1935–2017). "
  4. Tillmann Wegst: String Similarity. In: Tillmann Wegst // software development. January 22, 2006, accessed April 29, 2020 : “N-gram matches are another and very common measure. An N-gram is a sequence of neighboring elements, in the case of strings, that is, consecutive characters. "N" denotes the length of the sequences used; Trigrams, i.e. N-grams with length 3. To determine the similarity, two strings are broken down into the N-grams they contain, and the size of the intersection of the two N-gram sets provides the value for the similarity of the strings. "
  5. ^ Rob Gravelle: MySQL Fuzzy Text Searching Using the SOUNDEX Function. In: DATABASE Journal. QuinStreet Inc., March 9, 2015, accessed on April 29, 2020 (English): “Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. SOUNDEX codes from different strings can be compared to see how similar the strings sound when spoken. "
  6. ^ Ernst August Wieden: Fuzzy Search in Databases - Conceptual Foundations, System Design and Prototype Implementation. (pdf) In: Diploma thesis. Technical University Hamburg-Harburg - Software Systems Work Area, August 2002, accessed on April 30, 2020 : “This work pursues the aim of making the query mechanism of databases more intelligent; this especially with regard to the use of fuzzy logic methods, which up to now have mainly been used in control engineering. "
  7. Stefan Krempl: Fuzzy Logic conquers the Internet. In: heise online. Heise Medien GmbH & Co. KG, May 4, 2005, accessed on April 30, 2020 : "" In the coming years, the Internet will be the most important application area for fuzzy logic, "explained Lotfi Zadeh in a lecture at the TU Berlin. The retired computer science professor at the University of California at Berkeley assumes that the sometimes quite "stupid" search engines will turn into helpful servants who will quickly find a suitable answer to all possible questions. To do this, they would first have to "understand" people's natural language, which is best done with fuzzy logic. "
  8. Benno Nieswand: Levenshtein algorithm - how does the Levenshtein algorithm work ...? In: Levenshtein. Accessed on April 29, 2020 : "Although sophisticated improvements have been made with regard to the calculation of large matrices, there is no alternative to the usually quite large calculation effort."

See also