Stemming

from Wikipedia, the free encyclopedia

As Stemming ( parent form reduction , normal form reduction ) is known in information retrieval and in the linguistic computer science a method with which different morphological a variant definition to their common word stem are returned, z. B. the declination of word or word to word and conjugation of seen or saw to see .

history

In 1968, Julie Beth Lovins published the first known stemming algorithm. This algorithm had a great influence on the further development of stemming algorithms. A later stemmer was published by Martin Porter in 1980 . This stemmer became the de facto standard for stemming English-language texts. Porter received the Tony Kent Strix Award in 2000 for his work in the field of stemming algorithms and information retrieval.

Many implementations of the Porter-Stemmer algorithm were written and distributed free of charge, but many contained small bugs. This meant that these stemmer could never reach their full potential. To eliminate this source of error, Porter published an official implementation of the algorithm around the year 2000. In the following years he expanded his work by creating a framework for writing stemming algorithms with Snowball . He also created an improved stemmer for the English language along with stemmer for other languages.

Stemming process

There are different algorithms for different languages for stemming . The development of a stammer is an experimental science, since algorithms cannot be verified, but must first be tested on text corpora and in practice.

Examples:

An alternative, much simpler and less precise option is to search for partial strings, e.g. B. with the star operator . This is also known as truncation .

Remarks

In contrast to the search, for example with regular expressions , which are used for searching in large databases - e. B. Search engines - would be too slow, a lot of texts will be indexed once so that they can be searched quickly later.

In some languages, word breakdown and composition ( ran awayrun away ) also play an important role.

See also

Individual evidence

  1. Julie Beth Lovins: Development of a stemming algorithm. In: Mechanical Translation and Computational Linguistics. Vol. 11, No. 2, June 1968, pp. 22-31.
  2. Martin Porter: An algorithm for suffix stripping. In: Program. Vol. 3, No. 14, July 1980, pp. 130-137.
  3. Official implementation of the Porter-Stemmer algorithm