Successor Variety

from Wikipedia, the free encyclopedia

Successor Variety is a method from linguistics that aims to determine the morphemes of a word. This is done by determining the morpheme boundaries. The process goes back to Zellig S. Harris . Hagen Langer modified the procedure and found only 7.24% incorrect segmentation in a test.

Among other things, Successor Variety is used in information retrieval to carry out stem word reduction when preprocessing documents .

Procedure

In order to determine the morphemic boundaries via the succession variety, one needs a group of words from which one chooses the word to be determined. Now you go through this word letter by letter and count the number of letters that could follow in order to create a valid word from the word set from this partial word. The number of possible letters will continue to decrease until you come across the morphemic limit, where it then increases by leaps and bounds.

example

In practice, the number of words must be significantly larger! Be the phrase {get, go, have, hate, pet, homework, help, cheerful, household}

The word to be worked on is: homework

H letters: {o, a, i, e} number: 4

Ha letters: {b, s, u} number: 3

Hau letters: {s} Number: 1

House letters: {t, a, h} Number: 3

Hausa letters: {u} Number: 1

Hausau letters: {f} Number: 1

Hausauf letters: {g} Number: 1

Homework letters: {a} Number: 1

Homework Letters: {b} Number: 1

Homework Letters: {e} Number: 1

Homework Letters: {} Number: 0

The morphemic limit is here at 'house', since the number of letters there increases from 1 to 3.

See also

literature

  • Zellig S. Harris: From phoneme to morpheme. In: Language 31, 1955, 190-222. (Also in: The same: Papers in Structural and Transormational Linguistics . Reidel, Dordrecht 1970, pp. 32–67.)
  • Zellig S. Harris: Morpheme Boundaries within Words: Report on a Computer Test. In: Transformations and Discourse Analysis Papers 73, Dordrecht 1967 . (Also in: The same: Papers in Structural and Transormational Linguistics . Reidel, Dordrecht 1970, pp. 68-77.)
  • Ursula Klenk, Hagen Langer: Morphological Segmentation Without a Lexicon. In: Literary and Linguistic Computing , Volume 4, Number 4, 1989, pages 247-253.
  • Hagen Langer: An automatic morph segmentation method for German word forms. Diss. Phil. Göttingen 1991.

Individual evidence

  1. Langer 1991, page 81.