Law of Distribution of Morph Lengths

from Wikipedia, the free encyclopedia

The length of a morph can be defined in different ways: as the number of letters, sounds or phonemes .

Morph lengths in a small German text corpus

As an example of the occurrence of morphs in a small German text corpus, the following table presents the data for 20 texts from Lichtenberg's Sudelbuch H , which comprise a total of 5618 morphs:

Phonemes per morph Number of morphs with this phoneme number Percentage
1 1277 22.73
2 2106 37.49
3 1304 23.21
4th 654 11.64
5 222 3.95
6th 42 0.75
7th 7th 0.12
8th 4th 0.07
9 2 0.04

The average morph length in this case is ML = 2.40.

For comparison, two other small text corpora can be cited:

Class of text Number of morphs Mean (phonemes / morphs)
Pestalozzi, fables 5841 2.33
Press releases 3286 2.52

The average morph length is ML = 2.33 for Pestalozzi and 2.52 for press releases.

Example for the regular distribution of Morph lengths in individual texts

If one examines how often morphs of different lengths occur in a series of individual texts, one can see that they are controlled by a law of language. Studies on lexica are still pending; However, it is to be expected that in texts and lexicons different distributions will represent the occurrence of the morph. In principle, it is the same law of language that quantitative linguistics developed especially for the frequency distribution of word lengths ( law of the distribution of word lengths ; theory: Wimmer et al.).

An example of a morph length distribution (measured as the number of phonemes per morph) in a short press release:

x n (x) NP (x)
1 28 26.02
2 42 44.86
3 31 31.07
4th 17th 13.47
5 3 5.58

(Where x: number of phonemes per morph, n (x) is the number of morphs observed in this text with x phoneme length in the text; NP (x) is the number of morphs with x phonemes, which is calculated using the hyperpoisson distribution of the observed data conform. result: the Hyperpoisson distribution is when / is a good model to the test criterion P = 0.30, where P is considered to be good for this text larger or equal to 0.05 for detailed explanations. please refer to the literature given.)

The morphology distribution of this text is quite typical for German: the most common are the morphs, which consist of 2 or 3 phonemes; Both the monophonic and polyphonic ones are almost always rarer.

Overall, the investigations into morphine lengths are not very numerous. At least it can be shown that the Hyperpoisson distribution is a good model for morphine lengths in 42 German prose texts. Other models are possible for other languages ​​and other types of text. Creutz (2003), for example, shows that different distributions have to be used in the Finnish dictionary, depending on whether one uses morph types or morph tokens . So far, however, nothing speaks against the general hypothesis that linguistic units of any kind are distributed in texts or dictionaries according to certain laws.

See also

literature

  • Karl-Heinz Best : Morph lengths in fables by Pestalozzi. In: Göttinger Contributions to Linguistics 3, 2000, pages 19-30.
  • Karl-Heinz Best: Morph length . In: Reinhard Köhler, Gabriel Altmann, & Rajmund G. Piotrowski (eds.): Quantitative Linguistics - Quantitative Linguistics. An international manual . de Gruyter, Berlin / New York 2005, ISBN 3-11-015578-8 , pages 255–260.
  • Karl-Heinz Best: How many morphs do words contain in German press releases? In: Glottometrics 13, 2006, pages 47-58 (PDF full text ).
  • Karl-Heinz Best: Syllable, word and morph lengths in Lichtenberg. In: Glottometrics 21, 2011, pages 1–13 (PDF full text ).
  • Emmerich Kelih, Peter Zörnig: Models of morph lengths: Discrete and continuous approaches. In: Glottometrics 24, 2012, pages 70-78 (PDF full text ).
  • Ioan-Iovitz Popescu, Karl-Heinz Best, Gabriel Altmann : Unified Modeling of Length in Language. RAM-Verlag, Lüdenscheid 2014. ISBN 978-3-942303-26-2 . (Chapter "Morph length", pages 11–13.)
  • Regina Pustet & Gabriel Altmann : Morpheme Length Distribution in Lakota . In: Journal of Quantitative Linguistics 12, 2005, pp 53-63.

Web links

Wiktionary: Morph length  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. Best, Karl-Heinz: Syllable, word and Morph lengths in Lichtenberg. In: Glottometrics 21, 2011, pages 1-13; for morph lengths on pages 8–11. (PDF full text ). The table results from adding up the morphine lengths of the individual texts.
  2. ^ Karl-Heinz Best: Morph lengths in fables by Pestalozzi . In: Göttinger Contributions to Linguistics 3, 2000, pages 19-30.
  3. ^ Karl-Heinz Best: On the length of morphs in German texts . In: Karl-Heinz Best (Ed.): Frequency distributions in texts (pages 1–14). Göttingen: Peust & Gutschmidt, 2001.
  4. Gejza Wimmer, Gabriel Altmann: The Theory of Word Length Distribution: Some Results and Generalizations. In: Peter Schmidt (Ed.): Glottometrika 15 . Wissenschaftlicher Verlag Trier, Trier 1996, pp. 112-133; Gejza Wimmer, Reinhard Köhler, Rüdiger Grotjahn & Gabriel Altmann: Towards a Theory of Word Length Distribution. In: Journal of Quantitative Linguistics 1, 1994, 98-106
  5. ^ Karl-Heinz Best: On the length of morphs in German texts . In: Karl-Heinz Best (Ed.): Frequency distributions in texts (pp. 1–14). Göttingen: Peust & Gutschmidt, 2001, p. 9
  6. Archived copy ( Memento of the original from October 15, 2013 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice.  @1@ 2Template: Webachiv / IABot / lql.uni-trier.de
  7. ^ Mathias Creutz: Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency. In: 41st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. Vol. 3, 2003: 280-287