Law of the distribution of sentence lengths

from Wikipedia, the free encyclopedia

The law of the distribution of sentence lengths deals in linguistics with the question of how often sentences of different complexity are used in texts. A particularly simple criterion for sentence complexity is its length , which can be defined in various ways: as the number of letters, syllables, words, sub-sentences, etc. per sentence. If you now examine how often sentences of different lengths occur in texts, you can see that they are controlled by a law of language. In principle it is the same law of language that also affects the frequency distribution of word lengths ( law of the distribution of word lengths ; theory: Wimmer et al.).

Use of different definitions for sentence complexity

Wilhelm Fucks determined the sentence length according to the number of syllables per sentence, divided the sentences into classes of 1–5, 6–10 syllables and so on, and named the polya distribution as a model suitable for the style of many authors. If the sentence length is determined according to the number of partial sentences ( clauses ) per sentence, the Hyperpoisson distribution can be viewed as a good model for around 500 German texts . Other models are often more suitable for other criteria for sentence length, in other languages, for other types of text, etc. For example, if you choose the number of words per sentence as the criterion for sentence length, the negative binomial distribution can be used as a model for German texts.

If, on the other hand, the length of syntactic constructions is determined according to the number of their end nodes, then these also obey laws.

Instead of the length of the sentences, you can also choose the sentence depth as a measure of complexity. The sentence depth can be z. B. defined by the set of rules that are needed in a generative syntax to generate a sentence; equivalent: number of nodes in a tree graph for such a set. In this case too, corresponding distribution laws apply.

In summary, it can be said that the investigations into sentence lengths so far support the hypothesis advocated by quantitative linguistics that the language system and use behave according to certain, theoretically justifiable language laws .

An example

The following table gives an example of a distribution of sentences of different lengths (measured as the number of clauses) in a short German prose text. The observation data are taken from the study by Niehaus (1997), the fit of the Hyperpoisson distribution was recalculated.

x n (x) NP (x)
1 73 68.40
2 30th 36.44
3 21st 19.14
4th 9 9.92
5 6th 5.07
6th 3 2.56
7th 1 1.27
8th 1 1.20

(Where x is the number of clauses per sentence, starting with x = 1; n (x) is the number of sentences with x clauses observed in this text ; NP (x) is the number of clauses that is calculated when one fits the Hyperpoisson distribution to the observed data Result: The Hyperpoisson distribution is a good model for this text with the test criterion P = 0.84, where P is considered good if it is greater than or equal to 0.05 For more detailed explanations, please refer to the literature given.)

See also

literature

  • Gabriel Altmann : Distribution of sentence lengths . In: Klaus-Peter Schulz (ed.): Glottometrika 9 . Brockmeyer, Bochum 1988, pages 147-169. ISBN 3-88339-648-6 .
  • Gabriel Altmann: Repetitions in Texts . Brockmeyer, Bochum 1988. ISBN 3-88339-663-X .
  • Karl-Heinz Best : Sentence lengths in German: distributions, mean values, language change . In: Göttinger Contributions to Linguistics 7, 2001, pages 7–31.
  • Karl-Heinz Best: How many words are there in German? A contribution to the Sherman Altmann laws. In: Karl-Heinz Best (editor): Frequency distributions in texts. Peust & Gutschmidt, Göttingen 2001, pages 167-201. ISBN 3-933043-08-5 .
  • Karl-Heinz Best: sentence length . In: Reinhard Köhler , Gabriel Altmann, & Rajmund G. Piotrowski (eds.): Quantitative Linguistics - Quantitative Linguistics. An international manual . de Gruyter, Berlin / New York 2005, pages 298–304. ISBN 3-11-015578-8 .
  • Anja Kaßel, Eleanor Livesey: Studies on sentence length frequency in English: Using the example of texts from the press and journalism, literature (fiction) . In: Glottometrics 1, 2001, pages 27-51. (PDF full text )
  • Emmerich Kelih: Investigations into sentence length in Russian and Slovenian prose texts . Volume 1 & Volume 2. Diploma thesis, Graz 2002.
  • Emmerich Kelih, Peter Grzybek : Sentence lengths: definitions, frequencies, models . In: A. Mehler (Ed.): Quantitative methods in computational linguistics and language technology . [= Special Issue of: LDV Forum. Journal for Computational Linguistics and Language Technology 2004.]
  • Ioan-Iovitz Popescu, Karl-Heinz Best, Gabriel Altmann: Unified Modeling of Length in Language . RAM-Verlag, Lüdenscheid 2014. ISBN 978-3-942303-26-2 . (Chapter "Sentence length", pages 94–107.)
  • Martin Wittek: On the development of sentence length in contemporary German. In: Karl-Heinz Best (editor): Frequency distributions in texts. Peust & Gutschmidt, Göttingen 2001, pages 219-247. ISBN 3-933043-08-5 .

Individual evidence

  1. Gejza Wimmer, Gabriel Altmann: The Theory of Word Length Distribution: Some Results and Generalizations. In: Peter Schmidt (Ed.): Glottometrika 15. Issues in General Linguistic Theory and the Theory of Word Length . Wissenschaftlicher Verlag Trier, Trier 1996, pages 112-133, ISBN 3-88476-228-1 ; Gejza Wimmer, Reinhard Köhler, Rüdiger Grotjahn & Gabriel Altmann: Towards a Theory of Word Length Distribution. In: Journal of Quantitative Linguistics 1, 1994, 98-106; Archived copy ( memento of the original from April 13, 2014 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / lql.uni-trier.de
  2. ^ Wilhelm Fucks: According to all the rules of art. Deutsche Verlags-Anstalt, Stuttgart 1968, pages 84-88.
  3. For the suitability of the Hyperpoisson distribution in comparison to other distributions when using this criterion, see Best 2005, page 301.
  4. See for example: Emmerich Kelih, Peter Grzybek: Frequencies of sentence lengths: On the factor of the interval size as an influencing variable ( using the example of Slovenian texts) . In: Glottometrics 8, 2005, pages 23-41. (PDF full text. )
  5. ↑ In addition: Best 2001, Wie Many Words ..., 198f.
  6. Archived copy ( memento of the original from January 22, 2016 in the Internet Archive ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / lql.uni-trier.de
  7. Archived copy ( memento of the original from January 21, 2017 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / lql.uni-trier.de
  8. It's about: Gert Prokop: The mouse in the window . In: Gert Prokop: The mouse in the window. Bedtime stories . Benziger, Zurich / Cologne 1982, pages 7-18, ISBN 3-545-31111-2 .
  9. Brigitta Niehaus: Investigation of the sentence length frequency in German . In: Karl-Heinz Best (Ed.): Glottometrika 16. The Distribution of Word and Sentence Length. Wissenschaftlicher Verlag Trier, Trier 1997, page 213-275, data page 240. ISBN 3-88476-276-1 .

Web links

Wiktionary: sentence length  - explanations of meanings, word origins, synonyms, translations
Wiktionary: sentence length distribution  - explanations of meanings, word origins, synonyms, translations