Phrase length

from Wikipedia, the free encyclopedia

Phrase length is defined by the number of linguistic units that make up a phrase . Phrase itself is not a very clear term, as it is understood differently in different scientific contexts (traditional grammar, generative grammar, ...). In many cases the phrase corresponds to the part of the sentence .

Determination of the phrase length

The phrase length can be determined by the number of each smaller unit ( letters , sounds , morphs , syllables and others). Previously, the phrase length was indicated by the number of letters or phonemes or by the number of words .

Meaning of the phrase length

The phrase length has so far been used in both quantitative stylistics and quantitative linguistics .

Meaning of phrase length in stylistics

In quantitative stylistics, types of text can be differentiated with the help of the phrase length criterion ; Using the example of Russian, Hoffmann gives the following overview of the phrase length of subject groups (= word group that makes up the subject in a sentence) in two different text groups:

Phrase length Scientific prose Artistic prose
0 11.4 3.8
1 27.3 73.7
2 24.1 13.9
3 12.4 4.3
4th 8.4 1.7
≥5 16.4 2.6

The phrase length is given here by the number of words per phrase. There is a clear tendency towards longer phrase lengths for the subject groups in scientific prose; the same applies to the predicate group.

Meaning of phrase length in linguistics

In quantitative linguistics, texts were examined to determine which phrase lengths are represented in them and how often. The aim of these investigations was to prove the distribution of the phrase lengths as following a law of language . For the phrase length distribution in business journals it could be shown that they follow the 1-shifted Hyperpoisson distribution:

x n (x) NP (x)
1 383 382.70
2 701 700.45
3 427 412.56
4th 121 144.79
5 28 36.19
6th 19th 7.02
7th 6th 1.28

In the table, x is: number of auto-semantics (words with independent lexical meaning) per phrase, n (x) the number of phrases of length x observed in the corpus evaluated; NP (x) is the number of phrases of length x that is computed when fitting the 1-shifted Hyperpoisson distribution to the observed data. With the discrepancy coefficient C = 0.0036, the test shows that the 1-shifted Hyperpoisson distribution is a good model for the observed data. The result of such a test is rated as good if C ≤ 0.01, which is the case in this case. The examination of a short text from Lichtenberg's Sudel books produced an equally good result. For more detailed explanations, please refer to the literature given.

In a new study, Wang Hua successfully applied a different model to the data of the noun phrases in a very extensive body of text in British English (over 100,000 NPs).

See also

Individual evidence

  1. ^ Lothar Hoffmann: Communication means technical language. An introduction. Second completely revised edition. Narr, Tübingen 1985, p. 194. ISBN 3-87808-771-3 .
  2. Hoffmann 1985, p. 194.
  3. Hoffmann 1985, p. 195.
  4. Karl-Heinz Best: Distribution of phrase and sub-clause lengths in German technical language . In: Naukovyj Visnyk Černivec'koho Universytetu: Herman'ska filolohija . Vypusk 319-320, 2006, pp. 113–120, example p. 116. The data for the tests presented here and a number of other tests are taken from Schefe's study: Peter Schefe: Statistical syntactic analysis of technical languages ​​with the help of electronic computing systems using the example of medical, business and literary language in German. Kümmerle, Göppingen 1975. ISBN 3-87452-293-8 . (Extended and revised version of the dissertation.)
  5. ^ Karl-Heinz Best : Quantitative Linguistics. An approximation . 3rd, heavily revised and expanded edition. Peust & Gutschmidt, Göttingen 2006, p. 51. ISBN 3-933043-17-4 .
  6. Wang Hua: Length and complexity of NPs in Written English. In: Glottometrics 24, 2012, pages 79-88 (PDF full text ); Distribution of the lengths of the NPs: page 82f.