Text coverage

from Wikipedia, the free encyclopedia

Text coverage (also: text coverage ) is the proportion that an individual word or a defined group of words has in the vocabulary of a text or a text corpus .

meaning

Text coverage is the decisive criterion for language didactics when it comes to developing the basic vocabulary of a language. The basic vocabulary is that part of the vocabulary whose knowledge makes it possible to understand a very high proportion of the words in any text. Pfeffer divides the basic vocabulary into a "basic level", which contains the 1285 most common words, and an "intermediate level" based on this. If you look at the 22 text analyzes of very different subject areas, you can see that with the vocabulary of the basic level in each case over 60%, with that of the basic and intermediate level together over 80%, in many cases over 90% of the respective text vocabulary are known. A more recent study states: "The few studies that have been published for German indicate that the first 2,000 words achieve a text coverage of 80 percent."

For Chinese it is reported that, according to estimates by Chinese scientists, a text coverage of "ordinary texts" of approx. 86% is achieved with 3000 words, approx. 91% with 5000 and 95% with 8000 words.

Implications for stylistics can also be cited: For example, Kempgen mentions that in colloquial language there is a much higher level of text coverage than in the language of literature. A relatively low text coverage by the basic vocabulary indicates a linguistically demanding text.

problem

In order to work out the text coverage, it must be clarified what should be taken into account as a word ( lexeme ) or word form. For example, the different inflected forms of the noun “man” can be combined into one word or each can be evaluated individually. The evaluation of homonyms and polysemes also needs to be clarified, just to mention a few aspects. Rosengren does justice to this in that she lists the text coverage of the so-called running words and, separately, those of the various words.

literature

See also

Wiktionary: Text coverage  - explanations of meanings, word origins , synonyms, translations

Individual evidence

  1. J. Alan Pfeffer: Basic German. Elaboration and evaluation of three German corpora. Narr, Tübingen 1975, ISBN 3-87808-627-X , page 10ff.
  2. Pfeffer, pp. 85–125.
  3. Winnerlöv 2012, page 28.
  4. Cornelia Schindelin: The quantitative study of the Chinese language and writing. In: Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (eds.): Quantitative Linguistics - Quantitative Linguistics. An international manual . de Gruyter, Berlin / New York 2005, pp. 947-970, reference: pp. 952f. for text coverage . ISBN 3-11-015578-8 .
  5. ^ Sebastian Kempgen: Russian language statistics. Systematic overview and bibliography. Otto Sagner, Munich 1995. Page 51. ISBN 3-87690-617-2 .
  6. Pfeffer, page 20.
  7. ^ Inger Rosengren: A frequency dictionary of the German newspaper language. Die Welt, Süddeutsche Zeitung. 1. CWK Gleerup, Lund 1972. "current words", "various words": page XVIIIf, their text coverage: XXXVIIIf.