Quantitative text analysis

from Wikipedia, the free encyclopedia

Quantitative text analysis is an approach to track down latent structures and processes in texts. “Latent” refers to phenomena that cannot be obtained directly from the surface, from the grammatical or lexical forms of a text.

Task of quantitative text analysis

Statistical evaluations often rely on the largest possible amounts of data (text corpora, dictionaries) in their analyzes in order to achieve the most accurate results possible. In the quantitative text analysis, on the other hand, the uniformity / cohesion of individual texts is brought to the fore. Large amounts of data may have the advantage of being representative; the individuality of a self-contained text does not come into view.

The aim of quantitative text analysis is to uncover latent phenomena on which a single text is based; that is, according to Altmann (1988, 3) it is about

  • the " characterization of texts with the help of parameters",
  • the " comparison of texts based on their characteristics and the subsequent classification of the texts",
  • the “ investigation of laws that control the construction of texts”, as well as the development of a “theory of texts” based on this.

An example

Take the examination of parts of speech in a text as an example: On the one hand, a contribution by Ziegler, Best & Altmann deals with the survey of the frequency with which the individual parts of speech occur in a certain text and with whether the frequency distribution obtained follows a distribution law. Another aspect is the question of how the individual parts of speech increase in the course of the text until the frequency distribution just mentioned has come about at the end of the text. One can then see that each part of speech in its increase behaves in a specific way that is different from every other. A third aspect is the mutual interaction of parts of speech with one another. The parts of speech are already several latent phenomena of a text.

perspective

Parts of speech are only a small part of the linguistic phenomena of a text. Units and classes of units can be statistically recorded on all linguistic levels of a text (syllable types, sentence types, types, tokens, first occurrence of words and much more). The next level of investigation is to address the problem of what the results of one phenomenon have to do with those of another. A number of relevant experiences have already been made with the concept of linguistic synergetics and can also be applied at text level. Not only are individual phenomena subject to laws, but also their interactions.

literature

  • Vivien Altmann, Gabriel Altmann : Instructions for quantitative text analysis. Methods and Applications. RAM-Verlag, Lüdenscheid 2008. ISBN 978-3-9802659-5-9 .
  • Reinhard Köhler : On linguistic synergetics: structure and dynamics of the lexicon. Brockmeyer, Bochum 1986. ISBN 3-88339-538-2 .
  • C. George Sandulescu et al. a .: Quantifying Joyce's Finnegans Wake , in: Glottometrics 30, 2015, pp. 45–72 (PDF full text )

Individual evidence

  1. Gabriel Altmann: Repetitions in Texts . Brockmeyer, Bochum 1988. ISBN 3-88339-663-X . Quotations p. 3.
  2. As examples among many: Otto A. Rottmann: Word length in the Baltic languages ​​- are they of the same type as the word lengths in the Slavic languages? In: Glottometrics 6, 2003, pp. 52 - 60. (PDF full text. ); Otto Rottmann: On Word Length in German and Polish . In: Glottometrics 42, 2018, pp. 13 - 20. (PDF full text. ).
  3. Arne Ziegler, Karl-Heinz Best , Gabriel Altmann: A contribution to text spectra , in: Glottometrics 1, 2001, pp. 97-108. (PDF: full text )
  4. ^ Karl-Heinz Best: On the interaction of parts of speech in texts . In: Papers zur Linguistik 58, 1998, pp. 83–95.