Quantitative Lexicology

from Wikipedia, the free encyclopedia

The Quantitative Lexicology studies the vocabulary (the vocabulary ) of any languages in terms of structure and changes with the aim to discover the laws that are involved in the creation of the state, a certain vocabulary is in the at any given time. The primary goal is to develop a theory of lexicons that consists of linguistic laws that are mutually related and that allow observed states and processes to be explained in a scientific way. In this respect, quantitative lexicology is an important sub-discipline of quantitative linguistics .

Some aspects of quantitative lexicology

A pioneering study was presented by Köhler (1986). In this thesis, by analyzing a corpus of around 13,000 lemmas, he develops the basic features of linguistic synergetics , the main objective of which is to work out the interaction of linguistic laws. An example: The length of a lexeme is influenced by its frequency (frequency) and the size of the lexicon of the language concerned, and it in turn controls the number of meanings of the corresponding lexeme. What Köhler worked out on the basis of a German corpus could be confirmed on the basis of Polish and Japanese. This approach shows how one can come to a theory of language from the point of view of quantitative linguistics and thus also of quantitative lexicology .

Further topics can be mentioned: the phonological and morphological structure of the words, the meaning of the words, their dialectal distribution as well as processes of growth and loss of vocabulary.

An example

One example of the regularities that govern the structure of vocabulary is word length. Examining the dependence of the average word length of the frequency of words, so it can be shown that this connection for a sample of a German dictionary of the frequency with can be represented adequately. This applies both when the word length is determined by the number of phonemes and when the number of syllables per word is determined as a criterion . If you choose the number of morphs per word instead, another factor must be added to the formula mentioned.

The subject of such a study was the most frequent 8000 words of a frequency dictionary of German, whereby for the most frequent 1000 (x = 1), then for the second most frequent 1000 words (x = 2) and furthermore the mean values ​​of the word lengths were determined on the basis of a random sample. As an example, the result for the part of the investigation in which the word length was determined according to the number of syllables:

x n (x) NP (x)
1 1.95 1.92
2 2.10 2.23
3 2.55 2.41
4th 2.55 2.53
5 2.60 2.62
6th 2.50 2.68
7th 2.90 2.72
8th 2.70 2.75

X stands for the first, second, third to eighth thousand of the words; n (x) the observed average word length (in syllables) of the corresponding thousand, NP (x) the calculated word length that is obtained when the given formula is adapted to the observation values. a, b and c are the parameters of the given formula. The result shown is significant with a coefficient of determination D = 0.85. D must be 0.80 or greater to indicate a significant result; this condition is met. At least for this sample it is clear that, according to the hypothesis, the frequency influences the word length: the more common words are, the shorter they are on average.

Distribution of different forms or functions

In many cases, the survey of the frequency with which different forms or functions of lexemes can be observed is illuminating.

Frequency of occurrence of habere in a corpus of the Serbo-Croatian language

An example: If one includes not only all existential clauses but also all possessive clauses with habere from the corpus of the Serbo-Croatian language , then the affirmative form ima ( there is ) appears almost as often as the negated form nema . In the affirmative form ima , on average, every fifth use is existential, in the negated form nema, however, every second (see diagram). This means that almost three quarters of the existential use of habere in the corpus is realized in negated form.

literature

  • Gabriel Altmann , Dariusch Bagheri, Hans Goebl, Reinhard Köhler, Claudia Prün: Introduction to quantitative lexicology. Peust & Gutschmidt, Göttingen 2002, ISBN 3-933043-09-3 , pp. 94-133.
  • Rolf Hammerl: Investigations into the structure of the lexicon: construction of a lexical basic model. Scientific publishing house Trier, Trier 1991, ISBN 3-88476-005-X .
  • Volodymir Kaliuščenko, Reinhard Köhler, Viktor Levickij (eds.): Problemy typolohičnoi ta kvantytatyvnoi leksikologii - Problems of Typological and Quantitative Lexicology . Ruta, Černivci 2007, ISBN 978-966-568-897-6 . (Articles in German, English, Russian and Ukrainian)
  • Reinhard Köhler : On linguistic synergetics: structure and dynamics of the lexicon. Brockmeyer, Bochum 1986, ISBN 3-88339-538-2 .
  • Reinhard Köhler: Properties of lexical units and systems. In: Reinhard Köhler, Gabriel Altmann, Gabriel, Rajmund G. Piotrowski (eds.): Quantitative Linguistics - Quantitative Linguistics. An international manual . de Gruyter, Berlin / New York 2005, ISBN 3-11-015578-8 , pp. 305-312.
  • Juhan Tuldava: Problems and methods of quantitative-systemic lexicology. Scientific publishing house Trier, Trier 1998, ISBN 3-88476-314-8 .

Web links

Individual evidence

  1. It is an extract from the so-called LIMAS corpus; R. Köhler: On linguistic synergetics. 1986, p. 95.
  2. ^ R. Köhler: On linguistic synergetics. 1986, p. 74.
  3. ^ R. Hammerl: Investigations on the structure of the lexic 1991.
  4. Haruko Sanada: Investigations in Japanese Historical Lexicology. revised Edition. Peust & Gutschmidt, Göttingen 2008, ISBN 978-3-933043-12-2 , especially pp. 113-141.
  5. ^ G. Altmann, D. Bagheri and others: Introduction to quantitative lexicology. 2002.
  6. ^ Karl-Heinz Best : Frequency and length of words. In: V. Kaliuščenko and others: Problemy typolohičnoi ta kvantytatyvnoi leksikologii. 2007, pp. 83-90, table p. 87.
  7. a b Snježana Kordić : Words in the border area between lexicon and grammar in Serbo-Croatian (=  Lincom Studies in Slavic Linguistics . Volume 18 ). Lincom Europa, Munich 2001, ISBN 3-89586-954-6 , p. 206 .