Quantitative Linguistics

from Wikipedia, the free encyclopedia

The Quantitative Linguistics (also: Statistical linguistics . Engl Quantitative Linguistics , Statistical Linguistics ) is a discipline of Mathematical linguistics and thus the general linguistics or linguistics total. The subjects of quantitative linguistics are language acquisition, language change and the use and structure of languages. She examines languages, their units and structures, using combinatorics , probability theory, difference and differential equations and tests the results with the help of statistics; Its task is to establish linguistic laws with the aim of developing an exact scientific theory of language that forms a system of interconnected linguistic laws. Linguistic synergetics has devoted itself to researching and formulating such a network of interacting language laws .

Quantitative linguistics is based on the results of language statistics , which can be understood either as statistics of languages ​​or as statistics of any linguistic objects without any further theoretical claims having to be connected with it. The corpus linguistics and computational linguistics important basis.

On the history of quantitative linguistics

A history of quantitative linguistics cannot yet be presented, as there is still a considerable need for research, even if overview representations exist for some areas. However, some aspects can be named:

Quantitative linguistics dates back to ancient Greece and India. One line of tradition is the application of combinatorics to linguistic objects; another is based on elementary statistical surveys, which are referred to under the keywords colometry and stitchometry .

A thematically broader and more continuous development began in the 19th century. a. about the periodization of the works of an author, about sound and letter statistics as preliminary work for the development of shorthand systems and as a basis for language comparisons, about the different design of verses and the duration of sounds depending on the word length. The investigations into sound length as well as ideas about the interaction of other language properties present the first concepts that lead to the development of language laws in the 20th century, the best known is Zipf's law . In the 20th century, a number of other topics are added: Identification of anonymous authors, action quotient , language structure, language change law , type-token relation , development of children's language skills, dynamic aspects of text structure, etc.

Another essential aspect of the development of quantitative linguistics in the 20th century is that in 1994 the international society IQLA (International Quantitative Linguistics Association) was founded. A series of international conferences began in 1991 ( First International Conference on Quantitative Linguistics [= QUALICO ] Trier 1991, 2nd Moscow 1994 etc.). Important publication organs are the journals Journal of Quantitative Linguistics (1994ff.), Göttingen Contributions to Linguistics (1998–2009), Glottometrics (2001ff), Glottotheory (2008ff), Mathematical Linguistics (2015ff); there are also the book series Quantitative Linguistics (1978ff) and Studies in Quantitative Linguistics (2008ff).

Language Laws in Quantitative Linguistics

Under law , the Quantitative Linguistics sees a derived from theoretical assumptions (deduced) Laws hypothesis is formulated mathematically, communicating with other laws correlate sufficiently verified by specific tests and was not refuted. A law must apply to all languages ​​in which the corresponding boundary conditions are met.

“In addition, it can be stated that these properties of linguistic elements and their interrelationships are subject to generally applicable laws that can be formulated strictly mathematically in the way we know it from the natural sciences. It should be noted that these are stochastic laws; they are not fulfilled in every individual case (this is neither necessary nor possible), but rather they prescribe the probabilities with which certain events will occur or certain conditions will occur as a whole. It is easy to find counterexamples for all of the above examples which, however, do not violate the relevant laws as individual cases, since deviations from the statistical average are not only permissible, but even necessary and, in turn, quantitatively precisely determined. Basically, this situation does not differ from that in the natural sciences, in which the old deterministic ideas have long since become obsolete and have also been replaced by statistical-probabilistic models. "

- Reinhard Koehler

Some language laws

There are a number of suggestions for language laws, including:

New studies show that the distributions of linguistic units of different lengths in the vast majority of cases follow a single model, the Zipf-Alexeev function.

Other speech units subject to this law: this concerns Clauselängen , phrase lengths , the lengths of Kola , verses , subsets and so-called Hrebs and the lengths of speech acts . The same applies to the distribution of sounds of different duration ( sound length ).

  • Martin's Law : If one examines in a lexicon which word is suitable to explain a certain word in key words, and continues this by asking the explanatory word again how it should be explained itself, one comes up with increasingly general explanatory words . If this is done for many words, the result is a hierarchy of fewer and more general words. Certain legal relationships exist between these hierarchical levels.
  • Menzerath's law , in linguistics also: Menzerath-Altmann law : Menzerath's law states that the larger a unit is, i.e. H. The more components it consists of, the smaller these components are.
  • Rank-frequency laws : They concern a number of different language phenomena. If, for example, one examines in a large text corpus which word is the most common, which is the second most frequent, the third most common, etc., and puts these words in a ranking in which the most common and its frequency comes first, the second most common comes second Position, etc., a ranking is obtained. Various mathematical models have been proposed in the literature for the entire ranking. The procedure can in principle be applied to any linguistic units. A few examples are referred to here:
    • Letters, sounds or phoneme frequencies: The units in question are ranked according to the frequency with which they appear in texts or in the lexicon ( letter frequency ).
    • Word associations: One investigates which associations test persons connect how often with a certain word as a stimulus.
    • Word frequencies : Words in a text corpus are sorted according to frequency and ranked ( frequency class ).
  • Language change law : the growth of the vocabulary of a language, the spread of borrowings or foreign words , changes in the inflection system and many other language change processes are subject to a law known in linguistics as Piotrowski's law, which corresponds to growth laws (or models) in other sciences. In this case it is the type of the logistic model or logistic law (see logistic equation ). This type of Language Change Act is also suitable for processes in language acquisition , so that it can also be understood as a Language Acquisition Act.
  • Text block law : If you form text blocks of equal length in a text, you can show that the frequency with which linguistic units - for example different letters or words - appear in these text blocks are distributed according to law.
  • Zipf's law , better: Zipf's law: Zipf's law is mainly addressed as the fact that the product of rank and frequency, for example the words in a frequency dictionary (frequency dictionary), is approximately constant. One speaks better of Zipf's laws because this is not the only linguistic law that Zipf has proposed.

With a slightly different perspective, one can also ask what regularities are to be expected for a certain type of linguistic unit. Altmann developed this using the example of compound words. In this case one comes up with a number of legal hypotheses, some of which are still awaiting verification. One of the results is that shorter words are more likely to be involved in the formation of derivatives or compounds than longer ones. The polysemy of words also affects the degree to which they are involved in the formation of new words.

Linguistic synergetics

So far, it has been about language laws that affect the distribution and change of linguistic entities. However, those language laws that affect the interaction of different entities and can be captured in control loops are also significant. Two examples of interactions in which word length is involved may serve as hints: The frequency of the words has a negative impact on word length: the more common words are, the shorter they are. And: the longer words are, the fewer different meanings they have. A total of ten such interactions can be found in Best. Such interactions can also be demonstrated on other language levels.

Style research

The study of literary as well as non-literary styles can make use of language statistics; But she can also devote her research to the special characteristics of the laws of language in certain styles . In such cases, quantitative linguistics supports stylistics in its efforts to gain knowledge that is as objective as possible and to explain stylistic phenomena, at least in part, by reference to the laws of language. It is one of the basic assumptions of quantitative linguistics that, for example, word length distributions in different types of text may result in different distribution models, but at least different parameter values. If these efforts are primarily aimed at literary texts, quantitative stylistics ( stylometry ) is required as a sub-discipline of quantitative literary studies .

Research problems

Some books are devoted to the description of open research problems. In the volumes published so far, hundreds of possible research projects and the appropriate procedures are described, as well as relevant literature.

  • Udo Strauss, Fengxiang Fan, & Gabriel Altmann: Problems in Quantitative Linguistics 1. RAM-Verlag, Lüdenscheid 2008, ISBN 978-3-9802659-4-2 .
  • Reinhard Köhler, Gabriel Altmann: Problems in Quantitative Linguistics 2. RAM-Verlag, Lüdenscheid 2009, ISBN 978-3-9802659-7-3 .
  • Radek Čech, Gabriel Altmann: Problems in Quantitative Linguistics 3. Dedicated to Reinhard Köhler on the occasion of his 60th birthday . RAM-Verlag, Lüdenscheid 2011, ISBN 978-3-942303-08-8 .
  • Reinhard Köhler, Gabriel Altmann: Problems in Quantitative Linguistics 4. RAM-Verlag, Lüdenscheid 2014, ISBN 978-3-942303-22-4 .
  • Gabriel Altmann: Problems in Quantitative Linguistics 5. RAM-Verlag, Lüdenscheid 2015, ISBN 978-3-942303-33-0 .
  • Emmerich Kehlih, Gabriel Altmann: Problems in Quantitative Linguistics 6. RAM-Verlag, Lüdenscheid 2018, ISBN 978-3-942303-57-6 .

Well-known authors

Bibliographies

  • G. Billmeier, D. Krallmann: Bibliography for statistical linguistics. Buske, Hamburg 1969. (Research report 69/3 of the Institute for Communication Research and Phonetics, University of Bonn)
  • Pierre Guiraud: Bibliography critique de la statistique linguistique. Éditions Spectrum, Utrecht / Anvers 1954.
  • Reinhard Köhler with the assistance of Christiane Hoffmann: Bibliography of Quantitative Linguistics. Benjamin, Amsterdam / Philadelphia 1995, ISBN 90-272-3751-4 .

literature

(More, especially more specific, literature in the articles on the individual laws and on linguistic synergetics.)

  • Gabriel Altmann: Language Theory and Mathematical Models . In: Christian-Albrechts-Universität Kiel, SAIS [= Seminar for General and Indo-European Linguistics] Working reports. H. 8, 1985, pp. 1-13.
  • Gabriel Altmann, Dariusch Bagheri, Hans Goebl , Reinhard Köhler, Claudia Prün: Introduction to quantitative lexicology. Peust & Gutschmidt, Göttingen 2002, ISBN 3-933043-09-3 .
  • Vivien Altmann, Gabriel Altmann: Instructions for quantitative text analysis. Methods and Applications. RAM-Verlag, Lüdenscheid 2008. ISBN 978-3-9802659-5-9
  • Karl-Heinz Best: Quantitative Linguistics: A Plea . In: Gabriel Altmann, Viktor Levickij, & Valentina Perebyinis (eds.): Problemy kvantytatyvnoi linhvistyky / Problems of Quantitative Linguistics: zbirnyk naukovych prac (pp. 76-88). Ruta, Cernivci 2005. ISBN 966-568-783-2 .
  • Karl-Heinz Best: Quantitative Linguistics. An approximation . 3rd, heavily revised and expanded edition. Peust & Gutschmidt, Göttingen 2006, ISBN 3-933043-17-4 .
  • Karl-Heinz Best, Otto Rottmann: Quantitative Linguistics, an Invitation. RAM-Verlag, Lüdenscheid 2017. ISBN 978-3-942303-51-4 .
  • Gustav Herdan: Quantitative Linguistics. Butterworth, London 1964.
  • Emmerich Kelih: History of the application of quantitative methods in Russian linguistics and literary studies. Kovač, Hamburg 2008. ISBN 978-3-8300-3575-6 . (At the same time dissertation Graz, 2007. Detailed presentation of the contribution of Russian linguistics and literary studies from the middle of the 19th century, which is particularly important for the development of quantitative / statistical linguistics and literary studies.)
  • Sebastian Kempgen: Russian language statistics. Systematic overview and bibliography. Verlag Otto Sagner, Munich 1995. ISBN 3-87690-617-2 .
  • Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (eds.): Quantitative Linguistics - Quantitative Linguistics. An international manual . de Gruyter, Berlin / New York 2005, ISBN 3-11-015578-8 .
  • Reinhard Köhler, Gabriel Altmann: Aims and Methods of Quantitative Linguistics . In: Gabriel Altmann, Viktor Levickij, & Valentina Perebyinis (eds.): Problemy kvantytatyvnoi linhvistyky / Problems of Quantitative Linguistics: zbirnyk naukovych prac (pp. 12-41). Ruta, Cernivci 2005. ISBN 966-568-783-2 .
  • Haitao Liu & Wei Huang. Quantitative Linguistics : State of the Art, Theories and Methods . Journal of Zhejiang University (Humanities and Social Science) . 2012,43 (2) : 178–192. in Chinese.
  • Stephen Ullmann: Panchronic statistical laws. In: ders .: Basic features of semantics. The meaning from a linguistic point of view. de Gruyter, Berlin 1967, pp. 267-272.

See also

Web links

Wiktionary: Quantitative Linguistics  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. Burghard Rieger: Why set-oriented text science? On the justification of statistics as a method in: Gunzenhäuser, R. (Ed.): Mathematically Oriented Text Studies (Journal of Literary Studies and Linguistics 8), Athenaeum, Frankfurt / M. 1972, pp. 11-28
  2. Reinhard Köhler: Subject and mode of operation of quantitative linguistics . In: R. Köhler et al. (Ed.): Quantitative Linguistics - Quantitative Linguistics. An international manual . Berlin / New York 2005, pp. 1–16.
  3. On the history of quantitative linguistics in Germany and Austria, Russia / Soviet Union, Japan, China, Poland, the Czech Republic, Greece and on GK Zipf and Wilhelm Fucks in: Reinhard Köhler, Gabriel Altmann, Raijmund G. Piotrowski (eds.): Quantitative Linguistics - Quantitative Linguistics. de Gruyter, Berlin / New York 2005, pages 16–180; on France: Jacqueline Léon, Sylvain Loiseau (eds.): History of Quantitative Linguistics in France. RAM-Verlag, Lüdenscheid 2016. ISBN 978-3-942303-48-4 ; Other important authors: Karl-Heinz Best (Ed.): Studies on the history of quantitative linguistics. Volume 1. RAM-Verlag, Lüdenscheid 2015. ISBN 978-3-942303-30-9 .
  4. A sketch can be found in: Karl-Heinz Best: Quantitative Linguistik. An approximation . 3rd, heavily revised and expanded edition. Peust & Gutschmidt, Göttingen 2006, ISBN 3-933043-17-4 , 7–9.
  5. ^ NL Biggs: The Roots of Combinatorics. In: Historia Mathematica 6, 1979, pp. 109-136.
  6. ^ Adam Pawłowski: Prolegomena to the History of Corpus and Quantitative Linguistics. Greek Antiquity. In: Glottotheory 1, 2008, pp. 48-54.
  7. Subject and working method of quantitative linguistics . In: Quantitative Linguistics - Quantitative Linguistics. An international manual. P. 1f.
  8. Ioan Popescu-Iovitz, Karl-Heinz Best, Gabriel Altmann: Unified Modeling Language of Length in. RAM-Verlag, Lüdenscheid 2014, ISBN 978-3-942303-26-2 .
  9. s. also Luděk Hřebíček Hreb length - Laws in Quantitative Linguistics. (No longer available online.) In: lql.uni-trier.de. June 2, 1990, archived from the original on May 19, 2011 ; Retrieved April 3, 2015 .
  10. ^ Speech act length - Laws in Quantitative Linguistics. In: lql.uni-trier.de. Retrieved April 3, 2015 .
  11. ^ Vowel duration - Laws in Quantitative Linguistics. In: lql.uni-trier.de. Retrieved April 3, 2015 .
  12. ^ Phoneme frequency - Laws in Quantitative Linguistics. (No longer available online.) In: lql.uni-trier.de. Archived from the original on April 7, 2015 ; Retrieved April 3, 2015 .
  13. ^ Word associations - Laws in Quantitative Linguistics. In: lql.uni-trier.de. Retrieved April 3, 2015 .
  14. Compounds: further hypotheses - Laws in Quantitative Linguistics. In: lql.uni-trier.de. Retrieved April 3, 2015 .
  15. ^ Morphological productivity - Laws in Quantitative Linguistics. In: lql.uni-trier.de. Retrieved April 3, 2015 .
  16. Compounds and Polysemy Laws in Quantitative Linguistics. In: lql.uni-trier.de. Retrieved April 3, 2015 .
  17. ^ Karl-Heinz Best: Quantitative Linguistics. An approximation . 3rd, heavily revised and expanded edition. Peust & Gutschmidt, Göttingen 2006, ISBN 3-933043-17-4 , models pages 129f.
  18. ^ Karl-Heinz Best: Hans Arens (1911–2003) . In: Glottometrics 13, 2006, pp. 75–79 (PDF full text ). Hans Arens. In: glottopedia.de. Retrieved April 3, 2015 .
  19. Seminar for Linguistics. In: ualberta.ca. Retrieved April 3, 2015 .
  20. ^ Adolf Busemann. In: glottopedia.org. Retrieved April 3, 2015 .
  21. ^ York University - Dr. Sheila Embleton. In: yorku.ca. Retrieved April 3, 2015 .
  22. ^ Karl-Heinz Best: William Palin Elderton (1877–1962). In: Glottometrics 19, 2009, pages 99-101 (PDF full text ).
  23. Yaqin Wang, Haitao Liu: In Remembrance of Fengxiang Fan, 1950 - 2018. A Pioneer of Quantitative Linguistics in China. In: Glottometrics 43, 2018, pages 91–96 (PDF full text ).
  24. Yaqin Wang (ed.): Quantitative Studies on English Textual Vocabulary. Dedicated to the Memory of Fengxiang Fan. In: Glottometrics 47, 2019, Preface (PDF full text ).
  25. Homepage Gertraud Fenk. In: wwwu.uni-klu.ac.at. February 29, 2004, accessed April 3, 2015 .
  26. ^ Karl-Heinz Best: Ernst Wilhelm Förstemann (1822–1906) . In: Glottometrics 12, 2006, pp. 77–86 (PDF full text )
  27. Dieter Aichele: The work of W. Fucks . In: Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (eds.): Quantitative Linguistics - Quantitative Linguistics. An international manual . de Gruyter, Berlin / New York 2005, pp. 152–158. ISBN 3-11-015578-8 .
  28. Georg von der Gabelentz. In: glottopedia.de. June 27, 2008, accessed April 3, 2015 .
  29. Peter Grzybek. In: peter-grzybek.eu. Retrieved April 3, 2015 .
  30. Gabriel Bergounioux: How Statistics Entered Linguistics: Pierre Guiraud at Work. The Scientific Career of an Outsider. In: Glottometrics 33, 2016, 45–55 (PDF full text ).
  31. ^ Institute for Slavic Studies »Kelih Emmerich. In: slawistik.univie.ac.at. December 12, 2014, accessed April 3, 2015 .
  32. http://homepage.univie.ac.at/emmerich.kelih/
  33. ^ Karl-Heinz Best: Karl Knauer (1906–1966) . In: Glottometrics 12, 2006, pp. 86–94 (PDF full text )
  34. ^ University of Trier: Computational Linguistics and Digital Humanities - Prof. Dr. Reinhard Koehler. (No longer available online.) In: uni-trier.de. March 26, 2015, archived from the original on April 7, 2015 ; Retrieved April 3, 2015 .
  35. Frank Witzel, Andreas Riechel, Internet editors: Georg-August-Universität Göttingen - Lehfeldt, Werner, Prof. em. Dr. In: uni-goettingen.de. Retrieved April 3, 2015 .
  36. Gottfried Wilhelm Leibniz. In: glottopedia.de. Retrieved April 3, 2015 .
  37. Festschrift for the 70th birthday: Problems of General, Germanic and Slavic Linguistics. Papers for 70-th Anniversary of Professor V. Levickij. Edited by Gabriel Altmann, Iryna Zadoroshna, Yuliya Matskulyak. Books, Chernivtsi 2008. (There is no ISBN.) Levickij Glottometrics , Issue 16, 2008, (PDF full text ) is dedicated to the same occasion . Another appreciation: Emmerich Kelih: The Czernowitz contribution to quantitative linguistics: For the 70th birthday of Prof. Dr. Habil. Viktor V. Levickij. In: Naukovyj Visnyk Černivec'koho Universytetu: Hermans'ka filolohija. Vypusk 407 , 2008, pp. 3-10.
  38. ^ Human Language Computer. In: mypage.zju.edu.cn. Retrieved April 3, 2015 .
  39. Karl Marbe from WikiLingua, the free knowledge database ( Memento of 2 September 2010 at the Internet Archive )
  40. ^ Karl-Heinz Best: Paul Menzerath (1883-1954) . In: Glottometrics 14, 2007, pp. 86–98 (PDF full text ).
  41. George K. Mikros. In: users.uoa.gr. Retrieved April 3, 2015 .
  42. Also in the form: Shizuo Mizutani; Portrait for the 80th birthday in: Glottometrics 12, 2006 (PDF full text ); on Mizutani: Naoko Maruyama: Sizuo Mizutani (1926). The Founder of Japanese Quantitative Linguistics. In: Glottometrics 10, 2005, pp. 99-107 (PDF full text ).
  43. ^ Charles Muller: Introduction to Language Statistics . Hueber, Munich 1972 (French 1968)
  44. Ju. K. Orlov, MG Boroda, I. Š. Nadarejšvili: language, text, art. Quantitative analysis. Brockmeyer, Bochum 1982; ISBN 3-88339-243-X .
  45. Also in the spellings: Rajmund G. Piotrowski, RG Piotrovskij. Acknowledgment: IQLA - International Quantitative Linguistics Association. (No longer available online.) In: iqla.org. September 17, 2009, archived from the original on February 20, 2015 ; accessed on April 3, 2015 .
  46. ^ Anatoly A. Polikarpov: A model of the word life cycle. In: Reinhard Köhler, Burghard B. Rieger (eds.): Contributions to Quantitative Linguistics. Kluwer, Dordrecht / Boston / London 1993, 53–63. ISBN 0-7923-2197-9 .
  47. Ioan-Iovitz Popescu. In: iipopescu.com. Retrieved April 3, 2015 .
  48. University of Trier: Computational Linguistics and Digital Humanities - Prof.em. Dr. Burghard Rieger. In: uni-trier.de. March 24, 2015, accessed April 3, 2015 .
  49. ^ Otto Rottmann: On word length in German and Polish . In: Glottometrics 42, 2018, 13–20 (PDF full text ).
  50. Haruko Sanada: Investigation in Japanese Historical Lexicology (Revised Edition). Peust & Gutschmidt Verlag, Göttingen 2008, ISBN 978-3-933043-12-2 .
  51. August Schleicher. In: glottopedia.de. Retrieved April 3, 2015 .
  52. LA Sherman: Some observations upon the sentence-length in English prose. In: University of Nebraska Studies 1, 1888, 119-130.
  53. ^ Gabriel Altmann: Sherman's laws of sentence length distribution . In: Pauli Saukkonen (ed.): What is language synergetics? University of Oulu Printing Center, Oulu 1992, 38-39.
  54. Portrait, appreciation and bibliography of Tuldava's works in: Journal of Quantitative Linguistics 4, No. 1, 1997 (= Festschrift in Honor of Juh. Tuldava)
  55. ^ Ludmila Uhlířová: Bohumil Trnka: The first bibliography . In: Glottometrics 6, 2003, 105-106 (PDF full text ).
  56. ^ Andrew Wilson: Lengths and L-motifs of Rhythmical Units in Formal British Speech . In: Glottometrics 48, 2020, 37-51 (PDF full text ).