Language statistics
Language statistics can be understood in two ways: on the one hand as the statistics of languages - in this sense also known as language statistics - on the other hand as any kind of statistical investigation of any linguistic properties or objects and their changes. In the second sense, it is also known as statistical linguistics or linguostatistics . Quantitative linguistics pursues even broader goals , namely to develop a theory of language to which, for example, Menzerath's law and Zipf's law belong.
Tasks and goals of language statistics
In many cases, language statistical surveys serve practical or scientific purposes. For example, if you know how often letters appear in a language, you can make encrypted texts legible. Stylistic studies ( quantitative stylistics ) are about characterizing the characteristics of the language used by individual authors, text classes, epochs or different areas of communication (such as the style of the press and journalism or everyday language). If one applies statistical methods to literary texts, one pursues quantitative literary studies according to a suggestion by Fucks (1968: 77, 88) . David Crystal (1993: 67) outlines the possibilities of statistics in style studies . Language statistics can also be of great help in determining how difficult a text is. The readability indices are used to measure the readability , i.e. the degree of difficulty, of texts. The vocabulary of languages is also the subject of language statistics in a variety of ways under the term lexicostatistics : Among other things, this involves the creation of frequency dictionaries and the rates of decline to which the vocabulary is subject ( glottochronology ). The survey of the frequency of words is also an essential prerequisite for the creation of basic vocabulary and thus for language didactics . The content analysis makes use of quantitative methods ( quantitative content analysis ) to find out which topics receive how much attention.
Very practical purposes, the served stichometry in ancient times: For example, text lengths were determined to have a basis for payment of the clerk salary.
So there are many uses for language statistics. Often it is only about the satisfaction of curiosity: You just want to know what happens how often or how its frequencies change over time, without further goals being pursued directly. In this context one can refer to the question that is asked again and again, how extensive the vocabulary of German or that of other languages, and exactly how extensive the vocabulary of certain authors is.
About history
Frequency studies of linguistic phenomena go back to the Indian and Greek antiquity, where, among other things, combinatorial considerations were made on the formation of linguistic units, a tradition that has been preserved for many centuries. One of the questions that Leibniz discussed here was how many words can be formed from an alphabet with a certain number of letters. Word statistics followed later, and later sound statistics and much more. In addition to this work focused on the investigation of language / languages, work in the service of neighboring disciplines followed. In the 19th century, sound statistics have been compiled again and again since the 1930s in order to optimize shorthand . A variety of approaches are used in literary studies, for example studies using stylometry to identify anonymous authors. Work on the aesthetic quality of literary works should also be mentioned: the psychologist Karl Groos published “ Die Acoustic Phenomena in Schiller's Lyrik ” (1910), a work in which he presented a study of language statistics.
Language statistics of German
On German language statistics: Various overviews of German language statistics (letter and sound frequencies, grammar, vocabulary) can be found in Meier (1967). The corresponding works by Braun (1998) and Sommerfeldt (1988) contain a lot of data on German and its development tendencies. König (2005) and Duden also provide some data . The German spelling (2017) together. On many topics (frequency of morphs, sentences, syllables and word lengths; changes in frequencies in the individual and in language history; frequency of parts of speech and many other aspects), studies on quantitative linguistics also contain statistical data, especially on German, but also to a number of other languages.
Statistical basics
The statistical basis can be found in the manuals for the many scientific disciplines (psychology, sociology, economics, ...), which also rely on statistics in their research. But there are also works that are specially written for linguists or at least focus on linguistic topics. Altmann (1995), von Essen (1979), Hoffmann & Piotrowski (1979), Nikitopoulos (1973), Schlobinski (1996) and Wimmer & Altmann (1999) provide statistical (and partly epistemological) bases for different demands Linguists.
See also
- Letter frequency
- Child Language Statistics
- List of the most common surnames in Germany
- List of the most common words in the German language
- Name statistics
- Punctuation marks
- Writing statistics
- Text coverage
- Word spectrum
literature
- Pavel M. Alekseev, VM Kalinin, Rajmund G. Piotrowski: Language statistics: with numerous tables and schemes in the text , translated by a collective under the direction of Lothar Hoffmann. Fink, Munich / Berlin 1973 / Akademie-Verlag Berlin 1973.
- Gabriel Altmann : Statistics for Linguists . Wissenschaftlicher Verlag Trier, Trier 1995, ISBN 3-88476-176-5 .
- Karl-Heinz Best : Quantitative Linguistics. An approximation . 3rd, heavily revised and expanded edition. Peust & Gutschmidt, Göttingen 2006, ISBN 3-933043-17-4 .
- Peter Braun: Trends in contemporary German. Language varieties. 4th edition. Kohlhammer, Stuttgart / Berlin / Köln 1998, ISBN 3-17-015415-X , p. 103. (The book contains statistical information on many linguistic features of German)
- David Crystal : The Cambridge Encyclopedia of Language. Translation and editing of the German edition by Stefan Röhrich, Ariane Böckler and Manfred Jansen. Campus Verlag, Frankfurt / New York 1993, ISBN 3-593-34824-1 . Chapter: The statistical structure of language. Pp. 86-87.
- Otto von Essen : General and applied phonetics. 5th, revised and expanded edition. Akademie-Verlag, Berlin 1979.
- Wilhelm Fucks : According to all the rules of the art. Diagnoses about literature, music, visual arts - the works, their authors and creators. Deutsche Verlags-Anstalt, Stuttgart 1968.
- Lothar Hoffmann, Rajmund G. Piotrowski: Contributions to language statistics. VEB Verlag Enzyklopädie, Leipzig 1979.
- Emmerich Kelih: History of the application of quantitative methods in Russian linguistics and literary studies. Kovač, Hamburg 2008, ISBN 978-3-8300-3575-6 . (At the same time dissertation Graz, 2007. Detailed presentation of the contribution of Russian linguistics and literary studies from the middle of the 19th century, which is particularly important for the development of quantitative / statistical linguistics and literary studies.)
- Sebastian Kempgen : Russian language statistics. Systematic overview and bibliography. Sagner, Munich 1995, ISBN 3-87690-617-2 .
- Reinhard Köhler : Bibliography of Quantitative Linguistics. John Benjamin, Amsterdam 1995, ISBN 90-272-3751-4 .
- Reinhard Köhler, Gabriel Altmann, Rajmund G. Piotrowski (eds.): Quantitative Linguistics - Quantitative Linguistics. An international manual. de Gruyter, Berlin / New York 2005, ISBN 3-11-015578-8 .
- Helmut Kreuzer (Ed.), Rul Gunzenhäuser (Ed.): Mathematics and Poetry. Attempts on the question of an exact literary study. , Nymphenburger, Munich 1965, 1967, 1969, 4th, revised edition 1971, ISBN 3-485-03303-0 .
- Helmut Meier : German language statistics . 2nd, enlarged and improved edition. Olms, Hildesheim 1967, 1978, ISBN 3-487-00735-5 . (1st edition 1964)
- Charles Muller : Introduction to Language Statistics . Hueber, Munich 1972.
- Pantelis Nikitopoulos: Language Statistics. In: Hans Peter Althaus, Helmut Henne, Herbert Ernst Wiegand (eds.): Lexicon of German linguistics. 2nd, completely revised and enlarged edition. Niemeyer, Tübingen 1980, ISBN 3-484-10392-2 , pp. 792-797.
- Pantelis Nikitopoulos: Statistics for Linguists. A methodical contribution . Narr, Tübingen 1973.
- Peter Schlobinski : Empirical Linguistics . Westdeutscher Verlag, Opladen 1996, ISBN 3-531-22174-4 .
- Gejza Wimmer, Gabriel Altmann: Thesaurus of univariate discrete probability distributions . Stamm, Essen 1999, ISBN 3-87773-025-6 .
magazine
- Statistical methods in linguistics (SMIL), Språkförlaget Scriptor, Stockholm, 1961–1978.
Web links
- Bibliographies and further information on the Göttingen Quantitative Linguistics project
- Graz project QuanTA (quantitative text analysis)
- Own study program Quantitative Linguistics
- Sound frequencies according to Kaeding 1897 (PDF file; 39 kB)
Individual evidence
- ↑ The term language statistics can be found among others at: Emil Brix: The colloquial languages in Old Austria between agitation and assimilation. The language statistics in the Zisleithan population censuses 1880 to 1910. Böhlau, Vienna 1982, ISBN 3-205-08745-3 . (At the same time dissertation Vienna 1979)
- ^ Klaus Merten: content analysis. Introduction to theory, method and practice. 2nd, improved edition. Westdeutscher Verlag, Opladen 1993, ISBN 3-531-11442-5 .
- ↑ Gero von Wilpert : Specialized Dictionary of Literature (= Kröner's pocket edition . Volume 231). 8th, improved and enlarged edition. Kröner, Stuttgart 2001, ISBN 3-520-23108-5 .
- ^ Karl-Heinz Best : Quantitative Linguistics. An approximation . 3rd, heavily revised and expanded edition. Peust & Gutschmidt, Göttingen 2006, ISBN 3-933043-17-4 , Chapter: Statistical considerations on vocabulary, p. 13ff.
- ^ NL Biggs: The roots of combinatorics. In: Historia Mathematica. 6, 1979, pp. 109-136.
- ^ On Leibniz, Dissertatio de arte combinatoria. (1666) see: Karl-Heinz Best: Gottfried Wilhelm Leibniz (1646–1716). In: Glottometrics. 9, 2005, pp. 79–82 (PDF full text ); Eberhard Knobloch: GW Leibniz's mathematical studies on combinatorics. Explained and commented on almost exclusively handwritten notes. Franz Steiner Verlag, Wiesbaden 1973.
- ^ Karl-Heinz Best: Quantitative Linguistics. An approximation . 3rd, heavily revised and expanded edition. Peust & Gutschmidt, Göttingen 2006, ISBN 3-933043-17-4 , Chapter: Development of Quantitative Linguistics (QL), p. 7ff.
- ↑ Peter Braun: Tendencies in the German contemporary language. Language varieties. 4th edition. Kohlhammer, Stuttgart / Berlin / Cologne 1998, ISBN 3-17-015415-X , passim .
- ↑ Karl-Ernst Sommerfeldt (Ed.): Development tendencies in contemporary German. VEB Bibliographisches Institut, Leipzig 1988, ISBN 3-323-00169-9 , pp. 193-243.
- ↑ Werner König: dtv-Atlas German language. 15th, revised and updated edition. dtv, Munich 2005, ISBN 3-423-03025-9 , pp. 114–119.
- ↑ Duden. The German spelling. 27th, completely revised and expanded edition. Dudenverlag, Berlin 2017, ISBN 978-3-411-04017-9 , Chapter: Language in Numbers , pages 148-159.
- ↑ See http://wwwuser.gwdg.de/~kbest/einfueh.htm