Word length

from Wikipedia, the free encyclopedia

Word length is determined by how many smaller units a word consists of. It is possible to define the word length according to the number of letters , sounds , phonemes , morphs , syllables or mores . Another way is to measure the time it takes a speaker to pronounce a word; is then obtained as a word length of the word duration . However, before one can edit word length or word duration , it must be determined what exactly a word should be, a problem that is by no means trivial.

Shortest words - longest words

One question that has met with widespread interest is the longest or shortest word, either in a particular language or in general. In most cases, words are listed with the number of letters they consist of. The differently directed question about the shortest word is very easy to answer, as a word cannot be shorter than a letter , sound or phoneme . Examples are the Latin imperative “ī” (= go ), the Polish preposition “w”, the Spanish words “y” (= and ) and “o” (= or ) or the German interjection “o”, as in O du dear Augustine ; so there is not just one such shortest word.

The situation is different with the question of the longest word, to which one can contribute some observations and considerations, but which ultimately cannot be answered. A few references to German may demonstrate this:

Jean Paul (1820) already dealt with the phenomenon of long words and invented a particularly long one himself: word tapeworm stick abortion textbook stamp cost replacement calculation , a word with 67 letters.

On the occasion of a corresponding price offer by the Society for German Language , the “ Real Estate Traffic Permit Responsibility Transfer Ordinance ” was quoted, a word consisting of 67 letters. This is an official legal formulation (GrundVZÜV, repealed from 2003, 2007). The beef labeling monitoring task transfer law in Mecklenburg-Western Pomerania (RkReÜAÜG, repealed from 1999, 2013), which consists of 63 letters, is also officially documented. The Duden distinguishes between such word structures, which occur more frequently in text corpora , mostly from legal and administrative language, “individual, creative instantaneous formations” that usually only occur once, such as “ tax relief advisory preliminary talks coalition round agreements ” (68 letters) or “ actor support flight booking statistic management guest performance organization specialist ” (85 Letters) The corpus linguist Rainer Perkuhn from the Institute for German Language calls a record-breaking: " Psychological self-experience family constellation body tantra personality development seminars ". In the Guinness Book of Records " Danube Steamship Electricity Main Plant Construction Subordinate Company " is named. The words do not form the upper limit for word length in German, as you can always add another one without violating the German word formation rules; for example, “foundation” can be added to “Donaudampfschifffahrtselizitätshauptbetriebswerkbauunterbeamtengesellschaft”. The resulting word may not be used, but it is possible. This shows that one has to distinguish between the longest words found and the longest possible words.

The list of very long words can be surpassed if you add the technical terminology of chemistry / medicine. The “ Red List ” includes the designation “(6 R , 7 R ) -7 - [( Z ) -2- (2-Amino-1,3-thiazol-4-yl) -2- (methoxyimino) acetamido] -3- (6-hydroxy-2-methyl-5-oxo-2,5-dihydro-1,2,4-triazin-3-ylsulfanylmethyl) -8-oxo-5-thia-1-azabicyclo [4.2 .0] oct-2-en-2-carboxylic acid "for an antibiotic (abbreviation" Ceftriaxone "). This word is not an isolated case, because due to the almost infinite number of compounds in organic chemistry, the IUPAC has introduced a systematic nomenclature that allows any size (newly synthesized) molecules to be named exactly and internationally identified because - especially pharmaceutical - Substances are often led under different synonyms.

In order to correctly assess the mentioned (and other possible) examples, one must remember that German is a language in which long compound words can easily be formed; But there are certainly languages ​​that are at least as good as German when it comes to the possibilities of forming long words, e.g. B. Polysynthetic Languages .

Word lengths in an alphabetical dictionary of German

The question of how many syllables (or: morphs ) words can consist of can also be of interest . Evaluations of dictionaries give an impression of this. The following data were obtained from Menzerath's analysis of the Viëtor pronunciation dictionary ; the dictionary contains 20453 headwords. The following overview results:

Number of syllables
per word
Frequency
dictionary
percentage
in the dictionary
1 2245 11.00
2 6396 31.27
3 6979 34.12
4th 3640 17.80
5 0920 04.50
6th 0214 01.05
7th 0042 00.21
8th 0011 00.05
9 0006th 00.03

From these values ​​it can be calculated that words in this dictionary have an average of 2.78 syllables. If one takes the values ​​of a frequency dictionary for comparison, the result is a shorter length, since shorter words are generally used more often than longer ones. This is shown in the following list.

Word lengths in a frequency dictionary (frequency dictionary) of German

At the end of the 19th century, under the direction of Friedrich Wilhelm Kaeding , a German-language text corpus of 10,906,235 running words was counted; The following overview was sorted by word length:

Number of syllables
per word
Frequency in the
text corpus
percentage
in the text corpus
1 5,426,326 49.75
2 3,156,448 28.94
3 1,410,494 12.93
4th 646,971 5.93
5 187.738 1.72
6th 54,436 0.50
7th 16,993 0.16
8th 5.038 0.05
9 1.225 0.01
10 461 0.00
11 59 0.00
12 35 0.00
13 8th 0.00
14th 2 0.00
15th 1 0.00

The table is based on data taken from Zipf (1935, reprint 1968, page 23). From them the average word length (measured by the number of syllables per word) can be calculated as 1.83.

The longest words in Viëtor have nine, the longest in Kaeding to 15 syllables. Longer words can only be found in the already mentioned technical language of chemistry / medicine.

Comparison of word lengths in different languages

Fucks gives the mean word length (syllables per word) of literary authors for 11 languages:

language mean word length of
literary works
English 1.4
French 1.6
German 1.7
Esperanto 1.9
Italian 2.0
Greek 2.1
Japanese 2.1
Hungarian 2.2
Russian 2.2
Latin 2.4
Turkish 2.5

In Kaeding's dictionary of frequencies, the average word length for German was 1.83 syllables per word, while Fucks gave 1.7. The difference is due to the fact that in this case Fucks only cites data on literary texts.

Average word length in different text groups

If you want to characterize a language or styles / texts with regard to their word lengths, the question arises how the word lengths should be determined. Do you examine word length using the keywords in the lexicon or using the words in the text? Which unit do you choose to determine their number per word? Does it matter which lexicon or which type of text you evaluate? To anticipate: You get different average values, depending on how you decide on the questions mentioned.

As an example, some average values ​​for word lengths in German are given, determined by the number of syllables in the word; the data come from Best (2006). The average number of syllables per word in German texts was calculated as follows:

Class of text Word length limit
lower upper
Press releases 1.81 2.29
specialist texts 2.04 2.32
spoken language 1.52 1.66
Reading and textbook texts 1.32 1.88
SMS texts - 1.51
20th century letters - 1.68
Epic and prose 20th century - 1.70
Poems by Erich Fried - 1.60

Explanation: The observed lower and upper average values ​​can be given for the first four text classes. These text classes are interesting because they show how much the values ​​can fluctuate within a text class. In the other cases, only one value can be given at the moment; then the maximum value is identical to the one average value. Further details are given in the cited work, on the one hand on groups of texts within a text class, on the other hand on the development of word length over the centuries.

Of course, the specified values ​​depend on the selection of the evaluated texts. The table gives an idea of ​​how much these averages can fluctuate. A similar picture would result if word length were determined differently than by the number of syllables per word.

Word length distribution and word length in interaction with other linguistic variables

The Quantitative Linguistics has dealt in various ways with the laws of word lengths.

  • The law of the distribution of word lengths has been researched best , which states that the frequency with which words of different lengths in texts or in dictionaries of different types follow very specific, theoretically justifiable distributions: The length of words is a function of their frequency; In many languages, including German, this statement can be simplified to: The more common words are, the shorter they are. In some languages ​​such as Finnish or Latin, this statement does not apply from the monosyllabic words onwards, but only from the two- or three-syllable words. For further explanations on word length distributions, a reference to the corresponding special article Law of the distribution of word lengths should suffice.
  • In texts or in dictionaries, word lengths are interrelated in a number of ways with other language variables. In Koehler's control loop, some of these interrelationships are represented in the form of a simple model; they can be integrated into a more complex model.
    • There is an important regularity between the length of the words and the length of the word parts: the longer a word is, that is, the more smaller units (direct constituents ) it consists of, the smaller these constituents themselves are. This is a law of language which is known under the name Menzerath's Law (also: Menzerath-Altmann Law ). A study of German was based on the hypothesis "The longer the word, the shorter its morph" and, based on the evaluation of an entire dictionary, it was possible to show that this hypothesis is working.
    • If one examines the word length with regard to the number of syllables, one can formulate a corresponding hypothesis: “The longer the word, the shorter its syllables”. An investigation into word lengths in German and Italian supports this hypothesis. This also applies when the word length is related to the duration of the syllables in different speaking styles.
    • There is also a correlation between word length and the frequency with which the words are involved in the formation of compounds: shorter words are much more productive than longer in this regard.
    • Word length also affects the duration of the sounds that make up the words: the longer the words are, the shorter their sounds are spoken. This law of language goes back to the 19th century and is one of the oldest known laws. It is again a specification of Menzerath's law . Checks of this law using the example of the duration of vowels in Hungarian support the aforementioned legal hypothesis.
    • The interplay of word length and polysemy is also legal: the longer a word, the lower its polysemy, at least in Chinese.
    • There is also a lawful relationship between the age of words and their length: the older words are, the shorter they are on average. Miyayima finds: "Clearly older words are shorter and used more frequently." Sanada-Yogo comes to the same conclusion.
    • Another relationship exists between the age of words and their polysemy : the older words are, the more different meanings they have on average.

Word length in the language typology

In the attempt to numerically characterize language types using statistics, measures were developed within the framework of the language typology by Greenberg, Altmann & Lehfeldt and many other measures that allow languages to be compared morphologically with one another. Among the morphological properties that were measured for this purpose, there is also a measure of word complexity as one of 10 indices, which establishes a relationship between the number of words in a text and the number of morphemes , the so-called “synthesis index” S = number of morphemes / number of words or vice versa S = number of words / number of morphemes. The synthesis index is a measure of the average word length of the examined languages. Altmann & Lehfeldt also show how the 10 indices can be used to classify the languages ​​and how these indices interact with each other. Wilhelm Fucks demonstrates the connection between entropy and word length using the example of 11 languages .

Development of word length

Word lengths are indicators of the development of language, both the development of the individual's ability to speak and the development of the language.

  • Development of language acquisition by individuals: It can be shown that school-age children make systematic progress in linguistic terms, which is also reflected in the progressive increase in word length in their utterances. This development follows the same law as that of the language itself, the Piotrowski law , which can therefore also be understood as the language acquisition law .
  • Development in language: The average word length also changes with the change in a language. For German it can be shown that word lengths initially decrease up to around the time of early New High German and then increase again. This process also follows the Piotrowski law. The following table shows the development of word length, measured by the number of syllables per word, in German epic and prose from the 8th to the 20th century as a reversible process:
t year-
hundred
Syllables per word
observed calculated
2.5 8-11 1.72 1.72
4.5 12. 1.66 1.63
5.5 13. 1.49 1.53
6.5 14th 1.49 1.46
7.5 15th 1.45 1.45
8.5 16. 1.51 1.53
9.5 17th 1.66 1.63
10.5 18th 1.69 1.70
11.5 19th 1.71 1.72
12.5 20th 1.70 1.73

(Explanation: t is the period of time numbered consecutively for the calculation according to centuries; the first period of time for the 8th to 11th centuries was taken to be the middle of the 10th century with t = 2.5. Piotrowski's law is adapted to the observed data in the form for the reversible language change, one obtains the calculated values ​​given. Adaptation of the model results in a coefficient of determination of C = 0.94, where C is considered good if it is greater than / equal to 0.80 For explanations, please refer to the literature given.)

The development from the 8th to the 11th century deserved a separate study, for which additional data would be required. The general trend of a decrease and, from the 16th century onwards, an increase in word length was also established for German poems from around 1000 to 1970 (with generally shorter word lengths). In German letters there was an increase in word length between the 16th and 18th centuries, which then decreased again. Even if more data would be desirable everywhere, there are indications that on the one hand there is a general trend for the German language, which can have different effects in different text classes.

Readability

When determining the difficulty of a text for the reader, legibility plays an important role. This means the linguistic (grammatical and lexical) properties of a text. Legibility is part of what makes text understandable. For a long time now, scientific efforts have been focused on the question of whether the legibility of a text can be measured. A wide range of readability indices have been developed in which the word length is very often integrated as an essential aspect. In Best (2006) a reason was developed for why such simple criteria as word and sentence length can be valid properties of texts to say something about their legibility.

Consequences for language practice were drawn from the findings of the legibility research. Thus, Wolf Schneider given for avoiding unnecessarily long words Notes. A numerical characterization of texts as “very easy”, “easy”, “simple”, “normal”, “demanding”, “difficult” or “very difficult” was developed by Mihm; An overview for German texts with a comparison to English can be found at Groeben.

Stylistic aspects

There are a number of stylistic aspects of word length, both linguistically and literarily and psychologically.

One aspect concerns the design of proper names. One finds in Jean Paul the reference that he found "insignificant people monosyllabic: Wutz, Stuss baptized", which distinguishes them from "bad or apparently unimportant". Sigmund Freud explains: “It is well known that with monosyllabic family names there is a particular tendency to include the first name.” Wilfried Seibicke shows a clear tendency to give girls a longer (one-part) first name than boys. The tendency towards the use of more than just one first name points in the same direction, which is around 10% higher for girls.

Wilhelm Fucks , who advocates quantitative literary studies , regards word and sentence lengths as style characteristics, that is, as numerically recorded style characteristics that can be used to distinguish the style of groups of authors.

See also

literature

  • Karl-Heinz Best : Quantitative Linguistics. An approximation . 3rd, heavily revised and supplemented edition. Peust & Gutschmidt, Göttingen 2006, ISBN 3-933043-17-4 . Pages 129–132 of the book contain a brief overview of the relationships between word lengths and other linguistic quantities.

Web links

Wiktionary: word length  - explanations of meanings, word origins, synonyms, translations
Wiktionary: word length distribution  - explanations of meanings, word origins, synonyms, translations
Wiktionary: tapeworm  word - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. ^ Rainer Perkuhn: The longest German word? A fictional conversation with a real background. In: Sprachreport , Volume 26, Issue 2, Mannheim 2010, pp. 2–6. http://pub.ids-mannheim.de/laufend/sprachreport/pdf/sr10-2a.pdf
  2. ^ Jean Paul: About the German double words; a grammatical examination in twelve old letters and twelve new postscripts. In: Jean Paul: Complete Works. Division II, Volume 3 , edited by Norbert Miller. Zweiausendeins, Frankfurt 1996 (reprint of the edition by Hanser-Verlag 1963), pages 9-108, example page 67.
  3. resolutions older Price tasks - word monsters - (price task in Issue 1/2008). Society for the German Language (Gfds), archived from the original on March 30, 2015 ; Retrieved April 28, 2016 .
  4. http://dipbt.bundestag.de/extrakt/ba/WP16/93/9377.html
  5. Steffen Trumpf, dpa : Decision in the Schwerin state parliament: Germany's longest word has had its day. Spiegel Online , June 3, 2013, accessed June 3, 2013 .
  6. The longest words in the Duden corpus Duden, accessed April 28, 2016
  7. What is missing May 29, 1997, the daily newspaper - the archive - taz.de . Accessed April 28, 2016
  8. In Shakespeare's "Richard III." Rudolf K. Rath breaks his word on October 23, 2002, Neue Zürcher Zeitung , accessed April 28, 2016
  9. ^ Rainer Perkuhn: The longest German word? A fictional conversation with a real background. In: Sprachreport , Volume 26, Issue 2, Mannheim 2010, pp. 2–6, example p. 6. http://pub.ids-mannheim.de/laufend/sprachreport/pdf/sr10-2a.pdf
  10. Further examples in: Karl-Heinz Best: Our vocabulary. Language statistical studies. In: Karin. M. Eichhoff-Cyrus, Rudolf Hoberg (Hrsg.): The German language at the turn of the millennium . Mannheim / Leipzig / Vienna / Zurich: Dudenverlag, 2000, pp. 35–52, example p. 42. ISBN 3-411-70601-5 .
  11. [online.rote-liste.de Rote Liste online], accessed on March 27, 2015.
  12. ^ Wilhelm Viëtor: German pronunciation dictionary . 3rd revised edition, obtained from Ernst A. Meyer: OR Reisland, Leipzig 1921.
  13. ^ Paul Menzerath: The architecture of the German vocabulary . Dümmler, Bonn 1954.
  14. Best 2006, page 42.
  15. ^ George Kingsley Zipf : The Psycho-Biology of Language. An Introduction to Dynamic Philology. The MIT Press, Cambridge, Massachusetts 1968, page 23. First printed in 1935. Zipf also mentions that Kaeding corrected the sum of the words to 10,910,777 without giving the distribution over the different word lengths. The above calculation has been slightly corrected and supplemented. The same data as with Zipf can be found in: David Crystal: The Cambridge Encyclopedia of Language. Campus, Frankfurt / New York 1993, page 87. ISBN 3-593-34824-1 .
  16. ^ Wilhelm Fucks: According to all the rules of art. Diagnoses about literature, music, visual arts - the works, their authors and creators. Deutsche Verlags-Anstalt, Stuttgart 1968, page 80.
  17. ^ Karl-Heinz Best: Word lengths in German . In: Göttinger Contributions to Linguistics 13, 2006, pp. 23–49; only the observed values ​​of the word lengths are given here. All the data compiled in the table are based on texts from the 20th century.
  18. ^ Anikó Vettermann, Karl-Heinz Best: Word lengths in Finnish . In: Suomalais-ugrilaisen seuran aikakauskirja / Journal de la Societé Finno-Ougrienne 87, 1997, pp. 249-262.
  19. ^ Winfred Röttger: The Distribution of Word Length in Ciceronian Letters. In: Journal of Quantitative Linguistics 3, 1996, pp. 68-72; Andrew Wilson: Word Length Distributions in Classical Latin Verse. In: The Prague Bulletin of Mathematical Linguistics 75, 2001, pp. 69-84.
  20. The preface to: Karl-Heinz Best (Ed.): Frequency distributions in texts provide an overview of studies on German and foreign languages . Peust & Gutschmidt Verlag, Göttingen 2001, pp. V - XVII, especially pp. VIII - XI. ISBN 3-933043-08-5 and Karl-Heinz Best: word length . In: Reinhard Köhler, Gabriel Altmann, & Rajmund G. Piotrowski (eds.): Quantitative Linguistics - Quantitative Linguistics. An international manual . de Gruyter, Berlin / NY 2005, pp. 260-273. ISBN 3-11-015578-8 .
  21. Linguistic Synergetics # An elementary concept
  22. Best 2006, p. 129.
  23. Rainer Gerlach: To review Menzerath's law . In: Werner Lehfeldt, Udo Strauss (Eds.): Glottometrika 4. Brockmeyer, Bochum 1982, pp. 95-102. ISBN 3-88339-250-2 .
  24. Laila Asleh, Karl-Heinz Best: To review the Menzerath-Altmann law using the example of German (and Italian) words . In: Göttingen Contributions to Linguistics 10/11, 2004/05, 9-19.
  25. Christopher Michels: The relationship between word length and compounding activity in English , in: Glottometrics 32, 205, pp. 88–98 (PDF full text )
  26. Eduard Sievers: Fundamentals of phonetic physiology as an introduction to the study of phonetics in the Indo-European languages. Breitkopf & Härtel, Leipzig 1876. The decisive quote can be found on p. 122; In a more general form, supported by measurements on Spanish, this law was formulated in: Paul Menzerath, Joseph M. de Oleza: Spanish sound duration. An experimental study. de Gruyter, Berlin / Leipzig 1928, p. 70.
  27. ^ Karl-Heinz Best: Laws of sound duration. In: Glottotheory 1, 2008, pp. 1-9; especially pp. 5-7.
  28. ^ Lu Wang: Word length in Chinese. In: Reinhard Köhler, Gabriel Altmann (eds.): Issues in Quantitative Linguistics 3. Dedicated to Karl-Heinz Best on the occasion of his 70th birthday . Lüdenscheid: RAM-Verlag 2013, pp. 39–53. ISBN 978-3-942303-12-5 .
  29. Tatsuo Miyayima: Relationships in the Length, Age and Frequency of Classical Japanese Words. In: Burghard Rieger (Ed.): Glottometrika 13. Brockmeyer, Bochum 1992, pp. 219–229, quotation: p. 228. ISBN 3-8196-0036-1 .
  30. ^ Haruko Sanada-Yogo: Analysis of Japanese Vocabulary by the Theory of Synergetic Linguistics. In: Journal of Quantitative Linguistics 6, No. 3, pp. 239-251, especially pp. 244, 247f.
  31. ^ Haruko Sanada-Yogo: Analysis of Japanese Vocabulary by the Theory of Synergetic Linguistics. In: Journal of Quantitative Linguistics 6, No. 3, pp. 239-251, especially pp. 244, 247f.
  32. ^ Joseph H. Greenberg: A quantitative approach to the morphological typology of languages . In: International Journal of American Linguistics . Volume 26, 1960, pp. 178-194, “synthetic index” p. 185.
  33. ^ Gabriel Altmann and Werner Lehfeldt: Allgemeine Sprachtypologie . Fink, Munich 1973. ISBN 3-7705-0891-2 . “Synthetism” or “Analytism” p. 39.
  34. ^ Gabriel Altmann and Werner Lehfeldt: Allgemeine Sprachtypologie . Fink, Munich 1973, p. 41.
  35. ^ Gabriel Altmann and Werner Lehfeldt: Allgemeine Sprachtypologie . Fink, Munich 1973, pp. 44f.
  36. ^ Wilhelm Fucks: According to all the rules of art. Deutsche Verlags-Anstalt, Stuttgart 1968, p. 91.
  37. ^ Karl-Heinz Best: Laws of first language acquisition. In: Glottometrics 12, 2006, pages 39-54, especially page 43f. (PDF full text ).
  38. ^ Karl-Heinz Best: Word lengths in German . In: Göttinger Contributions to Linguistics 13, 2006, pages 23–49, table on page 31.
  39. ^ Gabriel Altmann : The Piotrowski law and its generalizations. In: Karl-Heinz Best , Jörg Kohlhase (Ed.): Exact language change research. Theoretical contributions, statistical analyzes and work reports (= Göttinger Schriften zur Sprach- und Literaturwissenschaft. Vol. 2). edition herodot, Göttingen 1983, ISBN 3-88694-024-1 , pages 54-90, on the reversible language change: page 78ff.
  40. ^ Karl-Heinz Best: Word lengths in German . In: Göttinger Contributions to Linguistics 13, 2006, pages 23–49, poems, pages 26f.
  41. ^ Karl-Heinz Best: Word lengths in German . In: Göttingen Contributions to Linguistics 13, 2006, pages 23–49, letters page 33.
  42. Norbert Groeben: Reader Psychology: Text Understanding, Text Understanding . Münster: Aschendorff Verlag, 2002, pp. 175-183. ISBN 3-402-04298-3 .
  43. Karl-Heinz Best: Are word and sentence length useful criteria for the legibility of texts? In: Sigurd Wichter, Albert Busch, (Ed.), Knowledge Transfer - Success Control and Feedback from Practice . Lang, Frankfurt / M. u. a. 2006, pp. 21-31. ISBN 3-631-53671-2 .
  44. ^ Wolf Schneider: German for life. What the school forgot to teach . Reinbek near Hamburg: Rowohlt, 2004, pp. 40–45. ISBN 3-499-19695-6 .
  45. ^ A. Mihm: Statistical language criteria for the suitability of reading books. In: Linguistik und Didaktik 4, 1973, pp. 117–127.
  46. Norbert Groeben: Reader Psychology: Text Understanding, Text Understanding . Münster: Aschendorff Verlag, 2002, p. 179.
  47. ^ Jean Paul: Preschool of Aesthetics. In: Jean Paul: Complete Works. Department I, Volume 5. Zweiausendeins, Frankfurt 1996, p. 270. (= reprint of the Hanser edition; original: 2nd edition 1813.) ISBN 3-86150-152-X .
  48. Sigmund Freud: On the psychopathology of everyday life. Fischer, Frankfurt 1992, p. 37 (first printed in 1904). ISBN 3-596-26079-5 .
  49. Wilfried Seibicke: The personal names in German. de Gruyter, Berlin / New York 1982, p. 105. ISBN 3-11-007984-4 .
  50. ^ Konrad Kunze: dtv-Atlas onenology. First and last names in the German-speaking area. 5th, revised and corrected edition. Deutscher Taschenbuch Verlag, Munich 1998, p. 49. ISBN 3-423-03266-9 .
  51. ^ Wilhelm Fucks: According to all the rules of art. Deutsche Verlags-Anstalt, Stuttgart 1968, p. 33.