# vocabulary

As vocabulary (also: vocabulary , lexicon or vocabulary ) is defined as the set of all words . This can mean:

• the entirety of all words in a language at a given point in time or
• the entirety of all words in a language that a single speaker knows or uses.

Within the second meaning a distinction must be made again:

• receptive vocabulary (or passive vocabulary ) - the words that the speaker knows or recognizes. The receptive vocabulary helps to understand spoken and written texts (comprehension vocabulary ). The speaker can call up the meaning of a word he has heard or read from memory - or, for example, deduce it using the word formation rules.
• productive vocabulary (or active vocabulary ) - the words that the speaker is actively using . The productive vocabulary enables the speaker to express himself clearly. The speaker can call up the corresponding word for a specific meaning from memory.

## Word inventory of languages

### German vocabulary

The vocabulary of the German standard language comprises approx. 75,000 words, the total size of the German vocabulary is estimated at 300,000 to 500,000 words or lexemes , depending on the source and counting method . The Duden German Universal Dictionary states that the vocabulary of everyday language is estimated at around 500,000 and the central vocabulary at around 70,000 words. The German dictionary by Jacob and Wilhelm Grimm (1852–1960) is estimated to have around 350,000 headwords; Truig (2008) states in the printed foreword to the new 2006 edition that this one-volume dictionary contains over 260,000 headwords. Such information provides information about how large the German vocabulary must at least be estimated.

However, these dictionaries contain only a small proportion of the many specialist vocabularies and are also incomplete in that derivatives and compounds are only partially included and current new formations are naturally missing. A decisive criterion for the inclusion of words is their frequency of use and usage; Words which are composed of simple words and which can be understood by themselves given knowledge of their components are excluded.

It is therefore clear that the vocabulary has to be considerably larger overall; the specification of 500,000 words is hardly exaggerated. If you add the technical terminology , several million words can be expected. According to Winter (1986), the technical language of chemistry alone contains around 20 million terms. Against this background, Lewandowski's remark: "The total number of words in German is estimated at 5 to 10 million words" seems too deep. In a text corpus of German with a volume of 1 billion words from the 20th century, "just under 5 million lexemes (...)" were observed. Since this corpus contains scientific texts, but little subject-specific terminology, it is clear that this corpus-related value underestimates the actual volume of vocabulary; however, it is unclear to what extent. The director of the Max Planck Institute for Psycholinguistics and head of the “ Digital Dictionary of the German Language ”, Wolfgang Klein, estimates the German vocabulary at 5.3 million words.

### Vocabulary in other languages

Wolff (1969: 48) explains: "More recent estimates give a number of 500,000 to 600,000 words for the English vocabulary, the German is just below that, the French around 300,000 words." One must not conclude from this that French is a wordless language. The difference is to a large extent due to the different type of word formation: The German word "Kartoffelbrei" / "Erdäpfelpüree" (a new word) corresponds in French to purée de pommes de terre (a group of words consisting of five words).

In the 20th century dictionaries of the Estonian literary language list around 120,000 words.

### Vocabulary and word forms

The number of words (vocabulary) should not be confused with the number of word forms. By flexion may be in inflecting languages several times more word forms of the basic forms of many words arise in German , for example, significantly more than in the flexion slowly losing english .

The frequency distribution of words and word forms can be described with Zipf's law .

### Vocabulary expansion and loss

The vocabulary of a language is not a static quantity; rather, it is constantly changing. On the one hand, names for objects are lost that are gradually falling out of use. The term slide rule will probably disappear from everyday language over time , as the power of the so-called device is now taken over by pocket calculators and computers . Often objects are renamed at the expense of the old name, as happened with the replacement of “ electronic brain ” by “computer”. On the other hand, new objects have to be named again and again, which is managed with the help of word formation or the adoption of foreign words . These processes of word loss or gain are subject to a law of language, Piotrowski's law .

## Composition of vocabulary

Only a small part of the vocabulary that can be found in any dictionary consists of simple words that cannot be broken down further, such as "Bach", "Hut" or "Sand"; many, on the other hand, are derivatives such as “sandy-ig” or compounds such as “brook course” or “hat brim”. The question that arises here is whether there is an elementary set of units that make up words. In this respect, one has to distinguish between three types of units: elementary units from which words can consist are on the one hand phonetic (sound) units: syllables , on the other hand morphs / morphemes - these are all the components of words that have a grammatical function or a Have meaning - as well as elementary words like the three mentioned, which are at the same time a syllable, a morph / morpheme and a word.

So how many such units can be expected? A first approximation can be given: Karl Bühler points out that around 2,000 "meaning syllables" were found in a dictionary of 30,000 headwords. It is not entirely clear whether the term “meaning syllable” only means carriers of meaning or also carriers of a grammatical function. Apart from that, Bühler also mentions that he found 1,200 “syllables” of Goethe's elective affinities on 30 pages and is reckoning with around 4,000 syllables for the entire novel.

Hints from Menzerath , who analyzed a German pronunciation dictionary with 20,453 headwords and found 2,245 monosyllabic words, which consist of a syllable and a morph / morpheme at the same time, are roughly the same .

Another reference can be found in Klein, who explains that word families in German are based on 8,000–9,000 word stems .

One can conclude from this that the German vocabulary can be traced back to a few thousand elementary units.

## Vocabulary of individuals and texts

### Orders of magnitude

The estimates of the vocabulary knowledge of individuals vary widely. The information on the active, i.e. H. see the traceable vocabulary of Goethe in his works, which is numbered around 91,000 words in the 3rd volume of the Goethe dictionary . Since only a few people have created such a rich work, this information should at least in its dimensions indicate the upper limit. However, the passive vocabulary is not yet taken into account: Goethe will eventually have known some words that cannot be found in his works.

Roughly the following applies: the higher a person's level of education , the greater their vocabulary. A larger vocabulary helps with a more differentiated exchange of information . A simple citizen gets by with a few thousand words in everyday life. A person's vocabulary depends on the area of ​​interest and professional field ( technical terminology ) as well as the socialization of this person.

An educated person, such as a scholar or writer, can use tens of thousands of words (active vocabulary) and understand many more when they come across him (passive vocabulary). Relevant studies have already put the vocabulary of 15-year-olds at around 12,000 words. Keller & Leuninger even ascribe around 80,000 words to a 17-year-old, "whereby variants such as scribe , script , write , write , write , written, etc. are only considered to be" one "word." The estimates for the volume of an adult's vocabulary Native speakers range from 3,000 to 216,000 words. Vocabulary tests are used to evaluate interindividual differences, for example sub-tests from intelligence test procedures to record productive vocabulary or tests such as the Peabody Picture Vocabulary Test to record receptive vocabulary. However, these tests do not show the total number of words that are available to a person, but rather they reveal differences in performance between people.

Some data on the vocabulary of individual texts or text groups in German can be found at Billmeier. Here you can find out that in 1964 you had to master over 4,000 words (in the sense of lexeme = headwords in the dictionary) to be able to read even an excerpt from the newspaper Die Welt in the months of January and February, one of the in this regard more undemanding readings. For Erwin Strittmatter's novel Ole Bienkopp , knowledge of over 18,000 lexemes is necessary.

Anyone who has mastered around 1,000 everyday words in a foreign language , that is, can use them grammatically and semantically correctly, will get along well in the respective country.

### Methodological problems in the measurement

The sometimes very marked differences in the estimates can be explained by the use of different methods. Thus, determining the amount of vocabulary is primarily a methodological problem. In principle, two different methods are available for the estimation: a qualitative and a quantitative one. The qualitative method examines the type of words to be found, while the quantitative method determines the type-token ratio , i.e. it measures how often different words occur.

A methodological problem arises when texts of different lengths are to be examined for their vocabulary richness (whereby “text” can also be interpreted as vocabulary tests that were designed to measure a subject's vocabulary). The Guiraud index is a commonly used measure to measure vocabulary abundance . The index is calculated as follows:

${\ displaystyle G = {\ frac {\ text {number of types}} {\ sqrt {\ text {number of tokens}}}}}$

The aim of the index is to enable statements to be made about the richness of the vocabulary of texts of different lengths. If, for example, the results of previously performed C-tests are used as “text” , Guiraud's index can be used as a measure of the vocabulary range of individuals. How the index works can be understood if one assumes that the number of types is identical to the number of tokens - that is, no word would be repeated in a text. The index of Guiraud then results:

${\ displaystyle G = {\ frac {X} {\ sqrt {X}}} = {\ sqrt {X}}}$

The G value for shorter texts is therefore not only dependent on the richness of the vocabulary, but also on the length of the text.

## Vocabulary acquisition

### Acquisition of mother tongue vocabulary

We humans first learn our mother tongue through imitation ; we imitate the language and pronunciation of our social milieu. This is not only the case in childhood, but also in adulthood. This means that our vocabulary as well as our pronunciation, our dialect, if we have one, adapts to our social milieu.

However, imitation cannot be solely responsible for language acquisition, since children in particular regularly go through a phase of over-generalization when they are around three years old, in which they form verb forms such as “went”, “gangte”, “is went” etc., i.e. analogies form.

The language acquisition processes, as far as it can be seen so far, are legal and obviously follow the Language Acquisition Act , as has been shown several times.

### key vocabulary

The so-called basic vocabulary is of particular importance for planning lessons in the mother tongue as well as for learning foreign languages ; this is the vocabulary that is necessary to understand approx. 85% of the texts in a language. Pfeffer puts this basic vocabulary at around 1285 words.

## literature

• Karl-Heinz Best : Our vocabulary. Language statistical studies . In: Rudolf Hoberg, Karin Eichhoff-Cyrus (ed.): The German language at the turn of the millennium. Language culture or language decline? Dudenverlag, Mannheim / Leipzig / Vienna / Zurich 2000, ISBN 3-411-70601-5 , pp. 35–52.
• Karl-Heinz Best: Quantitative Linguistics. An approximation . 3rd, heavily revised. u. supplemented edition. Peust & Gutschmidt, Göttingen 2006, ISBN 3-933043-17-4 . (Especially the sections How many words does German have? And The vocabulary of the individual , pp. 13-21.)
• Duden . The large dictionary of the German language . 10 volumes. Dudenverlag, Mannheim / Leipzig / Vienna / Zurich 1999, ISBN 3-411-04743-7 (Volume 1).
• Ulrike Haß-Zumkehr: German dictionaries . De Gruyter, Berlin / New York 2001, ISBN 3-11-014885-4 (especially chapter 17: How many words does the German language have? Pp. 381–385).
• Wolfgang Klein : About the wealth and poverty of the German vocabulary. In: Wealth and poverty of the German language. First report on the state of the German language. Published by the German Academy for Language and Poetry and the Union of German Academies of Sciences. De Gruyter, Berlin / Boston, MA 2013, ISBN 978-3-11-033462-3 , pages 15-55.
• Elisabeth Knipf-Komlósi, Roberta Rada, Bernáth Csilla: Aspects of the German vocabulary . Bölcsész Konzorcium, Budapest 2006, ISBN 963-9704-33-4 ( full text as PDF )
• Birgit Wolf: Language in the GDR - A dictionary . De Gruyter, Berlin / New York, NY 2000, ISBN 978-3-11-016427-5 .
• Dieter Wolff: Statistical studies on the vocabulary of English newspapers . Dissertation at Saarland University , Saarbrücken 1969.
• Friedrich Wolff, Otto Wittstock : Latin and Greek in the German vocabulary . VMA, Wiesbaden 1999, ISBN 3-928127-63-2 .

