The basic vocabulary (also: basic vocabulary , use of vocabulary , minimum vocabulary ) can as the amount of words of a language , are defined which are needed to about 85% of any text to understand a particular language in a certain stage of development. This is followed by the so-called advanced vocabulary , which is required to cope with higher proportions of texts and can be designed differently as required.

Basic German vocabulary

As with all other natural languages, comparatively few German words provide a high degree of text coverage . Alan Pfeffer determined 1,285 words for current German, with the help of which, depending on the type of text, between 85.9% and 92.2% of the texts are understandable. Theodor Lewandowski states that the 1,000 most common words are sufficient to understand about 80% of German texts; with 2,000 words around 90% can be read, with 4,000 words around 95%. The total vocabulary is difficult to quantify, but comprises a multiple of these values.

Basic vocabulary is evaluated methodically using frequency lists, among other things. With this approach, corpora are compiled, which are supposed to map the proportions of different types of text, genres, topics , etc. of language reality as adequately as possible. A main problem area here is the necessary balance between written and spoken language.

For German, such a large-scale frequency study was last carried out by Randall L. Jones and Erwin Tschirner and published in 2006. Its core vocabulary comprises 4,034 lemmas , with der , die , das (summarized at rank 1) occurring 115,983 times per million words. A group of almost a hundred words - including rubbish , saying and undoubtedly - make up the end of the list and come to a frequency of 16 under a million. This striking discrepancy between the high and low frequency range also explains the high degree of text coverage described. It can be described with Zipf's law .

One explanation for this observation in basic vocabulary research focuses on the relative frequency of function words in the high-frequency range. Without auto-semantic charging, these appear in every spoken and written language / text form and ensure that the ten most common lemmas in Jones / Tschirner already give a text coverage of almost 28%.

word Hits per million Percentage
that, that, that 115.983 11.60%
and 28,445 2.84%
be 24,513 2.45%
in 23,930 2.39%
a 23,608 2.36%
to 14,615 1.46%
to have 13,423 1.34%
I 11,201 1.12%
become 11,016 1.10%
she 10,245 1.02%

In this context, "text coverage " must not be equated with "text understanding". The boundaries between the basic and general vocabulary of an individual as well as the intermediate / advanced vocabulary in foreign language didactics are fluid. Nonetheless, auto-semantics such as nouns also have a certain static, especially in the high-frequency range. A comparative frequency survey from 2013 - the texts of the time slices 1905–1914, 1948–1957 and 1995–2004 were evaluated - made shifts within different word classes transparent. Of the twenty most common nouns, thirteen occurred in each period: year , lord , time , woman , person , day , life , man , child , eye , world , question, and part .

Importance of basic vocabulary

Research into basic vocabulary is important for both the didactics of the mother tongue and that of foreign languages, as it provides information on which parts of the vocabulary are particularly necessary in order to achieve the best possible understanding of the text with as little learning effort as possible. The problem here, however, is that the most common words have many different meanings at the same time .

Basic vocabulary in various communication areas

A basic vocabulary can also be determined separately for the individual communication fields of a language community, for example for the various specialist areas or sociolects . Such approaches are useful when it comes to the didactics of certain technical languages ​​or questions relating to the sociology of language. With technical languages one arrives at similar dimensions as they were already mentioned above for the standard language: with the 1100–1200 most frequent words one can understand "on average 80–90% of every text".

The basic vocabulary is of particular importance in the discussion of methodological concepts for spelling lessons. However, the limits of this approach also become clear here. Various studies have shown that only 100–300 (functional) words can be identified as being particularly common; the curve for content words flattens out significantly, i.e. H. the 900th word is only slightly more common than 700; In addition, the proportion of a word in a text corpus says nothing about its range of use, e.g. B. when few writers use a word often.

It is therefore suggested to only provide a common core vocabulary of around 250 words and to supplement this with 250 subject-related class words and 250 individually important words that could just as well serve as model cues for typical spelling patterns. Such interest-based spelling learning is also more effective. In addition, numbers such as “The 100 most frequent words cover two thirds of the running texts” are misleading; because with many words the problems lie in the derived forms (cf. for example “saw” or “sees” to “see”). These often have to be learned separately so that an apparently small basic vocabulary of 750 words quickly becomes more than 1,000 word forms to be learned. In this respect, spelling lessons are much broader and practicing more frequent words can only make a limited contribution.

Word lists in glottochronology

Glottochronology developed lists of basic words for its linguistic-historical questions, which should be as independent of cultural influences as possible and therefore prove to be as historically stable as possible. The relationships between languages ​​should be determined based on the expiration rates of these words. The approach can essentially be viewed as a failure.

