Digital dictionary of the German language

from Wikipedia, the free encyclopedia

The digital dictionary of the German language ( DWDS ), also the word information system for the German language in the past and present , is a project of the Berlin-Brandenburg Academy of Sciences , the aim of which is to create a digital dictionary system based on very large electronic text corpora .

It builds on the six-volume dictionary of contemporary German (WDG) and links it with its own text and dictionary resources. It provides the user with the latest spelling , pronunciation in the form of audio files and a variety of information on the form, use and meaning of his keywords .

Components

In the current version of the DWDS, the word information system, four lexical information types are linked: the dictionary articles of the WDG, automatically generated information on synonyms , hyponyms , hyperonyms from the WDG, text examples from the DWDS core corpus as well as statistical co-occurrence information from the core corpus (the so-called collocations , which indicate the frequency of occurrence of neighboring words).

dictionary

The dictionary of contemporary German (WDG) was developed in (East) Berlin at the German Academy of Sciences (from October 7, 1972: Academy of Sciences of the GDR ) between 1952 and 1977 under the direction of Ruth Klappenbach . The WDG comprises over 4,500 pages and contains 60,000 or, if the compound words are added, 121,000 keywords. From February 2002 to March 2004, the WDG was digitally recorded, structured and prepared for research under the leadership of the Berlin-Brandenburg Academy of Sciences and Humanities . The text corpus was compiled and expanded with the support of the German Research Foundation (DFG) between 2000 and 2003 and has been available as a reference work on a website since March 2003 .

Text corpora

The text corpora for the DWDS are continuously being expanded. As of May 2018, they comprise 13 billion running text words and consist of two large sub-corpus: the core corpus and the supplementary corpus.

  • The DWDS core corpus comprises around 100 million text words; it is spread evenly over the entire 20th century and is balanced according to text types. The corpus is based on four types of text: fiction (28.42%), newspaper (27.36%), scientific specialist texts (23.15%) and practical texts (21.05%). Since a complete temporal balance could not be achieved for the transcribed texts of spoken language, this is available as an independent corpus under special corpora. The DWDS core corpus is the first reference corpus of the German language of the 20th century and at least equal in quality to the British National Corpus (BNC), which has been the standard up to now .
  • The DWDS has concluded usage agreements with over 20 publishers and numerous public and private writers on rights-affected texts. B. make works by Thomas and Heinrich Mann , Martin Walser , Heinrich Böll , Jürgen Habermas or Victor Klemperer available for internet research.
  • The supplementary corpus comprises over 1.5 billion text words in around 3.5 million documents. It is less about balance than about size and topicality and consists essentially of newspaper sources from 1980–2006. All sources can be referenced bibliographically, and attention was paid to content and quality in the preparation.

Paradigmatic Relations

Over 65,000 synonyms, generic and subordinate terms were extracted from the definitions of the WDG with the help of automatic analysis programs. In addition to being used as a synonym dictionary and thesaurus , this type of information can no longer only be used to browse electronically in the WDG, but can also be navigated 'semantically'. For example, one of the keyword insect directly to its synonym insect jump, but equally to all subordinate terms such as ant , flea , locust beetle or water skiers .

Collocations

The statistical collocations determined in the core corpus are displayed graphically. The collocations are based on statistical association measures ( mutual information and t-score):

Collocation graph for "target"

Publicly searchable corpora

The corpora of the DWDS can be researched free of charge. However, due to the usage agreements with the granters of rights, prior registration is necessary for a large number of texts. More than 10,000 users are registered in the DWDS word information system.

  • DWDS core body
  • Corpus Der Tagesspiegel (1996-2005)
  • Korpus Berliner Zeitung (1946–1993), created as part of the GDR press portal project
  • Corpus of Berliner Zeitung (1994-2005)
  • Corpus of Jewish periodicals of the 19th and 20th centuries (cooperation with the DFG-funded project Compact Memory ) with a total of 25 million text words.
  • GDR corpus (9 million text words). The GDR corpus includes texts from the period from 1949 to 1990 that appeared in the GDR or were written by GDR writers and published in the Federal Republic. The GDR corpus is being expanded in collaboration with the Humboldt University in Berlin .
  • Corpus of New Germany (1946–1990)
  • Corpus Die ZEIT (1946–2016), limited to texts that are available digitally and online
  • Corpus Spoken Language . This includes transcripts from the entire 20th century with a volume of around 2.5 million text words. Among them are collections of speeches u. a. by Kaiser Wilhelm II. , Hitler , Ulbricht and Honecker , radio speeches from 1929 to 1944 ( around 80 hours of audio material were transcribed in cooperation with the German Radio Archive), as well as excerpts from Austrian parliamentary and Bundestag minutes as well as extracts from the Literary Quartet .

Revision of articles typical of the time

The substance of the DWDS dictionary is based on the dictionary of contemporary German. Approx. 2600 of the 90,000 entries in the WDG with GDR-typical content or formulations were revised by the DWDS project group. A group of lexicographers formulated the meaning paraphrases and competence examples in more neutral language or, if they illustrate an actual GDR-specific use, marked accordingly. This revision affected a further approx. 2500 entries or readings .

Web links

Individual evidence

  1. ^ Website of the Berlin-Brandenburg Academy of Sciences ; Retrieved August 19, 2015.
  2. ^ DWDS dictionary. Retrieved July 4, 2017 .