Kali (text corpus)

from Wikipedia, the free encyclopedia

Kali (also KALI) is an acronym from " K orpus a rarbeit Li nguistik" and describes a diachronic text corpus for the German language as the basis for linguistic studies on grammaticalization . From 2003 the Kali corpus was built up at the German Seminar of the University of Hanover under the direction of Gabriele Diewald .

scope

The corpus currently spans eight centuries. Substantial parts of the text selection and preparation as well as the linguistic annotation and glossing are already available. October 2008 the corpus consisted of 25 sources from the Old High German and the Middle High German language level , most of which are still accessible to the public free of charge today.

Objective and procedure

All verbs in the texts of the corpus are annotated and lemmatized with morphological information . Both the synchronous forms of the respective language level and the New High German equivalents are recorded. Relevant dictionaries form the basis for lemmatization : for Old High German Rudolf Schützeichel and for Middle High German Matthias Lexer (published 1872–1878).

Integration into current research projects

The corpus was initially used as a material basis for diachronic empirical investigations within the framework of the research project "Evidentiality Markers in German", but is also used as a data basis in other research projects, for example in the European cooperation project Grammaticalization and (inter) subjectification .

Research into the verbal categories of German, especially the categories evidentiality and modality , over several language periods is the focus of current research projects. Accordingly, verbs in the corpus are given preference. Around 2008 all verb forms in the corpus were morphologically annotated and lemmatized. An extension of the annotation and lemmatization to other parts of speech is planned.

technical basics

A classic LAMP software package forms the technical basis for Kali . The content management system used here was developed from scratch for the needs of Kali users and offers web-based, intuitive tools for lemmatization and annotation for corpus processing. The preparation of the data provides powerful options for linguistic research both in text output and by means of sophisticated search functions and hyperlemmatization .

literature

  • Matthias Lexer : Middle High German Concise Dictionary. At the same time as a supplement and alphabetical index to the Middle High German dictionary by Benecke-Müller-Zarncke. Hirzel, Leipzig 1872–1878 ( online at woerterbuchnetz.de).
  • Matthias Lexer: Middle High German pocket dictionary. In the last edition. 2nd reprint of the 3rd edition from 1885. Hirzel, Stuttgart 1992.
  • Rudolf Schützeichel : Old High German Dictionary . 6th, revised and expanded edition with glosses. Niemeyer, Tübingen 2006 ( online at saw-leipzig.de).

Web links

  • The former official website is no longer available (www.kali.uni-hannover.de).
  • Kali corpus on The Lindat / Clariah-Cz Project. 2020 (English; Czech Ministry of Education).
  • Kali corpus at Open Languages ​​Archives (OLAC), April 26, 2020 (English).

Individual evidence