Concordance (text science)

from Wikipedia, the free encyclopedia

Under Concordance (dating back to lat . Concordare "match"), refers to the text sciences traditionally an alphabetical list of key words and phrases that are used in a written work. The term comes from biblical studies , today it also plays an essential role in literary studies and in related disciplines and, more recently, in corpus linguistics , a sub-area of linguistics . In the latter, a concordance can also affect words and phrases that come from oral texts.

Today, concordances are usually electronically created hit lists that usually result from a search for a word or phrase, but actually from a search for any definable character string. In a concordance, the closest linguistic environment of the term sought, the so-called context , is usually given , for example the entire sentence in which a searched word occurs.

The terms register and index or index verborum ('index of words') are used as synonyms for concordance . In corpus and computational linguistics , the term Key Word in Context and its abbreviation KWIC have also become established as terms for the search term displayed in a concordance.


From a printed Vedic verbal concordance

Before the time of modern data processing, creating concordances was very laborious and time-consuming and therefore costly. Therefore, concordances were only created for works that were of particular interest, such as religious texts ( Bible , Koran , Rigveda ) or for the works of great writers (e.g. William Shakespeare ). As early as the Middle Ages, collections of documents were created for the Bible, but also for canonical texts using the Latin expression concordantia .

Typologically there are two variants to this day:

  • Verbal concordances give an alphabetical order of all occurring words and idioms with indication of the text passage
  • Real concordances provide a content-based compilation of all passages relating to a specific thought or object.

Since the most important literary works are now available in digitized form, concordances are now usually created using software , which enables convenient searches for words and phrases - similar to a search engine on the World Wide Web ( full-text search ). There is already a large number of different products designed for different purposes - for the Bible, for example, a whole range of Bible programs and for corpus linguistics, for example, “ WordSmith ”. Such software can optionally also be operable online. The full-text search in many other digitized texts such as dictionaries , encyclopedias and literary collections also corresponds to this principle. In corpus linguistics, the upswing of which was only made possible by modern digital technology, concordances are obtained either from text corpora specially designed for a specific research project or from pre-prepared text collections that are often also accessible online.

Early electronically created concordances were published in printed form, for example on the Greek poets Hesiod and Homer from 1977. These already had the form of representation that is common today in computer concordances . In this case, a reference point comprises a single line and the respective search term is in the middle. In today's electronic concordances, however, the displayed extent of the text surrounding the search term can often be changed (for example, any number of lines, a whole sentence or a whole paragraph).

Differences between traditionally and electronically created concordances

With electronically manufactured concordances, several limitations of previous concordances can be overcome. The focus on single words has changed in particular:

  • Individual words were traditionally selected for their search in a corpus on the basis of a content-related evaluation by the person - today a foreign point of view.
  • Individual words were reduced from their current form to their basic form, i.e. conjugated and declined word forms to their respective basic form. Today the concrete word forms are also in focus.
  • The starting point of the search does not only have to be a single word, but an entire text can be defined as a search criterion. Because single words are always embedded in word chains, and it can be automatically checked whether a single word in a certain chain is still used more often in the rest of the corpus. The view goes from grammatical constructions to empty phrases, formulas to quotations and plagiarism.

Concordances created using modern technology no longer correspond in several ways to the conventional model of a printed concordance:

  • Completeness: Electronically produced concordances always list the documents in a corpus in full - unless otherwise intended.
  • Search criterion: Conventional concordances are based on the search for content-related terms , while concordances from digital corpora list individual word forms (although a term in question and its word form can also be identical in writing). Whether and how a bridge to conventional concordance (inclusion of meanings) can and should be built (keyword lemmatization ) is currently still a methodological problem in the philological field.
  • Search options: In contrast to the classic search method, digital corpora offer a very flexible search structure. In these, for example, you can search not only for individual word forms, but also for any word strings (which do not necessarily have to be a phrase in the grammatical sense) as well as for combinations of words that do not have to appear as a chain in the text corpus.

Concordance software

From an electronically produced concordance of the word “Nationalrat” in the stenographic minutes of meetings of the Austrian National Council

A number of different software products exist for language analysis purposes. In addition to the so-called taggers (for annotating corpora) and parsers (for their syntactic analysis), there are a number of concordancers for generating concordances from a corpus , which are very differently designed and designed in such a way that they meet the respective requirements of the individual scientific research areas.

The CoMOn concordance program, for example, which primarily meets philological needs, allows a complete individual text with up to several thousand characters to be checked for its relationship to the surrounding corpus, whereby hits can also be output as concordances that range from the specified search text to a differ to a certain extent. The program also automatically recognizes up to what length word strings match. Programs like Wordsmith Tools or AntConc, on the other hand, which are mainly used in linguistics , offer not only the generation of concordances, but also a number of other functions such as collocation analyzes , lemmatization, keyword extraction or the output of statistical data on the corpus such as the type-token relation .


Depending on the scientific discipline, research question , type of software used and the design of the concordance (selected text corpus, individual term or phrase, scope of the linguistic context considered), concordances can be used differently. With the listing of a certain individual term it can be determined, for example,

  • In biblical studies (by means of biblical concordances ) at which places a certain term appears in different editions of the Bible, from which conclusions can be drawn about the practice of translation into different languages
  • in literary studies how often, in which works and in which contexts the search term is used by a certain writer, which is considered to be part of a certain language typical for the person ( idiolect )
  • in dictionary spelling in which different meanings a certain word appears in a certain language, from which the range of meanings of a certain term can be derived and - also of importance for historical linguistics - phenomena of language change can be described over time
  • In general linguistics, in which inflected forms or with which other words a certain word is generally used, which can be used to prove, for example, a different use in written and oral language
  • In language teaching research, the extent to which a certain expression is often used grammatically correctly or incorrectly when learning a foreign language, which has an influence on the design of teaching materials, for example

Using modern technology, concordances of any defined groups of several linguistic characters or words ( N-grams ) can also be created. In this way it can be determined whether and in which texts which word combinations (collocations) occur preferentially. In this way, idiomatic expressions, formulaic language usage, quotations, allusions, etc. can be recognized, which is of specific interest in the relevant scientific fields.

See also


Individual evidence

  1. Joseph R. Tebben: Hesiod Concordance. A Computer Concordance to Hesiod , Olms, Hildesheim 1977, ISBN 3-487-06268-2
  2. Joseph R. Tebben: Homer Concordance. A Computer Concordance to the Homeric Hymns , Olms, Hildesheim 1977, ISBN 3-48706270-4

Web links

Wiktionary: Concordance  - explanations of meanings, word origins, synonyms, translations
Software to download (selection)
Online software (selection)