A thesaurus ( ancient Greek θησαυρός thesaurós , treasure, treasure house '; Latin then thesaurus , hence also safe ) or word network is a controlled vocabulary in documentation science , the terms of which are linked by relations of at least approximate synonymy. The term is also used for linguistic thesauri or scientific vocabulary collections of a language.
A thesaurus is a model that tries to precisely describe and represent a topic. It consists of a systematically arranged collection of terms that are thematically related to one another. The thesaurus is a controlled vocabulary, also called an attribute value range, for the respective attribute to be described . Mainly synonyms , but also general and sub-terms are managed. However, often no antonyms (opposite terms) are listed.
Example: image (synonym: image, image, portrait; generic term: representation; sub-term: mirror image, painting); Carpenter (synonym: carpenter; generic term: craftsman, timber profession; sub-term: cabinet maker, construction joiner)
In the general sense of the word, it initially referred to a "store of knowledge" such as a dictionary or an encyclopedia. In 1572 the five-volume thesaurus Graecae Linguae by Henricus Stephanus (Henri Estienne) appeared, the most comprehensive dictionary of its time, also mentioned in the diaries of Samuel Pepys (December 1661). Roget's Thesaurus of English Words and Phrases, which was particularly influential in the English-speaking world and published in 1852 by Peter Mark Roget , shifted the meaning of the term in the direction of a linguistic thesaurus.
In the area of information retrieval , the term was first used by Hans Peter Luhn in 1957 , when various indexing systems were developed in the 1950s . The first thesauri to be used for cataloging were the Du Punt system (1959) and the Thesaurus of ASTIA Descriptors (1960). A uniform format for thesauri was presented in 1967 with the Thesaurus of Engineering and Scientific Terms (TEST). From the rules for the structure of thesauri, which had been developed from the beginning, general standards developed over time that define the form of the classic thesaurus for documentation. These include the UNESCO's Guidelines for the Establishment and Development of Monolingual Thesauri drafted by Derek Austin and Dale , the content of which was incorporated into ISO standard 2788 (1986).
In documentation science, the thesaurus has proven to be a suitable aid for subject indexing and for finding documents. Relations between the individual terms are used to find them during indexing (allocation of keywords) and during research . In contrast to a linguistic thesaurus, a thesaurus for documentation contains a controlled vocabulary , i. H. unique terms ( descriptors ) for each term. Different spellings (photo / photo), synonyms or quasi-synonyms, abbreviations , translations etc. that are treated as equivalent are related to one another by means of equivalence relations. Concepts are also linked by association relations and hierarchical relations.
The thesaurus serves as a documentation language for indexing, saving and finding documents. The relations make it possible to find suitable terms for searched terms during indexing and research . When searching, thesauri can be helpful because the search query is automatically expanded to include synonyms and sub-terms.
A thesaurus can thus also generally serve to clarify terms and in the best case it has the function of an authority file. In contrast to a monohierarchical table or database, the thesaurus can have a polyhierarchical structure (i.e. a sub-term can have several generic terms).
The thesaurus norms DIN 1463-1 and the international equivalent ISO 2788 provide for the following types of relations and associated abbreviations:
|Abbreviation and name|
|DIN 1463-1||ISO 2788|
|BF||Used for||UF||Used for|
|BS||Use synonym||USE / SYN||Use synonymously|
|IF||Generic term||BT||Broader term|
|VB||Related term||RT||Related term|
|SB||Top term||TT||Top term|
As a rule, an element of an equivalence relation, i.e. a designation, is specified as a preferred designation. The non-preferred terms are given a reference to their equivalent preferred term.
- Vehicle refers to the sub-terms truck and car .
- Car refers to the preferred term car and with an association relationship (“see also”) to truck .
Thesaurus as a compilation
Different forms of thesauri
In the past, a thesaurus was understood to be a scientific collection of the entire vocabulary of a language . The Thesaurus Linguae Graecae and the Thesaurus Linguae Latinae are known, among others . Strictly speaking, these works are dictionaries .
The first thesauri used in electronic word processing ( EDP ) were also simple dictionaries, which could compare the vocabulary entered with the entries contained and give the user feedback. The feedback could initially only be used to detect simple spelling errors and could be determined with searches, but later in the background , which corresponds to today's standard. The databases required for this were originally created from word collections that were manually converted into data format, which were initially continuously supplemented by the manufacturer for commercial programs and delivered to the customer with updates . With the advent of word entries that can be added individually by the user, the possibility arose of using large, quasi-collaborative, user-based platforms to collect new entries, with the database on a server briefly growing rapidly due to the return of the individual working copies of thesauri from different users. Here, too, a manual inspection was necessary in order to prevent the entry of incorrectly written and therefore incorrectly frequently sent incorrect vocabulary. However, due to the limited vocabulary of each language, almost complete data sets are now available for most languages that exhaustively reproduce the respective language. The entry of new words today only corresponds to the natural growth of the respective languages.
At the same time, the electronic thesauri have been developed into increasingly complex programs that can also control grammatical rules and style rules and offer synonyms. At their limit areas, modern thesauri now also provide translation aids and allow texts to be viewed automatically, whereby the user can select numerous options beforehand.
A special form of thesauri provides input aids for pictogram fonts such as the Chinese script using a western computer keyboard . Because of their large number, these characters often cannot be displayed on practically manageable keyboards, which is why the thesauri suggest characters to the user, which can then be accepted or rejected by him.
There are numerous methods for entering Japanese or Chinese characters that convert syllables or abbreviations into characters after thesaurid database entries. However, none of these methods has so far been able to establish itself in a standardized manner, because the Asian written languages are very complex and the meaning of the characters is often context-dependent.
The learning effort to use these thesauri-based program solutions is extremely high for Asians, and native speakers usually only use one software solution with which they can achieve acceptably high writing speeds, which, however, lags far behind that of the Latin alphabet. Latin writers write much faster than Asians, although the reading speed of pictogram fonts is higher for knowledgeable readers than with Latin fonts. A uniform thesaurus for pictogram fonts is opposed to traditional, conceptual and syntactic problems.
In a linguistic thesaurus, instead of terms, words with a similar and related meaning are linked by references . This type of lexico-semantically organized reference work can be used, among other things, as a formulation aid. There are reference works of this kind in printed form or in electronic form, here mostly as background resources for word processing programs.
- European Thesaurus for International Relations and Area Studies
- Eurovoc Thesaurus of the European Union
- Getty Thesaurus of Geographic Names
- INFODATA thesaurus
- Medical Subject Headings (MeSH)
- OpenThesaurus - project to create a German-language linguistic thesaurus
- Thesaurus Linguae Latinae - dictionary of the entire Latin language from its beginnings to about AD 600
- Thesaurus Linguae Graecae - Project for the digital recording of all Greek literature from ancient times to modern times
- UNESCO thesaurus
- Terminological database
- Simple Knowledge Organization System
- Ontology (computer science)
- Form of reference
- Guidelines for the construction, format, and management of monolingual Controlled Vocabularies (ANSI / NISO Z39.19-2005) (PDF; 2.0 MB)
- The Thesaurus: Review, Renaissance, and Revision . Haworth, 2004, ISBN 0-7890-1978-7 / ISBN 0-7890-1979-5 (corresponds to issue 3/4, volume 37, 2004 of the journal Cataloging & Classification Quarterly )
- Gernot Wersig : Thesaurus Guide: An introduction to the Thesaurus principle in theory and practice . 2nd supplementary edition, Saur, 1985, ISBN 3-598-21252-6
- Leonard Will: Publications on thesaurus construction and use . (on-line)
- Bettina Brühl: Thesauri and Classifications. Natural sciences - technology - economy. 1st edition, Verlag für Berlin-Brandenburg, 2005, ISBN 3-935035-63-2
- ISO / FDIS 25964-1: Information and documentation - Thesauri and interoperability with other vocabularies - Part 1: Thesauri for information retrieval, 2011-04
- Bulitta Erich und Hildegard: Dictionary of synonyms and antonyms . 18,000 keywords with 200,000 word explanations . Krüger Verlag, Frankfurt am Main 1983, Fischer Taschenbuch, 5th edition, 2011
- Jean Aitchison, Stella Dextre Clarke: The Thesaurus: A Historical Viewpoint, with a Look to the Future . In: Cataloging & Classification Quarterly 37, 3/4, 2004, pp. 5-21.
- Duden. The dictionary of synonyms. A dictionary of related words. 4th edition. Dudenverlag, Mannheim / Leipzig / Vienna / Zurich 2006. ISBN 978-3-411-04084-1