Indexing

from Wikipedia, the free encyclopedia

As indexing , (possible Anglizismus also tagging ), and indexing (Austria, Bavaria-wording) or Verstichwortung is called the Information Retrieval assigning descriptors to a document for the development of the facts contained therein. A distinction can be made between controlled indexing (with a thesaurus or subject catalog or notations of a classification ) and free indexing or free keywording (with non-specified descriptors). In the case of community indexing (also known as social tagging or collaborative tagging) with the help of social software , one speaks of tagging instead of indexing and of tags instead of descriptors.

Methods

Different indexing types and methods can be distinguished according to different aspects:

Manual indexing

The manual indexing , Intellectual indexing or indexing is a method of indexing documents in a document representative of the Tags (engl. "Subjects") are assigned by an indexer. Manual indexing is carried out by experts using terminology lists and similar sets of rules and controlled vocabulary; it allows a language analysis of individual formulations and a synonym assignment, but has the disadvantage that it is time-consuming, slow and expensive, its quality depends on the consistent working methods of the staff and the predefined descriptor vocabulary is static. In addition, the user must know the indexing vocabulary in order to research documents.

Automatic indexing

A common method of automatic indexing is full-text indexing , in which all words in a text are included in the index with the exception of stop words. This type of indexing is often used in search engines by so-called web crawlers . It is possible that words are traced back to a common word stem by means of stemming (dt. Reduction ).

With statistical indexing methods, a selection is made by determining the frequency of words and thus only words are included in the index that occur in the text with a certain frequency. A simple method of term weighting is the inverse document frequency. This procedure determines the frequency of a term in a document. This value is related to the frequency of the documents in which the term occurs. This makes it easy to read off the value or weighting of the term as a descriptor . The weighting of a term is higher, the fewer documents with this term there are in the archive and the more frequently the term occurs in the document to be indexed. The significance can be read from the frequency of the term. For example, “term” is often used in this document because the word is important to the subject. Only: “Term” is too broad a term per se. This shows that the frequency alone cannot tell whether it is a good or a bad descriptor. Only in conjunction with the above Weighting methods can be used to create significant descriptors.

With the help of computational linguistics , more intelligent automatic processes are also possible. If the terminology system of the respective institution ( thesaurus , classification, etc.) is implemented, the differences to the intellectual indexing are in some cases no longer significant. In contrast to human indexing, indexing consistency increases . This also makes it possible, after revising the terminology system or other improvements to the process, to reprocess the entire document collection with a reasonable amount of effort.

Especially when the library catalog is called automatic indexing - even within multi-unit subject strings of a syntactic indexing, which were awarded by qualified personnel in a manual indexing ( keyword catalog ) - Verstichwortung , from which the keyword catalog is created. The automatic extraction of keywords from a full text - for example to create an index - is also called this.

Computerized indexing

With computer-aided or semi-automatic indexing (also indexing), descriptors are suggested automatically and selected manually. The indexing is done by computers with preparation or follow-up by people or in interaction with people.

Keywording of images

The Iconclass classification is used in many museums to index the content of images . The subject headings authority file is also increasingly used in the museum sector. Many picture agencies and picture archives use the IPTC-IIM standard and the rules it contains for categories and keywords. However, in-house keyword lists also play a major role. In addition, there are various methods with which images can be researched using a similarity search and relevance feedback .

See also

Wiktionary: keywords  - explanations of meanings, word origins, synonyms, translations

literature

  • Holger Nohr: Basics of automatic indexing . A textbook. 3. Edition. Logos-Verlag, Berlin 2005, ISBN 3-8325-0121-5 .
  • Martin Kästner: Comparison of selected methods for keywording and validation of the methods through a test procedure . Thesis. Techn. Univ., Ilmenau 2006.

swell