Subject indexing

The indexing ( Engl. Subject cataloging ) or content analysis referred to within the library and documentation science the development of bibliographical and archival resources by content. This means that a resource is described intellectually or automatically based on its content. In contrast to this, formal indexing , which is also known as cataloging, is dedicated to the recording of an object according to formal rules. Only data that can be determined directly are used here, e.g. B. the title of a work.

Differentiation from formal indexing

The content of a resource can be, for example:

Statement of a text
Movie content
Description of a musical work

In contrast, the formal registration would only relate to the title, author, composer, etc.

In both cases, these are documentation activities in which metadata is obtained. The aim of the content indexing is to facilitate and accelerate the finding of relevant resources by providing informal added value. The information scientist calls this " improving retrieval ".

A content indexing places significantly higher demands on those who carry out it, so that in many institutions scientists in the subject (or at least in a related subject) are entrusted with this task. In libraries these are, for example, the subject librarians, in documentation facilities scientific documentaries .

Methods

Various documentation languages and systems are used for subject indexing, whereby a fundamental distinction can be made between classifying procedures and methods of verbal subject indexing. The verbal subject indexing can in turn be divided into indexing and free verbal indexing.

Subject classification

Classifications describe a content area using identifiers across subject areas. Two methods can be distinguished:

the hierarchical classification into groups and subgroups ( subject areas ) from which the appropriate class is selected. A classification assigns a unique subject class to a document. Examples of such classifications are the Dewey Decimal Classification (DDC) and the Regensburg Association Classification (RVK).
the facet classification , in which different subject areas are assigned equally next to each other. The great advantage of this classification is that the structure does not have to be planned in advance; subclasses and intersection classes (facets) can be created afterwards (post-coordinative classification), and new class keys can be defined based on the resulting facets. The facet classification can also be used to classify very complex or innovative subject areas. A well-known type is the Colon classification (CC), in the original colon (Engl. Colon ) was the only separator.

Keywording / indexing

The indexing can freely or using a controlled vocabulary done. Examples of controlled vocabularies are keyword lists as the Schlagwortnormdatei , the Library of Congress Subject Headings or a thesaurus . From this vocabulary, the editor selects the corresponding key word. This allocation of key words is called keywording . He is supported in this, depending on the documentation language, by making connections between the individual keywords in the documentation language clear.

Another form is the indexing with self-assigned keywords . For this purpose, essential words from the text are recorded. A relatively new form of this indexing is the use of so-called tags in open Internet systems ( collaborative tagging ). The new thing is above all that it is not a single person who assigns free keywords, but everyone who participates in this system, so that a large number of aspects can be covered.

Furthermore, there are increasing numbers of automatic methods for extracting catchwords and key words. It is controversial whether full-text indexing , as operated by search engines, can also be counted among the means of subject indexing. To enable the documents to be weighted, search engines use various algorithms to determine the relevance of a document to a specific keyword. However, this is undermined by procedures for search engine optimization .

Text summary and content excerpt

Another approach is the textual short form of the content. Examples of free verbal development, all in all different forms of summary , are:

Presentations
Tables of Contents
Summaries (such as the abstract of scientific papers)
Annotation (title extensions)
Excerpt (reproduction of selected text passages)
review
register

Metadata

Integrated forms of collecting metadata such as catalog enrichment can also be part of subject indexing . With the latter, the entries of electronic library catalogs are supplemented with tables of contents, links to reviews or title pages .

Metadata also includes methods for summarizing all cross-references and references to other documents and opening up cross-references from other data records to the document. This includes the bibliography , in online media these are hyperlinks and backlinks .

Complex combination of methods

The indexing only provides syntactic expressions that do not describe the context in which the expressions occur in the document. The links between keywords that appear in thesauri are only intended to help the person using the keywords and the searcher to choose the right keywords. The free verbal indexing can give the searcher an impression of the content, but only improves the search result to a limited extent. This can only happen if the table of contents is itself indexed.

Automatic procedures for text summary already exist here as well. Ontologies were developed in order to be able to code the semantic statements of a text in a machine-readable way . With the help of ontologies it is possible to make content statements searchable. Since subject indexing with ontologies and their creation is very time-consuming, these have hardly been used so far. In the context of the Semantic Web , however, this technology is to be used more intensively, so that it can be expected that it will gain in importance.

Also expert systems could be regarded as a form of indexing - even though these are not widespread because of their relative complexity.

history

Early indexing means in libraries were the systematics of the (old) real catalogs , in which literature was listed according to content criteria.

By the beginning of the 20th century at the latest, the exponentially growing amount of published information became so large that the systematic indexing of the content became indispensable. For this purpose, departmental bodies were first created. With the increasing number of publications , not all documents could be referenced and indexing introduced a much more compact form of content indexing .

Since the beginning of the 21st century, collaborative tagging has emerged as a new form of subject indexing that is often contrasted with conventional methods.

literature

Joachim Eberhardt: What is (library) subject indexing? . In: Museum, Region, Research: Festschrift for Rainer Springhorn . Edited by Detlev Hellfaier and Elke Treude. Detmold, 2011. (Writings of the Lippisches Landesmuseum; 7), pp. 19–28, ISBN 978-3-942537-00-1
Indexing - core task of the archives and an important topic for the entire I + D world , Arbido , Issue 3, September 21, 2006, ISSN 1420-102X
Jutta Bertram: Introduction to content development, basics - methods - instruments . In: Series of publications: Content and Communication . Vol. 2, Ergon-Verlag, Würzburg 2005, ISBN 3-89913-442-7
Wilhelm Gaus: Documentation and order theory. Information retrieval theory and practice . Springer, 2005, ISBN 3-540-23818-2
Otto Oberhauser: Automatic classification: stage of development - methodology - areas of application . Lang, Frankfurt am Main [a. A.] 2005, ISBN 3-631-53684-4
Christa Ladewig: Basics of the content indexing . In: Series of publications by the Institute for Information and Documentation (IID) at the Potsdam University of Applied Sciences . 1997, ISBN 3-00-001480-2
Ulrich Reimer: Knowledge-based procedures for organizing and conveying information . In: Rainer Kuhlen et al. (Ed.): Handbook for the introduction to information science and practice . 5th edition. Saur, Munich 2004, pp. 155–166, ISBN 3-598-11674-8
Ursula Schulz: On the future of intellectual library content indexing: a few remarks for common sense . In: Hans-Joachim Wätjen (Ed.): Between writing and reading: Perspectives for libraries, science and culture; Festschrift for the 60th birthday of Hermann Havekost . Library and information system of the University of Oldenburg, Oldenburg 1995, ISBN 3-8142-0516-2 ( PDF )
Karin Weishaupt: Subject indexing in libraries and bibliographies, I. Classificatory subject indexing , Vol. I of the series The library system in individual representations , Verlag Vittorio Klostermann GmbH, Frankfurt am Main 1985, ISBN 3-465-01672-6

Web links

Literature by and about subject indexing in the catalog of the German National Library
Introduction to verbal indexing for libraries - by Gerhard Stumpf (2002)
Thesaurus for content indexing - created by students at the Humboldt University of Berlin (2004)
Content indexing in libraries - lecture notes by Konrad Umlauf (2000)