Inverted file

from Wikipedia, the free encyclopedia

In the area of information retrieval, inverted files are required as a basis for carrying out various search queries, for example for searching with Boolean operators and truncations .

For this purpose, an index is created for a document collection (for example a literature database) that assigns the corresponding documents to the individual searchable terms. The inverted file for a (search) term refers to all documents that are linked to this particular term. For this purpose, the inverted file receives information such as the document numbers or their addresses in the database, an indication of how often the term occurs in the overall database (or the number of documents in which the entry occurs at least once). For the retrieval and weighting of the search results information about the position in the document where the term occurs (as how many word, in how many sentence or paragraph) is useful. If left truncation is to be enabled, each term must also be written backwards.

The advantage of this system is quick access to documents, since only the index (and not the documents themselves) has to be searched. The index can be used well for the creation of retrieval systems. Search possibilities can be exhausted and the search interface can be designed relatively freely. Disadvantages are, on the one hand, the enormous effort required to create such an index and, on the other hand, the large amount of memory required. In addition, the index must be updated every time new documents are added.

The principle of inverted files is based on a system developed by Herman Hollerith , who in 1890 was the first to use punch cards for evaluating a census in the USA.

The technical implementation takes place through an index structure .

See also