Relevance feedback

The relevance feedback method is used in information retrieval , a sub-discipline of computer science and information science . It describes a method for the gradual improvement of search results of a text-based search (e.g. from search engines ).

motivation

A problem for today's search engines are the often far too short search queries. On average, these barely exceed 2 words. This leads to a large number of ambiguous requests. Another problem is inaccuracies / fuzziness in the indexing of the content of the documents. The relevance feedback from multimedia information retrieval gains particular motivation, since a media break between search query and document often has to be overcome, for example by applying a textually defined search query to image data. The relevance feedback helps to defuse these problems, as the system can increasingly build better hypotheses about the requirements that the user makes on relevant documents.

Basic idea

The idea is to use the relevance of already found documents to search for similar documents. Hence the word relevance feedback , because it describes information feedback about the results of previous searches. The relevance feedback builds on existing search methods ( probabilistic model or vector space model ).

Procedure

On the basis of an initial search query Q, the retrieval system extracts a first set of documents from the document space.
The user then marks particularly relevant (positive feedback) and possibly irrelevant documents (negative feedback) in the search result set.
The retrieval system calculates based on this information
- a new search query Q ' in the vector space model , whose vector is more similar to the relevant documents and more dissimilar to the irrelevant documents than Q,
- in the probabilistic model new conditional probabilities that represent the relationship between the occurrence of terms in the indexate and the relevance assessment.
The retrieval system carries out the search step again with the new search query Q '(vector space model) or again with Q but based on the new probability estimates (probabilistic model) and finds a new set of documents that should better correspond to the interests of the user.
The new documents are presented to the user.
He or she can give further feedback (return to step 2).

This will improve the search result step by step.

disadvantage

A disadvantage of relevance feedback is the effort that the repeated relevance assessments require of the user.

Blind relevance feedback

Blind relevance feedback (also known as pseudo relevance feedback) eliminates these disadvantages of manual relevance feedback, but has other disadvantages. The relevance of the search query is not marked manually by the user, but automatically (hence the name "blind"). The search system automatically associates a relevance for the respective result documents, whereupon the search query is automatically expanded by query expansion, and a new result list is generated with the expanded search query. Since manual intervention is no longer necessary with this procedure, the results are usually too imprecise for the user.

literature

Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: Introduction to Information Retrieval, Cambridge University Press, 2008, ISBN 0521865719 .