Top-N request

from Wikipedia, the free encyclopedia

A Top-N query is a query to an information system that does not provide a complete result, but only the N best (Top-N) or N first (First-N) results. First-N inquiries are suitable, for example, for browsing larger information stocks in order to get an overview of the nature of the data records. A classic application of First-N queries are queries to a search engine .

First-N inquiries

Limiting the query to a limited number of results enables optimization. To do this, however, the information system must support a type of STOP operator. When processing inquiries, this operator should be used as early as possible to avoid unnecessary data transfer and processing time.

An effective implementation of such an operator is only possible in databases, where internal query plans are created and optimized for queries formulated in a query language (for example SQL ). The STOP operator contains the desired maximum number N of data records (scan stop) and, if necessary, a sorting direction for the data records (sort stop).

If sorting is not necessary or the data records are already correctly sorted, processing can be terminated after N data records. Otherwise the stop operator reads in all data records and forwards the best N with the help of a priority queue .

There are various strategies for placing the STOP operator in the query plan. With a conservative strategy, it is placed as early as possible so that no data that may be needed later is intercepted. An aggressive strategy enables greater optimization by placing it earlier in the inquiry plan. However, the number N should be chosen large enough and a suitable RESTART operator added to continue the query if not enough results are returned.

Top N inquiries

To determine the best N results, a form of ranking is necessary in which all results are sorted on the basis of an evaluation criterion. The main problem is usually the determination of the evaluation criterion with which individual data sets can be compared.

Top N queries with fuzzy attributes

If attributes (properties) of the objects to be searched for are specified in a fuzzy manner in a search (for example "Search for a round, red object"), the answer consists of a fuzzy set (see fuzzy logic ) of objects, each of which is assigned a rating between 0 and 1 is. For example, the attributes of multimedia objects cannot always be specified exactly.

In the case of several fuzzy attributes, a fuzzy logic standard is calculated accordingly in order to obtain an overall rating. Ronald Fagin has proposed an algorithm for such queries that delivers an optimal result without having to look at all the attributes of all objects directly. It is assumed that individual attributes can be accessed directly if necessary (random access) and that the objects are sorted for each attribute (sorted access). The algorithm works in three phases:

  1. Sorted Access: The best objects are queried successively according to the ranking lists of the individual attributes, i.e. first the best object of each individual attribute, then the second best etc. The process is continued until N objects have appeared in all lists or no objects are left.
  2. Random Access: After the first phase, N objects with their complete attributes and further objects with some attributes are known. For the latter, the missing attributes are completed by direct access.
  3. Calculation and sorting: Since all the attributes of the objects determined in the first phase are now known, the overall rating can be calculated according to which the objects are sorted. The first N objects of this sorting are the result of the query.

See also

literature

  • Michael J. Carey, Donald Kossmann :, On Saying "Enough Already" in SQL . In: Proceedings of the ACM SIGMOD Conference on Management of Data. 1997
  • Ronald Fagin: Fuzzy Queries in Multimedia Database Systems . Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 1998