Range minimum query

from Wikipedia, the free encyclopedia
Figure 1: Range minimum request for an integer array

Range Minimum Queries ( RMQs ) address the problem of answering a query for the smallest element within a specified range of an array in computer science . An efficient answer is relevant to other problems in computer science, such as B. for text indexing and compression, flow graphs etc.

definition

Let be an array of length with elements of a universe of total order , then return (with and ) the position of the smallest element within the specified interval. Figure 1 illustrates the request for an example sequence.

Trivial machining

There are two trivial solutions for solving problems, which are either space or time inefficient.

  • Linear scanning: By scanning of a query, for each linear request time in worst-case and a space of achieved.
  • Pre-calculation of a lookup table: A table is naively pre-calculated which stores the answers to all possible range minimum requests. A constant query time is thus achieved, with combinations requiring space.

It is assumed, however, that it is a static array and that the inquiries are made on-line, whereby a clever pre-calculation and the construction of a suitable data structure can significantly reduce the inquiry time. Here, an almost constant query time with linear space consumption is aimed for.

Efficient construction

The precomputed RMQ data structures can be divided into two categories: (a) array indexing and (b) array coding. In case (a) the precalculated structure needs access to the array in order to answer RMQs. In case (b) it is not necessary to access to answer an RMQ. The solutions presented belong to the category of array indexing.

In the following, it is shown step by step how the given problem can be reduced to a complexity of (linear space requirement / pre-calculation time, constant query time) by means of clever pre-calculation .

Linearithmic space requirement

Figure 2: Two partial inquiries for the RMQ

In MA Bender et al. (2005) describe one way to achieve this. For this purpose, the results are precalculated for all possible starting positions and lengths of the shape . Because of this, the actual RMQ is broken down into a maximum of two partial queries of equal length, the lengths of which are powers of two and possibly overlap. The greatest possible power of two is chosen, where is. Due to the precalculation, the result can be determined over the interval of in constant time. Figure 2 illustrates the procedure and shows an example of how the RMQ (interval = red line) is broken down into two RMQs (gray lines). Be the table with the pre-calculated answers. The algorithm comprises a total of three steps to determine the result of an RMQ:

At this point it can be seen that the specified solution belongs to category (a) array indexing, because the smaller of the two minimums of the partial requests must be determined and two accesses to are required for this.

analysis

Figure 3: Pre-calculation

Based on the precalculation, an RMQ can be answered in. The additional data structure contains a maximum of precalculated minima for each position within the array , because in the worst-case scenario there are powers of two from the first array position to the end. This reduces the space requirement from to . The precalculation requires a running time of steps by means of dynamic programming , whereby the following recurrence is solved: (see Figure 3) in order to calculate the corresponding table . The time required for the pre-calculation is in .

Linear space requirement

Figure 4: Block formation and storage of the minima

The solution presented comes from Johannes Fischer and Heun (2011). The space requirement is reduced to, with the input array being broken down into blocks of length . OBdA is . The block formation is shown visually in Figure 4, with the example broken down into four blocks. The RMQ can now be used in max. three parts are disassembled:

  • A request that spans entire blocks ( ).
  • Two queries, each relating to a part of a block [ in-block ] ( , ).

Due to the processing of the max. three partial inquiries you get the respective minimum over the spanned area. In the last step, the specific values ​​are loaded and compared. The position of the smallest of the max. three values ​​are output as a result. At this point, too, it becomes evident that the specified solution is to be assigned to category (a) array indexing.

Request for complete blocks

The RMQ, which terminates at block boundaries ( i.e. comprises complete blocks ), can be answered using the construction from Section 3.1 in two overlapping queries. Save the minimum of each block in a list (Figure 4), and be . The known construction is applied to this, which results in a space requirement of

results. In summary, it can be said that for this case an RMQ can be calculated in constant time, the auxiliary data structure linearly requiring a lot of space.

Inquiries about block parts

Figure 5: Example of Cartesian trees

Each block is a Cartesian tree assigned. A Cartesian tree is constructed according to the following recursive rule and can be seen in Figure 5 as an example.

  1. Root: position of the minimum in
  2. Left child: Cartesian tree from
  3. Right child: Cartesian tree from

Lemma: If the Cartesian trees of two arrays have the same structure, then applies . The interested reader can read the proof of the lemma in Johannes Fischer and Heun (2011, p.472).

Due to the lemma, there is a reduction for in-block RMQs, since not each block is precalculated individually, but only the answers for all possible Cartesian trees over elements are sufficient, because each block can be broken down into one in linear time (Johannes Fischer and Heun 2011) Map the Cartesian tree, as the tree construction is amortized in . The number of trees corresponds to the Catalan number . The bit patterns of the trees serve as an index for the precalculated in-block table . A Cartesian tree is clearly represented by its level-order unary degree sequence (LOUDS), which is described by Jacobson (1989) and requires bits. Overall, this results in a space requirement of .

As has been shown, the problem can be reduced to linear space requirements and constant request time by breaking down the original RMQ into three sub-requests. The minimum over completely covered blocks is calculated with the help of the scheme from section 3.1, for in-block queries blocks are mapped onto Cartesian trees, the results of which, fully precalculated, do not require much more space for input than linear.

Applications

Two use cases for RMQs are presented below. Further applications are e.g. B. in Johannes Fischer and Heun (2007, p.3) to read.

Lowest Common Ancestor

Figure 6: Reduction from LCA to RMQ

An LCA request for a tree and two nodes delivers either (or ), if this is on the path from the root to (or ), or the node at which the common path to and ends.

As was first described by Gabow, Bentley, and Tarjan (1984), the LCA problem can be reduced linearly to the RMQ problem (and vice versa, see Bender and Farach-Colton (2000)). This is relevant, since LCA queries can thus also be solved in constant time and linear space requirements with regard to the input ( ), as shown for example in J. Fischer and Heun (2007).

The reduction includes a Euler tour (or an in-order Baumtraversal ) in time over the LCA-tree for imaging in an array , wherein for each element in a corresponding Traversalnummer in is stored by the descent to be decremented (or incremented during the ascent ), see Figure 6. The array linearly requires a lot of space because the number of edges of the original tree is limited by its nodes. The previously described precalculation for RMQs can now be carried out for the array . To solve the LCA problem, the following applies :, where denotes the position of the input nodes within .

Longest common prefix

In text indexing, RMQs can be used to find LCPs (Longest Common Prefix) in the pattern search (or Longest Common Suffix by RMQ on the flipped text), whereby the LCP calculates a common prefix of the suffixes that begin in at position and .

For this purpose, the LCP array of the suffix array is used and an inverse suffix array is calculated, which supplies the starting position in of the -th suffix in the suffix array . Based on these two structures the length of the LCP can now be calculated using the following rule in constant time: .

Individual evidence

  1. a b c d e Fischer, J. and V. Heun: Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays . In: SIAM J. Comput. 40 (apr) . 2011, pp. 465-492. doi : 10.1137 / 090779759 .
  2. ^ Gál, A. and Miltersen, P .: The Cell Probe Complexity of Succinct Data Structures . In: Automata, Languages ​​and Programming . 2003, pp. 190-190. doi : 10.1007 / 3-540-45061-0_28 .
  3. ^ Bender, MA, M. Farach-Colton, G. Pemmasani, S. Skiena, and P. Sumazin: Lowest common ancestors in trees and directed acyclic graphs . In: Journal of Algorithms . 57, No. 2, November 2005, pp. 75-94. ISSN  0196-6774 . doi : 10.1016 / j.jalgor.2005.08.001 .
  4. ^ Jacobson, G .: Space-efficient static trees and graphs . In: Foundations of Computer Science, 1989., 30th Annual Symposium . 1989, pp. 549-554. doi : 10.1109 / SFCS.1989.63533 .
  5. ^ A b Fischer, J. and V. Heun: A new succinct representation of RMQ information and improvements in the enhanced suffix array . In: Combinatorics, Algorithms, Probabilistic and Experimental Methodologies . 2007, pp. 459-470. doi : 10.1007 / 978-3-540-74450-4_41 .
  6. Gabow, HN, JL Bentley, and RE Tarjan: Scaling and related techniques for geometry problems . In: Proceedings of the sixteenth annual ACM symposium on Theory of Computing . 1984, pp. 135-143. doi : 10.1145 / 800057.808675 .
  7. ^ Bender, MA and M. Farach-Colton: The LCA Problem Revisited . In: LATIN 2000: Theoretical Informatics . 2000, pp. 88-94. doi : 10.1007 / 10719839_9 .
  8. ^ Fischer, J. and V. Heun: Theoretical and practical improvements on the RMQ problem, with applications to LCA and LCE . In: Combinatorial Pattern Matching . 2006, pp. 36-48. doi : 10.1007 / 11780441_5 .