R tree

An example of an R-tree

3-dimensional R-tree with cube-shaped sides, visualized by ELKI

An R-tree ( English R-tree ) is an in database systems dynamic used multi-dimensional (spatial) index structure . Similar to a B-tree , this is a balanced index structure.

Usage

An R-tree allows a quick search in multidimensional, extended objects. In addition to simple points, this also includes polygons in two-dimensional space and geometric bodies in three-dimensional space. However, high-dimensional objects such as those found in mathematics or bioinformatics are also typical . In this context, high-dimensional means that the number of independent search criteria is in the range of 100–1,000,000. However, empirical values say that an R-tree only works well up to around 10 dimensions. However, this depends on the specific data, and there are R-tree variants that contain optimizations.

An R-tree can efficiently answer range queries (i.e., rectangular or interval queries). The evaluation of next-neighbor queries is also efficiently possible for geometric distances that can be limited with rectangles. An R-tree is dynamic; that is, it can be updated efficiently when changes are made.

A typical area of application for an R-tree index are geodatabases . Here z. B. street data, border areas or building data are managed. These are saved as polygons in the database. The R-tree later allows the efficient location of specific polygons based on the location coordinates . Since rectangular queries (e.g. map sections) are typical here, an R-tree is well suited for this.

The R-tree has fewer advantages for subspace queries (queries in which not all dimensions are specified - very large query areas arise here) or for non-geometric distances (which cannot be translated into rectangular areas). Other index structures are more suitable here.

realization

The principle of an R-tree as a spatial index structure was introduced by Antonin Guttman. Similar to a B + tree , leaf nodes of an R tree contain the spatial data to be indexed. In contrast to this, however, it does not contain any separating keys, but instead saves rectangular data regions ( minimally surrounding rectangles ) in the index nodes , which include all the data regions below in the subtree. Data points, rectangular areas or polygons can be stored in the data pages. The latter are often approximated by a minimally surrounding rectangle.

The storage of a rectangle is realized by specifying intervals for all its dimensions. With the help of an R-tree, point or abstention queries such as “is point P contained in figure F?”, “Is figure F1 contained in figure F2?” Or “which figures are contained in figure F?” Can be answered. The approximation used (minimally surrounding rectangles) in the leaf nodes can lead to false hits, which must be resolved by checking the query on the specific objects. In the case of polygons, the calculation on the rectangles is called a “filter step”, and the test with the exact polygon is called the “refinement step”. The actual index structure only provides candidates here.

R * tree

A popular R-tree variation is the R * tree by Norbert Beckmann, Hans-Peter Kriegel , Ralf Schneider and Bernhard Seeger. This variant tries to minimize the overlapping of rectangular regions by means of a further developed split strategy. As a result, fewer subtrees usually need to be searched for a search query, and the queries to the tree are therefore more efficient. In addition, if a page overflows, elements can be re- inserted into the tree , which can avoid a split (and the tree's height that may increase as a result). As a result, a higher degree of filling is achieved and thereby also improved efficiency. The resulting tree is always an R-tree, too, and the query strategy is unchanged.

R + tree

In the R-tree and R * -tree, the search spaces could overlap. This is not allowed in the R + tree (the search spaces are disjoint here). Enclosing rectangles are "separated" at the search area boundaries if necessary. Objects can also be "cut up" so that part of the object is in one search area and another part of the object is in the other search area. In such cases, objects have to be stored several times in the tree. R + trees can be easily precalculated for static data (especially for point data, where such a slicing is not necessary). However, when changes are made, it becomes more difficult to ensure disjointness.

Problems with R-trees

Like any index structure, R-trees represent a compromise. The management and updating of the tree requires additional computations, and additional memory is required for the index pages. Conversely, the tree can often find the data records you are looking for more quickly (with suitable queries and data). If you accept increased memory consumption and increased computing time for changes, you can often achieve better performance in search queries. An R + tree will usually require more memory than an R or R * tree. It delivers better search performance, but it takes more effort to make changes.

R * -trees are a popular choice because they do not require additional memory (compared to an R-tree; in contrast to the R + -tree, the objects are saved multiple times if necessary), and their implementation and computation costs are not significantly higher than with an R. -Tree. More precisely, an R-tree in memory is also always an R * -tree, they only differ in the insert and split strategy.

R-trees are sequence-dependent, an important optimization measure are therefore “bulk loads”, in which the tree is not built up by inserting it element by element, but rather an attempt is made to construct an improved tree directly. To this end, the elements are often sorted first and then the tree is constructed from the bottom up . If an R-tree is filled element by element in an unfavorable order, it can have a (geometrically) unbalanced structure. An optimized R-tree has approximately the same depth, but its regions overlap less and therefore cover the data space more evenly.

In higher dimensions the so-called curse of dimensionality occurs. In the case of the R tree, this means that the resulting rectangular regions overlap to ever larger parts, which means that the search has to process ever larger parts of the tree. This reduces the performance of the index structure. The X-tree by Stefan Berchtold, Daniel A. Keim and Hans-Peter Kriegel represents an important variation of the R-tree concept for higher dimensionalities. An important measure here are so-called “supernodes”, which are enlarged pages from X-tree can be used to avoid unfavorable splits (with a high overlap). In extreme cases, it can degenerate into a linear file, which can be desirable.

While an R-Tree is designed as a dynamic index structure, systematic insert or delete operations can degrade its performance. In such cases it can make sense to rebuild the tree from time to time using a bulk load in order to optimize it. There are also incremental optimization strategies (such as the re-inserts in the R * tree) with which a database system can improve the index during idle times.

Availability

A reference implementation of the R * -tree is available in the software package ELKI of the chair of Professor Hans-Peter Kriegel , including implementations of an M-tree and other comparison methods.

An R * tree can also be found, for example, in the database systems SQLite (for 2–5 dimensions) and Oracle (for 2–4 dimensions).

With PostgreSQL , the index type “rtree” can be used, there is an alternative implementation called “rtree_gist” via the generalized search tree interface.

Web links

Splitting Rules (English)
R-Tree Demonstration ( Java applet )

Individual evidence

^ A. Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching . In: Proc. ACM SIGMOD International Conference on Management of Data . 1984, doi : 10.1145 / 602259.602266 ( PDF ).
^ N. Beckmann, Hans-Peter Kriegel , R. Schneider, B. Seeger: The R * -Tree: An Efficient and Robust Access Method for Points and Rectangles . In: Proc. 1990 ACM SIGMOD International Conference on Management of Data . 1990, p. 322–331 , doi : 10.1145 / 93597.98741 ( PDF ).
^ S. Berchtold, DA Keim, Hans-Peter Kriegel : The X-Tree: An Index Structure for High-Dimensional Data . In: VLDB Conference . 1996, p. 28-39 ( PS ).

[1] A. Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching . In: Proc. ACM SIGMOD International Conference on Management of Data . 1984, doi : 10.1145 / 602259.602266 ( PDF ).

[2] N. Beckmann, Hans-Peter Kriegel , R. Schneider, B. Seeger: The R * -Tree: An Efficient and Robust Access Method for Points and Rectangles . In: Proc. 1990 ACM SIGMOD International Conference on Management of Data . 1990, p. 322–331 , doi : 10.1145 / 93597.98741 ( PDF ).

[3] S. Berchtold, DA Keim, Hans-Peter Kriegel : The X-Tree: An Index Structure for High-Dimensional Data . In: VLDB Conference . 1996, p. 28-39 ( PS ).