B + tree

from Wikipedia, the free encyclopedia

The B + tree is a data or index structure used in databases and file systems . It is an extension of the B-tree . In a B + tree, the actual data elements are only stored in the leaf nodes , while the inner nodes only contain keys. The keys in the directory pages are also known as separators.

A simple example of a B + tree that links data values ​​1-7. The red knots allow quick in-order traversal.

The B + tree is sometimes referred to as the B * tree for historical reasons . However, B * -tree also denotes a B-tree variant with a minimum filling level of 2/3 thanks to an improved split strategy.

The aim of this procedure is to improve the access times to the data elements. To do this, you have to reduce the tree height , which means that the degree of branching of the tree must increase. Since the maximum memory usage of a node is limited, moving the data to the leaves gives you more space for keys or branches in the inner nodes. This is especially true when storing complex objects that use significantly more memory than the keys or that do not have a fixed size. The reduced tree height also implies fewer internal nodes. This makes it easier to keep these in the main memory, which increases the performance in random access.

Another goal can be to improve the range search operation , in which all data is sequentially traversed in a certain key interval. If the data is only stored in the leaves, the next data record in the sequence does not have to be searched from the root again. So only the first key has to be searched for a complete run of the data, a large part of the tree is not read. In order to find successors and predecessors of a leaf node efficiently (i.e. in constant time), the leaves must be linked together in a doubly linked list . This feature is often included in the definition of the B + tree.

The main advantage of an external search tree (data only in the leaves) is the possibility of using secondary indices . They provide a further search tree - which can be sorted according to other criteria - on the same data.

structure

Each node consists of search keys and pointers. Sheet successors and (optionally) sheet predecessors are saved in each sheet. The remaining pointers in the leaf nodes each point to the data elements that are represented by the search key.

It is also possible to assign different sizes to middle nodes and root / leaf nodes. This is called a tree of the type , where the size of the central node and the size of the root and leaf nodes denote. The following applies .

Example:

  • All nodes have a maximum of 3 search key values ​​and a maximum of 4 pointers
  • The root has at least one value, i.e. 2 pointers
  • Inner nodes have at least 1 search key and 2 pointers
  • Leaves have at least 2 search keys and 3 pointers

regulate

  • Root: at least two pointers occupied
  • Middle node: minimum fill level pointer
  • Leaves: minimum fill level pointer (point to data blocks, pointers to neighboring nodes are not counted)

The following applies to middle and root nodes: The pointer that starts on the left below a search key leads to a node whose largest search key value is less than the search key value. The pointer on the right under a search key leads to a node whose smallest search key value is greater than or equal to the search key value.

Operations

Search

Example based on the graphic above:

  • Search for value "9": Start in the root node, value is greater than or equal to 5, so go right along the pointer, search through leaf, not found, search unsuccessful.
  • Search for value "2": Start in the root node, value is less than 3, so go left along the pointer and search through leaf, found, search successful.

Insert

There are basically two options for inserting new search keys:

  1. There is room in the sheet concerned.
  2. There is no place.

In the first case, the value can simply be entered in the sheet (see Search) .

In the second case, the value is added to the “virtual” sheet. Now there is an overfilling of the sheet. So it has to be shared. It should be noted that if the number of search keys in a sheet is odd, the left sub-sheet has one more search key value than the right (example: n = 4, insert 1, 3 values ​​on the left, 2 values ​​on the right). This can possibly cause a chain reaction that has to be propagated up the tree, since the middle and root nodes have to be adjusted. In this chain reaction, if a parent node is overcrowded, the middle element is promoted one level up. This repeats itself until there is enough space or the B + tree has to be expanded by one level (depth).

Clear

The value is searched for in the tree. If it is found, then the value is removed. Any changes must be propagated through the tree. It should be noted that underfilled nodes must be merged with others or keys of sibling nodes must be redistributed. There are quite different techniques for merging. It should be most common to merge the knot with its left neighbor (if there is no left neighbor, then the right one) and, if necessary, to divide it again according to the above rule in case of overcrowding.

advantages

  • The B + tree is always balanced
  • higher degree of branching than the B-tree, and thus a smaller directory size and depth
  • Ideally suited for area inquiries (in the database system), as all sheets are always sorted and linked by pointers so that they can be iterated easily
  • Reference keys do not necessarily correspond to a real key and therefore only have to be deleted in some cases when deleting a corresponding node.

variants

An important variant of the B + tree allows the use of keys and data with variable lengths. To do this, the concept of filling level has to be redefined, since larger keys naturally require more storage space than small keys. The goal of the tree remains to keep all pages at least half full, and the corresponding balancing operations continue to be carried out. In the case of deletion processes, however, the minimum filling level can be fallen below if a separator that is too large prevents the corresponding balancing operation. Only the minimum filling level can then be guaranteed for directory pages without complicated restructuring measures.

By using a varint coding, the degree of branching of a B + tree can usually be increased significantly if the implementation allows variable lengths.

With keys of variable length, a B + tree can also be used as a trie (prefix B + tree).

See also

literature

  • Alfons Kemper, André Eickler: Database systems. Oldenbourg Verlag, Munich 2009, ISBN 978-3-486-59018-0 , p. 218 (old edition)
  • Alfons Kemper, André Eickler: Database systems. Oldenbourg Verlag, Munich 2015, ISBN 978-3-11-044375-2 , p. 228 (10th edition)

Web links