Search tree

In computer science , a search tree is an abstract data structure in which the set of elements to be searched for is displayed in a tree structure . As an associative array , or a hash table , he realizes a finite function ( English : map ), in which a search key (from the finite set of definitions ) a data value (from the set of values is obtained). In the absence of a set of values, the tree implements an indicator function , i.e. it corresponds to a finite set ( English : set ).

For reasons of efficiency, most of the set has a total quasi-order , which can also be a total order . A left-right orientation in the tree corresponds to this order in the following way: "to the left" of a node there are no larger elements and "to the right" no smaller elements. This enables a principle called “divide and rule” in the form of “binary search” , which allows search times that are logarithmic in the number of search keys.

There are search trees in both static (unchangeable) and dynamic (changeable by inserting and deleting elements) characteristics, for which the tree structure is particularly suitable.

Operations

The characteristic operation is searching . Most of the other operations, such as insert , delete and traverse, are inherited from the underlying tree structure .

The search operation returns an element with a matching key or, if the key does not exist, the NULL element or another element that is closest in terms of total order .

The maximum effort for searching, i.e. H. the maximum number of required comparisons is proportional to the tree height in the present total order . A tree is called balanced if care is taken that the number of elements is always logarithmic . Without such precautions, the search tree can degenerate up to the unfavorable case that the search effort becomes proportional to . ${\ displaystyle h}$ ${\ displaystyle h}$ ${\ displaystyle n}$ ${\ displaystyle n}$

Special search trees

Binary search tree

A binary search tree is a node-based data structure in which each node contains a key and a maximum of two subtrees , the left and the right. All keys of the left subtree are smaller than the key of the node and those of the right subtree are larger. Each subtree is a binary search tree. Many variants have been developed to counteract degeneration.

The time complexity for the search in a binary search tree is in the worst case the height of the tree , which can be as small as for a tree with elements . ${\ displaystyle n}$ ${\ displaystyle {\ mathcal {O}} (\ log \ n)}$

The following search trees are not binary trees:

B-tree

B-trees are generalizations of binary search trees because they can have a variable number of subtrees at each node . While child nodes have a predefined area, they don't necessarily get populated with data, which means that B-trees can potentially waste some space. The advantage is that B-trees do not have to be rebalanced as often as other self-balancing trees.

Because of the variable range of their node length, B-trees are optimized for systems that read large blocks of data. They are also widely used in databases .

The time complexity for searching a B-tree is . ${\ displaystyle {\ mathcal {O}} (\ log \ n)}$

(a, b) tree

An (a, b) tree is a search tree in which all leaves have the same depth. Every node has at least one and at most children , while the root has at least 2 and at most children. ${\ displaystyle b}$ ${\ displaystyle b}$

${\ displaystyle a}$ and can be determined using the following formula: ${\ displaystyle b}$

{\ displaystyle 2 \ leq a \ leq {\ frac {(b + 1)} {2}}}

The time complexity for searching for an (a, b) -tree is . ${\ displaystyle {\ mathcal {O}} (\ log \ n)}$

Ternary search tree

A ternary search tree is a tree that can have 3 nodes : Each node stores a single character and the tree itself is arranged in the same way as a binary search tree , with the exception of a possible third node.

When searching a ternary search tree, a character string is passed to test whether a path contains it.

The time complexity for searching in a balanced ternary search tree is . ${\ displaystyle {\ mathcal {O}} (\ log \ n)}$

Other tree-based search structures

Although complexity theoretic information is to be understood asymptotically , it can best be transferred to practice if the entire data structure is in a uniform medium, e.g. B. the main memory (main memory) should be accommodated.

If, on the other hand, access to external media plays a role, further considerations come into play. In the context of the search trees, reference is made to the articles:

Storage complexity

The memory consumption of a search tree is - in addition to the user data - generally linear in the number of elements, i.e. in . ${\ displaystyle n}$ ${\ displaystyle {\ mathcal {O}} (n)}$

Runtime complexity

In the runtime , a distinction operations to search, insertion and deletion. In the case of the latter two, the runtime for positioning is not included in the table , as this is not only done via the search, but e.g. B. can also be done by traversing . Depending on the type of search tree, asymptotic mean maximum ( worst case ) runtimes are compared, the maximum being omitted if it matches the mean. ${\ displaystyle <}$

Runtime complexities
Search tree	Search (= tree height)	Traversing to the neighboring node	Insert	Clear
AVL tree	${\ displaystyle {\ mathcal {O}} (\ log n)}$	${\ displaystyle {\ mathcal {O}} (1) \! <\! {\ mathcal {O}} (\ log n)}$	${\ displaystyle {\ mathcal {O}} (1) \! <\! {\ mathcal {O}} (\ log n)}$	${\ displaystyle {\ mathcal {O}} (1) \! <\! {\ mathcal {O}} (\ log n)}$
Red-black tree	${\ displaystyle {\ mathcal {O}} (\ log n)}$	${\ displaystyle {\ mathcal {O}} (1) \! <\! {\ mathcal {O}} (\ log n)}$	${\ displaystyle {\ mathcal {O}} (1) \! <\! {\ mathcal {O}} (\ log n)}$	${\ displaystyle {\ mathcal {O}} (1) \! <\! {\ mathcal {O}} (\ log n)}$
2-3-4 tree	${\ displaystyle {\ mathcal {O}} (\ log n)}$		${\ displaystyle {\ mathcal {O}} (\ log n)}$	${\ displaystyle {\ mathcal {O}} (\ log n)}$
B-tree	${\ displaystyle {\ mathcal {O}} (\ log n)}$		${\ displaystyle {\ mathcal {O}} (\ log n)}$	${\ displaystyle {\ mathcal {O}} (\ log n)}$
Splay tree	${\ displaystyle {\ mathcal {O}} (\ log n)}$ ¹ ${\ displaystyle \! <\! {\ mathcal {O}} (n)}$	${\ displaystyle {\ mathcal {O}} (1) \! <\! {\ mathcal {O}} (n)}$	${\ displaystyle {\ mathcal {O}} (\ log n)}$ ¹ ${\ displaystyle \! <\! {\ mathcal {O}} (n)}$	${\ displaystyle {\ mathcal {O}} (\ log n)}$ ¹ ${\ displaystyle \! <\! {\ mathcal {O}} (n)}$
binary search tree	${\ displaystyle {\ mathcal {O}} (\ log n)}$ ² ${\ displaystyle \! <\! {\ mathcal {O}} (n)}$	${\ displaystyle {\ mathcal {O}} (1) \! <\! {\ mathcal {O}} (n)}$	${\ displaystyle {\ mathcal {O}} (1)}$	${\ displaystyle {\ mathcal {O}} (\ log n)}$ ² ${\ displaystyle \! <\! {\ mathcal {O}} (n)}$
¹ Depending on the access pattern of the application, splay trees can also have sub-logistic mean runtimes . ² Among the unbalanced binary search trees, there are trees that cannot be guaranteed. However, the information applies on average across all possible tree shapes. ${\ displaystyle {\ mathcal {O}} (\ log n)}$ ${\ displaystyle {\ mathcal {O}} (\ log n)}$

In addition to the aforementioned operations , others can be considered:

Access to special items, such as B. the smallest element
Merging search trees

Runtime measurements of some search algorithms based on realistic examples can be found at Ben Pfaff .

literature

Alexander Reinefeld: Spielbaum search procedure. (= Artificial Intelligence sub-series. Volume 200). Springer, 1989, ISBN 3-540-50742-6 .
A. Banzhaf, U. Boes, M. Kramer: Control of recognition processes by tree search methods. In: H. Burkhardt, KH Höhne, B. Neumann (ed.): Pattern recognition. (= Computer Science Reports. Volume 219). Springer, Berlin / Heidelberg 1989, ISBN 3-540-51748-0 .

Web links

Ben Pfaff: Performance Analysis of BSTs in System Software. (PDF; 316 kB). Stanford University , 2004. (English)

Individual evidence

^ Toal, Ray. "(a, b) Trees"

[1] Toal, Ray. "(a, b) Trees"