Weighted binary search tree

A binary search tree with 2 nodes and weight information (red)

In computer science , a weighted binary search tree is an expression of the abstract data structure binary search tree , in which each node is assigned a weight (access probability) in addition to keys and other data . (For the sake of completeness, this is also assigned to its neighboring intervals.)

The weighted path length of the tree is to be optimized .

The weight is linked to the key, so allowing multiple objects ("duplicates") with the same key does not make sense.

If weights are not known at all or if they are practically the same, height-balanced trees are a good choice. One example is the AVL tree , which can be viewed as optimized for the weighted path length with unit weights.

Examples

If the tree is static, i.e. insert or delete operations are irrelevant, then the Bellman algorithm can be used , which constructs an optimal weighted binary search tree. Its efficiency is given even if the weights are only roughly known.

Geometric distribution

In the geometrical weight distribution for with applies . A binary tree is recursively structured as follows: The key that has the greatest remaining weight is made one of the two sons and the root of the next subtree. Since there is no longer a key on his side, the other son remains empty. Such a binary tree has the constant weighted path length , although it corresponds to a linear list. If the arrangement of the keys fits exactly to this binary tree (so that it is a binary search tree), then it is at optimal, because downgrading the root of a subtree worsens the mean value. There are then very rare search queries that are answered in linear time with the optimal weighted binary search tree. ${\ displaystyle p_ {i} = (1-q) q ^ {i}}$ ${\ displaystyle i = 0.1, ...}$ ${\ displaystyle 0 <q <1}$ ${\ displaystyle \ textstyle \ sum _ {i \ geqq 0} p_ {i} = 1}$ ${\ displaystyle \ textstyle \ sum _ {i \ geqq 0} (i + 1) p_ {i} = 1 / (1-q)}$ ${\ displaystyle q> {\ tfrac {1} {2}}}$

Natural vocabularies

In English, the probability of the -t most common word occurring is approximate ${\ displaystyle i}$

{\ displaystyle \ alpha _ {i} \ approx i ^ {- 1 {,} 12} / \ sum _ {i \ geqq 1} i ^ {- 1 {,} 12}}

.

The weighted path length of an optimal binary search tree for all English words is approximate . ${\ displaystyle 10 {,} 2}$

Dynamic weights

If inserting or removing operations are important, the weights must also be maintained in principle. In the borderline case even when searching, as this at least changes the access statistics.

Mehlhorn describes "almost optimal binary search trees".

In the case of the splay trees , despite a completely different approach, the nodes that are most frequently mentioned are also flushed near the roots.

Access distribution and weighted path length

Fig. 4: (Optimal) binary search tree with weights (red).

Let be a set of keys from a totally (quasi) ordered reservoir of keys, let further be or frequencies with which the element (or equivalence class or interval) is accessed, whereby for resp. for . (Let and additional non- belonging elements with the usual meaning.) The - tuple ${\ displaystyle X: = \ left \ {x_ {1} <x_ {2} <... <x_ {n} \ right \}}$ ${\ displaystyle R}$ ${\ displaystyle p_ {i}}$ ${\ displaystyle q_ {j}}$ ${\ displaystyle x \ in R}$ ${\ displaystyle x \ in {\ overline {x_ {i}}}}$ ${\ displaystyle 1 \ leqq i \ leqq n}$ ${\ displaystyle x_ {j} <x <x_ {j + 1}}$ ${\ displaystyle 0 \ leqq j \ leqq n}$ ${\ displaystyle x_ {0}: = - \ infty}$ ${\ displaystyle x_ {n + 1}: = + \ infty}$ ${\ displaystyle R}$ ${\ displaystyle (2n + 1)}$

{\ displaystyle {\ mathfrak {z}}: = \ left ({\ begin {smallmatrix} & p_ {1} && p_ {2} && \ cdots && p_ {n} & \\ q_ {0} && q_ {1} && q_ {2 } && \ cdots && q_ {n} \ end {smallmatrix}} \ right)}

means access distribution for the crowd when everyone is. becomes the access probability distribution if is. ${\ displaystyle X}$ ${\ displaystyle p_ {i}, q_ {j} \ geqq 0}$ ${\ displaystyle {\ mathfrak {z}}}$ ${\ displaystyle \ textstyle \ sum p_ {i} + \ sum q_ {j} = 1}$

Let us now be a search tree for the set with an access distribution , and let the depth of the (inner) node and the depth of the leaf be (see Fig. 4; binary search tree terminology in Fig. 1B). We consider the search for an element . If so , we compare it to elements in the tree. If so , we compare it to elements in the tree. So is ${\ displaystyle T}$ ${\ displaystyle X}$ ${\ displaystyle {\ mathfrak {z}}}$ ${\ displaystyle a_ {i} ^ {T}}$ ${\ displaystyle x_ {i}}$ ${\ displaystyle b_ {j} ^ {T}}$ ${\ displaystyle (x_ {j}, x_ {j + 1})}$ ${\ displaystyle x \ in R}$ ${\ displaystyle x = x_ {i}}$ ${\ displaystyle x}$ ${\ displaystyle a_ {i} ^ {T} +1}$ ${\ displaystyle x_ {j} <x <x_ {j + 1}}$ ${\ displaystyle x}$ ${\ displaystyle b_ {j} ^ {T}}$

{\ displaystyle S _ {\ mathfrak {z}} ^ {T}: = \ sum _ {i = 1} ^ {n} p_ {i} (a_ {i} ^ {T} +1) + \ sum _ { j = 0} ^ {n} q_ {j} b_ {j} ^ {T}}

the weighted path length sum of the tree (with the access distribution ) ; is a probability distribution, then is the weighted path length , the weighted search depth, or the mean number of comparisons required. Fig. 4 shows a search tree which , with a value of, is optimal for the access distribution . ${\ displaystyle {\ mathfrak {z}}}$ ${\ displaystyle T}$ ${\ displaystyle {\ mathfrak {z}}}$ ${\ displaystyle S _ {\ mathfrak {z}} ^ {T}}$ ${\ displaystyle T}$ ${\ displaystyle S _ {\ mathfrak {z}} ^ {T} = 2}$ ${\ displaystyle {\ mathfrak {z}}: = {\ tfrac {1} {24}} \ left ({\ begin {smallmatrix} & 1 && 3 && 3 && 0 & \\ 4 && 0 && 0 && 3 && 10 \ end {smallmatrix}} \ right)}$

literature

Kurt Mehlhorn: data structures and efficient algorithms. Teubner, Stuttgart 1988, ISBN 3-519-12255-3 .

Individual evidence

↑ after #Mehlhorn a. a. OS 147

[1] ter #Mehlhorn a. a. OS 147