AVL tree

from Wikipedia, the free encyclopedia
Fig. 1: AVL tree with balance factors (green)
AVL tree
complexity
space
surgery on average Worst case
Search
Cross step
Min, max
Insert
Clear
Concatenate
columns
 Node in the tree
without navigation
Space and time complexities

The AVL tree is a data structure in computer science . It is a binary search tree with the additional property that the height of the two subtrees differs by at most one at each node . This property only lets its height grow logarithmically with the number of keys and makes it a balanced binary search tree. The maximum (and average) number of steps (comparisons) required to determine the presence or absence of a key is directly related to the height. Furthermore, the maximum isEffort for operations to insert and remove a key proportional to the height of the tree and thus also logarithmically in the number of keys; the average effort is even constant if positioning on the target element is not included.

Many operations, especially the navigation operations, can be taken directly from the binary search trees. In the case of the modifying operations, however, the AVL criterion must be observed, which in any case involves small adjustments that can range up to height corrections through so-called rotations .

The AVL tree is named after the Soviet mathematicians Georgi Maximowitsch A delson- V elski and Yevgeny Mikhailovich L andis who presented the data structure in the 1962nd This makes the AVL tree the oldest data structure for balanced trees.

motivation

Search trees - and thus also the AVL trees - are solutions to the so-called "dictionary problem". A basic explanation can be found in the section Motivation in the article Binary Search Tree .

What the search trees have in common is that they support dynamism, which means that inserting and / or deleting elements is important. Since these operations run the risk of losing the balance of the tree, various balance criteria for binary trees have been developed. For most of them, the effort for searching, inserting and deleting is logarithmic, at least on average, albeit with different constant factors.

With the AVL tree, the balance is defined by the height, it is a height-balanced binary search tree .

The AVL criterion

As the balance factor of a node resp. Sub-tree in a binary tree is called the height difference

,

where is the height of the tree , and the left, right child tree of . A knot with is referred to as being at the same height or balanced, one with as left and one with as right-heavy.

A binary (search) tree is an AVL tree if and only if the AVL condition

is observed at every node .

properties

This definition limits the height of an AVL tree with nodes (keys) by the inequality

With , and ( golden section number ).

The lower bound comes from the complete binary tree (in which all levels are complete except for the lowest level); the upper one of the Fibonacci tree , which at a given height represents an AVL tree with the smallest number of nodes - i.e. has the greatest height with the same number of nodes - is therefore the worst balanced (in terms of height). The height is averaged over all random insertions in an AVL tree of a given size, closer to the lower than the upper limit of the interval.

Data structure

If operations for access and administration are added to the data structure AVL tree, then these are only regarded as belonging if they maintain the AVL property. In this way, the data structure is expanded into an abstract data type (ADT) . In the case of search trees, there is also characterization as a self-balancing tree in English .

Operations

The most important operations with the search trees - and thus with the AVL tree - are: Search on the one hand and insert and delete on the other. With the property that all of these operations in the worst case ( worst case ) logarithmic complexity have the AVL tree is one of the height-balanced binary search trees.

Navigate

The navigation operations are the various search operationsTraversing and iterating ,Searching for the first or last element and the like, leave the tree unchanged and in principle work on any binary search tree . The information given there on complexity also applies to AVL trees, with the specification that the height of the AVL tree is logarithmically related to the number of nodes.

The Search (English: find , search , lookup or locate ) of an element based on its key is the most important of the navigation operations. The height balancing of the AVL tree tries to optimize for this operation. It enables a so-called direct access (in contrast to the sequential access of the traversal ). It is usually used as a preceding operation for both insert and delete.

Searching requires a total quasi-order on the set of keys that is most flexibly provided by a comparison function .

The mean runtime behavior of the search in a binary search tree is represented by the path length sum , which divided by the number of nodes defines the mean search depth. In the AVL tree, this is proportional to the optimal search depth - with a proportionality factor only slightly greater than 1.

Insert (register)

It is assumed that the navigation to the insertion point has already taken place. An insertion point is an external leaf and lies on the far left or on the right or between 2 internal (and key-bearing) nodes, so it can be specified in any case by an (internal) node together with a direction (left or right). The nodes on the far left or on the far right are always (half) leaves, just as at least one of 2 neighboring nodes is a (half) leaf. Such a node - together with the corresponding direction - is called the immediate insertion point. This is also how it is returned by an unsuccessful search operation.

For insertion (English: insert or add ), the node with the new key is attached as a leaf at the insertion point - in other words: the external leaf becomes the internal leaf. The height of the subtree consisting of this leaf increases from 0 to 1. There are 2 different approaches to repairing the tree above the insertion point:

  1. You go back the path of the parent node to u. U. to the root ("retracing" in English literature). Here a stack memory (with recursive programming usually the program stack memory) must have been filled with the nodes in the path beforehand.
  2. When looking for a (then the last, deepest) unbalanced (left- or right-heavy) knot in the descent, you have noted (“top-down strategy”). The final repair is then a second descent from this parent node. The comparisons can be repeated for this section or the comparison results have been saved beforehand in a stack memory (with one bit per entry).

Here we show the "simpler" variant, which corresponds to the sequence in recursive programming. When ascending to the parent node, and later with every further ascent, there are 3 options for the new (temporary) balance factor according to the 3 values ​​of the original balance factor of this node:

  1. If the balance factor becomes 0, then you come from a child tree that was previously lower, and the height of the node does not change - with the result that above all balance factors can remain as they are, and the AVL- The criterion for the whole tree is met.
  2. If the balance factor becomes ± 1 (it must have been 0 beforehand), the height of the subtree increases by 1 and the checking of the balance factors above must continue.
  3. If the balance factor becomes ± 2 at one level, the subtree must be rebalanced (see rebalancing below). After that, during an insert operation, the subtree has the same height as before - with the result that above all balance factors can remain as they are, and the AVL criterion for the entire tree is already met.

Inserting the -th node into an AVL tree with nodes has a logarithmic effort in the worst case (with or without searching), for example if every level has to be checked up to the root. However, since the probability of this decreases exponentially from level to level upwards, the pure modification effort (changing balance factors and rotations) is constant on average when inserting. It can even be shown that the modification effort is amortized constant in a pure insertion scenario .

Code snippets  

When the new node Z is inserted as a child of X, the height of this subtree Z increases from 0 to 1.

Loop invariant at insertion

The height of the subtree Z increases by 1. It is in AVL form.

 // Schleife vom Blatt bis möglicherweise zur Wurzel:
 for (X = Z->parent; X != null; X = Z->parent) {
     if (Z == X->childR) { // Der rechte Teilbaum ist es, der höher wird.
         if (X->BF <= 0) {    // X ist nicht rechtslastig
             if (X->BF < 0) { // X ist linkslastig
                 X->BF = 0; // Z’s Höhenzunahme aufgefangen bei X
                 break; // Verlassen der Schleife
             }
             X->BF = +1;
             Z = X; // Height(Z) wird höher um 1
             continue;
         }
         else {              // X ist rechtslastig
             // ===> X->BF temporär == +2
             // ===> Rebalancierung erforderlich
             G = X->parent; // Retten X->parent vor Rotation
             if (Z->BF < 0)                 // Rechts-Links-Situation   (s. Abbildung 3)
                 N = rotate_RightLeft(X,Z); // Doppelrotation: Right(Z) dann Left(X)
             else                           // Rechts-Rechts-Situation  (s. Abbildung 2)
                 N = rotate_Left(X,Z);      // Einfachrotation Left(X)
         }
     }
     else { // Z == X->childL: der linke Teilbaum wird höher
         if (X->BF >= 0) {    // X ist nicht linkslastig
             if (X->BF > 0) { // X ist rechtslastig
                 X->BF = 0; // Z’s Höhenzunahme aufgefangen bei X
                 break; // Verlassen der Schleife
             }
             X->BF = 1;
             Z = X; // Height(Z) wird höher um 1
             continue;
         }
         else {               // X ist linkslastig
             // ===> X->BF temporär == –2
             // ===> Rebalancierung erforderlich
             G = X->parent; // Retten X->parent vor Rotation
             if (Z->BF > 0)      // Links-Rechts-Situation
                 N = rotate_LeftRight(X,Z); // Doppelrotation: Left(Z) dann Right(X)
             else                           // Links-Links-Situation
                 N = rotate_Right(X,Z);     // Einfachrotation Right(X)
         }
     }
     // Nach der Rotation die Anpassung der Verknüpfung N<->G:
     // N ist die neue Wurzel des rebalancierten Teilbaums
     // Höhe ändert sich nicht: Height(N) == alte Height(X)
     N->parent = G;
     if (G != null) {
         if (X == G->childL)
             G->childL = N;
         else
             G->childR = N;
     }
     else
         tree->root = N; // N ist die neue Wurzel des ganzen Baums
     break;

     // Da ist kein fall thru, nur break; oder continue;
 }
 // Wird die Schleife nicht durch ein break; beendet,
 //   dann wird der ganze Baum um 1 höher.

Delete (unsubscribe, remove)

You can navigate to the node to be deleted by means of a search, but also by means of a cross step. When deleting (English: delete or remove ), more cases are to be distinguished than when inserting (see binary search tree ). The cases where the node to be deleted is a (half) leaf are simple. But if he has two children, the two subtrees that become free must be hung up again. To do this, you choose one of the in-order neighbors, i.e. either the rightmost node of the left child tree or the leftmost node of the right child tree, as a substitute parent in order to hang the two subtrees on it again. The descent required for this is a maximum of as many steps as the height is, and an average of exactly one. The replacement node is latched in the hierarchy of the tree at the position of the deleted node, but must itself be removed from its sub-tree of origin - this is easy because it is a (half) leaf.

If a leaf, that is a subtree consisting of 1 node, is removed, its height decreases from 1 to 0, and if a leaf moves in the place of a half-leaf, its height decreases from 2 to 1. All changes in height must be made be reflected at least in the AVL balance factors. When ascending to the parent node (and later with every further ascent) there are 3 options for the new (temporary) balance factor according to the 3 values ​​of the original balance factor of this node:

  1. If the balance factor becomes ± 1 (it was previously 0), then the height does not change - with the result that the balance factors can remain above as they are and the AVL criterion is met for the entire tree .
  2. If the balance factor becomes 0, the height of the subtree is reduced by 1 and the checking of the balance factors above must continue.
  3. If the balance factor becomes ± 2, the subtree must be rebalanced (see rebalancing below). Afterwards, during the deletion operation, the new subtree can have a height that is 1 lower than before - with the result that further checks must be carried out and, if necessary, rebalanced.
    However, it also happens that the height is the same as before deletion, so that the balance factors can remain above as they are, and the AVL criterion for the entire tree has already been met.

In the worst case, the effort for deleting is logarithmic (with or without searching); on average, however - for example, if finding the deletion candidate does not have to be included because it can be done otherwise - it is constant, since the probability of having to check the balance at the next higher level decreases exponentially upwards.

Code snippets  

When a node N is deleted as a child of X, the corresponding subtree goes from X from level 1 to 0 or from level 2 to 1 if N still had an (inner) child.

Loop invariant upon deletion

The height of the subtree N is reduced by 1. It is in AVL form.

 // Schleife vom Blatt bis möglicherweise zur Wurzel:
 for (X = N->parent; X != null; X = G) {
     G = X->parent; // Retten X->parent vor Rotation
     if (N == X->childL) { // Der linke Teilbaum ist es, der niedriger wird.
         if (X->BF <= 0) {    // X ist nicht rechtslastig
             if (X->BF < 0) { // X ist linkslastig
                 X->BF = 0;
                 N = X;       // Height(N) wird niedriger um 1
                 continue;
             } // X->BF == 0
             X->BF = +1;      // N’s Höhenabnahme aufgefangen bei X
             break; // Verlassen der Schleife
         }
         else { // X ist rechtslastig
             // ===> X->BF temporär == +2
             // ===> Rebalancierung erforderlich
             Z = X->childR; // das Geschwister von N (höher um 2)
             b = Z->BF;
             if (b < 0)                     // Rechts-Links-Situation   (s. Abbildung 3)
                 N = rotate_RightLeft(X,Z); // Doppelrotation: Right(Z) dann Left(X)
             else                           // Rechts-Rechts-Situation  (s. Abbildung 2)
                 N = rotate_Left(X,Z);      // Einfachrotation Left(X)
         }
     }
     else { // (N == X->childR): Der rechte Teilbaum wird niedriger
         if (X->BF >= 0) {    // X ist nicht linkslastig
             if (X->BF > 0) { // X ist rechtslastig
                 X->BF = 0;
                 N = X;       // Height(N) wird niedriger um 1
                 continue;
             } // X->BF == 0
             X->BF = 1;      // N’s Höhenabnahme aufgefangen bei X
             break; // Verlassen der Schleife
         }
         else { // X ist linkslastig
             // ===> X->BF temporär == –2
             // ===> Rebalancierung erforderlich
             Z = X->childL; // das Geschwister von N (höher um 2)
             b = Z->BF;
             if (b > 0)                     // Links-Rechts-Situation
                 N = rotate_LeftRight(X,Z); // Doppelrotation: Left(Z) dann Right(X)
                else                        // Links-Links-Situation
                 N = rotate_Right(X,Z);     // Einfachrotation Right(X)
         }
     }
     // Nach der Rotation die Anpassung der Verknüpfung N<->G:
     // N ist die neue Wurzel des rebalancierten Teilbaums
     N->parent = G;
     if (G != null) {
         if (X == G->childL)
             G->childL = N;
         else
             G->childR = N;
     }
     else
         tree->root = N; // N ist die neue Wurzel des ganzen Baums
 
     if (b == 0) // der in Abbildung 2 blass gehaltene Fall
         break;  // Die Höhe ändert sich nicht: Verlassen der Schleife.
 
     // Height(N) wird niedriger um 1 (== alte Height(X)-1)
 }
 // Wird die Schleife durch (N->parent == null) beendet,
 //   dann verringert sich die Höhe des ganzen Baums um 1.

Rebalancing

Fig. 2: Single rotation
left ( X )

If an operation results in a height difference of more than 1 between two sibling subtrees, the AVL criterion is violated for the parent node. A corresponding correction is called "rebalancing". So-called rotations are suitable as tools for this .

For insertions and deletions, where the temporary height difference never exceeds 2, two variants are required: single and double rotation. A single rotation provides the rebalancing when the inner child of the sibling higher by 2 ( Z in the two figures 2 and 3), that is the child with a child direction that is opposite to that of Z , ( t 23 in figure 2 or Y in figure 3) is not higher than its sibling, the outer child ( t 4 in both figures). This case is referred to in the literature as a right-right (or mirrored left-left) situation, since X is right and Z is not left, i.e. the two balance factors +2 and +1 (when deleting also 0 ) are, in short: the balance has the same direction twice in a row. The other case, in which the inner child is higher , is covered by a double rotation - called a right-left (or left-right) situation in the literature, since X is right and Z is left, i.e. the two balance -Factors +2 and -1 are, in short: the balance changes direction.

The keys only move “vertically” (within the vertical grid) during rotation. The in-order order of the nodes, which represents the sorting order (“horizontal”), is therefore completely retained.

A rotation involves only a constant number of link changes at a constant number of nodes.

Single rotation

In the original it is called Малое вращение (little twist) .

Figure 2 above shows a right-right situation. In the upper half, the two subtrees Z and t 1 of the node X have a height difference of +2 . The inner child t 23 of the 2 higher node Z is not higher than its sibling t 4 .
This can occur after an insertion into the subtree t 4 or after a deletion from the subtree t 1 . The case, kept pale in Figure 2, that t 23 is the same as t 4 , only occurs during deletion.

The rebalancing (shown in the lower half of the figure) is achieved by a single rotation to the left. The 3 links to be adapted are shown in the illustration with a stronger emphasis, and the balance factors change at both nodes X and Z.

When inserted, the height of the new subtree Z is the same as that of X before the operation. This is the same with a deletion if Z was not right-centric. With a right-hand Z , however, the height is reduced by 1, and the check of the balance factors above must continue.

The mirrored, the left-left situation is treated by a single rotation to the right.

Code example rotate_Left  
Input: X = root of the subtree to be rotated to the left
Z = his right child, Not left-leaning
    with height
Return value: the new root of the rebalanced subtree
 node* rotate_Left(node* X,node* Z) {
     // Z is um 2 höher als sein Geschwister
     t23 = Z->childL; // das innere Kind von Z
     X->childR = t23;
     if (t23 != null)
         parent(t23) = X;

     Z->childL = X;
     X->parent = Z;

     // Kommt bei einer Einfügung nicht vor:
     if (Z->BF == 0) { // t23 war gleich hoch wie t4
         X->BF = +1;   // t23 jetzt höher
         Z->BF = 1;   // t4  jetzt niedriger als X
     } else
     // Ende: Nicht bei Einfügung
     {
         X->BF = 0;
         Z->BF = 0;
     }

     return Z; // return neue Wurzel des rebalancierten Teilbaums
 }

Double rotation

Fig. 3: Double
rotation right-left ( X , Z ) = right ( Z ) + left ( X )

In the original it is called Большое вращение (great twist) .

The situation shown in Figure 3 differs from that in Figure 2 in that the middle child tree (with root Y ), that is the inner child of node Z , which is 2 higher , is higher than its sibling t 4 : a right-left -Situation, since X is right and Z is left.
This can happen after the insertion of the node Y or an insertion into one of the subtrees t 2 or t 3 or after a deletion from the subtree t 1 . The case in which the subtrees t 2 and t 3 are the same height and the subtree Y is higher than t 4 only occurs during deletion.

The double rotation, the result of which is shown in the lower third of the figure, is a right rotation through Z (against the left-leaning of Z , middle third) followed by a left-hand rotation through X (against the right- leaning of X ). It raises the higher (and inner) child Y from Z twice .

The 5 links to be adjusted are shown in the illustration with a greater emphasis, and the balance factors change for all 3 nodes X , Y and Z.

The height of the new subtree Y after an insertion is the same as that of X before the operation. In the event of a deletion, however, the amount is reduced by 1, with the result that above this the check of the balance factors must continue.

In a left-right situation, the mirrored version, i.e. a left rotation followed by a right rotation, is required.

Code example rotate_RightLeft  
Input: X = root of the subtree to be rotated
Z = his right child, left-leaning
    with height
Return value: the new root of the rebalanced subtree
 node* rotate_RightLeft(node* X,node* Z) {
     // Z is um 2 höher als das Geschwister
     Y = Z->childL; // das innere Kind von Z
     // Y is um 1 höher als das Geschwister
     t3 = Y->childR;
     Z->childL = t3;
     if (t3 != null)
         t3->parent = Z;
     Y->childR = Z;
     Z->parent = Y;
     t2 = Y->childL;
     X->childR = t2;
     if (t2 != null)
         t2->parent = X;
     Y->childL = X;
     X->parent = Y;

     if (Y->BF > 0) {  // t3 war höher
         X->BF = 1;   // t1 jetzt höher
         Z->BF = 0;
     } else
     if (Y->BF == 0) { // t2 und t3 gleich hoch
         X->BF = 0;
         Z->BF = 0;
     } else
     {                 // t2 war höher
         X->BF = 0;
         Z->BF = +1;   // t4 jetzt höher
     }
     Y->BF = 0;

     return Y; // return neue Wurzel des rebalancierten Teilbaums
 }

example

The AVL tree in Figure 1 becomes

  • after deleting the node »G« through the two single rotations right (»F«) , later left (»J«)
  • after inserting a node »T« through the double rotation LeftRight (»V«, »S«)

rebalanced.

Operations with whole AVL trees

The following two operations have entire AVL trees as input and output operands. They are not part of the standard set and are missing in some implementations. However, it should be shown here that they can also be carried out with a logarithmic effort.

Concatenate

Fig. 4: Linking of 2 AVL trees

Two AVL trees can be linked (concatenated) with logarithmic effort (English: concat or just cat ). For the sort sequence, of course, all keys in the first tree must precede all keys in the second.

algorithm  

You climb on the right edge of the first tree (see Figure 4 - the gray arrows show the way through the graph) and on the left of the second tree down to the leaves and note the nodes on the paths together with their heights.
Without loss of generality, let the first tree be the higher (as in the figure). Its first element H is taken from the second tree in the AVL manner. It then plays the role of a “link”. The height of the second tree is now h (possibly 0). Now look for a node G of height h +1 or h in the right flank of the first tree (there must be one of the two; if there are both, the higher one is preferred). One makes G the child of the link H and the second tree the second child of H , which results in an AVL-conforming balance factor of 0 or −1 for H. Then H is attached to parent F by G in place of G as a child. The difference in height between the new H and E is between 0 and +2 (if E is height h -1, then G is height h , and the new H is height h +1). After an initial adjustment with a possible (double) rotation, the balance factors must be checked in ascending order as with an insertion and corrected if necessary.

The height of the tree can increase by 1.

columns

Fig. 5: Columns of a binary tree.
Thick red dashed: dividing line, specified by ( node , direction ) : child-parent relationship to be resolved between
I n and H n + 1 : path without crossing the dividing line (from H n to I n =: A ) : path meets the dividing line twice (from I n to H n + 2 )


Somewhat more complicated than concatenating the splitting is (English: split ) of an AVL tree into two separate AVL trees to an external node, that is a position between two internal nodes (keys) serving as a pair ( node , direction ) including the path given to the root. To the left of this is the left partition with the smaller keys and to the right of it the right one with the larger ones. The dividing line defined in this way (thick red dashed in Figure 5) cuts the edges of the tree on the path from the node to the root, so that a forest of "snippets" results on the left and right .

Each of the two forests defined in this way can be combined with logarithmic effort to form an AVL tree in such a way that the result of a subsequent concatenation of these two AVL trees is equivalent to the original tree in terms of entries and their order.

algorithm  

The snippets are trees that are in AVL form with the exception of a possible knot (“stump”) ( H in Figures 5 and 6) on a high level with an output level of 1 (not necessarily the root of the snippet). This acts as a link when linking with the next lower snippet on the same page.

In the following, the indexing of the nodes is chosen so that there are only even-numbered indices on one side of the dividing line and only odd-numbered indices on the other. The indices grow towards the root as they rise.

Perform full induction in ascending order on either side of the dividing line

Induction start (Figure 5):

If the given node (defining the dividing line) has a child in the given direction , then this child is on the other side of the dividing line, is in AVL form and is called I 1 . The given node (now a stub) is called H 2 . H 2 is to be concatenated with a snippet I 0 , which in this case is empty.

If the given node has no child in the given direction , then let H 2 be its lowest ancestor in that direction (i.e. on the other side of the dividing line). Let I 1 be its child on the side of the given node s - it is in AVL form. H 2 is a stub that is to be concatenated with a snippet I 0 , which in this case is empty.

So I 0 (the empty tree) is the lowest snippet on one side and I 1 on the other. Both are in AVL form. The stub H 2 is the original parent of I 1 on the opposite side of the dividing line and the lowest ancestor of I 0 on the same side.

Fig. 6: An induction step when splitting an AVL tree

Induction step (see Figures 5 and 6):

In the n th induction step, the lowest snippet is called I n . Although the side changes with each induction step, for the sake of simplicity of explanation it is assumed that I n like I in Figure 6 is to the left of the dividing line.

According to the induction hypothesis, I is an AVL tree. Its lowest ancestor on the left is the stub H (=: H n + 2 ). (Its right child is on the other side of the dividing line, is already in AVL form and has the name I n +1 .) The original height of H is h (Figure 6 above). Cut up parents' edges (dashed black with arrow) and the remaining stumps can be recognized by the change in child direction when climbing the path. The (original) heights are to be recorded very precisely using the balance factors. The subtree of the individual child D of H is compared with the snippet I using H as a link concatenated . When chaining, the nodes D , F , H and I in Figure 6 play the same role as the nodes with the same name in Figure 4. The height of D s can increase by 1.

If H was the root, the algorithm ends with D , which is in AVL form, as the new root of the left tree. If H was the left child, D becomes the new lowest snippet on the left, is named I n +2 , and induction step n is ended.

Was H right child, then D to the child accordingly right at his former grandparent C made so that the siblings of his former uncle B . The height difference to the latter is a maximum of 3. The AVL balance of C can be established by rotations, whereby the height of C can decrease by 1. (The procedure for a height difference of 3 is similar to that for a height difference of 2: If the inner child of the higher sibling is lower than the outer child, you can achieve the AVL balance with a single rotation . If it is higher, you can get through with a double rotation . If both children are the same height, it depends on the balance factor of the inner child: if it is the same height, it does a double rotation; if it is on the outside, it can do two rotations; if it is on the inside, three rotations are possible.)

One looks on the left side in ascending order for the first ancestor A (=: I n + 2 ) in the path, which is the left child (or root). His parent H n +3 is the lowest ancestor of I n +1 on the right. On the left, on the levels up to A , the balance factors are to be checked as with a deletion and adjusted if necessary, whereby the height can decrease by 1. So I n is integrated into snippet A in such a way that A (or another node that has moved into the path as a root in the event of a rotation) is again an AVL tree - and the lowest snippet I n +2 of the induction step represents n +2 on that left.

For the following induction step n +3, however, the sides are switched, right and left are swapped and the roles of I n , H n + 2 and I n + 2 are assigned to the nodes I n +1 , H n +3 and I n +3 .

The edges visited within an induction step (gray arrows in Figure 6) only relate to planes between the roots of two snippets that lie one above the other on the same side ( I n +2 over I n and I n +3 over I n +1 ). The induction steps on the left and right side interlock like a zipper: the ascent on the right is descended and then ascended on the left side when chaining, followed by an ascent, which is descended and ascended on the right side, and so on Level is visited a maximum of three times, each time with constant effort. So the total effort for splitting is proportional to the total height.

Application examples

The bulk deletion of all keys in a coherent area (interval) can be done by splitting twice and concatenating them once or, if the smallest or largest element is included, splitting once. In a similar way, an interval can be prepared from an AVL tree as an AVL tree with logarithmic effort.

A mass insertion can be carried out by splitting once and concatenating twice, provided the set is prepared as an AVL tree and its keys are in an interval that does not appear in the target tree.

implementation

In addition to the need for the binary search tree , the balance factor with its 3 values ​​must be accommodated in a node :makes 2 bits .

In particular, if the processor prefers or enforces correctly aligned pointers , the balance bits can be absorbed by a pointer field in the node so that they do not require an extra memory word.

Otherwise, the same recommendations apply to the implementation of AVL trees as to the binary search trees in general. The special features of the AVL cursor are explicitly discussed below.

cursor

When searching, a pair ( node , direction ) is created which is suitable for specifying the insertion point when inserting . When deleting, the node to be deleted is identified by the Node component , and the Direction component can be used to indicate where the cursor should proceed after the deletion. When traversing, the node indicates the starting point and the direction the desired direction of the navigation in order to arrive at such a pair again as a result. In this way, all important operations generate and / or consume a construct which is called a cursor (by analogy with databases, for example ) .

The size of the cursor depends on whether the AVL nodes contain a pointer to the parent or not.

  1. Parent pointer present: A pair ( node , direction ) represents a fully-fledged cursor.
    A cursor becomes invalid after an operation (which does not care for the cursor) if and only if the node of this cursor is deleted.
    With the percentage surcharge on the storage requirement for the data structure, you definitely buy a percentage saving in runtime, since the way back to the root and head is always secured.
  2. Pointer to parent node not available ("cursor with stack"): In addition to the pair ( node , direction ), the path from the node to the root and head must be kept in the cursor. The length of the cursor thus corresponds to the maximum height of the tree.
    For all operations, access to the parent node via the stack in the cursor is slightly more expensive than via the parent pointer. If the path in the cursor is to be kept valid even after a modifying operation (e.g. for sequential insertions or deletions), an additional percentage (constant on average and, in the worst case, logarithmic) surcharge is added. However, this can only be done for one cursor, the input cursor.

In his technical report AVL dags , G. Myers describes a procedure how several versions of the same AVL tree can be arranged in a sequence and interwoven in a way that does not exceed a logarithmic time and memory requirement per version. The AVL tree is then a so-called " persistent data structure " (English persistent data structure ).

Separation of navigating from modifying operations

The introduction of the cursor allows the modularization of navigation and modifying operations. This puts the the Middle unterlogarithmischen (read: constant) cost of the latter free, for a climb up to the root (as in and shown) required only in exceptional cases. In applications with a strong sequential component, this can have a positive effect on the runtime.

Parallelization

With iterative programming , the checking and repair loop can also be run through in the opposite direction, that is, "anticipating" from head and root to the leaf. This is particularly interesting if the tree is to be accessed in parallel (competitor) to a high degree . For example, in a “Find and Insert” scenario, the search operation would remember the lowest (last) uneven node on the path, lock the tree from there, and keep the comparison results. To complete the insertion operation, the blocked ancestor (possibly after rebalancing) would have to be set to the same level and all of its descendants up to the leaf would have to be set to the balance factors corresponding to the directions taken. The advantage would be that all nodes outside the blocked subtree could be consistently inspected and also changed by the other processor cores in parallel with the ongoing insertion .

Comparison with red-black tree

The set of AVL trees is a real subset of the set of red-black trees. Because every binary tree that fulfills the AVL criterion can be colored in a way that fulfills the red-black criterion .

proof  
4 small AVL trees in red and black coloring.
Both colors go on the split knot.

Assertion: If the AVL tree has a straight height , then it can be colored with black depth for black roots; is odd, then with black depth and a black or with and a red root. (The NIL nodes are not counted here.)

Proof: The claim is correct for (see figure). If the height is even larger, there is a child tree with the odd height , which can be colored with a red root and black depth according to the induction assumption . If the other child tree has the same height, the same applies to it. If it is lower, then its height is straight; it can therefore be colored with the same black depth (with a black root). After the new root is colored black, the black depth is in the composite tree for all paths. In the case of a larger odd one , one of the child trees has the even height and can be colored with black roots and black depth. If a child tree is lower, then its height is odd and can fortunately be colored with the same black depth and with a black root. If the new root is colored black, then the new root is black . However, since both child trees have black roots, the new root can also be colored red, in which case a black depth of results. ■

No AVL tree

But there are red-black trees that do not meet the AVL criterion. The figure opposite shows, for example, a red-black tree with 6 (inner) nodes and the external path length sum 21, while 20 is the largest external path length sum for AVL trees (and at the same time the smallest possible for all binary trees) of this size. Consequently, also the worst-case amount of the AVL tree smaller than that of the red-black tree, namely by a factor with the above and the factor 2 from the height evidence of red-black tree . In general, AVL trees are viewed as better balanced and their search time behavior as more favorable.

The storage space requirements are practically identical: 1 bit for the color versus 2 or 1 bit (s) for the balance factor.

The space requirements and the runtime behavior for the operations listed are identical on average and in the worst case. This also applies to the amortized constant modification costs when inserting. The red-black tree offers amortized constant modification costs even when deleting and the AVL tree there only on average constant. Applications of binary search trees that only contain sequences of insertions and deletions - without any intervening search operations - miss the purpose of the search tree. If one disregards this type of application, the overall behavior of a mixture of searches and modifications is amortized for both data structures, on average and in the worst case logarithmic.

Realistic application situations with performance data and comparisons - also with other search algorithms and types of data structures - can be found at Ben Pfaff. In 79 measurements, his results show, among other things, the very great similarity of AVL trees ( AVL ) and red-black trees ( RB ) with runtime ratios AVLRB between 0.677 and 1.077 with a median of ≈ 0.947 and a geometric mean of ≈ 0.910 .

Applications

As a basic tool in computer science, the binary search trees have a large area of ​​application, in which they are also highly interchangeable. Specific examples and selection criteria can be found in Binary Search Tree # Applications .

See also

literature

Web links

Commons : AVL Trees  - collection of images, videos and audio files

Individual evidence

  1. Thomas H. Cormen et al. a .: Introduction to Algorithms . 2nd edition, MIT Press, Cambridge (Massachusetts) 2001, p. 296.
  2. GM Adelson-Velsky , EM Landis : Один алгоритм организации информации . In: Doklady Akademii Nauk SSSR . 146, 1962, pp. 263-266. (Russian)
    English translation by Myron J. Ricci: An algorithm for the organization of information. ( Memento of February 1, 2014 in the Internet Archive ) (PDF) In: Soviet Mathematics 3, 1962, pp. 1259–1263.
  3. In the English literature mostly “balance factor”, as in Donald E. Knuth: The Art of Computer Programming. Volume 3, Sorting and Searching, 2nd Edition, Addison-Wesley, 1998, p. 459; “Height balance” at Kurt Mehlhorn: data structures and efficient algorithms . Teubner, Stuttgart 1988, p. 295.
  4. ^ Donald E. Knuth: The Art of Computer Programming. Volume 3, Sorting and Searching, 2nd edition, Addison-Wesley, 1998, p. 460.
  5. Ben Pfaff Performance Analysis of BSTs in System Software p. 2
  6. s. Donald Knuth: The Art of Computer Programming p. 460.
  7. Donald Knuth takes this route in The Art of Computer Programming, p. 462.
  8. so Ben Pfaff in An Introduction to Binary Search Trees and Balanced Trees § 34.
  9. a b Dinesh P. Mehta, Sartaj Sahni (Ed.) Handbook of Data Structures and Applications 10.4.2
  10. ^ According to Robert Sedgewick , Kevin Wayne: Algorithms Fourth Edition. (PDF)  ( Page no longer available , search in web archivesInfo: The link was automatically marked as defective. Please check the link according to the instructions and then remove this notice. (PDF) Pearson Education, 2011, ISBN 978-0-321-57351-3 , p. 410 (English, accessed on March 25, 2018), the proposal was made in 1962 by T. Hibbard. In An Introduction to Binary Search Trees and Balanced Trees p. 39, Ben Pfaff always chooses the latter - if available. But you can also take the former or the possibly higher one of both subtrees.@1@ 2Template: Dead Link / github.com  
  11. The figures correspond to Donald Knuth The Art of Computer Programming p. 461 Case1 (single rotation) and Case2 (double rotation).
  12. ^ A b Donald E. Knuth: The Art of Computer Programming , Volume 3, Sorting and Searching, 2nd edition, Addison-Wesley, 1998, p. 474
  13. Costin S, who implements AVL trees with concat and split operations in his AvlTree.cs project , requires the recording of the full absolute height in each node. But - as shown - you can get by with the balance factors without increasing the effort.
  14. Ben Pfaff gives an object with very similar functionality the name "traverser" and offers a standard and a traverser variant for searching, inserting and deleting. ( An Introduction to Binary Search Trees and Balanced Trees p. 15 and Performance Analysis of BSTs in System Software p. 3)
  15. ^ Eugene W. Myers : AVL dags . In: Technical Report, Department of Computer Science, U. of Arizona . 82-9, 1982. Quoted from: Neil Sarnak, Robert E. Tarjan : Planar Point Location Using Persistent Search Trees Archived from the original on October 10, 2015. In: Communications of the ACM . 29, No. 7, 1986, pp. 669-679. doi : 10.1145 / 6138.6151 . Retrieved December 25, 2014.
  16. "Top-Down-Strategy" at Kurt Mehlhorn: data structures and efficient algorithms . Teubner, Stuttgart 1988, pp. 197-198.
  17. ^ Paul E. Black: AVL tree. In: Dictionary of Algorithms and Data Structures. National Institute of Standards and Technology , April 13, 2015, accessed July 2, 2016 .
  18. ^ Kurt Mehlhorn , Peter Sanders : Algorithms and Data Structures. The Basic Toolbox . Springer, Berlin / Heidelberg 2008, ISBN 978-3-540-77977-3 , doi : 10.1007 / 978-3-540-77978-0 . P. 165.
  19. ^ Ben Pfaff: Performance Analysis of BSTs in System Software. Stanford University 2004.

Remarks

  1. Since the maximum subtree rooted by it , "its" subtree , can be assigned to a node in a reversibly unambiguous way , for the sake of brevity, properties such as height, balance, parent-child and sibling relationship etc. should be assigned to a node and such a subtree in the same way. If the relationship to the parent plays a role, the left (or right) child tree should be mentioned more specifically . The other way around, for example when relocating a subtree, its root functions as a "handle".
  2. In principle, you can also use the mirrored difference. The text follows Knuth and many other authors.
  3. The recursion equation for height is
  4. All relevant data for inserting and deleting can be calculated from the balance factors ( ≤ 2 bits per node); knowledge of the absolute heights is not required.
  5. according to the definition of binary search tree # Abb1A , in which the node levels are counted and the tree consisting only of the root has a height of 1
  6. If the number of nodes in the Fibonacci tree approaches infinity, then its mean search depth behaves asymptotically like
        with     ,
    so only deviates by about 4 percent from the ideal of a complete binary tree. (However, in terms of search depth, the Fibonacci trees are not necessarily the worst balanced; e.g. there is an AVL tree of height 5 with 20 nodes (left subtree completely height 4, right Fibonacci height 3) and the external path length sum 97, in comparison to 96 for the Fibonacci tree of height 6 with the same number of nodes. These AVL trees are more difficult to characterize than the Fibonacci trees.)
  7. a b Let the number of nodes that are not leaves and that have two child trees of the same height, and the number of nodes with different height child trees, then the proportion of the height unequal . (Only the Fibonacci trees and the other thinnest AVL trees achieve exact . In contrast, only the complete binary trees (with inner nodes) achieve exact . The mean value over all possible forms of AVL trees is included .) The following situation is assumed recursively : In the case of an insertion, the result was that the height of the subtree in which the insertion took place increased by 1. It is also assumed that it is a right child tree (Figures 2 and 3 match this; the mirrored alternative left child tree is left to the reader, it leads to the same values). When checking the position in the parent node of this subtree, a distinction must be made between the following cases:
    BF
    before
    insertion
      frequency provisionally
    figer
    BF
    Rebalan-
    cierung
    required
    BF
    DA
    to
    Subtree
    gets
    higher by
    Level above
    half to
    check
    -1 left higher 0 No 0 No
    0 at the same height +1 No 1 Yes
    +1 right higher +2 Yes 0 0 No
    Tab. 2: After inserting: The new balance factors ( BF ) depending on the old ones

    The probability of having to check the next higher level when inserting is therefore . Assuming that these probabilities are the same for all levels, the probability that at least levels need to be checked is the same and that exactly levels need to be checked is the same . The average number of levels to be checked thus adds up

    This function in is monotonous from (for and complete binary trees) (for and Fibonacci trees). After inserting (with correction of the balance factor), the average number of levels to be checked subsequently is between 3 and 1.

  8. As with pasting, the tree can also be repaired when deleting from top to bottom ("top-down"). For the sake of simplicity, only the repair from the bottom up will be explained here. This is also how it works with recursive programming.
  9. a b The same assumptions are made about the proportion of vertically unequal nodes as for the insertion .
    The following situation is assumed recursively: After a deletion, the height of the subtree in which the deletion took place decreased by 1. Furthermore, it is a left child tree without loss of generality. When checking the position in the parent node of this subtree, a distinction must be made between the following cases:
    BF
    before
    deletion
      frequency provisionally
    figer
    BF
    Rebalan-
    cierung
    required
    BF
    DA
    to

    Sub- tree becomes
    lower by
    Level above
    half to
    check
    -1 left higher 0 No 1 Yes
    0 at the same height +1 No 0 No
    +1 right higher +2 Yes +1 1 No
    0 2 Yes
    1The penultimate line relates to the case of single rotation with children and of the same height
    2the last on rotations with not equally high children of the higher node Z (see Fig. 2 and 3), i.e. H.
      Double rotation (left child Y higher) resp. Single rotation with the right child t 4 higher.
    Tab. 3: After deletion: The new balance factors ( BF ) depending on the old ones

    In the event of a deletion, there is a probability of the need to check the next higher level. Assuming that these probabilities are the same for all levels, the mean number of levels to be checked adds up

    .

    This function in grows monotonically from (for and complete binary trees) to (for and Fibonacci trees). After deletion (with correction of the balance factor), an average of less than approximately one additional level must be checked.

  10. Note that in the case of a right-hand Y, the first partial rotation of a double rotation actually worsens the situation in the affected subtree Y ; only the second with an axis of rotation outside brings things right. As far as the balance factors are concerned, neither the first nor the second partial rotation corresponds to an AVL # single rotation .
  11. a b 1 bit if recorded in the children. If the bit is given the meaning "jump in height above", then the height in the path can be calculated without knowing the direction of the child. Nevertheless: it is more clever for the modification operations to have the 2 balance bits next to each other in one byte.
  12. As a data structure that is accommodated in the homogeneous working memory (main memory), the size of the AVL tree is limited, including the height of the tree and the lengths of the paths from leaf to root. Main memories with 32-bit and 64-bit 8-bit byte addresses are common.
    Width of
    a
    pointer
    in bits
    maximum number of addressable bytes minimum node length
    (2 pointers + 1 byte
    + user data  1 )
    maximum
    number of nodes
    Number of nodes of the next larger Fibonacci tree ... … the height
    32 4294967296 2 4 + 1 + 3 = 12 3.6 · 10 8 433494436 41
    64 18446744073709551616 2 * 8 + 1 + 7 = 24 7.7 10 18 1100087778366101930 86
    1including key. With more user data and less main memory
     , the maximum number and height of nodes are reduced accordingly.
    Tab. 4: Maximum height depending on the addressing width

    Conclusion: With AVL trees it is justifiable to provide cursors with stacks that are so large that an overflow cannot occur.

  13. dag for "directed acyclic graph", German: Directed acyclic graph
  14. In fact, it requires a chain with the links locks (child) and unlocking (parent) to the first height unequal node which initially remains blocked until a new interaction with the members of locks (height disparate node) and unlocking (ancestor) enters . ( Locking (node) actually means locking (subtree) . And: in the event of a potential rebalancing, the lock must even be held one level higher.) Once the insertion point has been reached, the correction and unlocking loop can be started - for maximum parallelism again in a child-parent chain.
    If all concurrent players this directionality of the locking protocol compliance, one can circular dependency and therefore a deadlock ( deadlock ) does not arise.
  15. In a complete binary tree, a -fold loop consisting of the insertion of an additional node and deletion of the same requires the adjustment of balance factors up to the root each time and thus an effort in , i.e. amortized .
This version was added to the list of articles worth reading on February 5, 2014 .