Bottom-up heapsort

from Wikipedia, the free encyclopedia

BottomUp Heapsort is a sorting algorithm that may a. It was introduced by Ingo Wegener in 1990 and works better than Quicksort on average , if comparative operations are weighted sufficiently. It is a variant of Heapsort that is particularly suitable for sorting very large amounts of data if (compared to the necessary swapping operations) a relatively high effort is required per comparison operation.

However, this variant was already analyzed in 1986 by Svante Carlsson, who finally found another variant that even has a worst-case runtime of only n log n + O ( n log log n ) comparisons. This bottom-up heapsort was discovered by Robert W Floyd (during a revision of the original heapsort by Williams), but he was unable to prove the runtime behavior of this variant.

In the worst case, it only needs 1.5 n log n + 2 n key comparisons to sort a sequence of n keys. On average, BottomUp heapsort only needs n log n + O ( n ) key comparisons.

Sorting principle

When sorting with normal heapsort, two comparisons are made for each level when an element is lowered:

  1. Which of the two successor nodes of the element to be lowered is larger?
  2. Is the successor node now determined larger than the element to be lowered?

Since half of a binary tree consists of leaves and, in addition, former leaves with already rather low values ​​are lowered when sorting, it is likely that an element will have to be lowered to or near the leaf level. It can therefore be worthwhile to forego the second comparison and, if there is any suspicion, to lower it to the leaf level.

In a second step, a backward check is made to determine how far the element has to be lifted again. In the best case (very large fields with only a few duplicates ), almost half of the comparisons required can be saved with moderate additional effort.

BottomUp Heapsort is less suitable for sorting smaller fields with a simple numerical comparison operation and when a large number of elements in the field are equivalent (then the assumption that they usually have to be lowered close to the leaf level is not correct).

algorithm

Specifically, the heapsort algorithm is changed as follows with regard to lowering:

First the path in which the root element is to be sunk is determined. This is done by determining the largest child in each case (path of maximum children). Then this particular path is traversed from bottom to top (from the leaf towards the root). Each node visited is compared to determine whether it is larger than the root element to be lowered. If so, the root element is copied to the position of the node visited last and the rest of the path is moved up one level beforehand.

Alternatively, the shift can be carried out from the start on suspicion down to the sheet level and later - if necessary - undone. Where copies can be made relatively cheaply (e.g. because only one pointer is copied), this can be advantageous overall.

example

Heap : [9, 23, 24, 20, 18, 14, 17, 13, 15, 11, 10, 5, 7, 3, 2]

Tree structure:

            9
       /         \
      23         24
    /   \       /  \
   20   18     14  17
  / \   / \   / \  / \
 13 15 11 10  5  7 3  2

The element 9 must be lowered because it is smaller than at least one successor node. First, the path of the maximum children (starting from the root) is determined. The result is 9 - 24 - 17 - 3 . The algorithm now runs through this path from bottom to top, i.e. 3 -> 17 -> 24 -> 9 . Now the path starting from the leaf node (3) is traversed until a node is found that is greater than 9 . The run ends at 17 . Now all nodes from 17 to the successor node of the root (= 17 -> 24) are moved up one level and node 9 is moved to the position of 17 . As a result, 17 and 24 as successor nodes and 9 as root nodes change their place.

Heap : [24, 23, 17, 20, 18, 14, 9, 13, 15, 11, 10, 5, 7, 3, 2]

Tree structure:

           24           -------|                      24
       /         \             |                 /         \
      23         17            9                23         17
    /   \       /  \           |               /   \      /   \
   20   18     14      <-------|              20   18    14    9
  / \   / \   / \  / \                       / \   / \   / \  / \
 13 15 11 10  5  7 3  2                     13 15 11 10 5  7  3  2

implementation

Simple example implementation in C99 :

 int heapsort_bu( int * data, int n ) // zu sortierendes Feld und seine Länge
 {
   int val, parent, child;
   int root= n >> 1;                  // erstes Blatt im Baum
   int count= 0;                      // Zähler für Anzahl der Vergleiche

   for ( ; ; )
   {
     if ( root ) {                    // Teil 1: Konstruktion des Heaps
       parent= --root;
       val= data[root];               // zu versickernder Wert
     }
     else
     if ( --n ) {                     // Teil 2: eigentliche Sortierung
       val= data[n];                  // zu versickernder Wert vom Heap-Ende
       data[n]= data[0];              // Spitze des Heaps hinter den Heap in
       parent= 0;                     //   den sortierten Bereich verschieben
     }
     else                             // Heap ist leer; Sortierung beendet
       break;

     while ( (child= (parent + 1) << 1) < n )  // zweites (!) Kind;
     {                                         // Abbruch am Ende des Heaps
       if ( ++count, data[child-1] > data[child] )  // größeres Kind wählen
         --child;

       data[parent]= data[child];     // größeres Kind nach oben rücken
       parent= child;                 // in der Ebene darunter weitersuchen
     }

     if ( child == n )                // ein einzelnes Kind am Heap-Ende
     {                                //   ist übersprungen worden
       if ( ++count, data[--child] >= val ) {  // größer als der zu versick-
         data[parent]= data[child];   //   ernde Wert, also noch nach oben
         data[child]= val;            // versickerten Wert eintragen
         continue;
       }

       child= parent;                 // 1 Ebene nach oben zurück
     }
     else
     {
       if ( ++count, data[parent] >= val ) {  // das Blatt ist größer als der
         data[parent]= val;           //   zu versickernde Wert, der damit
         continue;                    //   direkt eingetragen werden kann
       }

       child= (parent - 1) >> 1;      // 2 Ebenen nach oben zurück
     }

     while ( child != root )          // maximal zum Ausgangspunkt zurück
     {
       parent= (child - 1) >> 1;      // den Vergleichswert haben wir bereits
                                      //   nach oben verschoben
       if ( ++count, data[parent] >= val )  // größer als der zu versickernde
         break;                             //   Wert, also Position gefunden

       data[child]= data[parent];     // Rückverschiebung nötig
       child= parent;                 // 1 Ebene nach oben zurück
     }

     data[child]= val;                // versickerten Wert eintragen
   }

   return count;
 }

For comparison purposes, the function returns the number of comparisons made.

literature

  • JWJ Williams: Algorithm 232 - Heapsort . In: Communications of the ACM , 1964, 7 (6), pp. 347-348.
  • Robert W. Floyd: Algorithm 245 - Treesort 3 . In: Communications of the ACM , 1964, 7 (12), p. 701.
  • S. Carlsson: HEAPS . Doctoral dissertation, Lund Univ., Sweden 1986.
  • Svante Carlsson: Average-case results on heapsort . In: BIT , 27, 1987, no.1, pp. 2-17.
  • Ingo Wegener: BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small) . 15th International Symposium on Mathematical Foundations of Computer Science (MFCS '90) Banská Bystrica, 1990. In: Theoret. Comput. Sci. , 118, 1993, no. 1, pp. 81-98.