Binary heap

from Wikipedia, the free encyclopedia
Binary min heap

A binary heap is a data structure used in computer science for the efficient sorting of elements. The asymptotically optimal sorting method Heapsort uses a binary heap as the central data structure. The binary heap is also used to implement a priority queue in which the element with the highest priority can be efficiently queried and removed. The priority of the elements is imprinted on them by keys . There must therefore be a total order over the set of keys, for example the less than relation (<) over therepresents whole numbers .

In the literature, the addition binary is often omitted, so that depending on the context, the heap is meant as an archetypal (binary) heap structure (implemented in an array) or the class of all heap data structures . Furthermore, when using the order relation or a heap is referred to as a min heap or max heap . Since the operations in the min and max heap only differ in the order relation used, the binary heap and the operations on it are defined below using the min heap as an example.

Definition of min-heap

A binary tree is a min-heap when each node with the following applies:

Where denotes the parent node of .

This property is known as the (min) heap property.

So a heap is a partially ordered tree . There is an order between child and parent nodes, but the child nodes are not ordered from one another.

Operations

A binary heap can be constructed efficiently in linear time , where the number of elements denotes from the input. The following operations work on a heap and have a worst-case runtime of :

  • insert - inserts a new element into the heap
  • remove - removes an element
  • extractMin - extracts the element with the smallest key
  • decreaseKey - decreases the key value of an element

The getMin operation returns the smallest element in the heap and requires constant computing effort for this.

structure

A binary heap consists of a binary tree in which all layers except the last one must be completely filled. The last layer of the tree must be filled left-justified. This structure guarantees that the tree is balanced .

In addition, the binary tree must meet the heap condition: using the example of the min heap (see figure), the key of each child of a node must be greater than or equal to the key of the node itself. The heap property guarantees that the node with the smallest key is always at the root .

Often a binary heap is not explicitly constructed with pointers, but is mapped using an array. If you want to store a binary heap in an array , the root of the tree is stored in the first position in the array. With array indexing starting with , the two successors of a node in the -th position are stored in the -th and -th position, according to the Kekule numbering . Similarly, the successors of the node with index are stored in the -th and -th position if the array indexing starts with 0.

Example of the implicit representation of a binary tree in an array. The arrows drawn are implicitly given by the functions for calculating the child or parent indices.

Algorithms

When analyzing the algorithms, the heap size or the number of elements in the heap array is referred to.

Heapify

The function heapify forms the basis of several heap operations . If necessary, it restores the heap condition of a binary tree, provided that the left and right subtree meet the heap condition. To do this, heapify swaps child and parent nodes recursively in descending order until the heap property is no longer violated.

In other words, when one of the two child nodes rises to the parent level currently being considered, the parent node subsequently drops recursively until it has reached the level of its weight. Because of the successive swapping (or “shifting”) of a node down into a subtree, the heapify operation is also referred to as the sift-down ( “seven” ) or sift-down phase.

In pseudocode :

void heapify(Array H, int i)
  assert(isheap(H, left(i)) && isheap(H, right(i)))
  int min = i
  if (left(i) < H.size && H.key(left(i)) < H.key(min))
    min = left(i)
  if (right(i) < H.size && H.key(right(i)) < H.key(min))
    min = right(i)
  if (min != i) {
    H.swap(i, min)
    heapify(H, min)
  }

The auxiliary functions left and right calculate the index of the left or right child node in the heap array, key abstracts from the access to this array and swap swaps two elements in the array in which the heap is stored. The function only traverses the tree in depth, so its running time is in. Since the height of the tree log depends on the number of its elements, requires the function heapify in the worst case logarithmic term of respect. The size of the heap, so . heapify only needs a constant number of additional memory cells because it is tail recursive . This means that the recursion can be replaced manually or automatically by a loop without a stack:

void heapify(Array H, int a)
  int i = a
  do {
    assert(isheap(H, left(i)) && isheap(H, right(i))
    int min = i
    if (left(i) < H.size && H.key(left(i)) < H.key(min))
      min = left(i)
    if (right(i) < H.size && H.key(right(i)) < H.key(min))
      min = right(i)
    if (min == i)
      break
    H.swap(i, min)
    i = min
  } while(true)

Build

The build function constructs a heap from an array by iteratively calling the heapify function . It starts with the leaves, which by definition fulfill the heap property, and works its way gradually to the root. This way of working is also known as bottom-up. As pseudocode:

void build(Array a)
  if (a.empty)
    return
  for (int i = a.size/2-1; i>=0; --i)
    heapify(a, i)

The structure of the binary heap, which is stored in an array, shows that only leaves, i.e. 1-element heaps, are stored from position on. This means that the lowest level of the binary tree begins. Since the tree elements are stored in the array in level order , the iteration runs successively from the last to the first element of a level.

The build runtime is in , which is not directly obvious. Because heapify is in and is called several times. The linear running time results from the following equation:

where iterates over the tree height and calculates the number of subtrees at level or the number of children of all nodes of the height .

Decrease key

If the key of a heap element is reduced, the heap property may have to be restored in the previous node. The heap property in the child subtrees of is not changed by the decrease-key operation. decrease-key works like build bottom-up:

void decrease(Array H, int i, newkey)
  assert(isheap(H, 0))
  assert(H.key(i) >= newkey)
  H.key(i) = newkey
  while (i > 0 && H.key(i) < H.key(parent(i)))
    H.swap(i, parent(i))
    i = parent(i)

Insert

An element is inserted using insert by expanding the heap array by a new element at the end with the value , whereupon the decrease function is called with the key to be inserted, so that the possibly violated heap property is restored:

void insert(Array H, newkey)
  assert(isheap(H))
  int i = H.size
  H.resize(i+1)
  H.key(i) = inf
  decrease(H, i, newkey)

The term is accordingly

Remove

The removal of an element using remove is done by removing the element to be removed from its position i in the heap and replacing the element at the end of the heap. Then the heap property at position i must be restored. It can happen that the element at position i is larger than its parent node, as well as that it is smaller. Accordingly, it has to be moved down using heapify or up using decrease .

The decrease case may not be obvious. Why should the previously last element of the array be smaller than an element that was deleted much higher up in the tree? This is because a heap only represents partial and not total order. If you look at the first common ancestor of the element to be deleted and the last element, it may be that there are only very small elements in one subtree with the last element, whereas the other subtree with the element to be deleted contains larger elements. Here is an example of such a situation. The element with index 5 is to be deleted:

h1 = [0, 1, 2 , 2, 1, 2 , 2, 2, 2, 1 ] // is a correct heap
-> [0, 1, 2 , 2, 1, 1 , 2, 2, 2, 2 ] // after swapping with the last element
-> [0, 1, 2 , 2, 1, 1 , 2, 2, 2] // after removing the now last element

After removing the last element, element 1 now at index 5, together with its parent node 2, violates the heap property. Applying heapify would not change the array because the function only pushes elements down. However, index 5 is already a leaf node. Instead we have to slide the element up at position 5. This is done with the decrease function .

Element remove(Array H, unsigned int i)
  assert(isheap(H))
  assert(i<H.size)
  removedItem = H[i]
  int lastIdx = H.size-1
  H.swap(i,lastIdx)
  H.resize(lastIdx)
/*  Im Fall i == lastIdx wurde die Heap-Eigenschaft durch das Entfernen
    des Elements beibehalten, und i befindet sich nun ''out of bounds'',
    weil das Array verkleinert wurde. Ein Aufruf von heapify oder
    decrease würde zum Absturz führen. */
  if ( i != lastIdx ) {
    if ( i == 0 || H.key(i) > H.key(parent(i)) ) {
      heapify(H, i)
    } else {
      decrease(H, i, lastIdx) // decrease macht nichts, wenn H.key(i) == H.key(parent(i))
    }
  }
  return removedItem

Extract Min

Returning and removing an element using extractMin is similar to remove . Because of the heap condition, the minimum element is at the root of the tree and thus represents the element to be removed:

Element extractMin(Array H)
  assert(isheap(H))
  assert(H.size > 0)
  return remove(H, 0)

The running time is accordingly:

Get-Min

The getMin returns the smallest element in a heap. From the heap property it follows directly that the smallest element is always in the first array position, i.e. the root of the binary tree. In the pseudo-code:

Element getMin(Array H)
  assert(isheap(H))
  assert(H.size > 0)
  return H.key(0)

The running time is correspondingly constant:

See also

literature