Quicksort

from Wikipedia, the free encyclopedia
A random permutation of integer values ​​is sorted with quicksort. The blue lines show the value of the pivot element marked in red in the respective recursion step.

Quicksort ( English quick , quick ' and to sort , sort') is a fast, recursive , non- stable sorting algorithm , which according to the principle of divide and rule ( latin Divide et impera! , English divide and conquer ) works. It was developed in its basic form by C. Antony R. Hoare around 1960 and has since been improved by many researchers. The algorithm has the advantage that it has a very short inner loop (which greatly increases the execution speed) and does not require additional storage space (apart from the additional space required for the recursion on the call stack ).

On average , the quicksort algorithm makes comparisons. In the worst case , comparisons are made, but this is very rare in practice.

The numbers from 1 to n in random order are sorted with quicksort.

principle

First, the list to be sorted is divided into two sub-lists (“left” and “right” sub-list). To do this, Quicksort selects a so-called pivot element from the list. All elements that are smaller than the pivot element are placed in the sub-list on the left, and all elements that are larger in the sub-list on the right. The elements that are the same as the pivot element can be distributed over the partial lists as required. After the division, the elements in the list on the left are less than or equal to the elements in the list on the right.

Then you have to sort each sub-list in order to complete the sorting. To do this, the quicksort algorithm is executed on the left and right sub-lists. Each sub-list is then divided into two sub-lists and the quicksort algorithm is applied to each of them, and so on. These self- calls are known as recursion . If a partial list of length one or zero occurs, it is already sorted and the recursion is aborted .

The positions of the elements that are equal to the pivot element depend on the division algorithm used. You can distribute yourself to the sublists as you wish. Since the order of equivalent elements can change, Quicksort is generally not stable (stable in-place variants do exist, however).

The procedure must ensure that each of the partial lists is at least one shorter than the entire list. Then the recursion ends guaranteed after a finite number of steps. This can e.g. This can be achieved, for example, by placing the element originally selected as the pivot in a place between the sub-lists and thus not belonging to any sub-list.

Pseudocode

The division is implemented as an in-place algorithm: the elements are not copied into additional memory, but only swapped within the list. A process is used for this which is known as dividing or partitioning . Then the two sub-lists are in the right position. As soon as the partial lists have been sorted, the sorting of the entire list is finished.

The following pseudocode illustrates how the algorithm works , where datenthe list to be sorted has n elements. With each call to quicksort()specifies linksthe index of the first element in the sublist and rechtsthat of the last. The first time it is called (top recursion level), links = 0and rechts = n-1. The transferred list is recursively divided on and on until it only contains one value.

 funktion quicksort(links, rechts)
     falls links < rechts dann
         teiler:= teile(links, rechts)
         quicksort(links, teiler-1)
         quicksort(teiler+1, rechts)
     ende
 ende

The following implementation of the function teiledivides the field so that the pivot element is in its final position and all smaller elements are in front of it, while all larger elements come after:

 funktion teile(links, rechts)
     i:= links
     // Starte mit j links vom Pivotelement
     j:= rechts - 1
     pivot:= daten[rechts]
wiederhole solange i < j // solange i an j nicht vorbeigelaufen ist // Suche von links ein Element, welches größer oder gleich dem Pivotelement ist wiederhole solange i < rechts und daten[i] < pivot i:= i + 1 ende
// Suche von rechts ein Element, welches kleiner als das Pivotelement ist wiederhole solange j > links und daten[j] ≥ pivot j:= j - 1 ende
falls i < j dann tausche daten[i] mit daten[j] ende
ende
// Tausche Pivotelement (daten[rechts]) mit neuer endgültiger Position (daten[i]) // und gib die neue Position des Pivotelements zurück, beende Durchlauf falls daten[i] > pivot dann tausche daten[i] mit daten[rechts] ende antworte i ende

After the pivot element has been selected, an element is first searched for, starting from the beginning of the list (index i), which is larger than the pivot element. Accordingly, starting from the end of the list, an element smaller than the pivot element is searched for (index j). The two elements are then swapped and end up in the correct sublist. Then the two searches are continued from the positions reached until the lower and upper searches meet; there is the boundary between the two sublists.

Examples

teile(links, rechts)

The letter sequence “einbeispiel” should be sorted alphabetically.

Initial situation after initializing iand j, the element on the right ( l) is the pivot element:

  e i n b e i s p i e l
  ^                 ^
  i                 j

After the first search in the inner loops kept ion one element >= l and jone element <= l:

  e i n b e i s p i e l
      ^             ^
      i             j

After swapping the elements at iand j:

  e i e b e i s p i n l
      ^             ^
      i             j

After the next search and swap:

  e i e b e i i p s n l
              ^   ^
              i   j

After another search, the indices ran past each other:

  e i e b e i i p s n l
              ^ ^
              j i

After swapping iand Pivot is ithe separation point for the sublists. At iis the pivot element, on the left there are only elements ≤ Pivot and on the right only those> Pivot:

  e i e b e i i l s n p
                ^
                i

Complete example of alphabetical sorting

In this example the quicksort algorithm is to sort the sequence of letters "quicksort". First, the right element P->is defined as a pivot element. Then the counters g for "larger" run from left to right and k for "smaller" from right to left,

 Quicksort
^       ^^
g       kP

until g encounters an element which is larger than the pivot element and until k encounters an element which is smaller than the pivot element.

 Quicksort
  ^     ^^
  g     kP

These two found elements r and u are then swapped in the following step.

 Qricksout
  ^     ^^
  g     kP

In the following step, the indices k and g continue in the same direction as before and search for elements that are smaller at k and larger than the pivot element at g.

 Qricksout
       ^^^
       kgP

Now k and g have passed each other. This event is a termination condition. Now the pivot element is swapped with the element indicated by g.

 Qricksotu
       ^^^
       kPg

The following two statements now apply: “To the left of the pivot element, all elements are less than or equal to the pivot element. To the right of the pivot element, all elements are greater than or equal to the pivot element. "

   links|:|rechts
 Qrickso|t|u
       ^|^|^
       k|P|g

The pivot element now "divides" the amount of data at the point of the pivot element into two halves, left and right. Now the algorithm has to process the left and right parts in the same way as in the previous one. This results in the recursion. The right part (the letter u) is only a single element and is therefore sorted by definition. So now the left part is dealt with. The right element is again the pivot element, and the counters are set appropriately.

 Qrickso|t|u
^     ^^
g     kP

The Q is larger than o and the k is smaller than the o.

 Qrickso|t|u
 ^   ^ ^
 g   k P

So the Q and the k are swapped.

 kricQso|t|u
 ^   ^ ^
 g   k P

Indices g and k continue ...

 kricQso|t|u
  ^ ^  ^
  g k  P

The r and c are swapped.

 kcirQso|t|u
  ^ ^  ^
  g k  P

In the next step the indices ran past each other again ...

 kcirQso|t|u
   ^^  ^
   kg P

... and the pivot element (letter o) is swapped with the larger element (letter r).

 kcioQsr|t|u
   ^^  ^
   kP  g

Now there is again a left and a right part.

links:rechts
 kci|o|Qsr  |t|u
   ^|^|  ^  | |
   k|P|  g  | |

First the left part is dealt with.

 kci|o| Qsr|t|u
^ ^^| |^ ^^| |
g kP| |g kP| |
 cki|o| Qsr|t|u
 ^^^| |^ ^^| |
 gkP| |g kP| |

Letters c and k are swapped.

 cki|o| Qsr|t|u
 ^^^| |^ ^^| |
 kgP| |g kP| |

Indexes have passed each other and the element of index g is swapped with that of index P.

 cik|o| Qsr|t|u
 ^^^| |^ ^^| |
 kPg| |g kP| |

The new left and right parts that have now been created now only consist of a single element and are considered sorted.

 cik|o| Qsr|t|u
    | | ^^^| |
    | | kgP| |

In the former right part (letter Qsr) the indices run right past each other, and the element at g is swapped for the pivot element.

 cik|o| Qrs|t|u
    | | ^^^| |
    | | kPg| |

All characters are now sorted.

 cik|o| Qrs|t|u

Result:

 cikoQrstu

Parameters

running time

The running time of the algorithm essentially depends on the choice of pivot element.

In the worst case , the pivot element is always chosen so that it is the largest or the smallest element of the list. This is the case, for example, if the element at the end of the list is always selected as the pivot element and the list to be sorted is already sorted. The list to be examined is then only one smaller in each recursion step and the time complexity is described by . The number of comparisons in this case is .

In the best case (best case) is the pivot element always chosen so that the two resulting parts lists are approximately equal. In this case applies to the asymptotic running time of the algorithm . This time complexity is also true for the average case (average case). The length of the longer sub-list in the case of a recursive call is in section and the depth of the recursion is therefore in . In the best case, the number of comparisons is about .

Various approaches have been described in the literature for the choice of the pivot element. The probability of the occurrence of the worst case varies with these.

One possible approach is to always choose the first, last, or middle item on the list. However, this naive approach is relatively inefficient. Another possibility is to determine the median of these three elements and use it as a pivot element. Another approach is to select a random element as the pivot element. With this randomized quick sort , the probability that the pivot element is chosen in each division step in such a way that the worst-case runtime results is extremely low. One can assume that it practically never occurs.

There are algorithms, for example Heapsort , whose runtimes are limited even in the worst case . In practice, however, Quicksort is still used, since it is assumed that the worst case occurs only very rarely with Quicksort and is faster than Heapsort in the middle, since the innermost loop of Quicksort contains only a few, very simple operations. However, this is still the subject of research and some analyzes and simulations see Heapsort variants in the foreground, both from information-theoretical considerations and implementation considerations.

Today, Quicksort is the preferred sorting algorithm for a wide range of practical applications because it is fast and, when recursion is available, easy to implement. It is already available in many standard libraries. In the meantime, however, there is also an alternative available with Introsort , which guarantees an upper limit of even for the worst case with a comparable medium running time .

If the median-of-medians algorithm is used for the choice of the pivot element, which determines the median of an array , then a total running time of for the worst case of quicksort can be guaranteed.

Storage space

Quicksort is an in-place process. Although it only swaps the elements of the list to be sorted within the list and does not copy them to additional storage space, it requires additional space on the stack for each recursion level .

As with the runtime, the memory consumption also depends on the choice of pivot element and the type of data available. In the best and average case, with a recursion depth in , only a stack size of is required.

In the worst case, the number of recursions is only limited by the list length n, which requires a stack of the size . This can lead to a stack overflow with long lists. There are various modifications of the algorithm to solve this problem or at least to reduce the probability of its occurrence:

  • End recursion removal (see pseudocode below)
  • a more balanced choice of pivot element (e.g. median of three)
  • Choosing a random pivot element, which avoids systematic problems that could otherwise result from certain pre-sorting of the elements in conjunction with the pivot selection method
  • Avoidance of partial lists with fewer than two elements (results in the maximum recursion depth if the pivot element is also removed from the partial lists )

The following pseudocode replaces the final recursion (sorting of the second sublist) with an iteration in such a way that the shorter sublist is sorted recursively and the longer iteratively. The recursion depth is then no greater than log (n):

 funktion quicksort(links, rechts)
     solange rechts > links wiederhole
         teiler:= teile (links, rechts)
         falls rechts-teiler > teiler-links
            quicksort(links, teiler - 1) // kleinere Teilliste rekursiv ..
            links:= teiler + 1 // .. und größere iterativ sortieren
         sonst
            quicksort(teiler + 1, rechts)
            rechts:= teiler - 1
         ende
     ende
 ende

Another possibility to avoid the additional linear storage space is to first sort the smaller of the two subsequences recursively (the other is sorted afterwards, but also recursively). This means that the number of subsequences that have not yet been sorted is limited by. For each substring that has not yet been sorted, two index limits are stored, each of which requires additional storage space. Overall, with this variant, Quicksort requires additional storage space in the logarithmic cost measure.

variants

QuickSort for linked lists

In the QuickSort variant for linked lists shown below , the first of the sequence to be divided is selected as the pivot element. A pointer Pointer2 is advanced until it hits an element that is smaller than the pivot. The elements that Pointer2 passed are therefore greater than or equal to the pivot. Replacing the first of these larger elements with the one at Pointer2 ensures that the elements in question end up in the correct section. Pointer1 marks the current end of the section of elements that are smaller than the pivot. When Pointer2 has reached the right edge of the sequence to be divided, the pivot element is then swapped to the correct position between the partial sequences.

//Links, Rechts sind hier Zeiger
QuickSort(Links,Rechts);
  falls Links<>Rechts dann
    //Initialisierung der (lokalen) Zeiger
    //Zeiger0 wird nur als Vorgänger von Zeiger1 benötigt
    Zeiger0:=Links
    Zeiger1:=Links
    Zeiger2:=Links
    Pivot:=Links.Zahl
    wiederhole
      Zeiger2:=Zeiger2.Nachfolger;
      falls Zeiger2.Zahl<Pivot dann
        Zeiger0:=Zeiger1;
        Zeiger1:=Zeiger1.Nachfolger;
        tausche(Zeiger1,Zeiger2)
    solange bis Zeiger2=Rechts;
    tausche(Links,Zeiger1);
    falls Zeiger1<>Rechts dann
      Zeiger1:=Zeiger1.Nachfolger;
    QuickSort(Links,Zeiger0);
    QuickSort(Zeiger1,Rechts);
ende

The disadvantage of this procedure is that a largely sorted sequence or many similar key values ​​lead to worst-case-like behavior. For this reason, sorting algorithms such as mergesort are often chosen for linked lists , which even in the worst case have a time complexity of . Other dynamic data structures such as balanced trees (e.g. B-trees , AVL-trees ) distribute the costs of sorting to the insert operations, so that subsequent sorting is not necessary.

Iterative quicksort

Quicksort can also be implemented non-recursively using a small stack or array . Here is a simple variant with random selection of the pivot element:

funktion quicksort_iterativ(links, rechts)
   zufall:= random() // zufälliger Startwert
   wiederhole // äußere Schleife
      solange links < rechts wiederhole
         pivot:= daten[random(links, rechts)] // Auswahl eines zufälligen Elements, das zwischen links und rechts liegt
         push (rechts) // rechten Teil später sortieren
         mitte:= links
         wiederhole // innere Schleife, teile Feld
            solange daten[mitte] < pivot wiederhole // suche falsch einsortiertes Element von links
               mitte:= mitte + 1
            ende
            solange daten[rechts] > pivot wiederhole // suche falsch einsortiertes Element von rechts
               rechts:= rechts - 1
            ende
            falls mitte >= rechts dann Abbruch innere Schleife
            tausche daten[mitte] mit daten[rechts]
         ende // Feld geteilt, linken Teil weitersortieren
      ende
      falls stapel leer dann Abbruch äußere Schleife // noch ein rechter Teil übrig?
      links:= rechts + 1
      pop (rechts) // jetzt rechten Teil sortieren
   ende
ende

The instruction pushputs an element on the stack where it popis fetched again. The random choice of the pivot element provides a high probability of an average case. The size of the stack memory required is less than 2 · log 2 (n) with sufficient certainty . If a limited stack nevertheless overflows, the sorting of the remaining part can simply be started from the beginning.

The efficiency of Quicksort lies in the fact that it swaps elements with one another from a great distance. The shorter the list to be sorted, the more inefficient Quicksort works, since it approaches a complexity of . However, the list broken down into sublists by Quicksort has the property that the distance between an element and its sorted position is limited. Such a list sorts insertion sort in linear time, so that the quicksort is often aborted in the implementation below a defined partial list length in order to sort further with insertion sort.

QuickSort in functional languages

In functional languages ​​such as Haskell or Erlang , QuickSort can be implemented very easily thanks to its more powerful list processing:

quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (e:es) = quicksort [x | x <- es, x <= e] ++ [e] ++ quicksort [x | x <- es, x > e]

It is ++the concatenation operator for lists; the pattern (e:es)isolates the first element of the list and [x | x <- es, x <= e]yields all elements of the list esthat are smaller than the pivot element e.

Parallel quicksort

If several processors are available, it is also possible to parallelize Quicksort. This means that better runtimes can be achieved under certain circumstances.

literature

Web links

Individual evidence

  1. ^ CAR Hoare: Quicksort . In: The Computer Journal . tape 5 (1) , 1962, pp. 10–15 , doi : 10.1093 / comjnl / 5.1.10 .
  2. Steven S. Skiena: The Algorithm Design Manual . Springer, April 27, 2011, ISBN 978-1-84800-069-8 , p. 129 (accessed July 18, 2013).
  3. A Java implementation of a stable in-place variant of Quicksort
  4. ^ University of Illinois System, Department of Mathematics, Statistics, and Computer Science: Performance of Quicksort
  5. ^ Paul Hsieh: Sorting revisited. www.azillionmonkeys.com, 2004, accessed April 26, 2010 .
  6. ^ David MacKay : Heapsort, Quicksort, and Entropy. www.aims.ac.za/~mackay, December 1, 2005, archived from the original on June 7, 2007 ; accessed on April 26, 2010 (English).
  7. Ellis Horowitz, Sartaj Sahni, Susan Anderson-Freed: Fundamentals of data structures in C . 1st edition. International Thomson Publishing, 1994, ISBN 3-929821-00-1 , pp. 342 .