# Sorting process

In computer science, a sorting process is understood to be an algorithm that is used to sort a tuple (generally an array ) . The prerequisite is that a strict weak order is defined on the set of elements (“less than or equal to”), e.g. B. the lexicographical order of strings or the numerical order of numbers.

There are different sorting methods that work differently in terms of efficiency with regard to time complexity (number of operations required) and space complexity (additional storage space required in addition to the input array ). The complexity of an algorithm is usually represented in Landau notation (see expressions such as below ). With some sorting methods, the time complexity depends on the initial arrangement of the values ​​in the array; a distinction is then made between best case (with the most favorable “presorting”), average case (normal case) and worst case${\ displaystyle \ Theta (n \ cdot \ log (n))}$(worst case ~ the values ​​are “maximally unfavorable”). Often, additional factors have to be taken into account that have an influence on time or space complexity, for example slow access to external data, limited size of the main memory or the like.

A distinction is also made between stable and unstable sorting processes. Stable sorting methods are those that do not change the relative order of elements that are equivalent in terms of order, while unstable sorting methods do not guarantee this. For example, if the employee list of a company is sorted by surname and is then sorted by age (in years), the (surname) order among employees of the same age is retained with a stable sorting process.

A distinction is also made between sorting processes that work in-place (also in situ ), i.e. H. the additional storage requirement is independent of the number of elements to be sorted (i.e. constant and mostly small) and those for which it is dependent ( out-of-place or ex situ ).

A distinction is also made between natural sorting processes, which work faster with presorted data than with unsorted data, and those that do not. Algorithms in which the control flow depends on the data are called adaptive and, accordingly, sorting methods that do not depend on the input data are called non-adaptive. Accordingly, non-adaptive algorithms are particularly interesting for hardware implementations.

Manual sorting ( e.g. of index cards) and electro-mechanical sorting processes (e.g. for punch cards ) mostly correspond to one of the software-based sorting processes or mixed types described here .

## Comparison-based sorting

General procedures are based on the pairwise comparison of the elements to be sorted, whether one element is "smaller" than, "larger" than or "equal (large)" to the other element. The complexity analysis assumes that the effort required to compare two elements is constant.

Sorting process Cheapest case ("best case") Middle case ("Average-Case") Most unfavorable case ("worst case") Stable Additional memory requirements
Binary Tree Sort
(height-balanced)
${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ Yes ${\ displaystyle \ Theta (n)}$
Binary Tree Sort ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n ^ {2})}$ Yes ${\ displaystyle \ Theta (n)}$
Bubble sort ${\ displaystyle \ Theta (n)}$ ${\ displaystyle \ Theta (n ^ {2})}$ ${\ displaystyle \ Theta (n ^ {2})}$ Yes -
Combsort ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle {\ mathcal {O}} (n ^ {2})}$ ${\ displaystyle {\ mathcal {O}} (n ^ {2})}$ No -
Gnome location ${\ displaystyle \ Theta (n)}$ ${\ displaystyle \ Theta (n ^ {2})}$ ${\ displaystyle \ Theta (n ^ {2})}$ Yes -
Heapsort ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ No -
Insertion location ${\ displaystyle \ Theta (n)}$ ${\ displaystyle \ Theta (n ^ {2})}$ ${\ displaystyle \ Theta (n ^ {2})}$ Yes -
Introsort ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ No -
Merge insertion ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ Yes -
Merge sort ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ Yes Implementation on linked list: in-place
common implementations (on array): There is in-place on array, but then time complex. = n * (log n) * (log n). ${\ displaystyle \ Theta (n)}$
Natural merge sort ${\ displaystyle \ Theta (n)}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ Yes -
Quicksort ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n ^ {2})}$ No -
Selection location ${\ displaystyle \ Theta (n ^ {2})}$ ${\ displaystyle \ Theta (n ^ {2})}$ ${\ displaystyle \ Theta (n ^ {2})}$ No -
Shakersort (cocktail location ) ${\ displaystyle \ Theta (n)}$ ${\ displaystyle \ Theta (n ^ {2})}$ ${\ displaystyle \ Theta (n ^ {2})}$ Yes -
Shellsort ${\ displaystyle {\ mathcal {O}} (n \ cdot \ log (n) ^ {2})}$ ${\ displaystyle {\ mathcal {O}} (n \ cdot \ log (n) ^ {2})}$ ${\ displaystyle {\ mathcal {O}} (n \ cdot \ log (n) ^ {2})}$ No -
Smoothsort ${\ displaystyle \ Theta (n)}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ No -
Stoogesort ${\ displaystyle \ Omega (n ^ {2,7})}$ ${\ displaystyle \ Omega (n ^ {2,7})}$ ${\ displaystyle \ Omega (n ^ {2,7})}$ No -
Swap sort ${\ displaystyle \ Theta (n ^ {2})}$ ${\ displaystyle \ Theta (n ^ {2})}$ ${\ displaystyle \ Theta (n ^ {2})}$ - -
Timsort ${\ displaystyle \ Theta (n)}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ ${\ displaystyle \ Theta (n \ cdot \ log (n))}$ Yes -
Bogosort ${\ displaystyle \ Theta (n)}$ ${\ displaystyle {\ mathcal {O}} (n \ cdot n!)}$ ${\ displaystyle {\ mathcal {O}} (n \ cdot n!)}$ No -
Slow sort ${\ displaystyle \ Omega \ left (n ^ {\ frac {\ log (n)} {(2+ \ varepsilon)}} \ right)}$ ${\ displaystyle \ Omega \ left (n ^ {\ frac {\ log (n)} {(2+ \ varepsilon)}} \ right)}$ ${\ displaystyle \ Omega \ left (n ^ {\ frac {\ log (n)} {(2+ \ varepsilon)}} \ right)}$ No -
1. a b For the stable version s. the remark in the article Binary Tree Sort .
2. a b c For the (in the worst case) best known distance sequence.
3. a b Expected maturity.
4. a b c For any , see Slowsort .${\ displaystyle \ varepsilon> 0}$

## Non-comparison-based sorting

In the case of sorting processes that are not based on comparisons, in which the objects to be sorted are not compared with one another for “smaller”, “larger” or “equal”, with appropriately conditioned input it can be achieved that the time required is only linear with the number of items to be sorted increases. In the case of large numbers of data records to be sorted, these algorithms are superior to the comparison-based methods, provided they can be used (because of the additional memory requirement). However, they can only be used for numeric data types (or under the condition that the data type can be mapped to numerical values ​​in the same order with reasonable effort). It is implicitly assumed that the length of the key is limited so that it can be used in a constant time. Reducing the time complexity of the number of elements is bought at the expense of an additional time dependency variable (mostly the key length or the number of possible key values), often also through considerable additional memory requirements.

Sorting process time Stable Additional memory requirements
Bucket location ${\ displaystyle {\ mathcal {O}} \ left (n + k \ right)}$ Yes ${\ displaystyle {\ mathcal {O}} \ left (n + k \ right)}$
Counting location ${\ displaystyle {\ mathcal {O}} \ left (n + k \ right)}$ Yes ${\ displaystyle {\ mathcal {O}} \ left (n + k \ right)}$
Natal variety ${\ displaystyle {\ mathcal {O}} \ left (n \ cdot l \ right)}$ Yes ${\ displaystyle {\ mathcal {O}} \ left (n \ right)}$
MSD radix sort ${\ displaystyle {\ mathcal {O}} \ left (n \ cdot l \ right)}$ No ${\ displaystyle {\ mathcal {O}} \ left (1 \ right)}$, in-place
Flashsort ${\ displaystyle {\ mathcal {O}} \ left (n \ right) \, .. \, {\ mathcal {O}} \ left (n ^ {2} \ right)}$ No ${\ displaystyle {\ mathcal {O}} \ left (1 \ right)}$

It shows the number of elements, the number of possible values ​​and the number of digits of the longest key. ${\ displaystyle n}$${\ displaystyle k}$${\ displaystyle l}$

## Sorting by relationships

If you can no longer sort according to properties, but only according to paired relationships, then one speaks of a topological sorting . This is the case, for example, when tasks have to be completed, but some tasks must necessarily be performed before others, but for others the order does not matter.

There are algorithms for topological sorting, the running time of which depends on the number of relationships. Topological sorting is not possible if there are mutual (cyclical) dependencies. A topological sorting does not have to be unique.

When the relationships are complete, i.e. a dependency is given for every two objects, the topological sorting changes over to a normal sorting.

## Indirect sorting

In cases where rearrangement of the data is associated with a lot of effort, indirect sorting can also be used. You need additional memory proportional to the number of elements (e.g. a pointer to the respective element, or its index number in the basic array). This array is then sorted indirectly and thus represents a sorted index (according to the comparison criterion) . If the actual data are then also to be brought into the correct order, additional effort is required. ${\ displaystyle \ Theta \ left (n \ right)}$

If (optional) access to the elements is also "expensive", those data components that are included in the sort key are sometimes also included in the index. This then requires additional storage space.

## Proof of the lower bound for comparison-based sorting

It can be proven that a comparison-based sorting process cannot be faster than : ${\ displaystyle \ Omega (n \ cdot \ log (n))}$

Be the decision tree for the sequence of numbers . Since all permutations of could be the result of the sorting algorithm, the decision tree must have at least leaves. Since a minimum number of steps is sought, there are no unnecessary comparisons in the decision tree. ${\ displaystyle B}$${\ displaystyle X = (x_ {1}, \ ldots, x_ {n})}$${\ displaystyle X}$${\ displaystyle B}$${\ displaystyle n!}$

In a decision tree with leaves, the maximum and mean depths of a leaf are at least . Since we are looking for a lower bound, we can use to estimate downwards. This applies . ${\ displaystyle n!}$${\ displaystyle \ log (n!)}$${\ displaystyle n!}$${\ displaystyle n! \ geq \ left ({\ frac {n} {2}} \ right) ^ {n / 2}}$${\ displaystyle \ log (n!) \ geq \ left ({\ frac {n} {2}} \ right) \ cdot \ log \ left ({\ frac {n} {2}} \ right) = \ Omega (n \ cdot \ log (n))}$

It remains to be shown that in a binary tree with leaves the maximum and mean depth of a leaf is at least . Assume a binary tree for which the above statement does not apply. Let and be sub-trees of a binary tree with leaves. For the number of leaves in or in it is now evident that and . The following applies to the depth of each leaf, based on the root of : ${\ displaystyle k}$${\ displaystyle \ log (k)}$${\ displaystyle B}$${\ displaystyle T_ {1}}$${\ displaystyle T_ {2}}$${\ displaystyle k> 2}$${\ displaystyle k_ {1}}$${\ displaystyle T_ {1}}$${\ displaystyle k_ {2}}$${\ displaystyle T_ {2}}$${\ displaystyle k_ {1} ${\ displaystyle k_ {2} ${\ displaystyle k_ {1} + k_ {2} = k}$${\ displaystyle B}$

{\ displaystyle {\ begin {aligned} {\ text {medium depth}} (B) & = {\ frac {k_ {1}} {k}} \ cdot ({\ text {depth}} _ {\ text { medium}} (T_ {1}) + 1) + {\ frac {k_ {2}} {k}} \ cdot ({\ text {depth}} _ {\ text {medium}} (T_ {2}) +1) \\ & \ geq {\ frac {k_ {1}} {k}} \ cdot (\ log (k_ {1}) + 1) + {\ frac {k_ {2}} {k}} \ cdot (\ log (k_ {2}) + 1) \\ & = {\ frac {1} {k}} \ cdot (k_ {1} \ cdot \ log (2 \ cdot k_ {1}) + k_ { 2} \ cdot \ log (2 \ cdot k_ {2})) \ end {aligned}}}

The minimum of this function is now at and . Inserted into the above formula this results in: ${\ displaystyle k_ {1} + k_ {2} = k}$${\ displaystyle k_ {1} = k_ {2} = {\ frac {k} {2}}}$

${\ displaystyle {\ text {medium depth}} (B) \ geq {\ frac {1} {k}} \ cdot \ left ({\ frac {k} {2}} \ cdot \ log (k) + { \ frac {k} {2}} \ cdot \ log (k) \ right) = \ log (k)}$.

This results in a contradiction to the assumption that proves the above statement.

## literature

• Donald E. Knuth: Sorting and Searching . In: The Art of Computer Programming . 2nd Edition. tape 3 . Addison-Wesley, Boston 2003, ISBN 0-201-89685-0 .
• Niklaus Wirth: Algorithms and Data Structures . 5th edition. Teubner Verlag, Stuttgart / Leipzig 1999, ISBN 3-519-22250-7 .
• Robert Sedgewick: Algorithms in Java, Part 1–4 . 3. Edition. Addison-Wesley, Boston 2002, ISBN 0-201-36120-5 .
• Thomas H. Cormen, Charles Leiserson, Ronald L. Rivest, Clifford Stein: Algorithms - An Introduction . 3. Edition. Oldenbourg Verlag, Munich 2010, ISBN 978-3-486-59002-9 (American English: Introduction to Algorithms . Translated by Paul Molitor).
• Thomas H. Cormen, Charles Leiserson, Ronald L. Rivest, Clifford Stein: Introduction to Algorithms . 3. Edition. The MIT Press, Cambridge MA / London 2009, ISBN 978-0-262-03384-8 .
• Thomas Ottmann, Peter Widmayer: Algorithms and Data Structures . 3. Edition. Spektrum Verlag, Heidelberg / Berlin / Oxford 1996, ISBN 3-8274-0110-0 .
• Anany Levitin: Introduction to The Design and Analysis of Algorithms . 2nd Edition. Pearson Addison-Wesley, Boston 2007, ISBN 978-0-321-36413-5 .