Natal variety

Radix sort (from Latin radix 'root', 'base') or also distribution location (from English distribution 'distribution'), or in German Fachverteilen , is a linear sorting process based on counting sort or bucket sort . The sorting process has a linear runtime , provided that the maximum length of the keys to be sorted is known in advance . The procedure of a punch card sorter corresponds to a radix sort.

In all the variants presented here, the first digit of the key is the one with the highest rank. A distinction method, the first step in this most significant digit (MSD) (engl. Most significant digit begins, then to portions containing less than two elements by a process step will be skipped in the next step); and procedures that start at the least significant digit (LSD) and work their way up to the most significant digit .

There are also stable out-of-place variants and in-place variants, which, however, are not stable.

requirements

With Radixsort, the keys of the data to be sorted consist of characters from a finite alphabet . In addition, there must be a total quasi-order between the characters of the alphabet, usually it is even a total order . The keys thus resemble numbers in a place value system with the power of the alphabet as a base (or radix). This means that unsigned numbers (e.g. unsigned int) can be used directly, while with signed numbers (e.g. signed int) the most significant bit is interpreted inverted. Floating point numbers (e.g. float) can also be converted from the IEEE format into a suitable form.

A second prerequisite is that the length of the key is limited by a constant known in advance , then the runtime behavior is linear in the number of elements. ${\ displaystyle l}$

Procedure (using lists)

Radixsort consists of several steps, and each step has two phases. The partitioning phase is used to divide the data into compartments, while the data from these compartments is collected again in the collection phase . Both phases are carried out once for each digit of the key to be sorted. The number of steps is equal to the (maximum) number of digits.

Partitioning phase: In this phase, the data is divided into the available subjects, with one subject being available for each character of the underlying alphabet. The compartment in which the key being viewed is placed is determined by the character at the point being viewed. For example, the number 352 is placed in compartment 3 when the third digit is being viewed from behind (and when 10 is the base (radix) of the number).

Collection phase: After the data has been divided into compartments in phase 1, the data is collected again and placed on a stack. The procedure here is that all the data from the subject with the lowest value is collected first, whereby the order of the elements in it must not be changed. Then the elements of the next higher compartment are collected and added to the elements that have already been collected. This continues until all compartments have been emptied again.

These two phases are now repeated for each digit in the key, starting with the last digit (LSD ( least significant digit ) radix sort). The same number of compartments is required for each step. And in the last step, the first digit is used for splitting. After the last collection phase, the data are sorted in ascending order.

Alternatively, the digits of the key can also be processed from the most significant digit (MSD ) radix sort. Subdivisions must be created for each subject at each step. However, sub-subjects with fewer than two elements do not require any further subdivision.

Procedure (via counting sort)

Another approach is also possible which also requires two phases. This variant saves the management of variable lists and requires two arrays.

Counting phase

First, the filling levels of the compartments are determined by counting. The first array for local counting locations results in a histogram over the alphabet. Then the histogram is converted to its (left-hand) integral (the first element always being zero).

Collection phase

Now each data element is moved once to its final position using the histogram integral (final for this round). For this purpose, the respective counter must be incremented after each access to the histogram integral, since it serves as a pointer to the writing position. The second array is used to temporarily store the data elements and has the same memory requirements as the original data array.

Procedure (in-place)

Improve Comment : The functionality is incompletely explained. en wiki

Another variant without additional memory (in-place) consists only of the collection phase, since here each compartment is only 1 bit wide. There is no counting of the two bit counters, since the two bit values are filled simultaneously from the two boundaries of the array and the two writing positions converge. Since this process is not stable, it must be started with the most significant digit ( MSB ).

running time

The runtime of the algorithm can be estimated by (see Landau symbols ) , where the length of a key and the number of elements to be sorted denotes. Since is constant for primitive data types such as integers or floating point numbers , Radixsort has a runtime for these cases that increases linearly proportionally with the number of elements to be sorted, which makes it better than other, comparison-based sorting methods such as Quicksort . For variable-length data types such as lists or strings, however , Quicksort or Introsort may be useful . U. the better choice. However, the effort for each comparison of character strings is also linearly dependent on their (average) length, since a breakdown into the maximum register width (e.g. 64 bits) must also take place here. Strictly speaking, the effort for comparison-based sorting methods is also proportional to , whereby here is effectively smaller. ${\ displaystyle {\ mathcal {O}} (l \ cdot n)}$ ${\ displaystyle l}$ ${\ displaystyle n}$ ${\ displaystyle l}$ ${\ displaystyle {\ mathcal {O}} (l)}$ ${\ displaystyle l}$

Specialty

The radix sort can be implemented either stably (i.e. duplicate keys appear after sorting in the same order as in the original set) or in-place (i.e. no additional memory required).

example

The following numbers should be ordered:

124, 523, 483, 128, 923, 584

First, the last digit is sorted (LSD radix sort).

The partitioning phase begins:

|0| |1| |2| |3| |4| |5| |6| |7| |8| |9|
             |   |               |
            523 124             128
            483 584
            923

For the stability of the procedure , it is important that the order of the elements in the compartments |3|and is |4|observed.

This is followed by the collecting phase (collecting items from top to bottom, from left to right):

523, 483, 923, 124, 584, 128

Now the numbers are sorted according to the next digit (second digit from back to front).

Renewed partitioning phase:

|0| |1| |2| |3| |4| |5| |6| |7| |8| |9|
         |                       |
        523                     483
        923                     584
        124
        128

For the procedure to work, it is important that the displayed order of the elements in subjects |2|and |8|is followed. The same applies to the later steps.

Now a second collecting phase:

523, 923, 124, 128, 483, 584

And now it is sorted according to the first digit.

The final partitioning phase:

|0| |1| |2| |3| |4| |5| |6| |7| |8| |9|
     |           |   |               |
    124         483 523             923
    128             584

The last collecting phase follows:

124, 128, 483, 523, 584, 923

The numbers are now sorted in ascending order.

implementation

The following implementation in C ++ uses bit keys for partitioning. You iterate over all 32 bits of an unsigned integer uint32_tand check individually into which of the two compartments you want to partition.

#include <algorithm>
#include <array>
#include <cstdint>
#include <vector>

using namespace std;

template <typename ForwardIt>
void radix_sort(ForwardIt begin, ForwardIt end) {
    // Falls Container leer ist
    if (begin == end)
        return;

    // Partitionierung
    array<vector<uint32_t>, 2> partition;

    // Schleife über alle 32 Bits
    for (uint32_t bit = 0; bit < 32; ++bit) {
        // Bit ermitteln und im Segment abbilden
        for (ForwardIt iterator = begin; iterator != end; ++iterator)
            partition[(*iterator >> bit) & 1].push_back(*iterator);

        // Änderungen aus jedem Segment übernehmen
        copy(partition[0].begin(), partition[0].end(), begin);
        copy(partition[1].begin(), partition[1].end(),
            begin + partition[0].size());

        partition[0].clear();
        partition[1].clear();
    }
}

In another implementation, the largest element in the container is determined first. After that, analogous to the digit number system for base 10, all digit values are partitioned until the largest number has been reached. The algorithmic principle works on any basis greater than 1.

#include <algorithm>
#include <array>
#include <cstdint>
#include <vector>

using namespace std;

// Zweierpotenzen sind schneller (z.B. 256)
constexpr int BASE = 10;

template <typename ForwardIt>
void radix_sort(ForwardIt begin, ForwardIt end) {
    // Falls Container leer ist
    if (begin == end)
        return;

    // Partitionierung
    array<vector<uint32_t>, BASE> partition;
    uint32_t maximum = *max_element(begin, end);

    // Solange höchstwertigste Ziffer noch nicht erreicht
    for (uint32_t factor = BASE; factor <= maximum; factor *= BASE) {
        // Ziffer ermitteln und im Segment abbilden
        for (ForwardIt iterator = begin; iterator != end; ++iterator)
            partition[(*iterator / factor) % BASE].push_back(*iterator);

        ForwardIt copy = begin;

        // Änderungen aus jedem Segment übernehmen
        for (auto& segment: partition) {
            for (uint32_t element: segment) {
                *copy = element;
                ++copy;
            }

            segment.clear();
        }
    }
}

With the following mainfunction, both implementations can be used to sort containers such as arrays of unsigned integers that offer a forward iterator.

#include <cstdint>
#include <iostream>
#include <vector>

using namespace std;

int main() {
    vector<uint32_t> range = { 523, 923, 124, 128, 483, 584 };
    radix_sort(range.begin(), range.end());

    for (auto element: range)
        cout << element << endl;

    return 0;
}

literature

Donald Knuth . The Art of Computer Programming , Volume 3: Sorting and Searching , Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0 . Section 5.2.5: Sorting by Distribution, pp. 168-179.
Robert Sedgewick : Algorithms in C ++ , third edition, 1998, p. 424-427.
VJ Duvanenko: In-Place Hybrid Binary-Radix Sort , Dr. Dobb's Journal, October 1, 2009 .

Web links