Counting location

Counting sort (of English count "count") is a stable sorting process , which a given sequence of elements with linear time ( problem complexity sorted) when the sort key natural numbers from a limited interval of possible values are (or can be mapped to it). The algorithm does not work based on comparison , but counts the occurrences of the key values - it then works based on addresses. Compared to comparative sorting with the best possible complexity , there is an advantage if the interval length is very small compared to the number of elements to be sorted . ${\ displaystyle n}$ ${\ displaystyle \ textstyle O (n + k)}$ ${\ displaystyle k}$ ${\ displaystyle \ textstyle O (n \ log n)}$ ${\ displaystyle k}$ ${\ displaystyle n}$

Well-suited example: Sorting all residents of Bavaria (n = 12.85 million) according to their age (in completed years: k = 0..150).

Bad / impossible to implement example: Sorting the pupils of a primary school (n = 300) according to their surname (character string with a maximum length of 40 from all possible letters, umlauts and special characters; k ≈ 35 ⁴⁰ ).

algorithm

Countingsort assumes that the key values of the input to be sorted are in a limited interval (or can simply be mapped to it). The algorithm counts how often each of these values occurs in the input. It saves these numbers in an additional array with fields. With the help of this array, the target position in the output is then calculated for each key value. ${\ displaystyle \ {0, \ ldots, (k-1) \}}$ ${\ displaystyle k}$

Input: Array with for all and the right interval limit . ${\ displaystyle A [1..n]}$ ${\ displaystyle A [j] \ in \ {0, \ ldots, (k-1) \}}$ ${\ displaystyle j = 1, \ ldots, n}$ ${\ displaystyle (k-1)}$

Output: Field with contents of in sorted order. ${\ displaystyle B [1..n]}$ ${\ displaystyle A}$

Additional memory: During the sorting, the algorithm needs an auxiliary array . ${\ displaystyle C [0 .. (k-1)]}$

Specification of the algorithm in pseudocode :

countingsort(A,k)
{
  C = array(0,k-1)

  // Initialisiere das Array C mit Nullen
  for (m=0; m<k; m++)
    C[m] = 0
  // end for

  // Schreibe in C[m], wie häufig der Wert m in A vorkommt
  for (j=1; j<=A.size; j++)
    C[key(A[j])] += 1
  // end for

  // Adressrechnung
  for (m=1; m<k; m++)
    C[m] += C[m-1]
  // end for

  B = array(1, A.size)

  // Kopiere auf jeweilige Zieladresse in B (mit Dekrementierung)
  for (j=A.size; j>0; j--) {
    B[ C[ key(A[j]) ] ] = A[j]
    C[key(A[j])] -= 1
  }

  return B
}

The key function returns the sort key of the array element of the input. If an array element only consists of the sort key and does not contain any other components, then is . The auxiliary array is initialized in the first for loop . The second for loop iterates over the input and increments the assigned array element for each sort key . thus contains the number of times the key value in occurs. In the third for loop the counters are accumulated in, so that after the end of the loop the last position for an element with a key value in the output array is designated. This means that it no longer contains the “number of 'm elements' in A”, but the “address of the 'm section' for B” (but the end address, not the start address). The last for loop runs through the input from right to left (due to the end addresses in ), and sorts the respective entry: If the loop is in iteration , then has the key value (= ). This must therefore be sorted at address . The target address is then decremented so that the next element in , which has the key , is sorted one position ahead. As a result of the jump-free iteration, the relative order of several elements of the input with the same sort key is not changed in the output either. H. Countingsort sorts stable . ${\ displaystyle {\ text {key}} (x) = x}$
${\ displaystyle C}$
${\ displaystyle C [{\ text {key}} (A [j])]}$ ${\ displaystyle C [m]}$ ${\ displaystyle m}$ ${\ displaystyle A}$
${\ displaystyle C}$ ${\ displaystyle C [x]}$ ${\ displaystyle x}$ ${\ displaystyle C [m]}$
${\ displaystyle C}$ ${\ displaystyle j}$ ${\ displaystyle A [j]}$ ${\ displaystyle m_ {j}}$ ${\ displaystyle {\ text {key}} (A [j])}$ ${\ displaystyle C [m_ {j}]}$ ${\ displaystyle C [m_ {j}]}$ ${\ displaystyle A}$ ${\ displaystyle m_ {j}}$

Simplified algorithm

If the input is a simple number array, then Countingsort can be represented without an accumulation phase:

countingsort(A, k)
{
  C = array(0, (k-1))
  for (m=0; m<k; m=m+1)
    C[m] = 0
  // end for

  for (j=1; j<=A.size; j=j+1)
    C[A[j]] = C[A[j]] + 1
  // end for

  i=0
  B = array(1, A.size)
  for (m=0; m<k; m=m+1) {
    while (C[m]>0) {
      B[i] = m
      C[m] = C[m]-1
      i = i+1
    }
  }

  return B
}

This makes use of the fact that it is then not necessary to copy the contents from A to B - it is sufficient to fill the sections of B with as many integers as is noted in the respective . ${\ displaystyle m}$ ${\ displaystyle m}$ ${\ displaystyle C [m]}$

Function example

Execution of counting sort on an input field with elements from with auxiliary field and sorted output in field . ${\ displaystyle A [1..8]}$ ${\ displaystyle \ {0, \ ldots, 5 \}}$ ${\ displaystyle C}$ ${\ displaystyle B}$

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
0	0	0	0	0	0

1	2	3	4th	5	6th	7th	8th

Representation one below the other: output array , auxiliary array , the length of which depends on the definition range of the array. The elements are inserted sorted into the bottom array. The above figure shows the given sequence of numbers, whereby the first loop of the algorithm has already been processed by only initializing the array with 0. The second loop increments its position in the array by one for each digit. ${\ displaystyle A}$ ${\ displaystyle C}$
${\ displaystyle C}$

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
2	0	2	3	0	1

1	2	3	4th	5	6th	7th	8th

The third loop sums up the array so that its content indicates the position up to which a value appears in the sorted array. Two identical consecutive numbers mean that the last of the two numbers in the sequence does not appear at all, i.e. it was previously a 0 in this position. In Array So now stand instead of numbers , as often contain a value in the array A, instead, end addresses (or end-indices ): ${\ displaystyle C}$ ${\ displaystyle C}$ ${\ displaystyle C}$

The last value '5' from array A must be at index 8 in target array B;
the value '4' did not appear in array A and therefore has the same end address as the value '3';
the last '3' from array A must go to index 7 in target array B - then target address C ['3'] is counted down by 1 so that the next '3' from array A receives target index 6 and thus to B [ 6] is copied;
the last '2' from array A must be at index 4 (ie B [4]), further '2's from A must be before index 4;
'1' does not appear in A, so it has the same “last destination address” as the value '0';
the last '0' from array A has the destination address C ['0'], i.e. 2. As soon as it has been sorted into array B (i.e. after B [2]), C ['0'] must be counted down by 1, so that the next '0' found in array A is copied to index C ['0'] = 1, ie into field B [1].

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
2	2	4th	7th	7th	8th

1	2	3	4th	5	6th	7th	8th

In the last loop, the values from are successively transferred to the array , precisely at the point in the target array that the auxiliary array specifies for the corresponding number. Before the loop, this is always the last place where the number will appear. After each number has been transferred, the value in is also decremented. The next identical number is therefore inserted one position earlier in the target array. The following are the 8 steps. ${\ displaystyle A}$ ${\ displaystyle B}$ ${\ displaystyle C}$ ${\ displaystyle C [number]}$

The first 7 steps in detail

Step 1

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
2	2	4th	6th	7th	8th

1	2	3	4th	5	6th	7th	8th
						3

2nd step

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
1	2	4th	6th	7th	8th

1	2	3	4th	5	6th	7th	8th
	0					3

3rd step

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
1	2	4th	5	7th	8th

1	2	3	4th	5	6th	7th	8th
	0				3	3

4th step

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
1	2	3	5	7th	8th

1	2	3	4th	5	6th	7th	8th
	0		2		3	3

5th step

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
0	2	3	5	7th	8th

1	2	3	4th	5	6th	7th	8th
0	0		2		3	3

6th step

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
0	2	3	4th	7th	8th

1	2	3	4th	5	6th	7th	8th
0	0		2	3	3	3

7th step

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
0	2	3	4th	7th	7th

1	2	3	4th	5	6th	7th	8th
0	0		2	3	3	3	5

8th step

1	2	3	4th	5	6th	7th	8th
2	5	3	0	2	3	0	3

0	1	2	3	4th	5
0	2	2	4th	7th	7th

1	2	3	4th	5	6th	7th	8th
0	0	2	2	3	3	3	5

complexity

Runtime analysis

The runtime of the function depends on (number of elements in the input array) and (the size of the number interval). The for loops are run through either times or times. The time complexity of Countingsort is thus . ${\ displaystyle n}$ ${\ displaystyle k}$ ${\ displaystyle n}$ ${\ displaystyle k}$ ${\ displaystyle {\ mathcal {O}} (n + k)}$

Storage space requirements

In addition to the input and output, which each require memory fields ( out-of-place ), a temporary array is created to store the frequencies of the numerical values. This requires elements of storage space. The space complexity of Countingsort is therefore in . ${\ displaystyle n}$ ${\ displaystyle k}$ ${\ displaystyle {\ mathcal {O}} (n + k)}$

literature

Thomas H. Cormen, Charles E. Leiserson , Ronald L. Rivest , Clifford Stein: Introduction to Algorithms . 2nd Edition. MIT Press, Cambridge (Massachusetts) 2001, ISBN 0-262-03293-7 , pp. 168-170 .
Donald E. Knuth : The Art of Computer Programming : Volume 3 Sorting and Searching . 2nd Edition. Addison-Wesley, Reading MA 1997, ISBN 0-201-89685-0 , pp. 78 .
W. Feurweise: Algorithm 23: Math Sort . In: Communications of the ACM . tape 3 , no. 11 , 1960, pp. 601 .
HH Seward: MIT Digital Computer Laboratory Report R-232 . 1954, p. 25–28 (master's thesis).

Web links

Wikibooks: Countingsort - implementations in the algorithm collection