Prefix sum

In computer science , the prefix sum of a sequence of numbers a ₀ , a ₁ , a ₂ , ... is the number sequence s ₀ , s ₁ , s ₂ , ... of its partial sums :

s ₀ = a ₀

s ₁ = a ₀ + a ₁

s ₂ = a ₀ + a ₁ + a ₂

...

For example, the prefix sum of the natural numbers is the sequence of the triangular numbers :

Input sequence	1	2	3	4th	5	6th	...
Prefix sum	1	3	6th	10	15th	21st	...

The prefix sum can be calculated sequentially with a simple loop by using the formula

s _i = s _{i −1} + a _i

for i > 0 each sum value is added up successively. The prefix sum forms the basis for algorithms such as the counting sort . You can take a general summation as basic operation associative binary operation use, which, for example, for polynomial interpolation or to manipulate strings used and in the functional programming on higher-order functions can be applied; in this case it is also called a scan . Since the prefix sum can also be calculated efficiently on multi-core processor systems and computer clusters with the fork-join model , it plays an important theoretical and practical role when considering parallel algorithms , both as a test problem to be solved and as a subroutine of important parallel algorithms.

Mathematically, the calculation of the prefix sum can be generalized from finite to infinite sequences. It then represents a series . The prefix summation is a linear operator on a vector space of finite or infinite sequences. Its inverse is a difference operator .

Prefix operations ( scans ) of higher order functions

The actual prefix sum is based on the binary operation of addition . In functional programming , the prefix sum can be generalized to any binary associative operation, i.e. it can represent a prefix operation instead of a prefix sum . (However, the term “prefix sum” is often used for this.) The prefix operation resulting from this generalization is a higher order function and is also called a scan . For example, the sequence of faculties ( a _k ) can be calculated as the prefix product by replacing the addition with the multiplication:

Input sequence	1	2	3	4th	5	6th	...
Prefix product	1	2	6th	24	120	720	...

The scan operation is therefore similar to the reduce operation , however, Scan returns the entire sequence of partial operations, while Reduce only returns the value of the last partial operation as the result.

There are two flavors of scan in Haskell , namely and . The procedural MPI libraries offer an operation to compute a scan operation between networked processor units. The C ++ programming language has a function in its standard library which, despite its name, performs a scan operation with any binary operation. The Java programming language (from version 1.8) offers a method in several variants in the standard package , which expects a binary operator in addition to the array to be processed. The prefix sum of an array is calculated using the following statement and stored in the input array : scanlscanl1MPI_Scanpartial_sumjava.util.ArraysparallelPrefixa[k]

Arrays.parallelPrefix(a, (x,y) -> x + y);

The binary operator is specified here as a lambda expression , i.e. an anonymous function with two parameters. According to the example above, the factorial can be used as a prefix product

Arrays.parallelPrefix(a, (x,y) -> x * y);

programmed.

Parallel calculation

Sequence of calculation steps for a parallel prefix sum with 16 input entries

Using the fork-join model, a prefix sum can be efficiently calculated in parallel using the following steps.

Calculate the sums of the successive pair entries in which the first entry has an even index: z ₀ = a ₀ + a ₁ , z ₁ = a ₂ + a ₃ ,….
Recursively compute the prefix sum w ₀ , w ₁ , w ₂ ,… of the sequence z ₀ , z ₁ , z ₂ ,….
Express each term of the result sequence s ₀ , s ₁ , s ₂ ,… as the sum of at most two terms of this intermediate sequence: y ₀ = a ₀ , y ₁ = z ₀ , y ₂ = z ₀ + a ₂ , y ₃ = w ₀ etc. After the first value, each successive number y _{i is} either a copy of the value with half the index in the sequence w , or it is the previous value plus a value in the original sequence a .

If the input sequence has n entries, the recursion continues to a depth of O (log n ), which also represents the limitation of the parallel running time. The number of steps of the algorithm is O ( n ) and can be implemented on a PRAM with O ( n / log n ) processors by simply assigning several indices to a processor in rounds in which there are more entries than processors.

Parallel algorithms for prefix sums can be generalized to other prefix operations (scans) of associative binary operations . They run efficiently on parallel hardware such as multi-core processors , GPUs or computer clusters . Many parallel implementations use two passes to do this: the first pass calculates the partial prefix sums on each processor unit; the prefix sum of these partial sums is then calculated and sent back for the second pass to the individual processor units, which then continue to calculate with the known prefix as the initial value. Asymptotically approximated, this method requires about two read and one write operations per entry.

Applications

Countingsort is an integer sequence sorting algorithm that uses the prefix sum of a histogram of key frequencies to determine the position of each key in the sorted output sequence . The algorithm has linear time complexity O (n) for integer key values that are less than the number of entries, and also linear memory complexity . Countingsort, in turn, can be one of two possible subroutines for the radix sort .

List ranking is the problem of transforming a linked list into an array with the same sequence of elements. It can be done by calculating a prefix sum of the sequence 1, 1, 1, ... and assigning each list element to its prefix sum value as an array index. By combining list ranking, prefix sums and Euler circles , many important problems in tree structures can be efficiently solved using parallel algorithms.

Parallel prefix sum algorithms can be used for the design of adding units, i.e. Boolean switching networks that add two n -digit binary numbers . Here, a sequence of carry bits can be calculated as a prefix operation of the conjunction of the sequence of bit pairs; each bit of the result number is then an XOR of the two input bits and the associated carry bit . This enables a parallel adder for n -digit binary numbers with O ( n ) logic gates and O (log n ) calculation steps to be implemented. ${\ displaystyle f (x, y) = x \ wedge y}$

The Gray code of an entire binary number n , simply by calculating the XOR of n and n / 2 (i.e., the right shift of n by one bit) are formed. The inverse operation, i.e. the decoding of a Gray code x into a binary number, can be expressed as the prefix sum of the bits of x modulo 2, i.e. as a prefix operation with the associative XOR function .

{\ displaystyle f (x_ {1}, x_ {2}) = x_ {1} \ oplus x_ {2}}

Parallel prefix products (with multiplication as an associative operation) can be used as a component of fast parallel algorithms for polynomial interpolation . In particular, they can be used to calculate coefficients according to the divided difference scheme in Newton's algorithm . Similarly, Hermite interpolation or interpolation of functions can be efficiently calculated in parallel using the Vandermonde matrices .

Web links

Eric W. Weisstein : Cumulative Sum . In: MathWorld (English).

Individual evidence

^ ^A ^b Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein: 8.2 Counting Sort . In: Introduction to Algorithms , 2nd Edition, MIT Press, McGraw-Hill, 2001, ISBN 0-262-03293-7 , pp. 168-170.
↑ Guy Blelloch: Prefix Sums and Their Applications (Lecture Notes) . Carnegie Mellon University, 2011.
^ Paul Callahan, S. Rao Kosaraju: A Decomposition of Multi-Dimensional Point Sets with Applications to k-Nearest-Neighbors and n-Body Potential Fields . In: Journal of the ACM . 42, No. 1, 1995.
↑ ^a ^b A. de Vries (2014): Functional programming and streams in Java (PDF) Introductory lecture script for business informatics, FH Südwestfalen Hagen , § 3
↑ ^a ^b ^c ^d ^e R. E. Ladner, MJ Fischer: Parallel Prefix Computation . In: Journal of the ACM . 27, No. 4, 1980, pp. 831-838. doi : 10.1145 / 322217.322232 .
^ ^A ^b ^c Robert E. Tarjan, Uzi Vishkin: An efficient parallel biconnectivity algorithm . In: SIAM Journal on Computing . 14, No. 4, 1985, pp. 862-874. doi : 10.1137 / 0214061 .
↑ S. Lakshmivarahan, SK Dhall: Parallelism in the prefix problem . Oxford University Press , 1994, ISBN 0-19-508849-2 .
↑ Standard Prelude . In: Simon Peyton Jones (Ed.): Haskell 98 Language and Libraries: The Revised Report 2002.
↑ Java SE 8 API (2014)
↑ ^a ^b Yu. Ofman : Об алгоритмической сложности дискретных функций . In: Doklady Akademii Nauk SSSR . 145, No. 1, 1962, pp. 48-51.
↑ ^a ^b V. M. Khrapchenko: Asymptotic Estimation of Addition Time of a Parallel Adder . In: Problemy Kibernet. . 19, 1967, pp. 107-122.
↑ Shubhabrata Sengupta, Mark Harris, Yao Zhang, John D. Owens: Scan primitives for GPU computing . In: Proceedings of the 22nd ACM SIGGRAPH / EUROGRAPHICS Symposium on Graphics Hardware 2007, pp. 97-106.
^ Henry S. Warren: Hacker's Delight . Addison-Wesley, 2003, ISBN 978-0-201-91465-8 , p. 236.
↑ O. Eğecioğlu, E. Gallopoulos, C. Koç: A parallel method for fast and practical high-order Newton interpolation . In: BIT Computer Science and Numerical Mathematics . 30, No. 2, 1990, pp. 268-288. doi : 10.1007 / BF02017348 .
↑ O. Eğecioğlu, E. Gallopoulos, C. Koç: Fast computation of divided differences and parallel interpolation Hermite . In: Journal of Complexity . 5, 1989, pp. 417-437. doi : 10.1016 / 0885-064X (89) 90018-6 .

[clrs-1] A ^b Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein: 8.2 Counting Sort . In: Introduction to Algorithms , 2nd Edition, MIT Press, McGraw-Hill, 2001, ISBN 0-262-03293-7 , pp. 168-170.

[blelloch11-2] Guy Blelloch: Prefix Sums and Their Applications (Lecture Notes) . Carnegie Mellon University, 2011.

[callahan1995-3] Paul Callahan, S. Rao Kosaraju: A Decomposition of Multi-Dimensional Point Sets with Applications to k-Nearest-Neighbors and n-Body Potential Fields . In: Journal of the ACM . 42, No. 1, 1995.

[fork-join-4] A. de Vries (2014): Functional programming and streams in Java (PDF) Introductory lecture script for business informatics, FH Südwestfalen Hagen , § 3

[lf80-5] R. E. Ladner, MJ Fischer: Parallel Prefix Computation . In: Journal of the ACM . 27, No. 4, 1980, pp. 831-838. doi : 10.1145 / 322217.322232 .

[tv85-6] A ^b ^c Robert E. Tarjan, Uzi Vishkin: An efficient parallel biconnectivity algorithm . In: SIAM Journal on Computing . 14, No. 4, 1985, pp. 862-874. doi : 10.1137 / 0214061 .

[Prefix.book.92-7] S. Lakshmivarahan, SK Dhall: Parallelism in the prefix problem . Oxford University Press , 1994, ISBN 0-19-508849-2 .

[8] Standard Prelude . In: Simon Peyton Jones (Ed.): Haskell 98 Language and Libraries: The Revised Report 2002.

[java-9] Java SE 8 API (2014)

[offman-10] Yu. Ofman : Об алгоритмической сложности дискретных функций . In: Doklady Akademii Nauk SSSR . 145, No. 1, 1962, pp. 48-51.

[krapchenko-11] V. M. Khrapchenko: Asymptotic Estimation of Addition Time of a Parallel Adder . In: Problemy Kibernet. . 19, 1967, pp. 107-122.

[12] Shubhabrata Sengupta, Mark Harris, Yao Zhang, John D. Owens: Scan primitives for GPU computing . In: Proceedings of the 22nd ACM SIGGRAPH / EUROGRAPHICS Symposium on Graphics Hardware 2007, pp. 97-106.

[13] Henry S. Warren: Hacker's Delight . Addison-Wesley, 2003, ISBN 978-0-201-91465-8 , p. 236.

[14] O. Eğecioğlu, E. Gallopoulos, C. Koç: A parallel method for fast and practical high-order Newton interpolation . In: BIT Computer Science and Numerical Mathematics . 30, No. 2, 1990, pp. 268-288. doi : 10.1007 / BF02017348 .

[15] O. Eğecioğlu, E. Gallopoulos, C. Koç: Fast computation of divided differences and parallel interpolation Hermite . In: Journal of Complexity . 5, 1989, pp. 417-437. doi : 10.1016 / 0885-064X (89) 90018-6 .