External memory algorithm

An external memory algorithm (also called out-of-core algorithm ) is an algorithm that is optimized to efficiently process amounts of data that exceed the capacity of the available main memory . However, access to mass storage media such as hard disks or network storage is several orders of magnitude slower than operations of the ALU or access to higher levels of the storage hierarchy . Therefore, the number of I / O operations on slow mass storage devices is decisive for the performance of external memory algorithms .

The parallel disk model, here with . The internal memory holds data elements, the external memory is unlimited. Data transfer between the two always takes place in blocks of the size .

{\ displaystyle D = P = 1}

{\ displaystyle M}

{\ displaystyle B}

Analysis in the parallel disk model (PDM)

The parallel disk model is often used to analyze external memory algorithms . It models the most important properties of magnetic hard disks and systems with multiple hard disks connected in parallel and is still simple enough for efficient analysis. We only consider batch problems and systems with one processor here. For online problems and systems with any number of processors, see Vitter (2006).

definition

The model consists of an internal memory which holds data elements and an unlimited external memory which consists of hard drives. The main performance indicator is the I / O complexity: the number of accesses to the external memory to solve a problem with data elements. Each time the external memory is accessed, a block of data elements can be loaded into the internal memory from each of the hard drives . Analog can be written from the internal to the external memory. Furthermore should apply as well . ${\ displaystyle M}$ ${\ displaystyle D}$ ${\ displaystyle N}$ ${\ displaystyle D}$ ${\ displaystyle B}$ ${\ displaystyle M <N}$ ${\ displaystyle 1 \ leq DB \ leq M / 2}$

Two files A and B of the size which are distributed in the striping process on the hard disks D1 to D3

{\ displaystyle 4B}

In the case of hard disks, the input data for the problem must be distributed across the disks in striped ( en ) format for efficient processing (see the illustration opposite for an example). The output of the algorithm should follow the same format. This allows data elements to be written to or read from the external memory with the optimal number of I / Os. ${\ displaystyle D> 1}$ ${\ displaystyle D}$ ${\ displaystyle N}$ ${\ displaystyle \ Theta (N / DB)}$

The formulas for the resulting number of I / Os can often be simplified if, instead of the number of data elements used above, the respective number of blocks is used. This results in the derived variables as well . ${\ displaystyle n = N / B}$ ${\ displaystyle m = M / B}$

Fundamental operations

The I / O complexity of many algorithms is essentially determined by the performance of some fundamental operations:

Scanning (also streaming or touching ) - reading or writing of sequential data elements ${\ displaystyle N}$
Sort of data elements (comparison based) ${\ displaystyle N}$

Limits to the I / O complexity of these operations can be found in the following table:

Limits to the I / O complexity of fundamental operations
surgery	I / O barrier, D = 1	I / O barrier, D ≥ 1
${\ displaystyle {\ textit {Scan}} (N)}$	${\ displaystyle \ Theta (N / B) = \ Theta (n)}$	${\ displaystyle \ Theta (N / DB) = \ Theta (n / D)}$
${\ displaystyle {\ textit {Sort}} (N)}$	${\ displaystyle \ Theta (N / B \ log _ {M / B} N / B) = \ Theta (n \ log _ {m} n)}$	${\ displaystyle \ Theta (N / DB \ log _ {M / B} N / B) = \ Theta (n / D \ log _ {m} n)}$

Examples

Merge Sort

The External Memory Merge Sort should serve as a simple example of an I / O-optimal external memory sorting algorithm . This algorithm works in two phases. ${\ displaystyle D = 1}$

The first phase, called run formation , receives an unsorted sequence of elements in the external memory as input and generates a permutation of this sequence as output, partitioned into sorted partial sequences of length (the so-called runs ). Each of these partial sequences is generated by reading the input blocks into the internal memory, sorting them there and then writing them back to the external memory. ${\ displaystyle N / M}$ ${\ displaystyle M}$ ${\ displaystyle m}$ ${\ displaystyle n}$

In the second phase of the algorithm, the existing runs are merged recursively until there is only one completely sorted run. For this purpose, runs are simultaneously streamed block by block through the internal memory for each recursion level , and merged into a sorted run. All elements are read and written once per level, which corresponds to I / Os. After merge phases there is only one sorted run left, the result. ${\ displaystyle m}$ ${\ displaystyle 2n}$ ${\ displaystyle \ Theta (\ log _ {m} n)}$

Overall, the algorithm needs I / Os and is therefore optimal. ${\ displaystyle \ Theta (n \ log _ {m} n)}$

motivation

Typically, the runtime of algorithms is analyzed in calculation models without a storage hierarchy. In these models, memory access incurs a constant cost, as does the execution of arithmetic instructions. Furthermore, the costs of a memory access are independent of the address that is accessed, as well as of previous accesses.

These assumptions are simplistic and do not correspond to reality. On the one hand, the access times between two levels of the storage hierarchy usually differ by orders of magnitude. On the other hand, caches and the way hard drives work means that access to several consecutive addresses is usually faster than access to random addresses (see also locality property ).

The difference between the access times between main and mass storage is particularly high. See also storage hierarchy . This also applies to SSDs as mass storage.

application

There are various libraries and tools to implement external memory algorithms. A comprehensive overview can be found in Vitter (2006) from page 141.

history

According to Vitter, work on external memory algorithms began in Stanford as early as 1956 with HB Demuth's dissertation Electronic data sorting . Even Donald Knuth dealt in his 1973 monograph published The Art of Computer Programming Sorting and Searching: Volume 3 - extensively with sorting data on magnetic tapes and partly on hard disks. Around the same time presented Robert W. Floyd in his work permuting information in Idealized two-level storage a model with great similarity to PDM with parameters , , where . 1988 In The input / output complexity of sorting and related problems , Aggarwal and Vitter expanded Floyd's model to include the possibility of parallel block transfers. In 1994 Vitter and Shriver introduced the Parallel Disk Model, which is a more realistic version of the Aggarwal and Vitters model. ${\ displaystyle D = 1}$ ${\ displaystyle P = 1}$ ${\ displaystyle B = M / 2 = \ Theta (N ^ {c})}$ ${\ displaystyle 0 <c <1}$

Individual evidence

↑ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o Jeffrey Scott Vitter: Algorithms and Data Structures for External Memory . In: Foundations and Trends® in Theoretical Computer Science . tape 2 , no. 4 , 2006, ISSN 1551-305X , p. 305-474 , doi : 10.1561 / 0400000014 ( nowpublishers.com [accessed July 26, 2019]).
↑ Deepak Ajwani, Itay Malinger, Ulrich Meyer, Sivan Toledo: Characterizing the Performance of Flash Memory Storage Devices and Its Impact on Algorithm Design . In: Experimental Algorithms . tape 5038 . Springer Berlin Heidelberg, Berlin, Heidelberg 2008, ISBN 978-3-540-68548-7 , pp. 208-219 , doi : 10.1007 / 978-3-540-68552-4_16 ( springer.com [accessed July 30, 2019]).
↑ Demuth, Howard B .: Electronic data sorting . Department of Electrical Engineering, Stanford University, 1956, OCLC 25124024 .
^ Knuth, Donald Ervin: The Art of Computer Programming. Vol. 3, Sorting and Searching . Addison-Wesley, 1973, ISBN 0-201-89685-0 .
↑ Robert W. Floyd: permuting information in Idealized two-level storage . In: Complexity of Computer Computations . Springer US, Boston, MA 1972, ISBN 1-4684-2003-8 , pp. 105-109 , doi : 10.1007 / 978-1-4684-2001-2_10 .
↑ Alok Aggarwal, Jeffrey, S. Vitter: The input / output complexity of sorting and related problems . In: Communications of the ACM . tape 31 , no. 9 , August 1, 1988, pp. 1116-1127 , doi : 10.1145 / 48529.48535 .
↑ JS Vitter, EAM Shriver: Algorithms for parallel memory, I: Two-level memories . In: Algorithmica . tape 12 , no. 2-3 , 1994, ISSN 0178-4617 , pp. 110–147 , doi : 10.1007 / BF01185207 ( springer.com [accessed July 26, 2019]).

[Vitter_2006-1] ↑ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ ^o Jeffrey Scott Vitter: Algorithms and Data Structures for External Memory . In: Foundations and Trends® in Theoretical Computer Science . tape 2 , no. 4 , 2006, ISSN 1551-305X , p. 305-474 , doi : 10.1561 / 0400000014 ( nowpublishers.com [accessed July 26, 2019]).

[2] Deepak Ajwani, Itay Malinger, Ulrich Meyer, Sivan Toledo: Characterizing the Performance of Flash Memory Storage Devices and Its Impact on Algorithm Design . In: Experimental Algorithms . tape 5038 . Springer Berlin Heidelberg, Berlin, Heidelberg 2008, ISBN 978-3-540-68548-7 , pp. 208-219 , doi : 10.1007 / 978-3-540-68552-4_16 ( springer.com [accessed July 30, 2019]).

[Demuth_1994-3] Demuth, Howard B .: Electronic data sorting . Department of Electrical Engineering, Stanford University, 1956, OCLC 25124024 .

[Knuth_1973-4] Knuth, Donald Ervin: The Art of Computer Programming. Vol. 3, Sorting and Searching . Addison-Wesley, 1973, ISBN 0-201-89685-0 .

[Floyd_1972-5] Robert W. Floyd: permuting information in Idealized two-level storage . In: Complexity of Computer Computations . Springer US, Boston, MA 1972, ISBN 1-4684-2003-8 , pp. 105-109 , doi : 10.1007 / 978-1-4684-2001-2_10 .

[Aggarwal_1988-6] Alok Aggarwal, Jeffrey, S. Vitter: The input / output complexity of sorting and related problems . In: Communications of the ACM . tape 31 , no. 9 , August 1, 1988, pp. 1116-1127 , doi : 10.1145 / 48529.48535 .

[Vitter_1994-7] JS Vitter, EAM Shriver: Algorithms for parallel memory, I: Two-level memories . In: Algorithmica . tape 12 , no. 2-3 , 1994, ISSN 0178-4617 , pp. 110–147 , doi : 10.1007 / BF01185207 ( springer.com [accessed July 26, 2019]).