External Memory Minimal spanning tree

from Wikipedia, the free encyclopedia

In computer science, an external minimal spanning tree denotes a minimal spanning tree that was calculated for a graph that was swapped out in the secondary memory . The latter is necessary for graphs whose number of nodes and edges is too large to fit completely into main memory at the same time . A crawl of the social network Twitter from 2010 can be used as an application example . If the edge weight is neglected and an edge is stored as a pair of node indices, the graph has a size of

which corresponds to 9.55 gigabytes . If one assumes that a desktop PC usually has 8 GB of main memory, this would not be sufficient to calculate a minimal spanning tree completely internally on the graph.

The external memory model is used as the calculation model for external algorithms . It consists of main memory of the size and secondary memory and one can transfer data blocks of the size between the former and the latter. The reading (writing) of a block from (into) secondary memory into (from) main memory is called I / O (= input / output ). The latter is the core size in this model that needs to be minimized. In contrast to models like the Random Access Machine , the focus here is not on the asymptotic runtime , but on the number of I / Os. Due to the significantly increased access time of the secondary memory compared to the main memory, the I / Os often become the bottleneck of the system.

Semi-external calculation

A limitation to the definition from the introduction are semi-external algorithms for calculating minimal spanning trees. Those algorithms process undirected graphs, of which only the nodes fit into the main memory. This is formally through the relationship


given, where is a constant.

The Kruskal's algorithm is easy to implement as a semi-external variant. To do this, you first have to sort the edges externally according to increasing weight, which can be achieved with the help of (the sorting barrier) I / Os. The union find structure can be managed entirely in the main memory, resulting in a total of I / Os.

External calculation

If the set of nodes is also too large for the main memory, a minimal spanning tree can be calculated using external algorithms.

Algorithm of Prim

The Prim's algorithm can use an external priority queue are executed as external algorithm. It stores the edges of the graph, taking their weights as priority. The algorithm iteratively selects the edge with the lowest weight and adds it to the previous minimal spanning tree. The lightest edge is obtained by an extractMin()operation on the queue. In the following the algorithm is described in pseudocode (it should be noted that the directed edges of are considered).

c: Gewichtsfunktion der Kanten
s  V(G): beliebiger Startknoten
Adj: adjazenten Knoten eines Knoten
Inc: inzidente Kanten eines Knoten
external_prim(G,c,s):
01  T  // T = (V,E): Knoten und Kanten des zu berechnenden MST
02  Q  leere externe Prioritätswarteschlange
03  für alle e  Adj(s)
04      Q.insert(e, c(e))
05  solange Q nicht leer
06      (u,v)  Q.extractMin()
07      wenn v nicht Teil von T
08          V(T)  V(T)  {v}
09          E(T)  E(T)  {(u,v)}
10          für alle e  Inc(v) \ {(v,u)} // denn (u,v) ist schon aufgenommen worden
11              Q.insert(e, c(e))
12  return T

It is assumed that the edge weights are different in pairs. This vmakes it easy to check in line 7 whether the node is already included in the spanning tree. All you have to do is check whether the next edge with minimal weight is equal (u,v) (from line 6), which only requires an additional extractMin()operation of the queue Q. Because every node inserts its incident edges in Qand if they are valready part of the spanning tree, the edge is (v,u)also in Q. In this, however, this edge has to be the immediate successor of (u,v), since, according to the assumption, no two edges have the same weight.

All of the I / O complexity encompasses I / Os. First of all, its adjacent nodes are read out for each node, which requires I / Os when implemented using an adjacency list . The algorithm continues to process all edges of and performs a maximum of two operations on the queue per edge. Assuming that the queue supports operations with I / Os, the overall complexity is I / Os.

Edge reduction

Another approach is to iteratively identify edges of the minimal spanning tree so that they can be removed. This is continued until the resulting graph is sufficiently dense . If the graph is sufficiently dense, the complexity for calculating the minimum spanning tree on the remaining graph is no longer significant.

Example of an edge contraction: the edge has been contracted. The new edge was before .

Specifically, one wants to reduce the number of nodes to at least in order to then run the algorithm of Prim on the resulting graph, which simplifies its complexity to I / Os.

The reduction comes from Borůvka's algorithm , which is why the following sequence is also called the Boruvka phase : for each node the lightest adjacent edge is selected, which must be part of the minimal spanning tree. Then you contract each such edge so that the number of nodes is at least halved. During the contraction, new edges can arise, so that you have to save these references to its original edge. The latter is necessary in order to be able to construct a minimal spanning tree for the original graph in the end.

Overall, phases are necessary for this . A single phase can be implemented with I / Os, resulting in a total of I / Os. The maximum formation in the complexity arises from the fact that at least one Boruvka phase is necessary.

Randomized sweep

Instead of the deterministic method, the calculation can also be carried out randomly . The idea is to reduce the number of nodes until you can run a semi-external algorithm on the remaining graph. The reduction is also implemented by edge contraction, as described above. A random node is selected iteratively and only its lightest edge is contracted after it has been included in the minimal spanning tree.

A problem arises if the random selection is implemented directly: there is an I / O for each node, since the nodes are accessed in a random order. You can counter this by creating a random permutation of the node indices instead and processing the nodes in descending order according to their newly assigned indices (= sweeping ).

Here, too, an external priority queue is used, which stores the edges of the graph. The higher node index of the edge is selected as the priority, with the edge with the lower weight having the higher priority if two indices are the same. This makes it possible to efficiently determine the lightest edge that is incident to a node. By sorting according to node indices and because only the lightest edge is considered in each step of a node , the edge contraction can be implemented easily. For this purpose, in the next iteration, the edge is added to the queue as an edge with the same weight.

The expected number of edges considered until the number of nodes is reduced to . This means that a total of operations are carried out on the queue and a total of I / Os result.

literature

  • Otakar Borůvka: O jistémproblemému minimálním. Pp. 37-58 ( [1] ).
  • Alok Aggarwal, Jeffrey Vitter: The Input / Output Complexity of Sorting and Related Problems. ACM, New York 1988, pp. 1116-1127 ( [2] ).

Individual evidence

  1. ^ What is Twitter, a Social Network or a News Media? Retrieved July 26, 2019 .
  2. Jeffrey Scott Vitter: Algorithms and Data Structures for External Memory. (PDF) p. 3 , accessed on July 31, 2019 .
  3. ^ Algorithms for Memory Hierarchies . In: Lecture Notes in Computer Science . 2003, ISSN  0302-9743 , p. 65 , doi : 10.1007 / 3-540-36574-5 ( springer.com [accessed July 18, 2019]).
  4. Abello, Buchsbaum, Westbrook: A Functional Approach to External Graph Algorithms . In: Algorithmica . tape 32 , no. 3 , March 1, 2002, ISSN  1432-0541 , p. 8 , doi : 10.1007 / s00453-001-0088-5 .
  5. ^ A b Lars Arge, Gerth Stølting Brodal, Laura Toma: On external-memory MST, SSSP and multi-way planar graph separation . In: Journal of Algorithms . tape 53 , no. 2 , November 1, 2004, ISSN  0196-6774 , p. 190–191 , doi : 10.1016 / j.jalgor.2004.04.001 ( sciencedirect.com [accessed July 18, 2019]).
  6. ^ Algorithms for Memory Hierarchies . In: Lecture Notes in Computer Science . 2003, ISSN  0302-9743 , p. 77 , doi : 10.1007 / 3-540-36574-5 ( springer.com [accessed July 18, 2019]).
  7. Lars Arge, Gerth Stolting Brodal, Laura Toma: On external-memory MST, SSSP and multi-way planar graph separation . In: Journal of Algorithms . tape 53 , no. 2 , November 1, 2004, ISSN  0196-6774 , p. 192–193 , doi : 10.1016 / j.jalgor.2004.04.001 ( sciencedirect.com [accessed July 18, 2019]).
  8. ^ Algorithms for Memory Hierarchies . In: Lecture Notes in Computer Science . 2003, ISSN  0302-9743 , p. 81 , doi : 10.1007 / 3-540-36574-5 ( springer.com [accessed July 18, 2019]).
  9. a b Roman Dementiev, Peter Sanders, Dominik Schultes, Jop Sibeyn: Engineering an External Memory Minimum Spanning Tree Algorithm . In: Exploring New Frontiers of Theoretical Informatics (=  IFIP International Federation for Information Processing ). Springer US, 2004, ISBN 978-1-4020-8141-5 , pp. 195–208 , doi : 10.1007 / 1-4020-8141-3_17 ( springer.com [accessed July 25, 2019]).
  10. ^ Peter Sanders: Lecture "Algorithm Engineering" SS19. (PDF) p. 279 , accessed on July 31, 2019 .