Union find structure

A union find data structure manages the partition of a set . The abstract data type is formed by the set of the three operations {Union, Find, Make-Set} . Union find structures are used to manage decompositions into disjoint sets. Each set of the decomposition is assigned a canonical element, which serves as the name of the set. Union combines two such sets, Find (x) determines the canonical element of the set containing x , and Make-Set creates an elementary set {x} with the canonical element x .

Applications

Union find structures are a good way to manage connected components of graphs . The Kruskal's algorithm for example, requires such a data structure may be implemented to efficiently can.

definition

A finite set is partitioned into the disjoint classes : ${\ displaystyle S}$ ${\ displaystyle X_ {i}}$

{\ displaystyle S = X_ {0} \ uplus X_ {1} \ uplus X_ {2} \ uplus \ ldots \ uplus X_ {k}}

with .

{\ displaystyle X_ {i} \ cap X_ {j} = \ varnothing \ quad \ forall i, j \ in \ lbrace 0,1, \ ldots, k \ rbrace, i \ neq j}

A representative is chosen for each class . The associated union find structure efficiently supports the following operations: ${\ displaystyle X_ {i}}$ ${\ displaystyle r_ {i} \ in X_ {i}}$

Init ( ): Initializes the structure and forms a separate class for each as a representative. ${\ displaystyle S}$ ${\ displaystyle x \ in S}$ ${\ displaystyle x}$
Union ( , ): Combines the two classes that belong to the two representatives and , and determines the new representative of the new class. ${\ displaystyle r}$ ${\ displaystyle s}$ ${\ displaystyle r}$ ${\ displaystyle s}$ ${\ displaystyle r}$
Find ( ): Specifies the unique representative whose class belongs to. ${\ displaystyle x}$ ${\ displaystyle x \ in S}$ ${\ displaystyle x}$

implementation

A trivial implementation stores the associations between the elements from and the representatives in an array . In practice, however, sets of trees are used for shorter running times . The representatives are saved in the roots of the trees, the other elements of the respective class in the nodes below. ${\ displaystyle S}$ ${\ displaystyle r_ {i}}$

Union ( , ): hangs the root of the lower tree as a new child under the root of the higher tree (weighted union). If now is child of , and are swapped. ${\ displaystyle r}$ ${\ displaystyle s}$ ${\ displaystyle r}$ ${\ displaystyle s}$ ${\ displaystyle r}$ ${\ displaystyle s}$
Find ( ): Moves from the node up the path within the tree to the root and returns this as the result. ${\ displaystyle x}$ ${\ displaystyle x}$

Heuristics

To speed up the Find and Union operations , there are two heuristics, Union-By-Size and Path Compression .

Union-By-Size / Union-By-Rank

In operation Union (r, s) the tree that is smaller is hung under the larger tree. This prevents individual subtrees from degenerating into lists, as in the naive implementation (r is always the root of the new subtree). The depth of a subtree T can therefore not be greater than . With this heuristic, Find's worst-case runtime is reduced from to . For an efficient implementation, the number of elements in the subtree is included for each root. ${\ displaystyle \ log \ left | T \ right |}$ ${\ displaystyle O (n)}$ ${\ displaystyle O (\ log n)}$

Path compression

In order to speed up later Find (x) searches, one tries to shorten the paths from the visited node to the associated root.

Maximum shortening (travel compression)

After executing Find (x) , all nodes on the path from x to the root are placed directly under the root. Compared to the following two methods, this method has the greatest cost advantage for subsequent calls to Find for a node in the same subtree. Every node on the path between the root and x must be considered twice, but this is irrelevant for the asymptotic running time .

Splitting method

During the run, let each node point to its previous grandfather (if any); thus a continuous path is split into two half the length.

Halving method

During the run, you let every second knot point to your previous grandfather.

Both of these methods have the same amortized cost as the top-level compression method (put the node under the root). All compression methods speed up future Find (x) operations.

Terms

Union find data structures enable the above operations to be performed with the following runtimes ( ): ${\ displaystyle n = \ left | S \ right |}$

Trivial implementation with an array A (A [i] = x: node i is in the set x), worst-case: Find , Union: ${\ displaystyle O (1)}$ ${\ displaystyle O (n)}$

Implementation with trees:

without heuristics: Find , Union ${\ displaystyle O (n)}$ ${\ displaystyle O (1)}$
with Union-By-Size, worst-case: Find , Union ${\ displaystyle O (\ log (n))}$ ${\ displaystyle O (1)}$
with Union-By-Size, path compression, worst-case: Find , Union ${\ displaystyle O (\ log (n))}$ ${\ displaystyle O (1)}$
with union-by-size, path compression, sequence of m find and n-1 union operations: ${\ displaystyle O \ left (n + m \ cdot \ alpha (n) \ right)}$

It is the inverse of the Ackermann function . It grows very slowly and is less than 5 for all "practical" inputs ( ). In 1977 Robert Tarjan showed that for a sequence of m find and n-1 union operations, each data structure in the pointer machine model has a runtime of needed. Fredman and Saks showed in 1989 that this lower bound also applies in the more general Cell-Probe model. It follows that the data structure with union-by-size and path compression has an asymptotically optimal runtime, and there cannot be a union-find data structure that executes both find and union amortized in O (1). ${\ displaystyle \ alpha (n)}$ ${\ displaystyle A (n, n)}$ ${\ displaystyle n <10 ^ {80}}$ ${\ displaystyle \ Omega \ left (n + m \ cdot \ alpha (n) \ right)}$

Web links

Demonstration of Union-Find ( Java applet )

Individual evidence

^ Robert E. Tarjan: A class of algorithms which require nonlinear time to maintain disjoint sets . In: Journal of Computer and System Sciences . 18, No. 2, 1979, pp. 110-127. doi : 10.1016 / 0022-0000 (79) 90042-4 .
↑ M. Fredman, M. Saks: The cell probe complexity of dynamic data structures . In: Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing., Pp. 345-354. doi : 10.1145 / 73007.73040

[1] Robert E. Tarjan: A class of algorithms which require nonlinear time to maintain disjoint sets . In: Journal of Computer and System Sciences . 18, No. 2, 1979, pp. 110-127. doi : 10.1016 / 0022-0000 (79) 90042-4 .

[2] M. Fredman, M. Saks: The cell probe complexity of dynamic data structures . In: Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing., Pp. 345-354. doi : 10.1145 / 73007.73040