Hopcroft and Karp's algorithm

The algorithm by Hopcroft and Karp (developed in 1973 by John E. Hopcroft and Richard M. Karp ) is used in graph theory to determine a maximum matching in a bipartite graph . It starts from the matching, which does not contain any edges, and constructs alternating paths between nodes that are not yet paired . Each such path provides an enlargement (augmentation) of the matching by an edge .

Augmenting Paths

If there is a matching for a graph with a set of edges , we consider connected subgraphs that are trees , i.e. do not contain a cycle and that consist of ${\ displaystyle E}$ ${\ displaystyle M \ subset E}$

an unpaired node as the root ,
paired nodes that can be reached from the root within the tree on alternating paths with an even number of edges. Alternating means that the edges of the path alternately belong and do not belong. So these nodes are at a straight distance from the root. ${\ displaystyle M}$ ${\ displaystyle M}$
all nodes and edges along the alternating paths . This also adds nodes that have an uneven distance from the root.

A union of such trees that have no common nodes is called a forest .

If nodes and from two different trees of the forest, each at a straight distance from their roots, are connected by the edge , then this edge cannot belong, because the nodes are already paired by another edge within the tree unless it's the root that is unpaired anyway. The path with a set of edges from the root of one tree over to the root of the other tree is then an alternating path with unpaired start and end nodes. Such a path is - augmenting path called for is a matching that contains an edge over . ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle xy \ in E}$ ${\ displaystyle M}$ ${\ displaystyle P}$ ${\ displaystyle xy}$ ${\ displaystyle M}$ ${\ displaystyle (M \ setminus P) \ cup (P \ setminus M)}$ ${\ displaystyle M}$

Conversely, a matching that contains more edges than yields a subgraph with a set of edges in which all paths alternate between and , and of which at least paths must be augmenting without common nodes. is a maximum matching if and only if there is no -augmenting path. ${\ displaystyle M '}$ ${\ displaystyle M}$ ${\ displaystyle (M \ setminus M ') \ cup (M' \ setminus M)}$ ${\ displaystyle M}$ ${\ displaystyle M '}$ ${\ displaystyle | M '| - | M |}$ ${\ displaystyle M}$ ${\ displaystyle M}$ ${\ displaystyle M}$

Hungarian forests

When defining the forests under consideration, it was not previously assumed that a bipartite graph is available. In a bipartite graph , however, more applies: There the nodes are evenly spaced from their roots in or , depending on where the roots are. If there are no two knots in the forest and with, as in the last section, and the forest can no longer be enlarged while maintaining the properties mentioned, it is called a Hungarian forest . Because of the bipartism, it can then be shown that the matching is a maximum matching if and only if there is a Hungarian forest to it. ${\ displaystyle (U \ cup V, E)}$ ${\ displaystyle U}$ ${\ displaystyle V}$ ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle xy \ in E}$

algorithm

The following algorithm is a precursor to the Hopcroft and Karp algorithm. To a bipartite graph with matching he constructs a forest with the properties mentioned, the ${\ displaystyle M}$

either is a Hungarian forest

or - provides an augmenting path . ${\ displaystyle M}$

Start with the forest, which contains all unpaired nodes as roots, but no edges .
Find an edge from a node in the forest that is straight from its root to a node that does not belong to the forest or is straight from its root. If there is no longer such a node, the forest is a Hungarian forest. End the algorithm. ${\ displaystyle x}$ ${\ displaystyle y}$
If is even distance from its root, there is an even length path from an unpaired node to . Return the augmenting path from via to and exit the algorithm. ${\ displaystyle y}$ ${\ displaystyle z}$ ${\ displaystyle y}$ ${\ displaystyle M}$ ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle z}$
If it does not belong to the forest, it is paired, for example : Add the nodes and and the edges and to the forest and go back to step 2. ${\ displaystyle y}$ ${\ displaystyle y}$ ${\ displaystyle yz \ in M}$ ${\ displaystyle y}$ ${\ displaystyle z}$ ${\ displaystyle xy}$ ${\ displaystyle yz}$

At the beginning the algorithm is also executed. If it ends with an augmenting path in step 3 , it is replaced by and the algorithm is carried out again. The case that the algorithm ends with a Hungarian forest in step 2, with a maximum matching then , must occur after the algorithm has been run at the latest , because in the other case the matching is increased by two nodes. The runtime for a single execution of the algorithm is proportional to the number of edges , the total runtime for repeated execution is proportional to the product of the number of edges and nodes. ${\ displaystyle M = \ emptyset}$ ${\ displaystyle M}$ ${\ displaystyle P}$ ${\ displaystyle M}$ ${\ displaystyle (M \ setminus P) \ cup (P \ setminus M)}$ ${\ displaystyle M}$ ${\ displaystyle (| U | + | V |) / 2}$ ${\ displaystyle | E |}$

example

The bipartite graph in the example has 10 nodes and 10 edges. In the left part, all nodes are free before the first phase , the matching is empty. All augmenting paths are reduced to a single edge between a node in and a node in . For example, a breadth-first search selects the edges by selecting for each node in the free nodes in with the smallest index. This set of paths is a maximum, they all have the same length 1, the resulting matching has size 4 and there remain two free nodes, and . ${\ displaystyle U}$ ${\ displaystyle V}$ ${\ displaystyle (u_ {0}, v_ {0}), (u_ {1}, v_ {4}), (u_ {2}, v_ {2}), (u_ {4}, v_ {1}) }$ ${\ displaystyle U}$ ${\ displaystyle V}$ ${\ displaystyle u_ {3}}$ ${\ displaystyle v_ {3}}$

The second phase is about finding the paths with minimal increase in length starting from the only free node of or from the only free node of . The specified path alternates between black edges outside the matching and red edges inside the matching. It's length 5. We can see that there is no length 3 path, but there is a length 7 path, viz . The matching, which results from the symmetrical difference of the previous matching with the path of length 5, results in the example of the matching of size 5. It is maximum because there are no more free nodes. ${\ displaystyle U}$ ${\ displaystyle V}$ ${\ displaystyle (u_ {3}, v_ {0}, u_ {0}, v_ {1}, u_ {4}, v_ {3})}$ ${\ displaystyle (u_ {3}, v_ {4}, u_ {1}, v_ {0}, u_ {0}, v_ {1}, u_ {4}, v_ {3})}$

Simultaneous augmentation of multiple paths

The total running time of the algorithm can be reduced if more than one - augmenting paths are considered simultaneously. Let it be the length of the shortest augmenting path. We consider -augmenting node-disjoint paths of length to which no further -augmenting path of length can be added node-disjoint. Then it can be shown that ${\ displaystyle M}$ ${\ displaystyle l (M)}$ ${\ displaystyle M}$ ${\ displaystyle M}$ ${\ displaystyle P_ {1}, \ ldots, P_ {m}}$ ${\ displaystyle l (M)}$ ${\ displaystyle M}$ ${\ displaystyle l (M)}$

{\ displaystyle l ((M \ setminus (P_ {1} \ cup \ ldots \ cup P_ {m})) \ cup ((P_ {1} \ cup \ ldots \ cup P_ {m}) \ setminus M)) > l (M)}

The algorithm mentioned can be extended to the algorithm by Hopcroft and Karp in such a way that it not only returns an augmenting path, but also considers a set of augmenting paths as just seen. To do this, steps 2 and 3 must be carried out as a breadth-first search , whereby the constructed paths in the forest are only extended when no new paths of the previous length can be found. As soon as a path leads to an unpaired node, i.e. is an augmenting path, paths of even greater length no longer need to be considered.

If a maximum matching is and , the algorithm extended in this way delivers after iterations a matching with and node- disjoint augmenting paths whose length is at least . Because none of the nodes is contained in two of these paths, it must be, so the maximum matching must be achieved after further runs at the latest . The total runtime of the Hopcroft and Karp algorithm is therefore proportional to the product of the number of edges and the square root of the number of nodes. ${\ displaystyle M '}$ ${\ displaystyle n = | U | + | V |}$ ${\ displaystyle {\ sqrt {n}}}$ ${\ displaystyle P}$ ${\ displaystyle l (M) \ geq {\ sqrt {n}}}$ ${\ displaystyle | M '| - | M |}$ ${\ displaystyle M}$ ${\ displaystyle l (M) \ geq {\ sqrt {n}}}$ ${\ displaystyle n}$ ${\ displaystyle | M '| - | M | \ leq {\ sqrt {n}}}$ ${\ displaystyle {\ sqrt {n}}}$

literature

. Hubertus Th Jongen: optimization B . Script for the lecture. Verlag der Augustinus-Buchhandlung, Aachen, ISBN 3-925038-19-1

Web links

An interactive applet for learning, trying out and demonstrating the algorithm