Nested sets

The term nested sets refers to a model for mapping a tree with the help of sets that are nested within one another. The "is-child-of" relationship is mapped to an "is-subset-of" relationship. The model was originally introduced in the book SQL for Smarties by Joe Celko . It is mainly used in the context of database applications. This technique is also known as Modified Preorder Tree Traversal (MPTT). With the help of nested sets, one buys the possibility of being able to read out subtrees or the path to the root with constant effort at the price of having to work with more complex operations when changing the tree.

Classic tree structure

The classic approach to implementing a tree structure in a database is the adjacency list model . A reference to its parent node is stored for each node. The root has no reference to a parent node (the field is NULL); a leaf is a node that no other node references.

Tree structure as nested sets

In the nested sets model , the tree structure is transformed into nested sets. Each node corresponds to a subset; each subset is a subset of all of its parent sets. These nested sets can be represented in a database by determining two values for each subset that determine the limits of this subset. These values are often referred to as left and right . The value on the left is always smaller than on the right . Both values of a subset are larger than the value on the left of its parent set and smaller than its value on the right . The following graphic shows a tree that has been transformed into nested sets. The green numbers on the edges of an amount are the values described above, the left at the left edge and the right at the right edge.

example

Operations

Insert node

The subset for the first node receives the values 1 for the left and 2 for the right . If a further subset for a new node is inserted below an existing node, the new parent set is increased by 2 on the right . So that this is possible, every value on the left or right in the entire tree that is greater than the original value on the right of the new parent set must also be increased by 2 beforehand. The values right minus 2 and right minus 1 of the parent set are then assigned to the newly inserted subset as left and right . It should be noted that, unlike the conventional adjacency list model, the order of the children of a node is retained. The procedure just described is equivalent to always appending new nodes to the end of the list of children of the parent node. It is also conceivable to insert a new subset at any point between the other subsets of the parent set. The values to the left and right of the children following the newly inserted subset must then also be increased.

Transform a tree structure in nested sets

An already existing tree structure can be converted into nested sets by going through it in depth . The edges are counted, starting with 1. Every time a node is entered, the value of the counter is assigned to it as left and the counter is incremented. If the node is exited after all of its children have been processed, it is assigned the value of the counter as right and the counter is incremented again.

Remove subtree

To remove a complete subtree, the values to the left and right of the root of the subtree are first determined. This can then be deleted together with its subsets by deleting all nodes whose values for left and right lie within these values or match them. Finally, all left values and all right values must be reduced by the width of the subtree that are larger than the right value of the node to be removed; the width is the difference between the right and left, its root plus 1.

evaluation

Number of all nodes in a subtree

Half the width of a subtree corresponds to the number of nodes in this subtree including the root. Thus, the number of nodes in the entire tree can be determined by dividing the value on the right minus the value on the left of the root by two and rounding up.

Path to a node

In contrast to the adjacency list model, the path to a node does not have to be determined recursively with the nested sets model. It is sufficient to select all subsets whose left values are smaller and whose right values are greater than those of the node under consideration. If these subsets are sorted according to their left value, their order corresponds to the path from the root to the node under consideration.

All nodes of a subtree

All nodes below a given node can be determined by selecting all subsets whose values on the left and right are within the values on the left and right of the node being edited. The adjacency list model would also have to be used recursively.

All leaves of a subtree

The query for a complete subtree can easily be modified so that only leaves, i.e. nodes without their own children, are selected. For this purpose, right = left + 1 is used as an additional criterion for selection .

Depth first search

In order to enumerate all nodes of the tree according to a depth-first search , a simple SELECT with sorting is sufficient. And that in both variants preorder (parent before child) and postorder (child before parent node). Likewise, the sequence of the child nodes run through can be easily reversed due to the symmetrical meaning of left and right by slightly varying the sorting condition.

SELECT * FROM tree ORDER BY r;  /* DFS, postorder */
SELECT * FROM tree ORDER BY l;  /* DFS, preorder */
SELECT * FROM tree ORDER BY l DESC;  /* DFS, reverse postorder (smallest child last) */
SELECT * FROM tree ORDER BY r DESC;  /* DFS, reverse preorder (smallest child last) */

The additional determination of the node depth during the run is possible with the help of a self-join. Here, all tuples are selected from two nodes in which the values to the left and right of the first node lie between the values to the left and right of the second node. The depth of each node can be determined by counting how often a tuple appears with this node as the first node. The result is sorted according to the value to the left of the contained nodes. For example, this query could look like this in SQL :

 SELECT (COUNT(parent.id)-1) AS depth, node.id
 FROM tree AS node, tree AS parent
 WHERE node.l BETWEEN parent.l and parent.r
 GROUP BY node.id ORDER BY node.l;

With large nested sets , such a self-join would only be realizable in terms of runtime by creating additional indices on the left and right . For this reason, a mixed model of adjacency lists and nested sets is often used in practice, i.e. the reference to the parent node and the node depth are also stored for each node. This negligible additional effort when writing provides efficient evaluation options for many comparable questions.

Advantages and disadvantages

The complexity of reading is constant in most cases. With the adjacency list model, the effort is linearly dependent on the depth of the processed node. In exchange, changing the tree in the nested sets model is much more time-consuming than in the adjacency list model. It is therefore better suited for structures that are not modified often but read very often.
The representation as quantities fits better into the quantity-oriented world of databases. With the query language SQL, it is easier to operate on sets than on hierarchical structures.
Querying the direct children of a node is difficult.
The order of the children of a node is retained.

Web links

Lots of trees - nested sets by Arne Klempert
Trees in SQL by Christoph Reeg
The nested sets model - trees with SQL from php-resource.de
Adjecency List vs. Nested Set: Postgresql by Explain Extended
Adjecency List vs. Nested Set: SQL Server from Explain Extended (English)
Adjecency List vs. Nested Set: ORACLE by Explain Extended (English)
Adjecency List vs. Nested Set: MySQL by Explain Extended