Trie

from Wikipedia, the free encyclopedia
Trie storing the strings Java , Rad , Edge , Rau , Raum, and Rose

A trie or prefix tree is a data structure that is used in computer science to search for character strings . It is a special search tree for the simultaneous storage of several character strings. Tries contain a type of data compression , since common prefixes of the character strings are only saved once.

A trie is built up from a set of arbitrary strings. Each outgoing edge of a node within a triangle is provided with a single character, so that a path starting at the root and ending with a leaf in the triangle represents one of the character strings from which the tree was constructed.

Tries are used in the area of information retrieval . There they are used to index texts in order to efficiently answer certain queries to the text.

Compact tries or Patricia tries (a special variant of compact tries) are variants of the tries that are optimized in terms of storage space consumption. All nodes from which only one edge starts are grouped together with their respective successors.

The term trie was suggested by Edward Fredkin based on the term information re trie val . This author pronounces it like the English term tree [ 'triː ]. Another common pronunciation is like the English term try [ 'traɪ ], which verbally distinguishes the trie from the data structure tree . This second variant has meanwhile established itself.

definition

Let be a set of strings over the alphabet with size = . A trie over is a tree of shape , where is the set of nodes and the set of edges, that has the following properties:

  • has a label from the alphabet ,
  • all outgoing edges of the node have a different label ,
  • there is a node , so that a prefix of the concatenation of the labels of the path is starting with the root up to the node (these nodes are specially marked in the trie, e.g. by setting a bit),
  • Leaves there is a string so that the path from the root to the leaf is spelled exactly .

An example of a trie over = {"Java", "Rad", "Edge", "Rau", "Raum", "Rose"} can be seen in the picture, whereby the double-bordered nodes represent the character strings . In particular, note that the word "rough" is a prefix of the word "space"; H. a string from can be the prefix of another.

Applications

With the help of tries, different queries can be made to a given set of different character strings . Example inquiries could be:

  • Existence inquiries of the type “ Does the pattern contain ? "
  • Prefix queries of the type “ Which strings in begin with the pattern ? "
  • Successor and predecessor inquiries such as “ Which character strings in are lexicographic successors or predecessors of the pattern ? "

A possible use of Tries could, for example, be the implementation of search queries within a contacts or phone book app for smartphones. The Tries can be used to search for people by name (existence inquiries). Likewise, when entering a name, contacts can be displayed whose names begin with the letters previously entered (prefix queries). In addition, contacts can be found who are in the phone book after or in front of the requested person (successor and predecessor inquiries).

Implementation types

To answer inquiries to the trie , a path to a node that corresponds to the requested pattern is searched for, starting with the root , according to the top-down principle . The runtimes in which these queries can be carried out, as well as the space required by the tries, therefore depend heavily on how the storage of the outgoing connections is implemented. Some of the possible types of implementation are presented below:

  1. A simple variant is to save all outgoing edges per node in a list . This results in a running time of . The space requirement of this solution is , where the total length of all strings is denoted in.
  2. In the next variant, the outgoing edges are kept in a sorted array for each node instead of in a list . By using the binary search to find the successor edge, a running time of is achieved. The space requirement corresponds to that of variant 1 and is therefore .
  3. The outgoing edges are stored in an array of sizes for each node . This achieves a running time of . Here, however, the space requirement of the Tries increases .
  4. A hash table is used to store the outgoing edges . This can be created per node or globally for the entire trie. An expected running time of is achieved with both variants . The disadvantage of this variant, however, is that it cannot answer any previous or successor queries (since hash tables are per se unsorted). This variant achieves a space requirement of .

Comparison of runtime and space requirements

variant running time Space requirement
1.
2.
3.
4th expected

See also

literature

Web links

Commons : Trie  - collection of images, videos and audio files

Individual evidence

  1. Paul E. Black: trie . In: Dictionary of Algorithms and Data Structures . National Institute of Standards and Technology . November 16, 2009. Archived from the original on May 19, 2010. Retrieved on December 7, 2014.
  2. Donald Knuth : 6.3: Digital Searching . In: The Art of Computer Programming Volume 3: Sorting and Searching , 2nd. Edition, Addison-Wesley, 1997, ISBN 0-201-89685-0 , p. 492.
  3. a b c Johannes Fischer: Lecture script "Text Indexing and Information Retrieval" . Winter semester 2014/2015. Retrieved November 28, 2014.