Data structure

In computer science and software technology , a data structure is an object that is used to store and organize data . It is a structure because the data is arranged and linked in a certain way in order to enable efficient access to it and its management .

Data structures are not only characterized by the data they contain, but above all by the operations on this data that enable and implement access and management.

Explanation

The determination ( definition ) of data structures is generally carried out through an exact description ( specification ) of the data management and the operations required for this. This specific specification defines the general behavior of the operations and thus abstracts from the specific implementation of the data structure.

If the focus of consideration is shifted to the concrete implementation of the operations, an abstract data type is often used instead of the term data structure . The transition from the data structure to an abstract data type is not clearly defined, but depends solely on the perspective. With a view to the often changing memory requirements of many data structures, we also speak of dynamic data types that are technically based on dynamic memory management .

In addition to their basic form, most data structures have many specializations that have been specifically specified to fulfill a certain task. For example, B-trees, as a specialization of the tree data structure, are particularly well suited for database implementations .

With many algorithms , the resource requirements , i.e. both the required runtime and the storage space requirement, depend on the use of suitable data structures.

Basic data structures

The following data structures have usually been developed and optimized for classic imperative programming . Other programming paradigms such as functional programming may well require different data structures.

record

Data sets (also called 'tuples' or English records ) are the simplest data structures. They usually embody values that contain other values in a fixed number and sequence. Data records are usually identified by one or more of the elements they contain, often called a data field .

Array

The array (also field ) is the simplest data structure used. Several variables of the same basic data type are saved here. Access to the individual elements is possible via an index. From a technical point of view, this corresponds to the value that is added to the start address of the array in the memory in order to obtain the address of the object. The only operations required are indexed store and indexed read , which can directly access any element of the array. In the one-dimensional case, the array is often referred to as a vector and in the two-dimensional case as a table or matrix . Arrays are by no means restricted to two dimensions, but can be used in any number of dimensions. Because of its simplicity and fundamental importance, most programming languages offer a concrete implementation of this data structure as a composite data type array in the basic language scope.

The associative array is a special case , in which the stored data is not accessed via a numeric index but via a key . One possible way to implement an associative array is with the hash table .

Chained list

The linked list is a data structure for the dynamic storage of any number of objects. Each list element of a linked list contains as a special feature a reference to the next element, whereby the totality of the objects becomes a chain of objects. The operations belonging to a list are relatively unspecified. They are often used in more complex data structures and their elements are usually accessed directly instead of via operations for reasons of efficiency. The lists offered in program libraries are usually much more complex in terms of their underlying data structure and often do not represent any real lists at all, but appear to the outside world as such. Lists are always linear structures.

Stack storage

In a stack (engl. Stack ) can be stored objects from any number, however, the stored objects can be read again, only in reverse order. This corresponds to the LIFO principle. For the definition and thus the specification of the stack, it is irrelevant which objects are stored in it. At least the operations belong to a stack

push to put an object on the stack and
pop to read the last saved object and remove it from the stack.
( top or peek to read the top item without deleting it)

The top operation is not mandatory, but is often implemented to replace pop / push , as it is often interesting to "test" the top element. A stack is usually implemented as a list, but it can also be a vector.

Queue

In a queue (engl. Queue ) can be stored objects from any number, however, the stored objects can be read again only in the same order in which they were stored. This corresponds to the FIFO principle. For the definition and thus the specification of the queue, it is irrelevant which objects are stored in it. At least the operations belong to a queue

enqueue to put an object in the queue and
dequeue to read the object that was saved first and remove it from the queue.

A queue is usually implemented as a linked list, but it can also use an array internally; in this case the number of elements is limited.

Priority queue

A specialization of the queue is the priority queue , which is also known as the priority queue . Called Priority Queue . This deviates from the FIFO principle. The execution of the enqueue operation, which in this case is also called the insert operation, sorts the object into the priority queue according to a given priority that each object carries. The dequeue operation always returns the object with the highest priority. Priority queues are mostly implemented with heaps .

graph

As a data structure, a graph makes it possible to overcome the unidirectionality of the link. The operations here are also inserting , deleting and finding an object. The best known representations of graphs in the computer are the adjacency matrix , the incidence matrix and the adjacency list .

tree

Trees are special forms of graphs in graph theory . Usually only out-trees are used as the data structure . Starting from the root, several objects of the same type can be linked with one another, so that the linear structure of the list is broken up and a branch takes place. Since trees are one of the most commonly used data structures in computer science, there are many specializations.

In binary trees, for example, the number of children is at most two, and in height-balanced trees it also applies that the heights of the left and right subtree at each node do not differ too much.

In the case of ordered trees, especially search trees , the elements are stored in the tree structure in an orderly manner so that elements can be found quickly in the tree. A further distinction is made here between binary search trees with AVL trees (including Fibonacci trees ) as a balanced version and B trees and a variant, the B * trees . Specializations of B-trees are 2-3-4-trees , which are often implemented as red-black trees .

Geometric tree structures such as the R-tree and its variants are not sorted, but “nested” . Only those subtrees that overlap the requested area are searched here.

Although trees are multi-dimensional in their structure, the interlinking of the objects is often unidirectional. The concatenation of the stored objects begins at the root of the tree and from there in the direction of the nodes of the tree.

Heap

The heap combines the data structure of a tree with the operations of a priority queue. In addition to the minimally necessary operations such as insert , remove and extractMin , the heap often has other operations such as merge or changeKey . A min-heap or a max-heap is used depending on the order of priority in the priority queue. Further specializations of the heap are the binary heap , the binomial heap and the Fibonacci heap . Heaps are mostly built using trees.

Treap

The Treap combines the properties of trees ( "trees" ) and heaps.

Hash table

The hash table or hash table is a special index structure in which the memory position can be calculated directly. Hash tables compete with tree structures, which, in contrast to hash tables, can reproduce all index values in one order, but require greater administrative effort to provide the index. When using a hash table to search through data volumes, one speaks of the hash method . A distributed hash table can be used for very large amounts of data .

Storage and computing requirements of data structures

surgery	Array	Dynamic array	Linked list	Balanced tree	Hash table
Insertion / deletion at the beginning	N / A	Θ ( n )	Θ (1)	Θ ( log n )	Θ ( 1 ) to Θ ( n )
Insertion / deletion at the end	N / A	Θ (1)	Θ (1)
Insertion / deletion in the middle	N / A	Θ ( n )	search + Θ (1)
search	Θ ( n )	Θ ( n )	Θ ( n )	Θ ( log n )	Θ ( 1 ) to Θ ( n )
Additional storage space compared to array	0	0, Θ ( n )	Θ ( n )	Θ ( n )	Θ ( n )
Access to any / random item	Θ (1)	Θ (1)	Θ ( n )	Θ ( n )	Θ ( n )

literature

Ellis Horowitz, Sartaj Sahni, Susan Anderson-Freed: Fundamentals of data structures in C . International Thomson Publishing, Bonn 1994, ISBN 3-929821-00-1 .
Chris Okasaki: Purely Functional Data Structures . Cambridge University Press, Cambridge 1999, ISBN 0-521-66350-4
Thomas Ottmann, Peter Widmayer: Algorithms and Data Structures . 4th edition. Spektrum Akademischer Verlag, Heidelberg 2002, ISBN 3-8274-1029-0 .
Hanan Samet: Foundations of Multidimensional and Metric Data Structures. Elsevier, Amsterdam 2006, ISBN 0-12-369446-9
Niklaus Wirth : Algorithms and data structures in Pascal . 5th edition. Teubner, Stuttgart 2000, ISBN 3-519-22250-7

Individual evidence

^ Dan Schmidt The Perl Journal 1999: Building a Better Hash

↑ Assuming that the linked list remembers a pointer of the data end position, otherwise the end of the list must first be determined with the time required Θ ( n ).

↑ Gerald Kruse. CS 240 Lecture Notes ( page no longer available , search in web archives ) Info: The link was automatically marked as defective. Please check the link according to the instructions and then remove this notice. : Linked Lists Plus: Complexity Trade-offs ( page no longer available , search in web archives ) Info: The link was automatically marked as defective. Please check the link according to the instructions and then remove this notice. . Juniata College. Spring 2008.@1@ 2Template: Dead Link / www.juniata.edu @1@ 2Template: Dead Link / www.juniata.edu

↑ Assuming that a buffer is reserved for extensions.

[1] Dan Schmidt The Perl Journal 1999: Building a Better Hash

[2] Assuming that the linked list remembers a pointer of the data end position, otherwise the end of the list must first be determined with the time required Θ ( n ).