Database operator

from Wikipedia, the free encyclopedia

A database operator is part of a database query that is responsible for executing a single sub-step of the query. A database creates an evaluation plan for an inquiry , the execution of which delivers the requested result.

Schematically, an evaluation plan is a tree that contains the database operators as nodes . The stored data tables are located on the leaves of the tree. From there they are passed up from operator to operator until the query result is available at the root . The tree is also known as the operator tree .

Processing a request

In the case of relational databases, a query to a database is typically sent to the database management system as an SQL query . Here the query is broken down by a parser into the relational operations that are necessary to answer the query.

Logical and physical operators

One differentiates between the

  • Logical operators
  • Physical operators

While the logical operators represent the mathematical operations of relational algebra , behind the physical operators there are executable algorithms that implement the logical operator.

Relational algebra defines the following logical operators with which all other operators can be formed:

Each logical operator can be executed by very different physical operators. The database management system can decide at runtime which implementation is best in a specific case. There are also derived operators that can be derived from the logical operators through nesting. A specialized, derived operator is the join operator , which performs selection and cross product one after the other in order to implement this important operation as efficiently as possible. In addition, the difference operator and the division operator are further derived operators.

optimization

As a rule, the same query can be calculated in very different ways, which, depending on the query and the data available, can be very different in speed. This means that both execution plans have the same result in terms of quantity and are therefore mathematically equivalent . Modern database management systems therefore have complex query optimizers that select the most efficient one from the set of all possible execution plans.

Logical optimization

First, a logical optimization is carried out here. It is investigated whether a query can be mathematically simplified without influencing the result. That is, the operator tree is transformed into an equivalent tree . Typically, multiple selection operators on the same data source are eliminated here or selection operators that always lead to a reduction in effort are moved down the tree as far as possible.

Physical optimization

The next step is physical optimization. Now the optimizer selects the best possible algorithm for an operation. In doing so, it takes into account the cardinality of a data source, i.e. the number of elements that have to be processed, as well as existing indices on a relation. There are algorithms that are only very fast if there is an index, such as the index join .

ONC

Each operator is a node in an operator tree. Therefore it has one or two data sources and exactly one data output with which data can be passed on to another operator. The operators are typically implemented as iterators with an ONC interface . ONC stands for Open, Next, Close. The operator is therefore initially opened with Open . Then you can use Next to iterate over the elements that the operator delivers. With each iteration step, the operator requests as many elements from his data sources as he needs to calculate a new result element of the result relation. Finally, to release system resources, the operator can be closed using Close .

example

Operator tree of the example query

Assume that there are two relations person and address with the following attributes:

person.id
person.first name
person.name
person.date of birth
address.id
address.person_id
address.strasse
address.location

Then the request for all people from Marburg would look like this:

select p.name from person p , anschrift a  where a.ort = Marburg AND  p.id = a.personen_id;

The image on the right shows how the query is represented as an operator tree. The tool SELECT2OBaum offers the possibility to convert SELECT queries interactively into operator trees.

Others

It is noteworthy that the implementation of the database operators, which can still be explained today by the large differences in speed when accessing main storage space or hard disk space, was implemented again through operations on tree-like data structures, usually B-trees or B * -trees . To the extent that such speed bottlenecks are eliminated, less high-performance or problem-like data structures can also be used. The speed bottle necks in the storage organization are similar to the bottle necks according to von Neumann , see also storage hierarchy .

literature

Web links

  • XXL in Java - database library for teaching and research purposes

Individual evidence

  1. SELECT2OBaum at fh-koeln.de