Calculus (database)

For the theoretical analysis and semantically precise definition of query languages for databases are calculus expressions used, especially the Tupelkalkül ( English tuple calculus ) and the area of calculus (also Domain calculus , English domain calculus ). Because of their similarity, they are considered together here. But there are other calculus-like query languages.

Details

It is assumed that data is stored in a database in a number of tuples (relations) of the same type, i. H. the components of the tuples in a tuple set all have the same data type, and thus the same operations that can be applied to them. Since the calculi “only” represent a theoretical foundation, data types are not considered further here, but it is assumed that the usual operations apply.

The queries that are specified with the calculi are then constructed as sets . This construction follows a certain form: With the tuple calculus sets are constructed from tuples, with the range calculus sets are constructed from the so-called "areas", which are essentially free variables that can take values from the value range of the tuple components. In general, however, it can be defined where the data come from, how they are to be linked and which additional conditions should apply.

Calculations provide a very powerful query language, which is not desirable in the area of databases. It is Z. For example, it is not a problem to specify an infinite set of results using calculus expressions, which violates a design principle of query languages. For general database theory, it is therefore of interest to distinguish between safe calculus expressions , which correspond in power to relational algebra . Calculus expressions are still strictly relationally complete .

For SQL there is a formal definition using the tuple calculus. QBE is based on the domain calculus , QUEL is a relatively direct implementation of parts of the tuple calculus.

General form

A lot can be constructed in mathematics by specifying what should apply to each element. The simple form for this is {x | Formula} , d. H. I choose all x for which the formula applies, e.g. B. to get the natural numbers between 2 and 8. It was stated here, firstly, where the data come from (natural numbers) and secondly, what should apply to them (the formula). x is a free variable whose range of values was determined by the formula. Instead of a simple free variable, a formula can also be specified as the result, e.g. B. x² to get the square numbers of the numbers between 2 and 8. The general form of this set construction is {f (X) | g (Y)} , where f and g are the respective formulas and X and Y are sets of free variables. ${\ displaystyle \ {x \ mid x \ in \ mathbb {N} \ \ land \ {2 <x <8 \}}}$

For the database area, the value ranges are not determined arbitrarily, but they come from the mentioned tuple sets (usually relations, but also entities or object sets). The results can be expressed either as sets of variables or as tuples; H. the set is constructed as “all free variables for which ...” or “all tuples for which ...”. The former leads to the domain calculus; the second to the tuple calculus.

Calculus formulas

The set of possible formulas is also limited to a certain form, which is very similar for the two calculi. Terms are converted into atomic formulas, which in turn are combined into formulas. Ranges of values of free variables are specified by database predicates , which determine where the data comes from (range calculus : CUSTOMER (x, y, z) , tuple calculus: CUSTOMER (k) ).

Terms: variables, constants, function applications
atomic formulas: application of operations of the data types, e.g. B. <,> on the value range of the natural numbers and database predicates
Formulas: Combination of atomic formulas using operators of Boolean logic, ∧, ∨, ¬, ∀, ∃

The calculi can be extended in such a way that it is comparatively easy to formalize database languages such as SQL with them . To do this, instead of a set semantic, a multi-set semantic is introduced and the additional option of using formulas - especially the aggregate functions - in the results area. Furthermore, it is conceivable to again allow queries for database predicates so that the calculations are completed .

Area calculus

The form is {x ₁ ,…, x _n | φ (x ₁ ,…, x _n )} , where the x _{i are} the result as tuple sets and φ is a formula. These formulas are structured as indicated above. Not all variables have to be assigned in database predicates. Instead, '_' can be written, which means that they are each different, unnamed free variables.

Examples

Relationships with the relationship schemes CUSTOMER (customer no., Kname, address, location), ORDER (order no., Customer no., Goods no., Quantity), GOODS (goods no., Wname, wprice) are given.

Places with customers: {x | CUSTOMER (_, _, _, x)}

All customers from Bremen: {x, y, z | CUSTOMER (x, y, z, w) ∧ w = 'Bremen'} or briefly {x, y, z | CUSTOMER (x, y, z, 'Bremen')}

Customers who have ordered something: {x, y, z | CUSTOMER (x, y, z, _) ∧ ORDER (_, x, _, _)}

Goods that were not ordered: {x, y | GOODS (x, y, _) ∧ ¬ ORDER (_, _, x, _)}

Tuple calculus

The form is {t | φ (t)} . t is a tuple variable, φ a formula as given above. Database predicates are written as R (t) or t∈R , individual elements of the tuple (attributes) are accessed using dot notation with specification of the name from the schema, i.e. tA , or using the index t [i] .

Examples

With the above scheme,

Places where there are customers: {t.ort | CUSTOMER (t)}

All customers from Bremen: {t | CUSTOMER (t) ∧ t.ort = 'Bremen'}

Customers with orders: {t | CUSTOMER (t) ∧ ∃s (ORDER (s) ∧ s.kdnr = t.kdnr)}

Goods without order: {t | GOODS (t) ∧ ¬∃s (ORDER (s) ∧ s.warennr = t.warennr)}

As you can see, the join conditions are explicitly specified in the tuple calculus, whereas a join in the range calculus is formed by identically named variables.

literature

Andreas Heuer, Gunter Saake: Databases: Concepts and Languages , MITP Verlag, ISBN 3-8266-0619-1 , p. 297 ff.