Key (database)

A key used in a relational database to the tuples (records) a relative to identify (table) clear to them numbers . A key is then a group of columns that is selected in such a way that each table row has a unique combination of values above the values of this column group.

introduction

In the theory of relational databases, one or more key candidates are required per entity , which by definition must be unique. From these key candidates, one candidate is selected as the primary key and implemented as such when the entity is converted as a database table. Notwithstanding this convention, there are also database systems that allow table definitions without a primary key being defined. Such tables therefore also allow duplicate data records and are therefore not relational entities by definition.

Super key ⊇ key candidates, from these the primary key is selected

In relational databases one differentiates between the key terms

Super key (sometimes also called upper key): Set of attributes (fields) in a relation (table) that uniquely identify the tuples (rows) in this relation, ie always contain different values for tuples selected in pairs (one also says “are unique”). A trivial super key, for example, would be the set of all attributes of a relation in common . (Trivial because a relation is a set of tuples. The elements of sets must be unique, so there cannot be two identical tuples in a relation.)
Key candidate (also called candidate key or alternative key ): A minimal subset of the attributes of a super key, which enables the identification of the tuple (key candidates ⊆ super key).
Primary key: The selected key candidate that will be used to map the relations. The values of this key are used as foreign keys in referencing tables .

Formal definition

Let a certain relational scheme R (the table framework, i.e. all columns) be given. A subset S of the attributes (of the columns) of the schema R is called a key if:

Uniqueness: R cannot contain two different tuples where the values of S are the same.
The aim is to ensure that no (possible) expression of R can contain two different tuples in which the values of S are the same: No technically legal, possible filling of the table may result in two (technically different) lines leading to the same key value.
Definition: Some database systems allow null values, provided that this does not violate the uniqueness.
The aim should be that all entries in the table actually define the attributes from S; none of the entries should be NULL .
Minimalism: For a key to be a key candidate, no real subset of S must already meet the uniqueness condition.

Examples

Literature (a)
ISBN	author	Book title	...
0001	Hans	V	...
0002	Lutz	W.	...
0003	Peter	W.	...
0004	Peter	X	...
0005	Ralf	Y	...
...	...	...	...

Customer (b)
Surname	birthday	place of residence	...
Heinz Hoffmann	08/01/1966	North, BBS	...
Alf Appel	11/08/1957	Mömlingen	...
Sebastian Sunshine	08/04/1979	Hamburg	...
Klaus Kleber	04/15/1970	Frankfurt	...
Barbara Bachmann	10/17/1940	Kirchheim	...
...	...	...	...

IsBossOf (c)
direct supervisor (ID)	Employee (ID)
002	104
030	512
115	519
234	993
234	670
...	...

a: Here the key is a single attribute. The ISBN is very suitable for this, because no two books have the same ISBN. Books can very well have the same title or come from the same author. Note: The ISBN ( International Standard Book Number ) is shown here only symbolically as a sequential number, an ISBN is actually more complicated.
b: Here a combination of two attributes is used as a key. The developer of the database assumes that there are no customers with the same name and their birthday on the same day. If there are customers in this example who have the same name and have their birthday on the same day, then the part of the attributes selected here cannot be used as a key.
c: Here only all attributes of the relation come into question as keys. The personnel number shows which employee of a company is the superior of which other employee. Note: The data records of this relation only contain left-unambiguous tuples (1: n), because for technical-content-related reasons employees usually only have one direct superior. In principle, of course, tuples of relations that are relation types can contain all possible n: m assignments.

Key candidate

A candidate key (engl. Candidate key ) is a minimum set of attributes that uniquely identifies the tuple (records) a relation. The formal definition is: is a relation over the set of attributes , then: is accurate then a candidate key of R if: . ${\ displaystyle R (A)}$ ${\ displaystyle A: = \ {A_ {1}, ..., A_ {n} \}}$ ${\ displaystyle \ alpha \ subseteq A}$ ${\ displaystyle \ alpha {\ dot {\ rightarrow}} A}$

The term full functional dependency - represented by - is used here. Here A is fully functionally dependent, which means: ${\ displaystyle \ alpha {\ dot {\ rightarrow}} A}$ ${\ displaystyle \ alpha}$

If two tuples have the same values in the key attributes (), they also have the same values in all other attributes (A). And: ${\ displaystyle \ alpha}$
If you remove an attribute , property 1 no longer applies. ${\ displaystyle \ alpha}$

In contrast to the super key, only those attribute sets are considered here that can no longer be reduced without losing their key properties; they are also said to be minimally identifying . The following key candidates result for the example relations in the introduction:

a: {ISBN}, {author, book title}
b: {Name, birthday}
c: {Superior, subordinate}

From the list of super keys, those that are minimal were selected. Occasionally, the term is candidate key is used, which is a literal translation of the English technical term candidate key is.

Primary key and alternative key

In order to be able to clearly identify the tuples (= rows) in a relation (= table), a primary key is specified for the relation - one of the key candidates. The primary key is usually selected in such a way that it is as small as possible, that is to say comprises as few attributes as possible or has a data type that is as simple as possible. It should be stable over time, so its values should not change during the entire life cycle of the tables concerned, as this would also entail changes to the associated foreign key values (which is possible in principle through so-called cascading, but is often complex).

In addition, the selected primary key must actually allow the real objects that are represented by the tuples of the relation to be clearly identified. For example, if you select the combination {name, birthday} as the primary key, you also specify that there may not be two people with the same name who have their birthday on the same day (uniqueness, uniqueness). The introduction of surrogate keys (artificial keys, e.g. a sequence number) will avoid this problem in any case. The following primary keys are suitable for the example relations from the introduction (are key candidates ):

a: {ISBN}
b: {Name, birthday}
c: {Superior, subordinate}

Provided that no surrogate keys are to be introduced, the decision in examples b) and c) is irrelevant, because there is only one key candidate; consequently this must also be used as the primary key. In example a), {ISBN} is chosen as the primary key, because this is the smallest key (in contrast to {author, book title, ...} it has only one attribute), and it also accurately reproduces reality. If a primary key consists of several attributes, one also speaks of a combined ( also: composite) primary key or a compound key . By selecting the primary key, all other key candidates of the relation automatically become alternative keys . In our example relations these would be:

a: {Author, book title}
b: no
c: no

The purpose of alternative keys is to enforce uniqueness for all key candidates in the table, not just for the primary key. (This means that alternative keys are in principle suitable to be used in a different relation as a reference to a foreign key).

Secondary key

Secondary keys are groups of attributes that are often used to describe single and multiple tuples (search term). For example, the postcode in an address table can be used as a secondary key.

Secondary keys are implemented in the database by secondary indices (simply “indices” in everyday language). A secondary index is an optional, additional search structure of a database that makes tuples easier to find by avoiding searching through the entire database (similar to the index of a book, through which terms can be specifically found).

Secondary keys do not necessarily have to be unique (only the alternative keys, which are automatically also secondary keys, are unique). But foreign keys (also not necessarily unique) are secondary keys, because they are used to describe data records (sort, group, etc.).

Deputy key

It is possible that all key candidates of a relation consist of several attributes, or that all key candidates only insufficiently reflect the actual situation. One such case is b) of our examples. If you want to identify a person in the Customer table, you must always enter their name and birthday at the same time. It is therefore often desirable to introduce an additional attribute that serves as the primary key: This is called an alternate key (Engl. Surrogate key ). For example b), a business identification number such as “customer number” or a sequential number would be appropriate.

Foreign key

A primary key of one relation can become the foreign key of another

A foreign key is an attribute or an attribute combination of a relation that refers to a primary key (or key candidate) of another relation or the same relation.

It serves as a reference between two relations, i.e. That is, it shows which tuples of the relations are related to one another in terms of content. Examples of foreign keys are the two attributes “superior” and “subordinate” from the example relation c) of the introduction: The “personnel number” of an employee is specified here. But there is little that can be done with such a number in everyday life; much more important are name, department, employment and similar information. Therefore, there will most likely exist another relation that contains attributes such as {personnel number, name, department, employment, ...}. This relation will also most likely have the primary key {personnel number}; it is therefore advisable to use the personnel number as a foreign key.

definition

Let R, S be relations and the attribute set α the primary key of R. If a compatible attribute set β from S is to be a foreign key with respect to α, then the values of β must be a subset of the values of the primary key α in R. (see referential integrity )

An attribute set is compatible with another if the value ranges of the attributes involved are the same, i.e. dom (α) = dom (β) .

Foreign keys and relationship types

In the database world, a distinction is made between different types of relationships between two relations R and S. The term "relation" is - for better understanding - to be equated with the table. In the case of relational databases, a distinction is made between the following types of relationships:

1: 1 relationship , each data record from R is assigned a maximum of 1 data record from S; a maximum of 1 data record from R is assigned to each data record from S
1: n relationship , no data record, one data record or several data records from S are assigned to each data record from R; a maximum of 1 data record from R is assigned to each data record from S
n: m relationship , each data record from R can be assigned one data record or several data records from S; Several data records from R can be assigned to each data record from S.

Cases 1 and 2 are implemented with S containing the primary key from R as a foreign key. In the case of the 1: 1 relationship, this also becomes the primary key. For the n: m relationship, as in example c) above, you need a separate relationship that contains the primary keys of both relationships as foreign keys. Both sets of attributes together are the primary key of this "linkage relation".

Note: The actual so-called cardinalities of these three relationship types are 1: 1 ⇒ [0,1]: [0,1], 1: n ⇒ [0,1]: [0, *] and n: m ⇒ [0, * ]: [0, *]. The "*" sign stands for "any number".

Other uses of the term

The following terms are not keys in the sense of relational databases:

Search key : An attribute or an attribute combination of a relation that serves as a search criterion. A search key does not necessarily have to be an identifying key. Several data records can therefore also qualify using the same key value.

Keys can also be differentiated as natural or artificial keys according to the category of their derivation.

Talking key (also called natural key ): A key candidate that is naturally present in the tuple. Such a key also has a meaning in the real world, such as B. "Chassis number" for police-registered vehicles. In the case of speaking keys, it should be noted that the key domain can break if the field domain is not chosen carefully. For example, due to the growth in new vehicle registrations, five-digit vehicle numbers can at some point be too small, which requires a corresponding reorganization of the key designations. If an attempt is made to encode speaking group assignments in the key, key breaks are very likely because the number ranges are not used consecutively. In addition, such a practice also violates the requirement of normalization, which is why a group assignment should be made via an attribute field or even an N: M assignment table. Natural keys are often useful, for example to sort the data records in the chronology of their creation, or are required by the client. For example, the account numbers follow the specifications of the chart of accounts . When designing the key domain, criteria such as planned cardinality, legibility and handling by the user must be sufficiently taken into account.
Substitute key ( surrogate key ): An artificially generated attribute that did not appear in the tuple before, which identifies the tuples of the relation, which is often used as the primary key. Trivial example: Consecutive document number. Surrogate keys are used, for example, in OLAP technology, where very broad, composite keys are converted to a more compact, artificial surrogate key. In heterogeneous application systems, the surrogate key for a specific entity is assigned by the leading system identified for this purpose . If data records are passed on to second systems via an interface, then in addition to the key attribute, which is used in a movement table, for example, the associated entity with the master data must also be passed to the second system.

literature

Andreas Heuer, Gunter Saake: Databases. Concepts and languages. MITP Verlag, ISBN 3-8266-0619-1
A. Eickler, A. Kemper: Database systems. Oldenbourg Verlag, ISBN 3-486-27392-2