Join (SQL)
A SQL - Join (German: Verbund) forms from the records of two tables of a relational database a result table whose records contain attributes of both tables according to a specified join condition. It is the implementation of the concept of the composite of relational algebra in the query language SQL. The ISO standard for SQL describes the following types of joins:
- The Cartesian product
CROSS JOIN
, - the inner compound in the form of the natural compound
NATURAL JOIN
and other variants, as well - the outer compound in the forms
LEFT OUTER JOIN
,RIGHT OUTER JOIN
andFULL OUTER JOIN
.
As a special case, a table can also be joined with itself; this is known as a self join .
Example tables
The tables Mitarbeiter
and Abteilung
are used as examples to demonstrate the various forms of association . The identifying MId
name of an employee , as well as that of AbtId
the department in which he is employed, is recorded. The department has the attributes AbtId
that identify the department and its name. A network links information from the two tables.
MId | Surname | Dept. |
---|---|---|
M1 | Müller | 31 |
M2 | Schmidt | 32 |
M3 | Müller | 32 |
M4 | Meyer | ZERO |
Dept. | Abbot Name |
---|---|
31 | sale |
32 | technology |
33 | marketing |
The example tables have the following special features:
- The employee named "Meyer" is not assigned to any department. The value
NULL
asAbtId
in SQL means that this value is unknown . - The "Marketing" department has no assigned employees.
Using these two tables as an example, the various forms of association can be illustrated.
Cartesian product (CROSS JOIN)
The CROSS JOIN
two tables form the Cartesian product of the data records in the two tables. Each data record in the first table is linked to every other record in the second table. If the two tables have attributes with the same name, they are supplemented by placing the table name in front.
Example of CROSS JOIN
s:
SELECT *
FROM Mitarbeiter CROSS JOIN Abteilung
The explicit naming of the Cartesian product was introduced with the SQL standard SQL-92 . In the 1989 SQL standard, the same result can be obtained with the following statement:
SELECT *
FROM Mitarbeiter, Abteilung
The result has 4 × 3 records because there are 4 employees and 3 departments; it consists of all possible combinations of the data records in the two tables. Since this AbtId
occurs in both tables, the two attributes are made unique in the result table by placing the table name in front.
MId | Surname | Employee.AbtId | Department.AbtId | Abbot Name |
---|---|---|---|---|
M1 | Müller | 31 | 31 | sale |
M1 | Müller | 31 | 32 | technology |
M1 | Müller | 31 | 33 | marketing |
M2 | Schmidt | 32 | 31 | sale |
M2 | Schmidt | 32 | 32 | technology |
M2 | Schmidt | 32 | 33 | marketing |
M3 | Müller | 32 | 31 | sale |
M3 | Müller | 32 | 32 | technology |
M3 | Müller | 32 | 33 | marketing |
M4 | Meyer | ZERO | 31 | sale |
M4 | Meyer | ZERO | 32 | technology |
M4 | Meyer | ZERO | 33 | marketing |
Inner bond
The result of the inner join of two tables is the combination of the data records of the tables involved that meet the join condition. As a rule, the join condition requires the equality of values of certain attributes in the two tables. However, it can also contain other comparison operators than equality.
NATURAL JOIN
The basic form of the inner bond is the natural bond NATURAL JOIN
.
In the case of a natural join, the data records of the two tables involved are linked with one another, the values of which match the attributes of the same name . So only the " suitable " combinations are formed and not all possible combinations as with the Cartesian product .
Example for the NATURAL JOIN
:
SELECT *
FROM Mitarbeiter NATURAL JOIN Abteilung
In our example, the two tables have a common attribute, the AbtID
. In the natural network, each employee is assigned the department whose department AbtId
is recorded with him. Since the value of AbtId
in both tables is identical in each line , the attribute only appears once in the results table:
MId | Surname | AbtID | Abbot Name |
---|---|---|---|
M1 | Müller | 31 | sale |
M2 | Schmidt | 32 | technology |
M3 | Müller | 32 | technology |
Since the employee named "Meyer" is not assigned to any department, "M4" does not appear in the result. The “Marketing” department does not appear either because no employee is assigned to it.
JOIN ... USING ...
With the natural join, the combinations are formed by comparing the values of all attributes of the same name in the two tables. Often, however, you want to explicitly specify which of the attributes should be used to link the data records. There is a form for thatJOIN ... USING ...
In our example:
SELECT *
FROM Mitarbeiter JOIN Abteilung USING (AbtID)
In this example, the result is identical to that of the natural bond.
This form of association is NATURAL JOIN
preferable to the variant because the explicit specification of the attributes ensures that no undesired combinations are formed when the tables involved are expanded. In our example, if Mitarbeiter
the new attribute Ort
for the employee's place of residence is introduced into the table Abteilung
and an attribute named Ort
for the location of the department is also added to the table , then NATURAL JOIN
the attributes AbtId
and Ort
the creation of the linked data records are compared, which does not correspond to the intention to assign employees to their departments.
JOIN ... ON ...
It is also possible to create a join between tables in which the names of the attributes to be compared do not match or in which an operator other than =
is to be used.
For the example tables, a statement with this form of join is
SELECT *
FROM Mitarbeiter JOIN Abteilung ON Mitarbeiter.AbtId = Abteilung.AbtId;
Result:
MId | Surname | Employee.AbtId | Department.AbtId | Abbot Name |
---|---|---|---|---|
M1 | Müller | 31 | 31 | sale |
M2 | Schmidt | 32 | 32 | technology |
M3 | Müller | 32 | 32 | technology |
You can put in front of the two forms JOIN ... USING ...
and JOIN ... ON ...
also the optional keyword INNER
to underline that it is an inner compound.
In the SQL standard of 1989 one would formulate the above statement as follows:
SELECT *
FROM Mitarbeiter, Abteilung
WHERE Mitarbeiter.AbtId = Abteilung.AbtId
In the first form of the statement, the compound is specified explicitly , while in the second variant, a Cartesian product is initially formed and the compound is achieved through the subsequent restriction with the condition Mitarbeiter.AbtId = Abteilung.AbtId
.
The forms of the inner union in the previous examples are also referred to in the literature as equijoin (German: Gleichverbund ) because the equality operator is used in the condition for the union. In the variant, it is JOIN ... ON ...
also possible to specify any condition for linking the data records of the tables involved, e.g. B. with the comparison operator ≤
. In this case the join is called a theta join .
Outer join (OUTER JOIN)
The employee with the MId
"M4" does not appear in any of the forms of the internal structure of the example tables , because no department is assigned to him. And the “Marketing” department does not appear because it has no employees.
The forms of the outer join include data records in the join for which the values in the two tables do not correspond. The outer network must therefore always be used when unknown or missing information is involved.
If, in the example of employees and departments, all employees and their departments are to be output, including those who are not assigned to any department, then an external network is required.
LEFT OUTER JOIN
The result from tables T 1 and T 2 contains all data records in table T 1 on the left of the keyword , even if there is no corresponding data record in table T 2 on the right . The missing values from T 2 are filled with NULL. The keyword is not required in any of the examples, but it can be written to draw the reader's attention to the outer compound.
T1 LEFT OUTER JOIN T2
JOIN
OUTER
In the example of employees and departments
SELECT *
FROM Mitarbeiter LEFT OUTER JOIN Abteilung USING (AbtId)
MId | Surname | Dept. | Abbot Name |
---|---|---|---|
M1 | Müller | 31 | sale |
M2 | Schmidt | 32 | technology |
M3 | Müller | 32 | technology |
M4 | Meyer | ZERO | ZERO |
The result now also contains the employee with the MId
"M4" and the attributes from the linked table Department are NULL.
RIGHT OUTER JOIN
One RIGHT OUTER JOIN
forms the inner link between the two tables and supplements it with one data record each for data records in the table on the right, for which there is no correspondence in the table on the left.
In the example of employees and departments
SELECT *
FROM Mitarbeiter RIGHT OUTER JOIN Abteilung USING (AbtId)
MId | Surname | Dept. | Abbot Name |
---|---|---|---|
M1 | Müller | 31 | sale |
M2 | Schmidt | 32 | technology |
M3 | Müller | 32 | technology |
ZERO | ZERO | 33 | marketing |
The result now contains a data record for the "Marketing" department to which no employee is assigned, which is why the attributes MId
and are Name
NULL.
Another example in which the external network is required: All departments are to be output with the number of their employees. Since AbtId
no data record at all would be output with the internal link for the department with 33, the instruction must be formulated with the external link:
SELECT AbtName, count(MId) as Mitarbeiterzahl
FROM Mitarbeiter RIGHT OUTER JOIN Abteilung USING (AbtId)
GROUP BY AbtId, AbtName
Result:
Abbot Name | Number of employees |
---|---|
sale | 1 |
technology | 2 |
marketing | 0 |
FULL OUTER JOIN
One FULL OUTER JOIN
is the union of the results of LEFT
and of RIGHT OUTER JOIN
s.
The example for employees and departments:
SELECT *
FROM Mitarbeiter FULL OUTER JOIN Abteilung USING (AbtId)
MId | Surname | Dept. | Abbot Name |
---|---|---|---|
M1 | Müller | 31 | sale |
M2 | Schmidt | 32 | technology |
M3 | Müller | 32 | technology |
M4 | Meyer | ZERO | ZERO |
ZERO | ZERO | 33 | marketing |
Further variants of the syntax of the outer compound are contained in the syntax diagram below.
Self join
A self join is a join of a table with itself. This means that records in the table must be compared with other records in the same table. In order to be able to compare values of the data records of the same table in SQL , you have to assign explicit names for two tuple variables in the statement, both of which can run through the data records of the table.
As an example, it should be checked whether there are two employees with the same name but different MId in the Employees table. In the following self join the tuple variables MA
and are defined MB
for the table “Employees” in order to be able to carry out the comparison.
SELECT MA.MId, MA.Name
FROM Mitarbeiter MA CROSS JOIN Mitarbeiter MB
WHERE MA.MId <> MB.MId AND MA.Name = MB.Name
The result is:
MId | Surname |
---|---|
M1 | Müller |
M3 | Müller |
In fact, when processing a SELECT statement, SQL creates a tuple variable for every table, which normally has the same name as the table itself. If the statement is, for example select * from Mitarbeiter where Mitarbeiter.MId = 'M1'
, a tuple variable is used, which is called like the table Mitarbeiter
and accepts every data record in the table as its content can. The filter condition MId = 'M1'
is then checked by running the tuple variable through the table. It is possible to name the tuple variable explicitly, for example select * from Mitarbeiter M where M.MId = 'M1'
. This is used in the self join to have two tuple variables for one and the same table, whereby their data records can be compared with one another.
Syntax diagram
The following syntax diagram shows all forms of join in SQL.
The combination of two "table references" is shown in the syntax diagram. A “table reference” can itself be an instruction for a compound, i. H. not only joins with two, but with several tables can be formed.
Let us imagine in our example that there is a third table Adresse
that uses the attribute to assign AdrId
the employee his address. Then the following statement results in the assignment of the address and the department to the respective employee:
SELECT *
FROM Mitarbeiter JOIN Adresse using (AdrId)
JOIN Abteilung using (AbtId)
The inner compound is (apart from the order of the attributes in the result) both commutative and associative . The outer join is not commutative, and in general not associative either. If different forms of compound are used in a statement with several tables, it is advisable to use parentheses for clarity.
Product specific
The various database management systems partly deviate from the SQL standard or have other variants for the formulation of the external network. The following links point to the join documentation for some of the popular products:
-
IBM Db2
- IBM Db2
NATURAL JOIN
does not support the variant .
- IBM Db2
-
Microsoft SQL Server
- The SQL dialect for Microsoft SQL Server is called Transact-SQL .
- Transact SQL does not support the forms
NATURAL JOIN
andJOIN ... USING ...
joins, so there is only the variantON
with which, however, all tasks can be managed.
-
MySQL
- MySQL supports all forms of join according to SQL-92.
- MySQL has
STRAIGHT JOIN
a special variant of the join. Typically, the query optimizer of a database management system decides how the federation is actually carried out. WithSTRAIGHT JOIN
the MySQL optimizer the order in which the join should be executed is given. - MySQL does
FULL [OUTER] JOIN
n't support that . This form of the outer compound can be createdLEFT/RIGHT OUTER JOIN
together with the operator by a construction fromUNION
.
-
Oracle
- Oracle also has a proprietary syntax for the outer join, only in 2001 with version 9 the syntax of SQL-92 for the outer join was introduced.
- Today, Oracle recommends the SQL standard syntax for the outer join.
-
PostgreSQL
- PostgreSQL supports all forms of join according to SQL-92. (The PostgreSQL documentation includes a tutorial on joining .)
-
SQLite
-
SQLite only supports the
LEFT OUTER JOIN
. The two other forms of the outer compound can be generated by theLEFT OUTER JOIN
together with the operatorUNION
.
-
SQLite only supports the
See also
literature
- CJ Date with Hugh Darwen: A Guide to the SQL Standard . 4th edition. Addison-Wesley, 1997.
- Wolfgang Panny with Alfred Taudes: Introduction to the core language of SQL-99 . Springer, 2000.
- Gunter Saake, Kai-Uwe Sattler and Andreas Heuer: Databases: Concepts and Languages . 4th edition. mitp, 2010, p. 221 f. and 339 f .
- Alfons Kemper and André Eckler: Database Systems: An Introduction . 6th edition. Oldenbourg, 2006, p. 125 f .
Remarks
-
↑ The syntax of SQL92 emphasizes
CROSS JOIN
the intention that the Cartesian product is to be formed through the keywords . -
↑ There are many articles on the Internet about the network, in which it is illustrated by Venn diagrams for set operations. In these articles, the inner join is often formed from two tables that have the same schema and only in this special case is the natural join actually identical to the intersection of the two tables. In general, however, this is not the case. In our example, you can admire the intersection of the two tables
Mitarbeiter
andAbteilung
not form because the records of the two tables have different structures and are not comparable. The visualization of the compound operators using Venn diagrams is generally incorrect and therefore misleading.
Individual evidence
- ↑ CJDate with Hugh Darwen: A Guide to the SQL Standard, Fourth Edition, Addison-Wesley, 1997, pp. 135ff
- ↑ a b The various types of compound are described in the SQL standard SQL: 2011 in Part 2 Foundation Section 7.7.
- ^ Gunther Saake, Kai-Uwe Sattler and Andreas Heuer: Databases: Concepts and Languages . mitp, S. 339 .
- ^ Gunther Saake, Kai-Uwe Sattler and Andreas Heuer: Databases: Concepts and Languages . mitp, S. 220 f .