Join (SQL)

from Wikipedia, the free encyclopedia

A SQL - Join (German: Verbund) forms from the records of two tables of a relational database a result table whose records contain attributes of both tables according to a specified join condition. It is the implementation of the concept of the composite of relational algebra in the query language SQL. The ISO standard for SQL describes the following types of joins:

  • The Cartesian product CROSS JOIN,
  • the inner compound in the form of the natural compound NATURAL JOINand other variants, as well
  • the outer compound in the forms LEFT OUTER JOIN, RIGHT OUTER JOINand FULL OUTER JOIN.

As a special case, a table can also be joined with itself; this is known as a self join .

Example tables

The tables Mitarbeiterand Abteilungare used as examples to demonstrate the various forms of association . The identifying MIdname of an employee , as well as that of AbtIdthe department in which he is employed, is recorded. The department has the attributes AbtIdthat identify the department and its name. A network links information from the two tables.

Employee
MId Surname Dept.
M1 Müller 31
M2 Schmidt 32
M3 Müller 32
M4 Meyer ZERO
Department
Dept. Abbot Name
31 sale
32 technology
33 marketing

The example tables have the following special features:

  • The employee named "Meyer" is not assigned to any department. The value NULLas AbtIdin SQL means that this value is unknown .
  • The "Marketing" department has no assigned employees.

Using these two tables as an example, the various forms of association can be illustrated.

Cartesian product (CROSS JOIN)

The CROSS JOINtwo tables form the Cartesian product of the data records in the two tables. Each data record in the first table is linked to every other record in the second table. If the two tables have attributes with the same name, they are supplemented by placing the table name in front.

Example of CROSS JOINs:

SELECT *
FROM Mitarbeiter CROSS JOIN Abteilung

The explicit naming of the Cartesian product was introduced with the SQL standard SQL-92 . In the 1989 SQL standard, the same result can be obtained with the following statement:

SELECT *
FROM Mitarbeiter, Abteilung

The result has 4 × 3 records because there are 4 employees and 3 departments; it consists of all possible combinations of the data records in the two tables. Since this AbtIdoccurs in both tables, the two attributes are made unique in the result table by placing the table name in front.

MId Surname Employee.AbtId Department.AbtId Abbot Name
M1 Müller 31 31 sale
M1 Müller 31 32 technology
M1 Müller 31 33 marketing
M2 Schmidt 32 31 sale
M2 Schmidt 32 32 technology
M2 Schmidt 32 33 marketing
M3 Müller 32 31 sale
M3 Müller 32 32 technology
M3 Müller 32 33 marketing
M4 Meyer ZERO 31 sale
M4 Meyer ZERO 32 technology
M4 Meyer ZERO 33 marketing

Inner bond

The result of the inner join of two tables is the combination of the data records of the tables involved that meet the join condition. As a rule, the join condition requires the equality of values ​​of certain attributes in the two tables. However, it can also contain other comparison operators than equality.

NATURAL JOIN

The basic form of the inner bond is the natural bond NATURAL JOIN.

In the case of a natural join, the data records of the two tables involved are linked with one another, the values ​​of which match the attributes of the same name . So only the " suitable " combinations are formed and not all possible combinations as with the Cartesian product .

Example for the NATURAL JOIN:

SELECT *
FROM Mitarbeiter NATURAL JOIN Abteilung

In our example, the two tables have a common attribute, the AbtID. In the natural network, each employee is assigned the department whose department AbtIdis recorded with him. Since the value of AbtIdin both tables is identical in each line , the attribute only appears once in the results table:

MId Surname AbtID Abbot Name
M1 Müller 31 sale
M2 Schmidt 32 technology
M3 Müller 32 technology

Since the employee named "Meyer" is not assigned to any department, "M4" does not appear in the result. The “Marketing” department does not appear either because no employee is assigned to it.

JOIN ... USING ...

With the natural join, the combinations are formed by comparing the values ​​of all attributes of the same name in the two tables. Often, however, you want to explicitly specify which of the attributes should be used to link the data records. There is a form for thatJOIN ... USING ...

In our example:

SELECT *
FROM Mitarbeiter JOIN Abteilung USING (AbtID)

In this example, the result is identical to that of the natural bond.

This form of association is NATURAL JOINpreferable to the variant because the explicit specification of the attributes ensures that no undesired combinations are formed when the tables involved are expanded. In our example, if Mitarbeiterthe new attribute Ortfor the employee's place of residence is introduced into the table Abteilungand an attribute named Ortfor the location of the department is also added to the table , then NATURAL JOINthe attributes AbtId and Ort the creation of the linked data records are compared, which does not correspond to the intention to assign employees to their departments.

JOIN ... ON ...

It is also possible to create a join between tables in which the names of the attributes to be compared do not match or in which an operator other than =is to be used.

For the example tables, a statement with this form of join is

SELECT *
FROM Mitarbeiter JOIN Abteilung ON Mitarbeiter.AbtId = Abteilung.AbtId;

Result:

MId Surname Employee.AbtId Department.AbtId Abbot Name
M1 Müller 31 31 sale
M2 Schmidt 32 32 technology
M3 Müller 32 32 technology

You can put in front of the two forms JOIN ... USING ...and JOIN ... ON ...also the optional keyword INNERto underline that it is an inner compound.

In the SQL standard of 1989 one would formulate the above statement as follows:

SELECT *
FROM Mitarbeiter, Abteilung
WHERE Mitarbeiter.AbtId = Abteilung.AbtId

In the first form of the statement, the compound is specified explicitly , while in the second variant, a Cartesian product is initially formed and the compound is achieved through the subsequent restriction with the condition Mitarbeiter.AbtId = Abteilung.AbtId.

The forms of the inner union in the previous examples are also referred to in the literature as equijoin (German: Gleichverbund ) because the equality operator is used in the condition for the union. In the variant, it is JOIN ... ON ...also possible to specify any condition for linking the data records of the tables involved, e.g. B. with the comparison operator . In this case the join is called a theta join .

Outer join (OUTER JOIN)

The employee with the MId"M4" does not appear in any of the forms of the internal structure of the example tables , because no department is assigned to him. And the “Marketing” department does not appear because it has no employees.

The forms of the outer join include data records in the join for which the values ​​in the two tables do not correspond. The outer network must therefore always be used when unknown or missing information is involved.

If, in the example of employees and departments, all employees and their departments are to be output, including those who are not assigned to any department, then an external network is required.

LEFT OUTER JOIN

The result from tables T 1 and T 2 contains all data records in table T 1 on the left of the keyword , even if there is no corresponding data record in table T 2 on the right . The missing values ​​from T 2 are filled with NULL. The keyword is not required in any of the examples, but it can be written to draw the reader's attention to the outer compound. T1 LEFT OUTER JOIN T2 JOINOUTER

In the example of employees and departments

SELECT *
FROM Mitarbeiter LEFT OUTER JOIN Abteilung USING (AbtId)
MId Surname Dept. Abbot Name
M1 Müller 31 sale
M2 Schmidt 32 technology
M3 Müller 32 technology
M4 Meyer ZERO ZERO

The result now also contains the employee with the MId"M4" and the attributes from the linked table Department are NULL.

RIGHT OUTER JOIN

One RIGHT OUTER JOIN forms the inner link between the two tables and supplements it with one data record each for data records in the table on the right, for which there is no correspondence in the table on the left.

In the example of employees and departments

SELECT *
FROM Mitarbeiter RIGHT OUTER JOIN Abteilung USING (AbtId)
MId Surname Dept. Abbot Name
M1 Müller 31 sale
M2 Schmidt 32 technology
M3 Müller 32 technology
ZERO ZERO 33 marketing

The result now contains a data record for the "Marketing" department to which no employee is assigned, which is why the attributes MIdand are NameNULL.

Another example in which the external network is required: All departments are to be output with the number of their employees. Since AbtIdno data record at all would be output with the internal link for the department with 33, the instruction must be formulated with the external link:

SELECT AbtName, count(MId) as Mitarbeiterzahl
FROM Mitarbeiter RIGHT OUTER JOIN Abteilung USING (AbtId)
GROUP BY AbtId, AbtName

Result:

Abbot Name Number of employees
sale 1
technology 2
marketing 0

FULL OUTER JOIN

One FULL OUTER JOINis the union of the results of LEFTand of RIGHT OUTER JOINs.

The example for employees and departments:

SELECT *
FROM Mitarbeiter FULL OUTER JOIN Abteilung USING (AbtId)
MId Surname Dept. Abbot Name
M1 Müller 31 sale
M2 Schmidt 32 technology
M3 Müller 32 technology
M4 Meyer ZERO ZERO
ZERO ZERO 33 marketing

Further variants of the syntax of the outer compound are contained in the syntax diagram below.

Self join

A self join is a join of a table with itself. This means that records in the table must be compared with other records in the same table. In order to be able to compare values ​​of the data records of the same table in SQL , you have to assign explicit names for two tuple variables in the statement, both of which can run through the data records of the table.


As an example, it should be checked whether there are two employees with the same name but different MId in the Employees table. In the following self join the tuple variables MAand are defined MBfor the table “Employees” in order to be able to carry out the comparison.

SELECT MA.MId, MA.Name
FROM Mitarbeiter MA CROSS JOIN Mitarbeiter MB
WHERE MA.MId <> MB.MId AND MA.Name = MB.Name

The result is:

MId Surname
M1 Müller
M3 Müller

In fact, when processing a SELECT statement, SQL creates a tuple variable for every table, which normally has the same name as the table itself. If the statement is, for example select * from Mitarbeiter where Mitarbeiter.MId = 'M1', a tuple variable is used, which is called like the table Mitarbeiterand accepts every data record in the table as its content can. The filter condition MId = 'M1'is then checked by running the tuple variable through the table. It is possible to name the tuple variable explicitly, for example select * from Mitarbeiter M where M.MId = 'M1'. This is used in the self join to have two tuple variables for one and the same table, whereby their data records can be compared with one another.

Syntax diagram

The following syntax diagram shows all forms of join in SQL.

Syntax of the variants of the compound in SQL

The combination of two "table references" is shown in the syntax diagram. A “table reference” can itself be an instruction for a compound, i. H. not only joins with two, but with several tables can be formed.

Let us imagine in our example that there is a third table Adressethat uses the attribute to assign AdrIdthe employee his address. Then the following statement results in the assignment of the address and the department to the respective employee:

SELECT *
FROM Mitarbeiter JOIN Adresse using (AdrId)
                 JOIN Abteilung using (AbtId)

The inner compound is (apart from the order of the attributes in the result) both commutative and associative . The outer join is not commutative, and in general not associative either. If different forms of compound are used in a statement with several tables, it is advisable to use parentheses for clarity.

Product specific

The various database management systems partly deviate from the SQL standard or have other variants for the formulation of the external network. The following links point to the join documentation for some of the popular products:

  • IBM Db2
    • IBM Db2NATURAL JOIN does not support the variant .
  • Microsoft SQL Server
    • The SQL dialect for Microsoft SQL Server is called Transact-SQL .
    • Transact SQL does not support the forms NATURAL JOINand JOIN ... USING ...joins, so there is only the variant ONwith which, however, all tasks can be managed.
  • MySQL
    • MySQL supports all forms of join according to SQL-92.
    • MySQL has STRAIGHT JOINa special variant of the join. Typically, the query optimizer of a database management system decides how the federation is actually carried out. With STRAIGHT JOINthe MySQL optimizer the order in which the join should be executed is given.
    • MySQL does FULL [OUTER] JOINn't support that . This form of the outer compound can be created LEFT/RIGHT OUTER JOINtogether with the operator by a construction from UNION.
  • Oracle
    • Oracle also has a proprietary syntax for the outer join, only in 2001 with version 9 the syntax of SQL-92 for the outer join was introduced.
    • Today, Oracle recommends the SQL standard syntax for the outer join.
  • PostgreSQL
  • SQLite
    • SQLite only supports the LEFT OUTER JOIN. The two other forms of the outer compound can be generated by the LEFT OUTER JOINtogether with the operator UNION.

See also

literature

  • CJ Date with Hugh Darwen: A Guide to the SQL Standard . 4th edition. Addison-Wesley, 1997.
  • Wolfgang Panny with Alfred Taudes: Introduction to the core language of SQL-99 . Springer, 2000.
  • Gunter Saake, Kai-Uwe Sattler and Andreas Heuer: Databases: Concepts and Languages . 4th edition. mitp, 2010, p. 221 f. and 339 f .
  • Alfons Kemper and André Eckler: Database Systems: An Introduction . 6th edition. Oldenbourg, 2006, p. 125 f .

Remarks

  1. The syntax of SQL92 emphasizes CROSS JOINthe intention that the Cartesian product is to be formed through the keywords .
  2. There are many articles on the Internet about the network, in which it is illustrated by Venn diagrams for set operations. In these articles, the inner join is often formed from two tables that have the same schema and only in this special case is the natural join actually identical to the intersection of the two tables. In general, however, this is not the case. In our example, you can admire the intersection of the two tables Mitarbeiterand Abteilungnot form because the records of the two tables have different structures and are not comparable. The visualization of the compound operators using Venn diagrams is generally incorrect and therefore misleading.

Individual evidence

  1. CJDate with Hugh Darwen: A Guide to the SQL Standard, Fourth Edition, Addison-Wesley, 1997, pp. 135ff
  2. a b The various types of compound are described in the SQL standard SQL: 2011 in Part 2 Foundation Section 7.7.
  3. ^ Gunther Saake, Kai-Uwe Sattler and Andreas Heuer: Databases: Concepts and Languages . mitp, S. 339 .
  4. ^ Gunther Saake, Kai-Uwe Sattler and Andreas Heuer: Databases: Concepts and Languages . mitp, S. 220 f .