Relational database

from Wikipedia, the free encyclopedia

A relational database is a digital database that is used for electronic data management in computer systems and is based on a table-based relational database model . The basis of the concept of relational databases is the relation. It is a mathematical description of a table and is a well-defined term in a mathematical sense ; see database relation . Operations on these relations are determined by relational algebra .

The associated database management system is referred to as a relational database management system or RDBMS (Relational Database Management System). To query and manipulate the data, the database language SQL (Structured Query Language) is mainly used, the theoretical basis of which is relational algebra.

The relational database model was proposed by Edgar F. Codd for the first time in 1970 and is still an established standard for databases despite some criticisms .

Basic concepts

Relational Database Terms

A relational database can be thought of as a collection of tables (the relations) in which data records are stored. Each row ( tuple ) in a table is a data record ( record ). Each tuple consists of a series of attribute values ​​( attributes = properties), the columns of the table. The relation scheme defines the number and type of attributes for a relation. The picture illustrates the relation  R with attributes A 1 to A n in the columns.

For example, a book in a library is described by the record (book ID, author, publisher, year of publication, title, date of inclusion) . A data record must be clearly identifiable. This is done via one or more keys ( English key ). In this case, Book ID contains the keys. A key must never change. It relates to the record and not to the position in the table.

Example of a relation "book":
Book ID author publishing company Publishing year title date
1 Hans prolific writer Sample publisher 2007 We learn SQL 13/01/2007
2 J. Gutenberg Gutenberg and Co. 1452 Printing made easy 01/01/1452
3 GI Caesar Handwriting publisher −44 My life with Asterix 16.03. − 44
5 Galileo Galilei Inquisition International 1640 Eppur si muove 1641
6th Charles Darwin Vatican Publishing House 1860 Adam and Eve 1862

Relationships between tables

Links can also be used to express the relationships between tables. A library database could be implemented with three tables as follows:

Book table , which contains a row for each book:

  • Each line consists of the columns of the table (attributes): book ID, author, publisher, year of publication, title, date of recording .
  • The book ID serves as the key , as it uniquely identifies each book.

User table , which contains the data of all registered library users :

  • The attributes would be for example: user ID, first name, last name.
Relation "user"
User ID First name Surname
10 Hans Frequent readers
11 Jens Middle reader
12 Eric Little reader
"Borrowed" relation
User ID Book ID
10 1
10 2
10 3
12 5
12 6th

You also need a third table, Borrowed , which contains information about the availability of the book. It would contain the attributes User ID and Book ID . Each line of this Borrowed table assigns a book ID to a user ID.

The entry (10,3) would mean that the user with ID 10 (“Hans Vielleser”) has borrowed the book with ID 3 (“My life with Asterix”). The same user also borrowed the book “Printing Made Easy”, which is evidenced by the table entry (10,2) . The set of attributes (user ID, book ID) is used as the key here . At the same time, the user ID connects each entry in the Borrowed table with an entry in the Users table , and the book ID connects each borrowed entry with an entry in the Book table . Therefore, these attributes mean in this context, foreign key (Engl. Foreign key ). Tables without a foreign key are called flat tables .

The term relation used here does not describe the relationship between entities (as in the entity relationship model ), but the relationship between the attributes and the name of the relation. In the above example, Hans is the first name (attribute) of the user (relation name) . In addition, relation is generally used as a synonym for table in relational databases (mostly arising from the entity type in the ERM).

Demarcation

In addition to the relational database model, there are various alternative concepts that allow data to be managed in other structures. These concepts are often of little importance or have not yet caught on. Nevertheless, they offer a simpler connection to the data to be managed for certain applications. So-called NoSQL has become increasingly popular in recent years .

Older approaches

In the 1960s and 1970s, hierarchical database systems and network database systems were used for corporate data processing . With these, the data or table structure is defined in the design phase and cannot be varied during the query. They are still used today in special cases.

Object-oriented databases

With the advent of object-oriented programming languages , object databases were increasingly offered. This means that objects from OO languages such as Java can be kept directly in the database - it is then no longer necessary to map the objects to the relational table structure, the object-relational mapping . This approach has advantages over relational design if you want to save complex data objects that are difficult to map onto the flat relational table structures. Object databases, however, still have disadvantages compared to relational databases when processing large amounts of data. This is caused, for example, by access paths to objects over several types of path (e.g. inheritance and association). This leads to exponential complexity in lock management write operations and thus poor performance. The performance problems were taken up in the object-relational databases, in which only the constructs from object-oriented databases with lower complexity (e.g. ) were adopted.

Object relational databases

Some vendors add object-oriented properties to their relational databases and then call them object-relational databases . However, these are not intended for the direct mapping of objects of the programming language - they only use the concept of inheritance when defining and querying tables with similar field structures and thus simplify their handling. The SQL-99 standard was expanded to include object-relational language elements.

Semi-structured databases

The semi-structured databases are newer concepts . They differ from conventional database models in that they do not have a fixed schema. The database is structured hierarchically, tree-like and each database unit ( English entity ) of the same type can have different sets of attributes .

Typical representatives of this type are XML databases , which manage the data as XML fragments. The XML data are hierarchically organized and can contain any structures as long as they are well-formed according to the XML definition. The data can be queried via XQuery or XPath . Today, proprietary language extensions are used for manipulation. The disadvantage of current XML databases is their lower performance compared to relational systems.

Semi-structured databases can also be implemented with relational DB via extensions or server programming, but the relation model is no longer used.

Relational Database Theory

The foundations of the relational database theory were laid by Edgar F. Codd in the 1960s and 1970s and described in his paper A Relational Model of Data for Large Shared Data Banks . In theory, all operations are based on relational algebra.

Understanding relational algebra

The relational algebra is an algebraic model that describes data as stored can be queried and manipulated. The main operations from which all further operations can be derived are the following:

All queries that are made to a relational database using SQL are mapped to these operators by the database management system, i.e. translated. In practice there are other operators, such as the join operator , which, however, also only represents a combination of cross product, selection and projection.

Limitations of Relational Algebra

Relational algebra does not support the computation of recursive queries ( transitive hulls ). This means, for example, that it is not possible to calculate all ancestors of a person in one query if these are stored in a Person relation and are connected to the respective ancestor in person via an AncestorVon relation . The ancestors can only be determined through a series of queries.

With the introduction of SQL-99, however, an extended relational algebra was also introduced, which allows an operation to compute the transitive envelope.

Database schema and modeling

An ER diagram in Chen notation

An important part of a relational database is its schema . The schema defines what data is stored in the database and how this data relates to one another. The process of creating a schema is called data modeling .

The entity relationship model is also used to model relational databases . It is used to design a conceptual schema that can be implemented using a database management system (DBMS). This step is referred to as logical design or data model mapping and results in a database schema in the implementation data model of the DBMS.

An important step in the modeling process is normalization . This is intended to reduce redundancies and prevent anomalies in order to simplify the maintenance of a database and to ensure the consistency of the data. Edgar F. Codd has proposed four normal forms that have since been used in the relational database design and have been supplemented by more.

Criticism of the relational database model

segmentation
In the relational representation, an object is stored segmented on many different relationships. The application objects are usually complex , that is, they themselves consist of objects or lists of objects. Since the relational model only knows sets of tuples that consist of values , complex application objects have to be restored from the individual relations by means of numerous joins when a query is made by the DBMS . This can lead to confusing queries, which have to be checked with every structural change to the application object for any adjustments required. The use of joins, which must be supported by well-fitting database indices , makes object access more complex than z. B. in an object database , both in terms of resource requirements and development effort.
Artificial key attributes
In some cases, artificial keys must be used to uniquely identify tuples . This is used e.g. For example, to reduce the size of the key if it is to be used as a foreign key, or to implement belongs-to-relationships. Attributes are included in the relation that have nothing to do with the abstract description of an application object, but are "only" administrative information.
External programming interface
Since in many relational databases only data manipulation languages ​​of limited power are available, interfaces to more powerful programming languages ​​are usually necessary. This connection may lead to an inconvenient handling, e.g. B. if the set-oriented SQL is to be processed in the set-oriented C ++ , see Object-relational impedance mismatch .
However, there are also relational databases with powerful programming languages, such as PL / SQL in Oracle or PL / pgSQL in PostgreSQL or T / SQL in Microsoft SQL Server ; In both database systems, the respective data manipulation languages ​​allow the integration of other programming languages. PL / SQL enables e.g. B. the use of Java programs or C ++ programs within PL / SQL programs. PL / pgSQL in turn enables server programming with other languages ​​such as PHP , Tcl or Python .
Object properties and behavior often cannot be mapped
The typical application behavior of an object cannot be described in the relational database. This description can therefore only be made outside of the database in application software. If several applications use the same database, this can lead to a redundant implementation.

The collective term NoSQL denotes non-relational database models that intend to solve problems such as those mentioned by alternative approaches.

Comparison of basic terms

Relational database Relations model Entity Relationship Model (ERM) Unified Modeling Language (UML)
Range of values ​​(domain, domain)
Header Relation type / format Entity type Class, object type
column attribute attribute attribute
table relation Entity set (-set) Object set, instance set, class
Column heading (s) Foreign key functional relationship association
row Tuple entity Object , instance
cell Attribute value

See also

literature

  • Edgar F. Codd: The Relational Model for Database Management. Version 2. Addison-Wesley, Reading et al. a. 1990, ISBN 0-201-14192-2 .
  • Alfons Kemper, André Eickler: Database systems. An introduction. R. Oldenbourg Verlag, Munich 2004, ISBN 3-486-27392-2 .
  • Andreas Meier: Relational and Post-Relational Databases . 7th edition. Springer-Verlag, Heidelberg 2010, ISBN 978-3-642-05255-2 ( online ).

Web links

Individual evidence

  1. ^ Edgar F. Codd : A Relational Model of Data for Large Shared Data Banks . In: Communications of the ACM . ACM Press, June 13, 1970, ISSN  0001-0782 , pp. 377-387 .