Entity Relationship Model

The Entity-Relationship-Model - short ER-Modell or ERM ; German as much as: Model (for the representation) of things, objects, objects (= 'entities') and the relationships / interrelationships between these (= 'relationship') - is used in the context of semantic data modeling in a given context ( e.g. a project to create an information system) to determine and display a relevant section of the real world. The ER model essentially consists of a graphic ( ER diagram , abbreviation ERD) and a description of the elements used in it.

An ER model serves both in the conceptual phase of application development to facilitate understanding between users and developers (only the what is dealt with, i.e. technical and factual conditions, not the how , e.g. the technology) and in the implementation phase as a basis for the design of the - mostly relational - database .

The use of ER models is the de facto standard for data modeling , even if there are different graphical forms of representation for data models .

The ER model was introduced by Peter Chen in his 1976 publication The Entity-Relationship Model . The descriptive means for generalization and aggregation were introduced in 1977 by John M. Smith and Diane CP Smith. After that there were several further developments, for example in the late 1980s by Wong and Katz.

Terms

Simple examples of ERDs based on the Chen notation

Entity-relationship models are based on the typification of objects, their relationships to one another and the information to be kept about them (“attributes”).

Basic components

In discussions, examples and conceptual texts, reference is made to objects and conditions in the real world (in the context of observation); these are called:

Entity : individually identifiable object of reality; z. B. the employee Miller, the project 3232
Relationship: connection / connection between two or more entities; z. B. Employee Miller leads Project 3232.
Attribute: What is of interest about an entity (in context); z. B. the entry date of the employee miller.

In the context of the modeling, similar types are formed from the aforementioned facts and precisely defined and described in the model . These types differ according to:

Entity type: Typing similar entities e.g. B. Employee, project, book, author, publisher
Relationship type: typification of similar relationships; z. B. Employee leads project
Attribute : Typing of similar properties, e.g. B. Last name, first name and entry date in the entity type Employee. The attribute or the combination of attributes by means of the values of which entities can be clearly identified, d. H. identify these are called identifying attribute (s); For example, the project number attribute in the Project entity type is an identifying attribute.

Special circumstances

ER modeling knows the following constructs for describing and representing particular facts:

Strong entity type: An entity can be identified by one or more values of attributes of the same entity type; so is z. B. Identifying the order number for the order entity type.
Weak Entity Type : To identify such an entity, an attribute value of another entity of strong type related to the weak entity is required; so is z. For example, in order to identify the weak entity type “room”, in addition to the room number, it is also necessary to specify a building of another strong object type “building”. In extensions of the ER model, such as the SERM , the weak entity type and the associated relationship type are merged into a so-called ER type, which makes diagrams more compact.
Cardinality : The cardinality defines (at the relationship type level) for each of the entity types involved in how many specific relationships (of this type) its entities can or must be involved. Various forms of notation have been developed to represent cardinality , of which modeling tools usually support a specific one.
Reflexive (self-referential) relationship: relationship between individual entities of one and the same entity type, thus a relationship type between the same entity type (for example the tree structure of an organizational structure by "organizational unit is divided into organizational unit" and the network structure of a parts list by "part is used in part") . Synonym: recursive relationship.
Relationship Type Degree or Complexity: Number of entity types involved in a relationship type. The rule is grade 2 (binary relationship type); seldom grade 3 (ternary relationship type) or a higher grade occurs. Ternary and higher-level relationship types can be roughly reduced to binary relationship types by introducing a new entity type (which corresponds to the original relationship type). Example: Employee looks after the supplier (for product group) ; New entity type supplier support with relationships to the three original entity types. However, such an approach can be lossy, i. H. there are facts that can only be exactly represented by multi-digit relationship types.
Relationship attributes: Relationship types usually have no attributes, as they only connect the entity types involved. However, if additional attributes are required, then - as with higher-level relationships - an independent entity type with relationship types to the originally involved entity types can be created from the relationship type. The attribute is then assigned to the new (weak) entity type ( e.g. degree of project participation in the relationship type employee works on project ). Depending on the modeling method used, “attributive” relationships can also be formulated, but the formation of new entity types is often used as a substitute.

Relationships with special semantics

The meaning of the relationship types between entity types in terms of content is expressed in the ER diagram only by a short text in the rhombus (usually a verb) or as a label on the edge, whereby the modeler is free to choose which name to assign. Now there are relationships with special semantics that occur relatively frequently in modeling. For this reason, special identifiers and graphic symbols have been defined for these relationship types. Specialization and generalization as well as aggregation and decomposition are additional means of description with special semantics. With these two special relationship types, the real world conditions can be modeled and represented more precisely and according to their actual meaning. Permanently defined names and special graphic symbols show that these are semantically pre-assigned relationships with special rules.

The entity and relationship types that are specifically modeled in this way , mostly only in semantic data models, can be implemented in different ways in terms of database technology, for example (identical to the model) as separate tables or in shared tables with comments or attribute designations that characterize the special relationship. The implementation decision about this is made (as well as the determination of the cardinality for these special relationships) in the activities of the database modeling .

Specialization and generalization using an is-a relationship

In the case of specialization , an entity type is recognized and declared as a subset of another entity type, with the specialized entity set being distinguished from the higher-level, generalized set by special properties (attributes and / or relationships that apply only to it). Since an individual object of the specialized set and the generalized set are the same individual object, all properties - in particular the identification - and all relationships of the generalized individual object also apply to the specialized individual object.

Relationship types of the type “specialization / generalization” are described by is-a / can-be (“is a” / “can be a…”). For is-a , a-kind-of ("a kind ...") is sometimes used. These are 1: c relationships.

Example of the is-a relationship:

Air travel is a journey

and in another reading direction:

Travel can-be air travel,

with properties such as travel date, travel price (for travel) and relationships to the flight entity type (for flight).

The is-a relationship described here (between identical individual objects) must not be confused with the is-element-of relationship (the association of an individual object with another), for which the notation is-a is sometimes used, e.g. . B. Flight is a flight (which would be semantically wrong).

If necessary, several specialized entity types can also be declared for a specialization . It must be determined whether individual objects of the generalized entity type may be missing from the specializations and whether they can only occur alternatively in a specialized entity set or in several specialized entity sets at the same time. Example: The customer is a private customer or a corporate customer; one of the relationships must exist.

Generalization / specialization result from the modeling process

While specializations arise through the formation of partial entity sets from given entities, common properties and relationships that occur in different entity types are combined into a new entity type during generalization . So z. B. Customers and suppliers can be brought together in addition to business partners, since name, address, bank details etc. occur both with the customer and with the supplier.

In this example, the resulting generalization relationship type is based on the business partner and leads to the two entity types customer and supplier. Whether the relationship can or must only occur in specific individual cases for entities from only one of the two or from both entity types is to be determined by the cardinality.

The above distinction between specialization and generalization results only from the order in which entity types were identified during modeling; As a result, there are always relationship types that are specialization in one direction and generalization in the other. If necessary, several specializations / generalizations can occur for the same entity type . Example: Employees can be specialized to external employees or internal employees ( disjunctive ) and additionally to "senior employees". Specialized entity types can also be specialized / generalized again (continued, cascaded).

The visual representation of specializations and generalizations is not provided for in the original ERM diagram, but is used in extensions such as B. the SERM is used.

Aggregation and decomposition using an is-part-of relationship

If several individual objects (e.g. person and hotel) are combined to form an independent individual object (e.g. reservation), this is called aggregation. The superordinate, independent whole is called the aggregate; the parts that make it up are called components. The aggregate and components are declared as an entity type.

In the case of aggregation / decomposition, a distinction is made between role and quantity aggregation:

A role aggregation exists when there are several role-specific components, these are combined into one aggregate and there is a 1: c relationship.

Example of an is-part-of relationship:

Soccer team is part of soccer game and venue is part of soccer game

and in another reading direction:

Football game consists of a football team and a venue.

A quantity aggregation exists when the aggregate is created by combining individual objects from exactly one component. There is a 1: cN relationship here.

Example of quantity aggregation:

Soccer player is part of soccer team

and in another reading direction:

Football team consists of (several, N) football players.

Contents of the ER model

ER diagrams

The graphical representation of entity and relationship types (representative and derived by typing from the entities and relationships identified in the given context) is called an entity relationship diagram (ERD) or ER diagram . This is an overview / graphic of all relevant entities and their relationships. U. a complex, net-like structure is created. In the case of very large models, i. d. R. Partial models (excerpts from the overall model) shown. Colloquially, ERDs are z. Sometimes called "data model" for simplicity; in the broader sense, however, this also includes the textual descriptions.

Notation forms in ER diagrams:

There are different forms of representation in use. Entity types are mostly represented as rectangles, relationship types as connecting lines with different line ends or labels that represent the cardinality of the relationships.

Today there are a large number of different notations that differ, among other things, in terms of clarity, scope of the graphic language, support from standards and tools. In the following there are some important examples that make it clear that the core statements of the ER diagrams are almost identical for all graphical differences.

Of particular - in part historical - importance are among others:

the Chen notation by Peter Chen , the developer of ER diagrams, 1976; extended by the representation of attributes through the modified Chen notation (MC notation)
the IDEF1X as a long-standing de facto standard for US authorities;
The Bachman notation of Charles Bachman as a widespread tool Diagram language;
the Martin notation (crow's foot notation) as a widely used tool diagram language ( information engineering );
the (min, max) notation by Jean-Raymond Abrial, 1974.
UML as a standard that even ISO uses as a replacement for ER diagrams in its own standards. Attributes (not visible in the diagram) can be represented as class attributes; Relationship attributes, on the other hand, are modeled with the help of association classes.

All adjacent notations express the following in their own way:

A person is born in one (1) place. A place is the birthplace of any number of (N) people.
Whether anyone point to a birth must (it possibly would be the place Unknown ') and / or whether there may be places where (acc. Dataset) no person was born, not in the Chen notation and the other forms of notation with different symbols.

The (min, max) notation is fundamentally different from the other forms of notation in terms of determining the cardinality and the position at which the frequency is specified in the ER diagram. In all other notations, the cardinality of a relationship type is determined by asking for an entity of one entity type about the number of possible participating entities of the other entity type. In the case of the min-max notation, however, the cardinality is defined differently. For each of the entity types involved in a relationship type, the question is asked about the smallest and largest possible number of relationships in which an entity of the respective entity type is involved. The respective min-max result is noted for the entity type for which the question was asked.

The numerical difference between min-max notation and all other notations only becomes apparent in ternary and higher-level relationship types. In the case of binary relationship types, the difference can only be seen in an exchange of the cardinality information.

Cardinality notations with n without min-max specification harbor a semantic deficit. Because they do not specify whether the value n includes 0 or not, so the relationship can occur optionally. Whether z. For example, in the case of a 1: n relationship between employee and project, a project, even if it is only allowed to be temporarily without management employees, remains open - and must be explicitly defined verbally.

Description of the model components

While the ER diagram shows the entities relevant in the context and their relationships (on the type level), details are recorded using their own descriptions. The documentation serves the purpose of being able to understand and communicate the developed facts uniformly and clearly (uniform terms!) And to provide the information possible from a conceptual point of view for the subsequent project phases of the implementation.

Examples of possible content:

For entities: name, short name, definition, example (s), further explanations, estimated quantities, new or already available, ...
For relationships: short name, entity types involved, relationship statement 1 ("MA leads project"), relationship statement 2 (in the opposite direction), cardinality, possibly other conditions for the relationship ("only for private individuals"), ...
For attributes: name, short name, definition, example, further explanations, information format (e.g. number, 2 decimal places), value range (from 1 to 99), identifying for entity (yes / no / partially), ...

Specifically, the content is determined by the modeling tools used or organization-specifically (e.g. using document templates). If there are objects in the ER model that already exist in the organization, these are usually used in their existing form (copies, ...). Conversely, new objects from the ERM are included in the company's central data model after the end of the project.

Use of the ERM in database design

The ER model is often used in connection with the design of databases. Here, expanding the semantic ERM or using it as a copy basis, a new ER model is generated and this is expanded in such a way that it forms the basis for the implementation of the database. The implementation of the data facts recognized (and modeled) in the real world in a database scheme takes place in several steps:

Recognizing and combining entities into entity types through abstraction (e.g. colleagues Fritz Maier and Paul Lehmann and many others on the entity type employee );
Recognizing and combining relationships between two objects to form a relationship type (example: the employee Paul Lehmann heads the project improvement of the working atmosphere , the employee Fritz Maier heads the project to increase efficiency in administration . These findings lead to the relationship type "employee leads project".) ;
Determination of the cardinalities , d. H. the frequency of occurrence (e.g. a project is always managed by exactly one employee, an employee is allowed to manage several projects).

These steps can be represented in an ER model according to the examples shown above.

The following steps are also necessary, the result of which is often not shown graphically, but only added as a descriptive text:

Determination and detailed description of the relevant attributes of the individual entity types - such as field length, value ranges, mandatory fields, etc.
Determining suitable attributes of an entity type as identifying attribute (s) , so-called key attributes . If necessary, artificial keys must be defined.

Definition of further details for the implementation of relationship types - such as mandatory relationship, foreign key or relationship table, referential integrity.

Generation of the schema of a relational database with all its table and associated field definitions with their respective data types .

Depending on the modeling tools used - and specifications for the project methodology - a distinction does not always have to be made between ERM and database model. This can e.g. This may be the case, for example, with small database projects or with database tasks where the database design is created using end-user databases (e.g. Microsoft Access ) and the documentation including ERD is supported by functions of the same system.

It is also possible (depending on the tool) to transfer model contents for the conception of the database to another tool and to process it further there. In this case in particular, the consistency of the two design levels should be ensured.

Transfer to a relational model

The transfer of an entity relationship model into the relationship model is essentially based on the following figures:

Entity type → relation
Relationship category → foreign key; in the case of an n: m relationship type → additional relationship
Attribute → Attribute.

The exact transfer, which can be automated, takes place in 7 steps:

Strong entity types: For each strong entity type a relation R is created with the attributes with k as the primary key and as attributes of the entity. ${\ displaystyle R = \ lbrace a_ {1}, a_ {2}, \ ldots, a_ {n} \ rbrace \ cup \ lbrace {k} \ rbrace}$ ${\ displaystyle a_ {1}, a_ {2}, \ ldots, a_ {n}}$

Weak entity types: For each weak entity type a relation R is created with the attributes with the foreign key k and the primary key , where the weak entity type and k identify the strong entity type. ${\ displaystyle R = \ lbrace a_ {1}, a_ {2}, \ ldots, a_ {n} \ rbrace \ cup \ lbrace k \ rbrace}$ ${\ displaystyle \ lbrace k \ rbrace \ cup \ lbrace a_ {x} \ rbrace}$ ${\ displaystyle \ lbrace a_ {x} \ rbrace}$

1: 1 relationship types: For a 1: 1 relationship type of the entity types T , S , one of the two relations is extended by the foreign key for the other relation.

1: N relationship types: For the 1: N relationship type of entity T , S is the incoming with the cardinality N (or 1 in min-max notation) Relation T by the foreign key of the relation S expanded.

N: M relationship types: For every N: M relationship type is a new relation R with the attributes with the attributes of the relationship and or created for the primary key of the relations involved.

{\ displaystyle R = \ lbrace a_ {1}, a_ {2}, \ ldots, a_ {n} \ rbrace \ cup \ lbrace k_ {T} \ rbrace \ cup \ lbrace k_ {S} \ rbrace}

{\ displaystyle \ lbrace a_ {1}, a_ {2}, \ ldots, a_ {n} \ rbrace}

{\ displaystyle k_ {T}}

{\ displaystyle k_ {S}}

Multivalued attributes: For each multivalued attribute in T is a relation R with the attributes , with as multi-valued attribute and k as a foreign key to T created. ${\ displaystyle R = \ lbrace k \ rbrace \ cup \ lbrace a_ {x} \ rbrace}$ ${\ displaystyle \ lbrace a_ {x} \ rbrace}$

n-ary relationship types: For each relationship type with a degree , a relation R is created with the attributes with as a foreign key to the incoming entity types and as attributes of the relationship type. If all entity types involved are entered with cardinality , the primary key is the set of all foreign keys. In all other cases the primary key comprises foreign keys, whereby the foreign keys for entity types with cardinality must always be contained in the primary key. ${\ displaystyle n> 2}$ ${\ displaystyle R = \ lbrace k_ {1}, k_ {2}, \ ldots, k_ {n} \ rbrace \ cup \ lbrace a_ {1}, a_ {2}, \ ldots a_ {m} \ rbrace}$ ${\ displaystyle \ lbrace k_ {1}, k_ {2}, \ ldots, k_ {n} \ rbrace}$ ${\ displaystyle \ lbrace a_ {1}, a_ {2}, \ ldots a_ {m} \ rbrace}$ ${\ displaystyle> 1}$ ${\ displaystyle n-1}$ ${\ displaystyle> 1}$

literature

Peter Pin-Shan Chen : The Entity-relationship Model — Toward a Unified View of Data . In: ACM Trans. Database Syst. tape 1 , no. 1 , March 1976, p. 9-36 , doi : 10.1145 / 320434.320440 .
Peter Pin-Shan Chen: Entity-Relationship Modeling - Historical Events, Future Trends, and Lessons Learned (PDF file; 417 kB) . In: M. Broy, E. Denert (Ed.): Software Pioneers: Contributions to Software Engineering. Springer-Verlag, Berlin 2002, ISBN 3-540-43081-4 , pp. 296-310.
John Miles Smith, Diane CP Smith: Database Abstractions: Aggregation and Generalization . In: ACM Transactions on Database Systems . tape 2 , no. 2 , June 1977, p. 105-133 , doi : 10.1145 / 320544.320546 .
John Miles Smith, Diane CP Smith: Database Abstractions: Aggregation . In: Communications of the ACM . tape 20 , no. 6 , June 1977, p. 405-413 , doi : 10.1145 / 359605.359620 .
Ramez Elmasri, Shamkant B. Navathe: Fundamentals of Database Systems . 5th edition. Addison-Wesley, 2006, ISBN 0-321-36957-2 .

Web links

Commons : Entity-Relationship-Models - collection of images, videos and audio files

Instructional videos on entity relationship modeling , Big Data Analytics Group, Saarland University

Instructional video on entity relationship diagrams in business processes, research group wi-mobile, University of Augsburg

Entity Relationship Model

Terms

Basic components

Special circumstances

Relationships with special semantics

Specialization and generalization using an is-a relationship

Aggregation and decomposition using an is-part-of relationship

Contents of the ER model

ER diagrams

Description of the model components

Use of the ERM in database design

Transfer to a relational model

See also

literature

Web links