Data modeling

In computer science, data modeling is a process for the formal mapping of the objects relevant in a defined context by means of their attributes and relationships. The main goal is the clear definition and specification of the objects to be managed in an information system, their attributes required for information purposes and the relationships between the information objects in order to be able to obtain an overview of the data view of the information system. (see Ferstl / Sinz 2006, p. 131).

The results are data models which, running through several modeling stages, ultimately lead to usable databases or data stocks.

Data models usually have a much longer lifespan than functions and processes and thus software. The following applies: " Data is stable - functions are not " (" Data are stable, functions are not"). Data modeling is also outside projects for application development are operated to represent certain facts. For example, data or other conditions of a specific company area, department, business process (up to the entire company) can be recorded and documented with their interrelationships. Uniform terms can also be defined with such measures.

Procedure

Data modeling, as an essential sub-discipline of software development , takes place over different project phases . The activities are procedural, i. H. there are goals / purposes, activities and results which, building on one another, lead via intermediate results to ultimately final results. Based on the ANSI SPARC architecture - based on certain milestones in the project - the following model variants are essentially created:

From the technical draft to the database

Conceptual database scheme : Based on the consideration of a section of the real world, the relevant objects with all relevant properties and the relevant relationships between them are collected, analyzed and formulated graphically and textually. The basis for this are specifications or statements on the given task (= context), which, if necessary, can be clarified through discussion with the client.
Logical database schema : The conceptual database schema is mapped onto a logical database schema. The model is expanded to include technical data (e.g. field formats, identifying search terms). The logical database schema obeys the rules of astructure givenby the DBMS to be used, e.g. B. the relational data model in which all data is stored in tables.
Physical database schema : To implement the data model with a specific database system (DBMS), all information must beformulatedin theDBMS syntax for databasegeneration. In some cases this is possibleautomatically or semi-automaticallyusing generators .

With these three model levels and the procedure for them, only a basic approach is outlined. In detail this procedure, the (interim) results and which are names of the models of the frequently used company-specific process models designed and used by the modeling methodology and software. Examples:

When using the later DBMS as a modeling tool, the model boundaries are fluid; the models gradually develop to the finished database.

For databases that are not to be managed under a DBMS, only a " copy path " is created (quasi as a database schema replacement) with which the data structure definitions can be integrated into programs and thus used.

Data modeling generally only includes data that belongs to the technical and content-related purpose of the systems, but not those that belong to the software in the narrower sense , e.g. B. configuration data , parameter data . As a prerequisite for technical operation , the latter are installed directly in suitable data storage formats.

Activities per data model level (examples):

To explain the procedure for data modeling, some activities are listed below as examples that can be focal points in the context of the respective level. The examples are based on modeling with the entity-relationship method and the use of relational databases .

About the conceptual database schema:

Identifying the relevant information needs ( attributes )

Thereby: Identifying entity types and relationship types
Mapping the attributes to entity types
Determination of possible attribute values, suggestions for identifying attributes
Determine the relationship cardinality
Technical description of the entity and relationship types and the attributes

To the logical database schema:

Methodical review of the professionally modeled approaches (e.g. through normalization )

Thereby: Forming new entity types, e.g. B. through specialization / generalization

Decision: With which data management system (s) ( DBMS , others) should the data be managed?

Transferring the ER model into a relation model
Define the identifying key
Specifications for the technical implementation of relationships: foreign keys, relationship tables
Specifying extended options for direct access ( secondary key )
Specifications for referential integrity
Extension of the database model in connection with history and version management, multi-tenancy etc.
Supplementing the model with lookup tables , parameter tables , etc.

About the physical database schema:

Set optimization options for data access (e.g. through index definitions)
Formulating the scripts / commands for setting up and configuring the database (in the syntax of the DBMS )
Specifications for data backup

Methods

There are u. a. the following data modeling methods, some of which are combined with one another:

Bottom Up : Collection of individual attributes, recognition of potential keys, grouping into object types, formation of relationships (special form: synthesis algorithm )
Top Down : Recognizing object types, forming relationships, recognizing elementary attributes
Generalization and specialization of object types in terms of inheritance
Re-engineering of existing schemes
Setting up tables as a relation model and normalization
Analysis of existing lists, issues, evaluations, etc.

The result of the data modeling are data models , which are available in the form of the Entity Relationship Model (ERM) - and ultimately operational databases. An ERM consists of an Entity-Relationship-Diagram (ERD), for example according to UML or IDEF1X , and a textual description of the model and its components.

Design patterns: As in other design processes in computer science, design patterns that are available for a number of subject areas also play a major role in data modeling . These include historicizing , multilingual , multi-tenancy , but also part of models such as addresses, organizational structures, roles and rights structures, etc. Also prefabricated whole data models, such as for the financial sector, can serve as a design source. The most common patterns are listed by Fowler , Hay, and Silverston .

Metamodeling: An important area for the use of design patterns is metamodeling . Moriarty calls this modeling dynamic modeling . In the case of a metamodel, in contrast to the concrete data model, the data content also forms a relevant part of the data model.

Different terms for similar issues: In the practical use of data modeling, uniform terms are not always used. In part, this is justified in relation to the method, in part 'historically grown' in the respective organizations (and not always methodologically correct), in part terms from different modeling levels are mixed up. Examples are:

For model graphics: ER diagram, class diagram, data model, information structure, information map
for entities: entity, object, information object, class, table, line
for relationships: relation, foreign key
for attribute values: property, field, data field, attribute, column.

As can be seen from this, z. Sometimes the instance terms (entity, relationship) are used instead of type terms ( entity type ... ) or the terms from the database implementation (table ...) are already used. Different terms are also used when participants from different companies or from different departments (specialist department, programming) communicate. In the interest of efficient communication and to avoid misunderstandings, efforts should be made to use correct and uniform terms.

Support through software tools

As with all software development processes, data modeling is carried out using certain tools. In project practice, very different approaches can be observed in this regard, which are outlined in the following examples:

Only standard software for graphics (for ERDs) and for word processing (for the description of components) is used. In practice, only free text is recorded, possibly supported by sample forms; Quality assurance can hardly be automated; no focus on the specific task; not recommendable.
Simple special applications in which graphic symbols and the descriptions are related. Example: Double-clicking on the entity opens its description; Designations are identical in graphics and texts; foreign terms can be referred to via a link.
The application has a meta-model that defines which information details can / must be recorded. The tool checks the possible entries and certain relationships. E.g .: Only existing attributes can be assigned to an entity.
Data dictionary : The developed components are managed as data objects and can be used in several projects. Only excerpts are referenced per project, extensions / changes / deletions are possible on a project-specific basis, etc.
Further performance components of DM tools, listed as examples, can be: version concept, documentation and evaluation functions, multi-user and multi-project capability, multi-client capability, authorization and security concept.

The following examples can be considered particularly highly integrated:

Universal specification tool: The model contents modeled for the data are also used (referenced) by the tools with which functional specifications are created. The data constructs can be called up via a where-used list (field XYZ appears in formula ABC, evaluation CDE, ...).
" Active data dictionary ": The model contents are not only used in the project, but also in the finished application - e.g. B. for displaying field names, carrying out plausibility checks.

The degree of integration of the tools can therefore be very different. It largely determines the quality of the modeling processes, especially their efficiency.

Examples

Figure 1: Semantic model ( ERD ) of order management

Figure 2: Database schema for the same application

Examples of data models are:

Product, customer, order and invoice as "object types" (entity types) in an order processing system to be created or procured by a medium-sized trading company from the point of view of sales. The model of this reality excerpt can be used to specify the functional requirements for the system.

The metamodel of the thesaurus used in a research area, i.e. the specific terminology with its synonyma and subordinate and generic terms as well as related terms as a reference work for researchers working in this area. A topic map , for example, can be used to represent the resulting data model . The metamodel for this thesaurus can be used to create a database (possibly including IT application) to record the terms mentioned.

The semantic data model for a project management application for order management - as shown in Figure 1.

The database schema as a graphic from the implementation tool MS Access for the same project management application - graphic 2 with implementation-related extensions or deviations from the semantic model. A database model as an intermediate stage was not created separately here.

It becomes clear that the relevance of the segment of reality is determined by the respective context and the specific purpose.

literature

Otto K. Ferstl, Elmar J. Sinz: Basics of business informatics. 5th edition. Oldenbourg, Munich 2006, ISBN 3-486-57942-8 .
Andreas Gadatsch: Data modeling for beginners. Introduction to entity relationship modeling and the relationship model. Springer Vieweg, Wiesbaden 2017, ISBN 978-3-658-19068-2 .
Graeme C. Simsion : Data Modeling Essentials. Morgan Kaufmann, Scottsdale 2005, ISBN 0-12-644551-6 .

Individual evidence

^ Martin Fowler : Analysis Patterns , ISBN 0-201-89542-0
^ David Hay: Data Model Patterns , ISBN 0-932633-29-3
^ Len Silverston: The Data Model Resource Book , ISBN 0-471-38023-7

[1] Martin Fowler : Analysis Patterns , ISBN 0-201-89542-0

[2] David Hay: Data Model Patterns , ISBN 0-932633-29-3

[3] Len Silverston: The Data Model Resource Book , ISBN 0-471-38023-7