Data dictionary

from Wikipedia, the free encyclopedia

A data dictionary - German data dictionary , data catalog or slightly less sharp data directory called - is a catalog of metadata that contains the definitions and presentation rules for all application data of a company and the relationships between the different data objects, so that the database is free of redundancy and uniform structure. It is a use case of a specific data model.

In a relational database , a data dictionary is a set of tables and views that are only read (read-only) during queries (e.g. using the SQL language ). The data dictionary is structured like a database, but does not contain application data but metadata, i.e. data that describes the structure of the application data (and not the content itself). Such a data catalog is usually set up and maintained via an interactive dialog or with the help of a data definition language ( DDL ).

Active, passive and integrated data dictionaries

An active data dictionary reflects the current, detailed status of the data model at all times. Changes to the structure of a database can be made directly in the maintenance interface of the data dictionary, or by other means, for example a command interpreter of a DDL . Regardless of how these changes are made, the currency of an active data dictionary is always guaranteed automatically.

This synchronicity does not exist in a passive data dictionary . Changes to the structure of the database management system (DBMS) must be manually updated in the data dictionary (DD), if this is desired and economically feasible. DD products for modeling and documentation of the conceptual data model in particular suffer from this problem.

Interfaces between humans and software

A data dictionary can be queried via the following interfaces:

  • User interface
    • Data designer, modeler
    • Application programmer
    • End user
    • Database administrator
  • Software and DBMS interface
    • Compiler / precompiler
    • Integrated development environment (IDE) with integration of the data dictionary in terms of CASE
    • Application programs
    • Report / form generators
    • Query Optimizer
    • Integrity Constraint Enforcer

Classification of data dictionaries according to the modeling level

A distinction is made between different modeling levels in the development and maintenance of data models:

  • Conceptual level (usually related to an area of ​​application, in business informatics often also company-wide or even cross-company)
  • Logical level
  • Physical level in which the conceptual / logical data model is mapped and implemented in relation to a specific DBMS.

According to the different levels of data modeling , the data dictionaries can be differentiated according to the support of these model levels. Depending on the level, the data dictionaries differ in terms of type, content and data types of the necessary metadata, but also with regard to their functions and evaluation options.

Data dictionary for conceptual / logical data modeling

A data dictionary for conceptual / logical data modeling includes:

In addition to defining the essential data objects or elements and their relationships, detailed descriptive texts are typically stored at the level of the respective entities, which are linked to one another using hyperlink technology. When an organization sets up a company-wide data model (UwDM), information on the application-related semantics, data type and data representation is collated for each data element. The semantic information defines the exact meaning of a data element and is formulated as running text. The presentation rules determine how data elements are saved (e.g. data type such as integer, text, maximum text length, input format, output formats, permissible value ranges as checking rule, static or dynamic quantity, etc.). This first form is often not included as a standard function in the functional scope of a DBMS. This is why isolated solutions often have to be used here. In relation to the DBMS, however, these represent a passive data dictionary . Changes to the conceptual data model cannot be automatically transferred to the physical data model of the DBMS.

Specifications have been drawn up under ISO / IEC 10027 which are intended to allow a cross-vendor and cross-platform exchange of information resources between different data dictionaries.

A data catalog can also be used as a glossary in which information objects / entities, data elements / attributes and also relationships / relationships are viewed as terms, the definitions of which are stored in the respective description section. The data dictionary can be further developed into complete ontologies or classes or business process models. If in addition to the data structure, the methods for data transformation are also described, one speaks of a repository .

Data dictionary for physical data modeling

A data dictionary for physical data modeling includes precise information on:

  • Tables and data fields
  • Primary and Foreign Key Relationships
  • Constraints , e.g. B. Test information
  • Stored Procedures and Triggers
  • Access legitimations (user names, roles)
  • physical database structure, e.g. B. Memory allocations and indexes
  • Reference and proof of use

This form is available in every DBMS as an active data dictionary, but is not always visible to the application programmer. Where such a data dictionary is not visible, it still forms the database structure as a database schema , but is stored in hidden system tables. Each time the database is accessed, the DBMS system software reads the database schema in order to be able to identify the structure and storage location of the queried data.

Function of the data dictionary in application development

In any case, it makes sense to integrate the metadata from the data dictionary into the integrated development environment (IDE). For the dynamic or generic programming of forms and reports, however, a data dictionary that is well structured and visible for the needs of application programming is a necessary prerequisite.

The functions for conceptual and physical data modeling are often not integrated in a data dictionary. More seriously, changes to the detailed database architecture are not reflected in the conceptual data model. Either time-consuming manual follow-up maintenance is necessary, or the currentness of the documented data model is lost.

A data dictionary that is integrated into the database, the program development environment and the data modeling instruments fulfills a variety of functions:

  • It describes all persistent data of an application area, e.g. B. in the form of a company-wide data model
  • Screen masks can be generated automatically on the basis of the data dictionary data (see generative programming ).
  • The structure of a database table can be read out by a program.
  • Programs can read data types and structures from tables that did not even exist at the time the program was designed. With suitable language support, a statically fixed data definition in the program text becomes obsolete. The data dictionary is thus a central instrument in application development when it comes to decoupling data definition and modeling from program development.

Cross-application data consistency

One of the advantages of a well-defined data dictionary is the consistency of the defined data elements across different tables in a database. For example, different tables can contain the data element TelefonNr ; a data dictionary can ensure that all tables refer to the same data element. In this way, a database-wide consistency and a record of usage for all table fields and data elements can be achieved.

BNF syntax for writing data dictionaries (DD)

- Context: data flow diagram (DFD), ERM, system modeling

A DD can be written in its rough structure in the BNF notation and then consists of several definitions, each in one line. In a definition , the following is written in the form of an assignment:

<Non-atomic term> = <expression>

To the left of the assignment is a non-atomic term. There is a rule to the right of the assignment. A rule consists of a combination of atomic and non-atomic terms. Repetitions are possible. Circular definitions are prohibited, but recursions are allowed.

= Assignment
() Optional
[| ] Selection
{} Repetition

Example:

Customer card = salutation + (title) + first name + surname + {address} + telephone
Address = street + house number + zip code + city
Telephone = area code + {[0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9]}
Area code = 0 + [1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9] + {[0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9]}
Street = string
House number = string
First name = string
Name = string

See also: Backus-Naur-Form (BNF or EBNF ), syntax diagrams

A presentation in this form only allows the meta-terms and their relationships to be shown. In application development, the individual meta-term is usually the carrier of a wide range of additional information such as

  • technical description in everyday language
  • Domain values
  • Field descriptions (short, medium and long text) for the form design
  • Information as to whether the data element may be empty (e.g. NULL)
  • References to check tables

Therefore, the BNF notation can be used to list the entities and show their relationships to one another. For detailed attribution and more detailed documentation, the BNF notation is usefully supported by an actual DD instrument.

See also

literature

  • Ramez Elmasri, Shamkant B. Navathe: Basics of database systems ; Pearson Studium, Munich 2004. ISBN 3-8273-7021-3
  • Thomas Connolly, Carolyn Begg, Anne Strachan: Database systems. A practical guide to design, implementation and management ; Addison-Wesley, Munich 2002. ISBN 3-8273-2013-5

Web links

Individual evidence

  1. Elmasri, p. 625