Scheme (computer science)
A schema , data schema or relation schema ( plural schemas or schemas , also schemas ) is a formal description of the structure of data in computer science . Schemas have special meanings in connection with databases . A database schema comprises all objects that a privileged database user (schema owner) has created. These objects include tables, views, synonyms, sequences etc. and can be used by other database users, provided that the schema owner has granted the privileges accordingly.
Usually the schema itself is defined in a formal language so that data can be automatically checked to see whether it corresponds to the schema. A well-known example of such a description language is XML schema for XML .
Aspects of schemes
In terms of their complexity, schemas can range from simple attribute lists to complex ontologies . In principle, schemes contain definitions of relations as tuples of attributes to which data types can be assigned in many cases . Depending on the type of schema, additional relationships and conditions between different relationships and additional rules are possible. Data types (for example numbers, strings, date formats ...) are themselves described as part of a schema by rules, which are, however, usually assumed to be given. In the context of object-oriented modeling , complex data types are composed of simple data types, whereby one speaks of objects instead of data types .
Schemas in databases
Schemas play an important role in connection with databases. This is often referred to as the database schema . The schema defines which data can be stored in a database in which form and which relationships exist between the data. Especially in the case of relational databases , schema is an SQL object in which the tables and their attributes as well as the integrity conditions are specified to ensure consistency . This includes, in particular, the definition of value ranges for individual attributes and foreign key relationships, as well as existence and uniqueness conditions. Database systems store the schemas of the managing databases in a special area, the data dictionary .
The ANSI-SPARC architecture , also known as the three-level architecture, describes the basic structure of a relational database system using three schemes:
- The external schemas that formally describe how the database presents itself to the user (group) and applications (individual application-oriented view).
- The conceptual or conceptual schema in which the subject logic is formally described on the basis of the semantic data model (technical view).
- The internal schema that formally represents how and where the data is stored in the database (technical view).
Examples
Scheme of a CSV file
VORNAME; NACHNAME; STRASSE; ORT
Schema for XML data in the form of a document type definition (DTD)
<!ELEMENT PERSON (VORNAME, NACHNAME, ADRESSE+)>
<!ELEMENT VORNAME (#PCDATA)>
<!ELEMENT NACHNAME (#PCDATA)>
<!ELEMENT ADRESSE (STRASSE, ORT)>
<!ELEMENT STRASSE (#PCDATA)>
<!ELEMENT ORT (#PCDATA)>
This DTD (not to be confused with XML schema ) describes that PERSON elements consist of exactly one first name, one last name and at least one address. Address consists of street and place; First name, last name, street and city consist of #PCDATA , d. H. parsed character data, i.e. simple text that is not broken down further.
Schemas for data in the semantic web
An example of an ontology can be found under Web Ontology Language .
Design of schemes
The design of schemas (data modeling) depends heavily on the approach. A basic distinction can be made between the entity relationship model and object-oriented modeling (see data modeling ).
Schematic heterogeneity
In order to convert or merge data based on different schemes, a transformation and integration of their schemes is necessary. In practice, this is particularly necessary for data migration and information integration .
The heterogeneity can affect both the structure and the semantics, whereby structural differences can be bridged much more easily. However, the transition from structural to semantic differences is not always clear.
Typical structural differences concern the order of attributes, name conflicts, that is, different names for the same attributes ( synonyms ) or the same names for different attributes ( homonyms ), flat structures (SQL) as opposed to hierarchical structures (XML), the degree of normalization and different data formats with the same expressiveness.
Semantic heterogeneity exists when the individual concepts of the different schemes do not match. Instead, there is an inclusion or an overlap that has to be accepted to a certain extent.
A heterogeneity between structural and schematic are data types that differ from one another in detail ( units of measurement , accuracies, etc.).