Data type
Formally called a data type (from the English datatype ) or a type of data in the computer science the combination of sets of objects with defined thereon operations . Only the names of these object and operation sets are specified by the data type of the data record using a so-called signature . A data type specified in this way does not yet have any semantics .
The much more frequently used, but more specific meaning of the term data type comes from the field of programming languages and describes the combination of specific value ranges and operations defined on them to form a unit. Examples can be integers or decimal numbers, character strings or more complex types such as date / time or objects . In the literature, the term “ concrete data type” is also used to distinguish between these data types . For a discussion of how programming languages deal with data types, see Typing .
The conceptual transition from the formal definition to the definition of concrete data types used in the programming language environment takes place via the successive introduction of semantics to the formally specified names of the object and operation sets. The specification of the operation set leads to abstract data types or algebraic structures . The concrete data type results from the further specification of the object set.
Formal definition of a data type through a signature
A signature is a pair (sorts, operations), where sorts represent names for object sets and operations names for operations on these sets. An example is to show this for a simplified version of the well-known (concrete) data type Integer , which is called Simple Integer here :
Simple integer | |||
sorts | int | ||
Operations | zero: |
-> int
|
|
+ | : int x int |
-> int
|
|
- | : int x int |
-> int
|
|
End simple integer |
This is a signature for an assumed data type Simple Integer , on which only two operations + and - (besides the "producer operation") are allowed. We name the only variety int
. The operation zero is used to create an int
element. The operations + and - each have two digits and each return an element of the variety int
. It is important that this is a purely syntactic specification. What is a int
is not defined anywhere. For this purpose, the name of the variety would have to be assigned to a set. In this case, a meaningful assignment would be the set of natural numbers . Nothing more is said about the way the operations work than their arenas and their results. Whether the + symbol corresponds to the operation of the sum operation is not specified here - this would also be completely impossible, since it is not even known whether the operation works on the natural numbers. Such assignments fall into the field of semantics. A specification extended by semantics could therefore look like this:
Simple integer | |||
/ * pure syntax * / | |||
sorts | int | ||
Operations | zero: |
-> int
|
|
+ | : int x int |
-> int
|
|
- | : int x int |
-> int
|
|
/ * Assignment of a semantic * / | |||
amounts | int = IN | ||
Functions | zero = 0 | ||
+ | : int x int | correspond to the sum of two numbers from IN | |
- | : int x int | correspond to the arithmetic difference between two numbers from IN | |
End simple integer |
However, this already exceeds the range of a signature. Rather, this specification would be referred to as algebra . In this way, however, the specification comes closer to the programming language understanding of the term data type , to which a large part of the rest of the article is dedicated.
Data types in programming languages
Many programming languages offer their own set of predefined data types for which the principle of the respective value range, such as whole numbers , floating point numbers or character strings , is the same. The actual names of these data types and the precise definitions of the value ranges and the associated operations, however, differ greatly in some cases, since they depend on the programming language used, the platform used and other compiler- dependent factors.
Data types in the programming used to storage areas a concrete semantics assign. These memory areas are called variables or constants . The data types enable a compiler or runtime environment to check the type compatibility of the operations specified by the programmer . Inadmissible operations are sometimes already recognized during compilation , so that, for example, the division of a character string 'HANS' by the number '5', which is not useful and is undefined in common programming languages, is prevented.
A distinction is made between elementary and composite data types. Another classification term is the ordinal data type .
Ordinal data types
Ordinal data types are characterized by the fact that a fixed order relation is defined on them , which assigns a unique order number to their values. This defines the order of the values. As a result has
- every value except the first has exactly one direct predecessor and
- every value except the last one has exactly one direct successor.
Whether an elementary data type is also an ordinal data type depends on the definition in the specific programming language. Examples:
- The enumeration type is an ordinal data type in PASCAL, since the values are ordered from left to right; Successors and predecessors can be determined using standard functions. This is not the case in C.
- Boolean is a special enumeration type with the two values “false” (ordinal value 0) and “true” (ordinal value 1), usually called “false” and “true” in English.
- Whole numbers and natural numbers are inherently ordinal data types.
Elementary data types
Elementary data types , also called simple data types or primitive data types , can only accept one value of the corresponding value range. They have a fixed number of values ( discretion ) as well as a fixed upper and lower limit ( finiteness ). Therefore, real numbers can only be represented as floating point numbers with a certain degree of accuracy. For elementary data types, basic operations are defined in a programming language; for numbers , these are the basic arithmetic operations . Depending on the programming language and value range, data types have different names and are written in upper or lower case (all upper case here for an overview).
Whole numbers
- Designation : BIGINT, BIN, BIN FIXED, BINARY, BYTE, COMP, INT, INTEGER , LONG, LONG INT, LONGINT, MEDIUMINT, SHORT, SHORTINT, SMALLINT
- Value range: Mostly 32 bits (−2 31 … 2 31 -1), 8 bits, 16 bits, 64 bits
- Operations: +, -, * , <,>, = , integer division , modulo , bitwise operators
Natural numbers
- Designation : BYTE, CARDINAL, DWORD, NATURAL, UINT, UNSIGNED, UNSIGNED CHAR, UNSIGNED INT, UNSIGNED LONG, UNSIGNED SHORT, WORD
- Value range: Mostly 32 bits, (0… 2 32 -1), 8 bits, 16 bits, 64 bits
- Operations: +, -, * , <,>, = , integer division , modulo , bitwise operators
Fixed-point numbers (decimal numbers)
- Designation: COMP-3, CURRENCY, PACKED DECIMAL, DEC, DECIMAL, MONEY, NUMERIC
- Range of values: Range of values directly dependent on the maximum number of digits, which is usually to be specified; CURRENCY (64 bit): -922337203685477.5808… 922337203685477.5807
- Operations: +, -, * , <,>, = , integer division , modulo
Enumeration types
- Designation: ENUM, SET or implicit
- Range of values: Freely selectable, for example (BLACK, RED, BLUE, YELLOW)
- Operations: <,>, =
Boolean (logical values)
- Designation: BOOL, BOOLEAN, LOGICAL, or (implicitly without an identifier)
- Value range: (TRUE, FALSE) or (= 0, ≠ 0) or (= -1, = 0)
- Operations: NOT , AND , XOR , NOR , NAND , OR , =, ≠
Character (single character)
- Designation: CHAR , CHARACTER
- Range of values: All elements of the character set (for example letters)
- Operations: <,>, = , conversion to INTEGER, ...
Floating point numbers
- Description: DOUBLE, DOUBLE PRECISION, EXTENDED, FLOAT, HALF, LONGREAL, REAL, SINGLE, SHORTREAL
- Range of values: Various definitions (see below)
- Operations: +, -, *, / , <,>, =
Number of bits n |
Range of values from ... to |
significant digits |
||
---|---|---|---|---|
HALF | 16 | 3.1 · 10 −5 | 6.6 · 10 4 | 4th |
SINGLE, REAL | 32 | 1.5 · 10 −45 | 3.4 · 10 38 | 7-8 |
REAL | 48 | 2.9 · 10 −39 | 1.7 · 10 38 | 11-12 |
DOUBLE, REAL | 64 | 5.0 · 10 −324 | 1.7 x 10 308 | 15-16 |
REAL | 64 | 1.1 · 10 −306 | 1.8 x 10 308 | 15-16 |
EXTENDED | 80 | 1.9 · 10 −4951 | 1.1 x 10 4932 | 19-20 |
Bit sets
Bit sets represent a set of several bits . In some programming languages there is a separate data type and separate operators (for example for the union or the intersection ) for bit sets to maintain type safety .
Bit sets are not to be confused with enumeration types or data fields , since several elements of the data type (or the set) can be addressed at the same time. In many programming languages, whole-number data types are used to represent bit sets, so that numbers and bit sets are assignment compatible , although arithmetic operators do not make sense with bit sets and set operators in connection with whole numbers.
- Designation: SET , BITSET
- Range of values: {} for an empty set, {i} for a set with the element i, {i, j} for a set with the elements i and j
- Operations: comparison operator , type conversion into an integer or element of a character set , set operators
Pointer types / dynamic data types
A special feature are pointers , the real value range of which remains anonymous in many programming languages, since they are "only" references to any other data types. Depending on the referenced type, pointers to certain elements are named separately, such as pointers to files , printers or pipes .
Object-oriented programming languages store the data type referenced by the pointer (for example in the case of instance variables) together with the address to which the pointer refers, so that the assignment compatibility can be checked not only for the data type of the addressing, but also for the referenced content. This is then even possible at runtime and is also necessary for some applications (for example with polymorphism ).
pointer
-
Designation: ACCESS, POINTER, IntPtr or just asterisk (
*
) - Range of values: Address of the basic type (often anonymous)
- Operations: reference, dereference, in some languages: +, -, *, /
Constant zero pointer
- Designation: NULL, VOID, None, NIL, Nothing
- Range of values: none
- Operations : =
- Meaning: This pointer is different from all pointers to objects.
Procedure types
Some programming languages, such as Oberon , use procedure types which are used for pointer variables that can point to different procedures with identical formal parameter lists .
Compound data types
Compound data types are a data construct that consists of simpler data types. Since they can theoretically become arbitrarily complex, they are often already counted among the data structures . Most programming languages have in common:
- Sequence ( tuple ), table; Field (ambiguous!)
- Designation : ARRAY , (implicit definition with [n] or (n) without identifier)
- Value range: mapping a finite set (index set) to the value range of a basic type (element type). The index set must be ordinal. Applying multiple indices creates a multidimensional order.
- Operations: <,>, = , assignment with assignment compatibility
-
Example:
type 3D-Vektor is ARRAY(1..3) of INTEGER;
-
Fixed length character strings (Basically, character strings themselves are only a sequence of the character type. However, since they are predefined in many programming languages, they are listed separately here.)
- Designation : Array of CHAR, CHAR (n), CHAR [n]
- Range of values: All possible strings
- Operations: String functions (substring, concatenation [composition]), <,>, =
- Variable length character string . The length can be determined, implicitly using a metacharacter as a string end character (ASCII \ 0), explicitly using a variable, or using a standard function. Often as an abstract data type in a standard library.
- binary character string of variable length. The length can be determined using a variable or a standard function.
-
Compound , sentence, structure, area
- Designation : RECORD, STRUCT, CLASS (extended meaning), (implicit definition via level numbers)
- Range of values: A network contains a series of different components which can have different data types. Any type is permitted as a component type. In some object-oriented programming languages (for example Oberon ), compounds for describing the behavior of the components of the compound can also have type-specific procedures using methods .
- Operations: comparison (only equality or difference), assignment with or without assignment compatibility (highly dependent on the programming language)
-
Example:
type Prüfung is RECORD (Fach: STRING, Schueler: STRING, Punkte: INTEGER, Lehrer: STRING, Termin: DATUM)
- In many programming languages there are options for interpreting the memory area of a network several times differently. This is called a variant record or UNION . In most cases , however, there is no longer any type safety .
Additional individual format information
When using data types in the source code of a program, individual and additional format specifications are often implemented for a selected data type. For example, a date (or generally a time) can be created as an integral elementary data type, to which information on the form of processing / representation can be added. The date is then e.g. B. stored in milliseconds since January 1, 1970 00:00 and can, based on this, be converted into certain other forms (such as 'DD.MM.YYYY' or 'MM.TT hh: ss'); please refer. Alternatively, a date could of course also be represented as a composite (e.g. from three numbers for day, month and year).
Functions as first-order values
In many contemporary programming languages, in addition to function pointers , regular function values, function literals , and anonymous functions are also available. These were developed based on the lambda calculus and implemented in LISP as early as 1958 (albeit with a faulty dynamic link). A correct, i.e. H. static binding was e.g. B. specified for Algol 68 . That functions to this day z. Some of them are not understood as values, is due to the spread of this concept outside of computer science, which is only now beginning to spread.
Universal data type
A universal data type is understood to be the type of values in a programming language with support for non-typed variables. This is mostly about the discriminated union of the types of the occurring values (elementary, compound, functions etc.). The universal data type characteristically occurs in universal scripting languages . Examples of the use of universal data types in languages of other genres are the lambda calculus , in which functions are the only values, and Prolog , in which the data are given by the Herbrand structure .
Abstract data types
- definition
- An abstract data type (ADT) is a collection of data in variables - linked to the definition of all operations that access them.
Since the access (read or write) is only possible via the specified operations, the data is encapsulated from the outside . Each ADT contains a data type or a data structure.
Object-oriented programming languages support the creation of ADTs through their class concept, since data and operations are linked here and the data can be protected. Some modular programming languages such as Ada or Modula-2 also specifically support the creation of abstract data types.
From a technical point of view, an abstract data type defines a defined range of values with technical significance and its specific characteristics. The data type 'customer number' may be of the elementary type 'whole numbers', but it differs through a defined length and e.g. B. a check digit in the last digit. - It forms a subset of all whole numbers in the defined length. Complex data with a dependency on one another can also be combined here as ADT. This is common in the example of a representation of time periods. A start date and an end date (both have the data type 'Date') are linked via an integrity condition. As a result, the permissible range of values for the end date is ultimately linked to further conditions. - Ultimately, an ADT is any complex range of values that is tied to static and / or dynamic values and assigned rules for determining the value.
Anonymous data types
Some programming languages and the XML structure definition language XML Schema support the concept of the anonymous data type . This is a data type for which no name is defined.
Web links
Java
SQL
- Description of the data types in PostgreSQL with references to standard conformity according to ANSI / ISO (English)
- Chapter 11. Data Types in the MySQL 5.1 Reference Manual
Ada
Individual evidence
- ↑ David Axmark, Michael "Monty" Widenius u. a .: The column types BLOB and TEXT. (No longer available online.) In: MySQL 5.1 Reference Manual. MySQL AB, August 11, 2008, archived from the original on July 18, 2008 ; Retrieved August 28, 2008 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice.
- ↑ Jack Melnick et al. a .: LOB and BFILE Operations. In: Oracle Call Interface Programmer's Guide, 11g Release 1 (11.1). Oracle, May 2008, accessed August 28, 2008 .
- ↑ C-Standard ISO / IEC 9899: TC3 Committee Draft open-std.org (PDF) p. 338