Data referred to as plural of date facts, dates or calendrical time information, and as a plural word in common language through observations , measurements u. Similar numerical values obtained and information based thereon or formulatable findings .
While data in colloquial language are givens , facts and events , in technical language data are signs that represent information. In various fields such as B. in computer science , mathematics , economic theory , neuroscience or the biosciences , different - mostly similar - definitions are used. There is currently no uniform definition.
Etymology and usage
Data or previously Data are actually plurals of date , which as a loan word from the Latin back to date , given '( PPP to lat. Dare , give') or a substantive , the given '. On more important documents, the usual introductory formula was noted as "date ..." ("given (on) ...") with <time information> and possibly <place information> - whereby their content became "the given". The plural form data to date follows other words of Latin origin such studies - studies or individuals - individual .
Since in the German language the meaning of “date” has narrowed down to calendar date in general usage , the word form “dates” is often not used for the plural formation in the sense of times, but instead of “dates” or “ dates ”. Conversely, words such as “value”, “indication” or “data element” are used for the single number of “data” in the broader sense as a given measurement, information or character (string). So it is a plural tantum .
- Data as distinct from information
Although these two expressions are often used interchangeably in colloquial language , information theory distinguishes the two fundamentally from one another. For details and examples see → Information .
German law uses the term data in various places, but has no definition in this regard. The term is used, for example, in data protection (Art. 4 No. 1 GDPR ) or in criminal law under " Spying on data " ( StGB ); "Data" in this sense are "only those that are stored or transmitted electronically, magnetically or otherwise imperceptibly." This criminal law provision of data is based on the technical view of data as machine-readable coded characters that are sent to a memory or transmission medium are bound. A semantic dimension of data as a carrier of information must be distinguished from this. This distinction also has a legal meaning. The question of legal protection for the information content of data leads to the scope of intellectual property (copyright, industrial property protection) or data protection. Unauthorized changes to the coding on a data carrier, on the other hand, are to be regarded as interference with the ownership of the data carrier and are therefore relevant in terms of property law and possibly also criminal law.
Ownership of data: Ownership of data corresponding to property ownership (§ ff. BGB ) is not recognized by applicable law in Germany. Since the property regulations are aimed at an exclusive assignment of a thing that cannot be increased at will and clearly identifiable, they do not match the character of data as a non-rival good that can be increased at will, almost at no cost. However, the applicable law recognizes ownership of data carriers . It has not yet been conclusively clarified to what extent ownership of the data carrier or of a data-producing device extends to the stored or produced data.
The Austrian core criminal law has known the concept of data since the introduction of StGB (data corruption). In the course of time, more facts were added, so that today also fraudulent data processing abuse ( StGB), data falsification ( StGB), the disruption of the functionality of a computer system ( StGB) and various offenses (including , and StGB) can be punished.
There is also a differentiated description of the term inData Protection Act 2000 (DSG). A distinction is made between personal and non-personal data, with only the former being protected by the DSG.
The determination of the facts is also referred to as the determination of circumstances.
According to the definition of the now superseded DIN 44300 No. 19 standard , data (from 1985) were "structures made up of characters or continuous functions that represent information based on known or assumed agreements, primarily for the purpose of processing and as its result."
According to terminology of the applicable standard of international technology standards ISO / IEC 2382-1 for Information Technology (since 1993) are data - Data : "a reinterpretable representation of information in a formalized manner, suitable for communication, interpretation, or processing" - a re-interpretable representation of information in a formalized way, suitable for communication, interpretation or processing.
In computer science and data processing , data is commonly understood as a ( machine ) readable and editable, usually digital representation of information . Their content is usually first encoded in characters or character strings , the structure of which follows strict rules, the so-called syntax . In order to abstract the information from data again , it must be interpreted in a context of meaning. For example, depending on the context , a sequence of digits such as “ 123456 ” can represent a telephone number , an account number or the number of new vehicle registrations in a certain period of time. The sequence of characters “123456” or “11110001001000000” as such can only be recognized as a sequence of digits; their concrete meaning only becomes clear in the appropriate context (see semantics ).
The storage of data takes place on data storage devices , such as B. hard drives, DVDs, flash memories or magnetic tapes, formerly z. B. also punch cards . These data carriers are considered to be hardware , while the data contained on / in them are to be understood as an "intangible term".
The way in which data is presented is called coding , the number of possible characters is called code alphabet (e.g. UTF-8 ). Data can be coded differently, i. H. note in different codes , but still represent the same information. In today's digital technology, coding in binary form has almost exclusively established itself . A bit is the smallest unit of information. In addition to binary code, alphabets with more than two symbols can also be used.
- Usual memory cells only know the states “on” and “off”, which are interpreted as “1” and “0” and thus as the basic values of the binary system.
- Memory cells with more than one bit per cell are found in flash memories, e.g. B. the MLC or TLC memory cell .
- Storage cells for superimposed quantum states, so-called qubits , are still at the research stage.
Categorization of data
- Structured data : The data (for example in databases or files ) have a similar structure.
- semi-structured data (e.g. Extensible Markup Language (XML))
- unstructured data (e.g. documents, any text, graphics)
According to the degree of their persistence one differentiates:
- Transient data (volatile, temporary) versus persistent data (more permanent)
- Input data and output data or data to be saved versus saved data
Further terms for data types:
- Application data are data to be processed in a technical and functional manner - in contrast to technical data (such as installation data, program code , executable files , etc.). Application data can be differentiated into master data, movement data and inventory data; see also master data .
- Near-time data are copies of current data that are slightly less up-to-date than the original data (in real-time real-time data).
- Backup data are files of data that have been copied for security reasons and that can be accessed when required
- Original versus derived data: Original is data that is available for the first time and is unique. Sums, copies or other constructs can be formed (derived) from them.
- Serial data (also called sequential data): The data are not managed under the management system of a database (DBMS), but are stored and processed in a standard file format of the operating system . As a rule, direct access is not possible; the data must then be written or read in sequence.
- Historical data: The data stock at certain times (e.g. status before changes, status at the beginning of the year) can be saved separately and used later in certain functions (e.g. screen display).
Forms of processing of data
As data operations in storing data according to the principle " CRUD the initial acquiring data (c" reate ), reading (r ead ), changing (u pdate ) and erasing (d elete ) to distinguish. The subject of such operations is usually a certain group of data (such as a customer address, order, etc.), which z. B. was formed according to the rules of data modeling . These data-technical operations are triggered by computer programs , i. H. specified via corresponding commands contained in these (as part of an implemented algorithm ). On the one hand, the operations are themselves input / output commands in relation to the data stock, but they are also partly related to input and output by the user of the computer program.
The purpose of storing data is usually its later use . The simple reproduction (e.g. in the form of displays or lists) can be distinguished from the evaluation, in which the data flows into different logical, mathematical or representational processes (e.g. for adding up, calculating averages, forming differences, data comparison , as graphic diagrams, etc.).
A special form of data processing are data import (file import) and data export (file export) as common methods for data exchange between different systems. In this case, data conversion may also be necessary if the source and target systems use different data formats or file formats .
Observation and impact levels for data
The term "data" occurs in different, interrelated levels of impact and observation. These are essentially:
- In data management , general framework conditions for working with the data are specified and applied during operation , for example: Who is considered to be the owner of the data? Where and how is the data created or is it used? Who is allowed to access them ( data security ); Rules and measures for data protection and data backup ; company-wide models and naming conventions; Deployment concepts for data tools, etc.
- Data design: In addition to the functionality of the programs, data play a central role, especially in software development within the framework of projects . Using individually available procedures and tools, details of the data architecture are determined, e.g. B .: What data does the software know? How are they related to each other ? Does it already exist? Are they managed / stored in databases or in files? Mandatory or optional field? Which data types and data structures are to be formed? Which characteristics / content can an attribute take on?
Technical implementation :
- The results of the design specifications are set (when stored in a database system ) in a database model as the basis for processing and managing the data that the database is to contain.
- During the programming , the program code is created , the commands of which are used to process data. So-called declarations are used to arrange data structures with their individual data fields in such a way that they can accept the data and that commands are generated during translation that correspond to the field properties (position, length, data format, etc.).
- Actual data: This is where the data is actually stored and used by the programs.
Data in programming
Data is primarily the source and destination of processing in computer programs . The program includes H. Declarations and commands corresponding to the processing purpose are required in its source code . Depending on the programming language , these can have considerable syntactic and linguistic-conceptual (semantic) differences. Important data-related terms are given here (each with synonyms, similar terms and examples ):
- Data inventory: This is where data is stored, generated, changed or deleted by a program and / or read from there (see also CRUD ). Similar: database , file , database ; Example: customer addresses
- Data record : summarizes the information / values related to an object ( entity ). Similar terms: tuple , compound , dataset , recordset ; Example: address of a specific customer.
- Data field : A single, elementary specification / information belonging to a data record. Only data used internally in the program (for example total fields, the VAT rate in percent ...) are also defined and processed in data fields. Similar: variable , constant , field; Example: postcode of the place of residence
- Data structure : Combination of several data fields into a group. Representative: composite (data group), array / table , stack ; Example: the customer's telephone number (s), country code, area code, phone number, extension number if applicable
- Data type : Classification for data fields and structures, for example text, numeric / floating point, array. The commands (methods, functions) that can be applied to the data fields are based on this. Similar: data format ; Example: Postal code is a numeric field
Since the turn of the millennium, the proportion of digital data is said to have exceeded that of analog recordings. In 2011, around 1.8 zettabytes (10 21 bytes = 1.8 trillion gigabytes) of digital data were created or copied. The total volume has grown by a factor of five in the last five years and is currently growing by 10 18 bytes every day . If you wanted to burn the entire amount of data on DVD, you would need a stack of DVDs that would stretch from the earth to the moon and back again.
Global data traffic is expected to multiply in the next few years, as is the proportion of "dark information", which means that more and more information is exchanged between machines. For 2020, “the amount of data that is created, copied and consumed will be around 40 zettabytes - and thus 50 times as high as three years ago”.
Business and Economics
In business administration and economics , data is understood to be given economic variables that cannot usually be influenced by the decision-maker . Both sciences take the etymological origin of the word ( Latin datum , 'the given') very seriously. The environmental influences on these decisions are divided into endogenous factors such as the internal acceptance of company decisions or the susceptibility to failure in the implementation of the service processes and exogenous factors . This includes natural (information on climate and weather ) and societal data (such as laws , collective agreements , action parameters of competitors , suppliers and buyers or institutions ) that are not to be understood as a reaction to own action parameters. If the decision-maker fails to make any attempt to influence the situation, the societal conditions are also data parameters , as is the case with natural conditions . In particular, they are the framework conditions defined by the external environment of a company ( market , state , central bank , supervisory authorities , foreign countries ), which, at least in the short term, cannot be influenced either directly or indirectly by one's own decisions . The decision-making framework therefore sees the decision-making environment as an unchangeable date.
Company data that a company collects during its activities within a financial year serves as an essential basis for decision-making . Only a small part from the accounting department reaches the interested public in the context of the disclosure requirement for accounting reasons through publication in the annual financial statements or in quarterly reports .
Data in the general sense
- Contents of lexicons and books
- The temperature displayed on a thermometer
- The annual rings of a tree or similar biological (measurable) characteristics
- The (measured) speed of a passing vehicle
- Answers to surveys , censuses - to the questions in questionnaires
- Results of experiments in science, technical facts
- Press archives from newspaper publishers
- The content of documents (e.g. letters, notes, minutes, etc.)
Data in computer science
- Bits and Bytes, for example:
- when storing on data carriers such as hard drives, USB sticks or DVDs
- when transmitting via the Internet or to the mobile phone
- as character strings or texts in text files
- as binary files (e.g. machine code , database content , digital photos, sound recordings or videos, etc.)
- Big data
- Data science
- Data mining
- Data stealing
- Digital data
- Information quality
- soft data
- Herbert E. Wiegand: Dictionary research , volume 1. de Gruyter, Berlin 1998, ISBN 3-11-013584-1 , Chapter 18.104.22.168 (comments on the use of data and information ).
- data . In: Duden (online)
- WE Proebster : Computer networks, technology, protocols, systems, applications . books.google.de
- B. Witt: Data Protection Compact and Understandable: A Practice-Oriented Introduction . Vieweg + Teubner (Springer Fachmedien), Wiesbaden 2010, p. 4, definition: data .
- What are data? The data protection blog
- data. In: Gabler Wirtschaftslexikon. Retrieved February 27, 2011 .
- Heinz-Peter Gumm , Manfred Sommer: Introduction to Computer Science. 10th edition. Oldenbourg Verlag, ISBN 978-3-486-70641-3 , p. 4 f.
- D. v. Erffa: Taschenlexikon der Wirtschaft . books.google.de "data" or explanatory z. B. in G. Blümle et al .: Perspectives of a cultural economy , volume 1. LIT Verlag, Münster 2004, ISBN 3-8258-6137-6 , books.google.de data concept from Eucken.
- date, n.. In: Jacob Grimm , Wilhelm Grimm (Hrsg.): German dictionary . tape 2 : Beer murderer – D - (II). S. Hirzel, Leipzig 1860 ( woerterbuchnetz.de ).
- see “Dates” and “Date” in Kluge: Etymological Dictionary of the German Language , 23rd Edition, pp. 163f.
- Information & data / introduction . informatikstandards.de
- Susanne Reindl-Krauskopf: An overview of computer criminal law . 2nd Edition. Facultas Verlag, Vienna 2009, ISBN 978-3-7089-0523-5 , p. 8f.
- law to criminalize data stealing . cr-online.de, portal for IT law
- The naive data cow . In: Der Tagesspiegel
- Peter Rechenberg, Gustav Pomberger: Informatik Handbuch . 4th edition. Carl Hanser Verlag, Munich, ISBN 978-3-446-40185-3 , p. 189.
- According to Schneider (1997) in Bernard Favre-Bulle: Information and connection: information flow in processes of perception of thought and communication , p. 35, books.google.de
- The World's Technological Capacity to Store, Communicate, and Compute Information. martinhilbert.net, accessed September 29, 2015 .
- Data growth doubles every two years - EMC. EMC Corporation, accessed July 16, 2011 .
- Data traffic forecast for 2016. In: Spiegel Online . Retrieved May 4, 2013 .
- 34 gigabytes - the daily information consumption. In: Telepolis . Retrieved August 22, 2016 .
- volume doubles every two years. In: Welt Online . July 16, 2013, accessed November 24, 2015 .
- Gerhard Vogler, General Business Administration , 1976, p. 55