Data stream

from Wikipedia, the free encyclopedia

With data streams ( English datastreams ) is known in the computer science a continuous flow of data sets , the end of which is usually not be foreseen in advance; the data records are processed continuously as soon as a new data record is received. The individual data sets are of any fixed type . The amount of data records per time unit ( data rate ) can vary and possibly become so large that the limited resources are insufficient for further processing and the recipient has to react accordingly (e.g. discarding data records). In contrast to other data sources , data streams can only be processed continuously record by record - in particular, in contrast to data structures with random access (such as arrays ), usually only sequential access to the individual data records is possible.

Data streams are often used for interprocess communication (communication between processes on a computer ) and for the transmission of data over networks , especially for streaming media . They can be used in many ways within the framework of the Pipes and Filters programming paradigm ; this is a common tool in Unix shells . Examples of data streams are weather data and audio and video streams (streaming media). The continuous transmission of data over a network is also known as streaming.

In contrast to the meaning in connection with “streaming”, the term “data stream” is also used more generally as “electronically coded data at the transmission stage”; the aspect of continuous processing is unimportant here, it is emphasized that the transmission is not yet completed. Examples are: uploads / downloads; data sent during electronic data exchange ; Databases for import or export at SAP .

Data streams vs. static data

Non-flowing, i.e. static data are usually stored in a structured manner, often as tuples of values ​​in relations in a database . They are limited and not ordered in time. The data in data streams, on the other hand, have an orderly chronological order and can occur practically without limit. While data in relations can also be specifically updated and deleted, only the insertion of new data is possible in data streams, as individual elements cannot be accessed with random access. Using special data flow algorithms, however, individual tuples of a data flow can be selected based on their properties and, if necessary, converted into a new data flow. The (reversible) transformation of structured data into a data stream-like sequence is also known as serialization .

history

The concept of data streams in data processing can be traced back, among other things, to the pipes proposed by Douglas McIlroy for linking macros , which were implemented in 1964 as "communication files" in the Dartmouth time-sharing system and integrated into the Unix operating system in 1972 . This is a data connection between two processes based on the FIFO principle. The principle of streams can now be found in most modern programming languages .

Processing of data streams

Processing of data streams in a DSMS

Most of the data streams are processed using programs specially tailored to an application . For example, audio / video streams can be played with special playback programs. For the general administration of any data streams, so-called data stream management systems (DSMS) have been developed in computer science since the beginning of the 21st century . These systems, which are still a relatively new area of ​​research, are comparable to conventional database management systems (DBMS) for static data. An example of such a DSMS is the Stanford Stream Data Manager . As a query language , the Continuous Query Language (CQL) was developed as an extension to SQL in the context of this project .

Typical problems when processing data streams are large amounts of data in a short time and the limited resources available for processing them, since not all of the incoming data can be temporarily stored and only a section of the data is known. This means that only certain algorithms are possible. The time available for evaluation is also often limited, since time-critical applications expect quick results. Systems that deliver a result within a guaranteed period of time are also referred to as real-time systems .

Since the incoming data streams are practically unlimited, the results of a processing of data streams calculated from them are often themselves data streams themselves. A distinction is therefore made between incoming data streams ( ingoing stream , instream or downstream ) and outgoing data streams ( outgoing stream , upstream ).

See also

literature

Individual evidence

  1. Federal Standard 1037C data stream