Data flow architecture

A data flow architecture is an alternative computer architecture to the so-called Von Neumann architecture , according to which the vast majority of today's computers are implemented . A computer implemented according to the data flow architecture is called a data flow computer . Data flow computer try the possibilities of parallel processing computing their orders through the concurrent executing a plurality of threads exploit. Implementations of this architecture were experimental multiprocessor computers, computers of this type could not record a commercial success. Disadvantageous properties of the data flow architecture prompted the development of hybrid computers which combined the advantages of both the data flow architecture and the Von Neumann architecture; many concepts that are typical of data flow architecture "survived" in this way and can be found in most computers today due to this development.

Motivation for developing the data flow architecture

A Von Neumann computer executes a sequential process in a linear address space ; the degree of concurrency that can be exploited is quite low. In contrast to this, a data flow computer generally processes several threads of machine commands that describe a so-called data flow graph , which as a rule have a high degree of concurrency.

Multiprocessor systems in the Von Neumann context have two central problems: The latency time for memory access, i.e. the time that passes between a memory request and the response from the memory system, and the problem of synchronization; This means the necessity of the orderly execution of the machine instructions according to the data dependencies . As an alternative, the data flow model was developed; According to their developers, computers that implement it should be better able to deal with the problems mentioned.

Differences to the Von Neumann architecture

The principle of data flow was developed by Jack Dennis and James Rumbaugh in the early 1970s . The earliest architectural design by Dennis and David Misunas dates from 1975, implemented in the Distributed Data Processor (DDP) from Texas Instruments in 1979. The first implementation of a data flow computer was the DDM 1 (Data Driven Machine 1) by Alan L. Davis and Robert S. Barton 1977 at the University of Utah in collaboration with Burroughs .

A data flow computer manages without two central features of the Von Neumann architectures: it does not require a program counter or a central memory that is continuously updated and has become a bottleneck when parallelism is exploited; In contrast to the Von Neumann architecture, calculated results, which are input values for further calculations, are not cached, but only exist as temporary "messages" between the execution units.

All microprocessors that work according to the current Von Neumann principle follow a strictly sequential processing model. The instruction sequence follows the Von-Neumann cycle , that is, after the current instruction has been loaded, the address of the following instruction is implicitly controlled, which is loaded and executed. The corresponding command is loaded under the control of a program counter, which implies the sequence-preserving loading of the individual machine commands.

The dynamic scheduling of superscalar processors, which is implemented by all modern processors in Von Neumann computers, is also aimed at executing machine commands in the order of data flow. The sequentially read instruction stream is temporarily rearranged while processing in the pipeline, preserving semantics. The result of processing the command stream must ultimately be equivalent to the sequential processing of the command stream.

With the Von Neumann principle, the machine instructions are always loaded in the correct order and the results must be saved in the registers after the respective command has ended; For example, the result of the command following a jump command can only be saved when the jump condition has been evaluated and it is clear whether the corresponding command should have been executed at all; the calculation of the said result can, however, be carried out before the jump condition is evaluated. The Von Neumann principle therefore requires the processing of the machine instructions while taking control flow dependencies into account .

The processing of the program of a data flow computer is determined by data dependencies ; that is, the control flow is identical to the data flow of the individual instructions. This means that every machine instruction can be loaded, executed and the result of the calculation saved as soon as its operands are available; the underlying model is therefore inherently asynchronous. The concept entails that there are some challenges of the Von Neumann architecture, such as branch prediction - there are no jumps in the processed data flow graphs - and the sequence-preserving execution of load and save commands of the program to be processed does not exist here.

Compilers in the Von Neumann context analyze their source code in terms of data dependencies in order to be able to correctly arrange the instructions in the generated bytecode in sequential order. Which data dependencies the compiler has determined specifically is not noted in the bytecode. A data flow compiler, on the other hand, notes the data dependencies determined in its binary code. Each dependency is given a clear marking to distinguish it from the others; this enables the executing data flow machine to process differently marked code segments in parallel.

Data flow computer

Basics

Data flow languages

To write the programs to be processed for a data flow computer, a programming language is required whose compiled machine program does not require a program counter for its processing and which can be used to describe concurrent processes very easily .

Some examples of such languages are:

Id (Irvine data flow language)
Lucid
Luster
SISAL

Data flow programming languages are declarative ; they are related to functional programming languages; their programs are nothing more than functions that map input values to output values. Both follow the so-called one -time assignment principle : a variable cannot be assigned a value several times in the same function, which does justice to the actual mathematical meaning of a variable.

Data flow graphs

Programs that are written in a data flow programming language can be modeled with the help of a so-called data flow graph or, with the help of a compiler , translated into machine instructions that describe such a graph . Data flow graphs usually describe concurrent processes that can be calculated using multiple threads by a data flow computer .

A data flow graph is a directed graph whose nodes represent instructions; the edges between the nodes describe the underlying data dependencies; Input values for the instructions are propagated along the edges in the form of data packets, called tokens. Two central characteristics of a data flow graph are functionality , that is, the evaluation of the graph is equivalent to the evaluation of a mathematical function, and the compositional property , that is graphs, can be combined as desired and thus form a new graph.

The processing of the data flow graph is controlled by means of the directed edges of data dependencies . If there are enough tokens on the input edges of a node, it can fire , that is, some of the tokens on the input edges are consumed and new tokens are produced on the output edges, which can enable subsequent nodes to fire.

Simple data flow graph

The data flow graph on the right corresponds to the program:

 Input: u, v, w;

   x = u - (v + w);
   y = u * (v + w);

 Output: x, y;

This is not a sequential program. When processing the graph, the tokens marked with values wander through the graph; the function of the node DUP is to duplicate the token on the input edge on both output edges. Initially, the nodes ADD and DUP could fire, either one node at a time or both at the same time; the firing takes place here in the form of an asynchronous concurrency . When both nodes have fired, their two successor nodes MUL and SUB are able to fire. The order of execution may influence the runtime during processing, but has no influence on the result. The processing of the program is therefore deterministic .

Conceptually different data flow graphs are conceivable. Such graphs differ in their behavior and their expressiveness. In the following some architectural variations of the data flow computers are shown, which reflect the temporal course of their development and which are partly based on different types of a data flow graph.

Static data flow computers

The architecture that processes graphs with one node per edge, a static architecture, was essentially developed by Jack Dennis . The main advantage of this model is the fact that it is quite easy to identify nodes that are able to fire. An undesirable effect of this model is that successive iterations of a loop can only partially overlap, thus reducing the degree of concurrency that can be achieved. Another disadvantage of the model is the lack of supporting programming constructs in modern programming languages. Nevertheless, some computers of this architecture were built, which formed the basis for subsequent generations of data flow computers; Some examples are listed below:

MIT Static Dataflow Architecture ( Massachusetts Institute of Technology )
NEC Image Pipelined Processor ( NEC Electronics )
Hughes Dataflow Multiprocessor ( Hughes Aircraft )

Dynamic data flow computers

The performance of a data flow computer can be significantly increased if loop runs and subroutine calls can be processed in parallel. To achieve this, every loop pass and every subroutine call should be able to execute a separate entry-invariant instance of the corresponding subgraph. Since this replication of such a subgraph is very complex in practice, only one instance of the graph is actually kept in memory; Therefore, each token has a day provided that the process context identified. The rule for firing is changed for the nodes so that the node fires precisely when a sufficient number of tokens with identical tags are available on the input edges. Data flow architectures that implement this method are also called dynamic data flow architectures. The first experiments with the dynamic data flow principle were undertaken in the late 1970s.

The advantage of this architecture is an increased performance due to the fact that several tokens can now be propagated on the edges. One problem with this architecture is efficiently synchronizing two operations. This raises the so-called token matching problem , in which it is necessary to determine whether two tokens have the same tag, i.e. whether they belong to the same process context. This comparison operation requires an associative memory access. Because associative memory was very expensive, a hardware-implemented hash was typically used, which was found to be ineffective in many ways. Some examples of implementations of this architecture are listed below:

Manchester Dataflow Computer ( University of Manchester )
MIT Tagged-Token Dataflow Machine ( Massachusetts Institute of Technology )
PATTSY - Processor Array Tagged-Token System ( University of Queensland )

Dynamic data flow computers with explicit token storage

In order to solve the token matching problem and not to need expensive associative memory, the concept of explicit token memory was developed. The basic idea here is to dynamically provide a so-called “activation frame” in the token memory during loop iteration, which provides a storage location for the token that arrives first at the comparison unit. The address of the memory location can be calculated by the compiler. If the second token arrives at the comparison unit, the first token is read from the memory again and the comparison is carried out, so that with this technology only two memory accesses are necessary instead of having to search through part of the token memory for a suitable token, as before. However, the number of loop and subroutine instances that can be executed simultaneously is limited by this procedure. Experiments with this architecture were made in the late 1980s; an example of an implementation is:

Monsoon ( Massachusetts Institute of Technology in collaboration with Motorola )

Advantages and disadvantages of data flow computers

advantages

A data flow computer can execute many threads on one or more processors; its performance increases almost linearly with the number of its processors
Programmers don't have to worry about explicit concurrency when programming; Data flow computers use implicit concurrency that is discovered by the compiler at compile time
For a long time, data flow computers were superior to those of the Von Neumann architecture in terms of memory latency and synchronization

disadvantage

No use of registers - instead result forwarding : Because the time sequence in which the individual commands of the data flow program are executed is not defined in advance, registers cannot be used as intermediate storage for operands.

Bad utilization of caches : The reference locality used in the Von Neumann architecture when accessing data and instruction caches , which is very pronounced due to the sequential processing model, is hardly present in the data flow architecture, but it has a decisive influence on the performance of a cache. High-speed processors require caches in order to be able to utilize their execution units.

Performance degradation due to duplication operations

Poor performance when processing a single thread : When processing a single thread, tuples of operations have to wait until the previous operation has finished. A data flow computer has its strength in executing many threads in order to be able to fill and utilize its pipelines with many independent operations

Relatively wide machine instructions: The width of the machine instructions of the Monsoon data flow computer was 144 bits, for example

Overhead due to the token matching problem

Motivation for multithreading

Data flow computers have poor performance when they run a single sequential thread . The respective results of an operation tuple are calculated in the last stage of the pipeline and data conflicts would result in only one instruction being processed in the pipeline at a time, so that the pipeline is obviously of no use. However, data flow programs generally consist of many concurrent threads, so that it is possible to continuously fill the pipeline with data-independent instructions that belong to different process contexts, so that no pipeline conflicts arise and the pipeline can be fully utilized. In particular, it is thus also possible to bridge long latency times which, for example, cause loading and storage operations; In the meantime, data-independent instructions from other threads are simply fed into the pipeline. The multithreading of today's processors, or Hyper-Threading in Intel's Pentium, is based on the same principle.

Hybrid architectures

Data flow computers were mainly built in the 1980s, but they did not prove to be competitive with Von Neumann supercomputers due to the bottleneck in memory access. The disadvantages that the data flow architecture brings with it led to the development of hybrid architectures that use the advantages of both the data flow architecture and the Von Neumann architecture. Both architectures should not be seen as orthogonal computer paradigms . The spectrum of such hybrids is relatively broad; it ranges from a simple expansion of the Von Neumann processor with a few additional instructions to specialized data flow systems that use a variety of techniques that were developed for Von Neumann computers.

Some important concepts of data flow computers “survived” due to this “convergence” and can be found in the vast majority of today's processors, such as the non-sequential execution of machine instructions in the dynamic scheduling of superscalar processors and multithreading , according to the data flow principle .

literature

Arvind and R. Nikhil: Executing a program on the MIT tagged-token dataflow architecture. In: IEEE Transaction on computers. 1990 ( cs.wisc.edu PDF; 1.91 MB).
G. Papadopoulos, D. Culler: Monsoon: an explicit token-store architecture. In: International Symposium on Computer Architecture (ISCA). 1990 (Describes the architecture of the Monsoon data flow calculator).
J. Silic, B. Robic, T. Ungerer: Asynchrony in parallel computing: From data flow to multithreading. In: Technical report CSD, Computer Systems Department. Josef Stefan Institute, Ljubljana, Slovenia, 1997 (lists all important data flow computers, describes their historical development and explains why today's multithreaded processors are descendants of data flow computers).
James Rumbaugh : A Parallel Asynchronous Computer Architecture For Data Flow Programs. MIT-LCS-TR-150, 1975 (dissertation).

Individual evidence

↑ ^a ^b ^c ^d ^e J. Silic, B. Robic and T. Ungerer: Asynchrony in parallel computing: From data flow to multithreading . Technical report CSD, Computer Systems Department, Josef Stefan Institute, Ljubljana, Slovenia, 1997
↑ ^a ^b ^c ^d Theo Ungerer: data flow computer. Teubner-Verlag 1993
↑ Dennis, Misunas A preliminary architecture for a basic data flow processor , the second Annual Symposium on Computer Architecture, Houston, January 1975
^ Edwin D. Reilly Milestones in Computer Science and Information Technology , Greenwood Press 2003, article Dataflow Machine

[Nr1-1] J. Silic, B. Robic and T. Ungerer: Asynchrony in parallel computing: From data flow to multithreading . Technical report CSD, Computer Systems Department, Josef Stefan Institute, Ljubljana, Slovenia, 1997

[Nr2-2] Theo Ungerer: data flow computer. Teubner-Verlag 1993

[3] Dennis, Misunas A preliminary architecture for a basic data flow processor , the second Annual Symposium on Computer Architecture, Houston, January 1975

[4] Edwin D. Reilly Milestones in Computer Science and Information Technology , Greenwood Press 2003, article Dataflow Machine