Von Neumann architecture

from Wikipedia, the free encyclopedia
Model calculator (1958) with Von Neumann architecture in the Dresden Technical Collections

The Von Neumann Architecture ( VNA ) is a reference model for computers , according to which a shared memory holds both computer program instructions and data . Von Neumann systems belong to the class of SISD architectures ( single instruction, single data ) according to the Flynn classification , in contrast to parallel processing .

The Von Neumann architecture forms the basis for the operation of most computers known today . It is named after the Austro-Hungarian mathematician John von Neumann , who later worked in the USA , whose main work on the subject was published in 1945. It is also sometimes called Princeton architecture (after Princeton University ).

A competitive architecture that is often presented in teaching is the Harvard architecture .

development

Von Neumann described the concept in 1945 in the initially unpublished paper " First Draft of a Report on the EDVAC " as part of the construction of the EDVAC computing machine. It was revolutionary at the time, because previously developed computers were tied to a fixed program that was either interconnected in terms of hardware or had to be read in via punched cards . With the Von Neumann architecture it was now possible to make changes to programs very quickly and without changes to the hardware, or to run various programs in quick succession.

Many of the von Neumann architecture ideas had already been worked out by Konrad Zuse in 1936 , documented in two patents in 1937 and, for the most part, implemented mechanically in the Z1 machine in 1938 . 1941 Konrad Zuse built in collaboration with Helmut Schreyer with the Zuse Z3 the first working digital computer in the world. However, it is unlikely that von Neumann knew Zuse's work when he presented his architecture in 1945.

Most of the computers in use today are based on the basic principle of the Von Neumann architecture; H. their properties correspond to those of a VNA. However, this typically no longer means that they are internally structured like a simple VNA with the few VNA function groups. In the course of time, many of the originally conceived as simple VNA computer architectures, e. B. the x86 architecture , beyond it, differentiated and developed far more complex. This was done in order to achieve performance gains without, however, breaking with the easily manageable VNA model; H. to remain compatible with this from a software point of view in order to continue to use its advantages.

With the trend towards the growing number of parallel processing units ( multicore ) and buses (e.g. HyperTransport ), this compatibility is becoming more and more complex and difficult to implement. It is therefore to be expected that in the foreseeable future a paradigm shift to another, parallel architecture model will be necessary in order to be able to achieve increases in performance in computer architectures. The first harbingers are, for example, the emergence of NUMA computing, in which memory is no longer viewed as having “uniform” properties.

concept

The Von Neumann architecture is a circuit concept for the implementation of universal computers ( Von Neumann computers , VNR ). It implements all the components of a Turing machine . However, their systematic division into the corresponding function groups enables the use of specialized binary switching mechanisms and thus a more efficient structuring of the operations.

In principle, everything that can be calculated with a Turing machine can also be calculated on a machine with Von Neumann architecture and vice versa. The same applies to all high-level programming languages that are mapped onto the binary representation by a compiler or interpreter . Although they simplify the handling of the operations, they do not offer any extension of the semantics specified by the Turing machine. This becomes clear from the fact that the translation from a high-level programming language into the binary representation is in turn carried out by a binary program without user interaction.

Components

Components of a Von Neumann computer
Schematic structure of a Von Neumann computer with the associated bus system

A Von Neumann computer is based on the following components that are still used in computers today:

  • ALU ( Arithmetic Logic Unit ) - arithmetic unit , rarely also called a central unit or processor, performs arithmetic operations and logical operations. (The terms central processing unit and processor are generally used with different meanings.)
  • Control Unit - control unit or control unit, interprets the instructions of a program and accordingly interconnects the data source, sink and necessary ALU components; the control unit also regulates the command sequence.
  • BUS - bus system , used for communication between the individual components (control bus, address bus, data bus)
  • Memory - (RAM / working memory) storage unit , stores both programs and data that are accessible to the arithmetic unit.
  • I / O Unit - input / output unit , controls the input and output of data to the user (keyboard, screen) or to other systems (interfaces).

Program Sequence

These components process program commands according to the following rules.

  • Principles of the saved program:
    • Commands are loaded and control signals are sent to other functional units
    • Commands are stored in a RAM memory with a linear (1-dimensional) address space .
    • An instruction address register, called an instruction counter or program counter , points to the next instruction to be executed.
    • Commands can be changed like data.
  • Principles of sequential program execution (see also Von Neumann cycle ):
    • Instructions are read from a cell of memory and then executed.
    • The content of the command counter is then normally increased by one.
    • There are one or more jump commands that change the content of the command counter by a value other than +1.
    • There are one or more branch instructions which, depending on the value of a decision bit, increase the instruction counter by one or execute a jump instruction.

properties

advantages

The strictly sequential sequence of a Von Neumann architecture is the decisive advantage over other, parallel architectures (e.g. computer network , Harvard architecture) and the reason for the unbroken popularity of this architecture. From the point of view of the programmer, a simple, deterministic program sequence is guaranteed, race conditions and data incoherence are excluded by the individual bus via which the CPU accesses data and the program.

Von Neumann bottleneck

The Von Neumann bottleneck of the Von Neumann architecture describes performance reductions of processors through competing data and command code accesses via a common bus. Going further, the Von Neumann bottleneck also describes the concept responsible for this issue of “only one thing at a time” (original: one-word-at-a-time thinking ), i.e. the explicit, forced sequentialism by the only one Bus through which all actions take place.

The term itself, "Von Neumann bottleneck" (Eng. Von Neumann bottleneck ), was coined by John W. Backus , who introduced it in 1977 in his lecture on the occasion of the Turing Awards :

“Surely there must be a less primitive way of making big changes in the store than by pushing vast numbers of words back and forth through the von Neumann bottleneck. Not only is this tube a literal bottleneck for the data traffic of a problem, but, more importantly, it is an intellectual bottleneck that has kept us tied to one word-at-a-time thinking instead of encouraging us to think in terms of the larger conceptual units of the task at hand. Thus programming is basically planning and detailing the enormous traffic of words through the von Neumann bottleneck, and much of that traffic concerns not significant data itself, but where to find it. "

“Surely it must be possible to make large changes to memory in a less primitive way than pushing huge amounts of data words back and forth through the Von Neumann bottleneck. This tube not only forms a literal bottleneck for the traffic of a problem, but more importantly, it is an intellectual bottleneck that has tied us to thinking “one word at a time” rather than encouraging us in the terms of think of larger conceptual units of the task at hand. Consequently, programming is essentially the planning and working out of the enormous traffic of data words through the Von Neumann bottleneck, and a large part of this traffic does not concern the significant data itself, but where it can be found. "

With the advent of separate caches for data and commands, the Von Neumann bottleneck has become an academic problem. In modern processors, the decoupling of memory and arithmetic units via several cache hierarchies has progressed so far that countless command decoders and arithmetic units share the main memory resource without major loss of performance.

By Neumann

The Von Neumann architecture allows a command code word to be read or a data word to be read or a data word to be written. Command code reading and data reading and writing compete.

          CPU-Kern 
             ^
             v
          RAM-Ctrl

Harvard

The standard Harvard architecture allows the simultaneous reading of an instruction code word and the reading or writing of a data word. This allows a certain parallelization of the command code processing. However, commands consisting of several words as well as read-modify-write access to data prevent commands from being processed within one memory cycle. Commands without data memory access are not accelerated compared to a Von Neumann architecture.

          CPU-Kern 
          ^      ^
          |      v
      RAM-Ctrl RAM-Ctrl

A classic standard Harvard architecture with a strict separation of command and data bus is unusual except in special cases. Only finished programs could be executed in non-volatile memory. Reloading programs, compiling and executing programs are not possible.

Super Harvard

The Super Harvard architectures are often found in DSPs that have two or four bus systems. Examples are Motorola 56001 and Texas Instruments TMS320 .

           CPU-Kern 
       ^      ^      ^
       |      v      v
 RAM-Ctrl RAM-Ctrl RAM-Ctrl

It is also usual to relax the separation of the bus systems. Each bus can deliver both code and data. Collisions reduce the performance. In addition to command processing by the CPU core, other memory accesses by DMA controllers and video controllers are common.

        CPU-Kern + Dma-Ctrl
       ^      ^      ^      ^
       v      v      v      v
RAM-Ctrl RAM-Ctrl RAM-Ctrl RAM-Ctrl

1993

          CPU Kern 
      zwei Rechenwerke
       ^          ^     
       |          v     
      L1I        L1D     
       |          |      
       +-----+----+      
      RAM-Controller                                  

1997

          CPU Kern 
    mehrere Rechenwerke
       ^         ^ |
       |         | v   
      L1I        L1D     
       |          |      
       +-----+----+ 
             L2
             |     
       RAM-Controller                                  

2008

     CPU Kern 1             CPU Kern 2             CPU Kern 3     ...     CPU Kern N
 mehrere Rechenwerke    mehrere Rechenwerke    mehrere Rechenwerke    mehrere Rechenwerke
  ^        ^ ^ |         ^        ^ ^ |         ^        ^ ^ |         ^        ^ ^ |
  |        | | v         |        | | v         |        | | v         |        | | v
 L1I        L1D         L1I        L1D         L1I        L1D         L1I        L1D 
  |          |           |          |           |          |           |          |  
  +----L2----+           +----L2----+           +----L2----+           +----L2----+  
       |                      |                      |                      |        
  +----L3---------------------L3---------------------L3---------------------L3-----+ 
  |                                                                                |
  +--------------------------------------+-----------------------------------------+
                                    RAM-Controller                                  

Dual socket server system

                                        Sockel 1                                                                                        Sockel 2
     CPU Kern 1             CPU Kern 2             CPU Kern 3     ...     CPU Kern N                 CPU Kern 1             CPU Kern 2             CPU Kern 3     ...     CPU Kern N       
 mehrere Rechenwerke    mehrere Rechenwerke    mehrere Rechenwerke    mehrere Rechenwerke         mehrere Rechenwerke    mehrere Rechenwerke    mehrere Rechenwerke    mehrere Rechenwerke       
  ^        ^ ^ |         ^        ^ ^ |         ^        ^ ^ |         ^        ^ ^ |             ^        ^ ^ |         ^        ^ ^ |         ^        ^ ^ |         ^        ^ ^ |          
  |        | | v         |        | | v         |        | | v         |        | | v             |        | | v         |        | | v         |        | | v         |        | | v          
 L1I        L1D         L1I        L1D         L1I        L1D         L1I        L1D             L1I        L1D         L1I        L1D         L1I        L1D         L1I        L1D           
  |          |           |          |           |          |           |          |               |          |           |          |           |          |           |          |        
  +----L2----+           +----L2----+           +----L2----+           +----L2----+               +----L2----+           +----L2----+           +----L2----+           +----L2----+           
       |                      |                      |                      |                          |                      |                      |                      |
  +----L3---------------------L3---------------------L3---------------------L3-----+              +----L3---------------------L3---------------------L3---------------------L3-----+
  |                                                                                +--------------+                                                                                |
  +--------------------------------------+-----------------------------------------+              +---------------------------------------+----------------------------------------+
                                    RAM-Controller                                                                                   RAM-Controller

Memory wall

Since in a Von Neumann architecture, in contrast to the Harvard architecture, only one common bus is used for data and commands, these must share the maximum amount of data that can be transferred. In early computers , the CPU was the slowest unit in the computer; In other words, the data provision time was only a small proportion of the total processing time for an arithmetic operation. For some time, however, the CPU processing speed has grown significantly faster than the data transfer rates of the buses or the memory, which exacerbates the influence of the Von Neumann bottleneck. The term “memory wall” describes this growing imbalance between the speed of the CPU and the memory outside the CPU chip.

From 1986 to 2000, CPU speeds grew 55% annually, while memory transfer speeds only increased 10%. Following this trend, memory latency has become the bottleneck in computing power. As a first measure, data registers were introduced early on. Today, a three-level cache takes up about half of the chip area in high-performance processors and executes the vast majority of load and write commands without the main memory initially being involved.

Compared to Harvard architecture

One of the most important competing architectures is the Harvard architecture with a physical separation of instruction and data memories, which are accessed via separate buses, i.e. independently and in parallel. The advantage of this architecture is that commands and data can be loaded or written at the same time, so the Von Neumann bottleneck can be avoided.

The physical separation of data and program ensures that a separation of access rights and memory protection can be easily implemented. If a memory that can only be read during operation was used for the program code, overwriting even by malicious code is impossible. However, it is disadvantageous that data memory that is not required cannot be used as program memory (and vice versa), so that increased memory fragmentation occurs.

See also

  • Johnny simulator , a software implementation of a Von Neumann computer with ten predefined commands

Web links

Footnotes

  1. ^ John von Neumann: First Draft of a Report on the EDVAC. In: IEEE Annals of the History of Computing. Vol. 15, Issue 4, 1993, doi: 10.1109 / 85.238389 , pp. 27-75 ( PDF, 9.556 MB )
  2. ^ John Backus: Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs. In: Communications of the ACM. Vol. 21, No. 8, August 1978, p. 615 ( PDF, 3.0 MB )
  3. ^ William A. Wulf, Sally A. McKee: Hitting the Memory Wall: Implications of the Obvious. In: Computer Architecture News. Vol. 23, Issue 1, March 1995, doi: 10.1145 / 216585.216588 , pp. 20–24 ( PDF, 20 KB )