Tomasulo algorithm

from Wikipedia, the free encyclopedia

The Tomasulo algorithm is an algorithm for implementing dynamic scheduling in processors . It was developed by Robert Tomasulo at IBM - originally for the floating point unit of the 360/91 .

classification

In order to increase the speed with which a processor executes instructions to be executed at a constant clock frequency , the execution of instructions is divided into several steps. As soon as an instruction has passed through a level, the next instruction can already enter this level, so that the processor always works on several instructions at the same time. This process is called pipelining , and the stations through which the commands pass are called stages.

If parts of the pipeline or the entire pipeline occur several times, one speaks of superscalarity . Since several instructions are in the pipeline at the same time, problems can arise due to dependencies between the instructions to be executed. A naive solution is to wait before processing the next commands. A smarter way to get around this is dynamic scheduling . The Tomasulo algorithm represents a concrete implementation of this method. B. scoreboarding .

strategy

The goal of the Tomasulo algorithm is to continue executing commands even when there are data dependencies . On the one hand, it handles read-after-write hazards (RAW) in that the processor keeps track of when an operand is available. On the other hand, it prevents write-after-write (WAW) and write-after-read hazards (WAR) by temporarily storing relevant register contents when decoding a command in so-called reservation stations and thus protecting them from premature overwriting.

Processor structure

Tomasulo's floating point unit

A processor that implements the Tomasulo algorithm contains the following components, among others:

  • Functional Units (FU): The functional units are processor components that perform logical / arithmetic calculations. There are usually several of these; they differ in the type of operations they can perform (floating point, integer, load / store etc.). In modern processors, five is a typical number for the number of drives.
  • Reservation Stations (RS): Two to eight reservation stations are assigned to each FU. These implement register renaming and are treated like temporary registers . A reservation station contains the opcode of the operation to be carried out , two fields for the values ​​of the operands and two fields for the addresses of the RSs that calculate the operands if they are not yet available at the current time or are not yet valid.
  • Register set : The register set contains a field for the address of an RS in addition to a field for the stored value for each register. An RS is entered here if it is still calculating the value of the register.
  • Common Data Bus : All FUs and RSs are interconnected by a common result bus. After a calculation has been completed, an FI places the address of the processed RS and the result on the bus. The RSs monitor the bus in order to directly accept the value of operands that are still required.

functionality

Each command to be executed passes through three stations.

  1. Issue : The command at the current position in the Operation Queue is decoded and entered in a suitable reservation station according to the operation to be carried out. Operands are taken directly from the register file if they are valid. This process is known as register renaming . If an operand is not yet available, the address of the RS that is currently calculating the value is entered instead. If no suitable RS is free, the command remains in the operation queue and the assignment is tried again in the next cycle.
  2. Execute : As soon as all operands are available in the reservation station, the operation is passed on to the FI and executed. Otherwise, the Common Data Bus is monitored for incoming values ​​and missing operands are accepted if the address of the source RS matches the required address.
  3. Write result : As soon as the result of the operation has been calculated, it is placed on the common data bus together with the address of the RS executed and thus visible to the RS, which are waiting for the result.

Other features

In addition to the above logic, the Tomasulo algorithm recognizes overlapping write commands on one and the same register and only executes the last one to update the register.

Individual evidence

  1. ^ John Hennessy, David Patterson: Computer Architecture. A Quantitative Approach., 4th Edition, Morgan Kaufmann Publishers, ISBN 978-0-12-370490-0 (English), p. 92

Web links