Tomasulo algorithm
The Tomasulo algorithm is an algorithm for implementing dynamic scheduling in processors . It was developed by Robert Tomasulo at IBM - originally for the floating point unit of the 360/91 .
classification
In order to increase the speed with which a processor executes instructions to be executed at a constant clock frequency , the execution of instructions is divided into several steps. As soon as an instruction has passed through a level, the next instruction can already enter this level, so that the processor always works on several instructions at the same time. This process is called pipelining , and the stations through which the commands pass are called stages.
If parts of the pipeline or the entire pipeline occur several times, one speaks of superscalarity . Since several instructions are in the pipeline at the same time, problems can arise due to dependencies between the instructions to be executed. A naive solution is to wait before processing the next commands. A smarter way to get around this is dynamic scheduling . The Tomasulo algorithm represents a concrete implementation of this method. B. scoreboarding .
strategy
The goal of the Tomasulo algorithm is to continue executing commands even when there are data dependencies . On the one hand, it handles read-after-write hazards (RAW) in that the processor keeps track of when an operand is available. On the other hand, it prevents write-after-write (WAW) and write-after-read hazards (WAR) by temporarily storing relevant register contents when decoding a command in so-called reservation stations and thus protecting them from premature overwriting.
Processor structure
A processor that implements the Tomasulo algorithm contains the following components, among others:
- Functional Units (FU): The functional units are processor components that perform logical / arithmetic calculations. There are usually several of these; they differ in the type of operations they can perform (floating point, integer, load / store etc.). In modern processors, five is a typical number for the number of drives.
- Reservation Stations (RS): Two to eight reservation stations are assigned to each FU. These implement register renaming and are treated like temporary registers . A reservation station contains the opcode of the operation to be carried out , two fields for the values of the operands and two fields for the addresses of the RSs that calculate the operands if they are not yet available at the current time or are not yet valid.
- Register set : The register set contains a field for the address of an RS in addition to a field for the stored value for each register. An RS is entered here if it is still calculating the value of the register.
- Common Data Bus : All FUs and RSs are interconnected by a common result bus. After a calculation has been completed, an FI places the address of the processed RS and the result on the bus. The RSs monitor the bus in order to directly accept the value of operands that are still required.
functionality
Each command to be executed passes through three stations.
- Issue : The command at the current position in the Operation Queue is decoded and entered in a suitable reservation station according to the operation to be carried out. Operands are taken directly from the register file if they are valid. This process is known as register renaming . If an operand is not yet available, the address of the RS that is currently calculating the value is entered instead. If no suitable RS is free, the command remains in the operation queue and the assignment is tried again in the next cycle.
- Execute : As soon as all operands are available in the reservation station, the operation is passed on to the FI and executed. Otherwise, the Common Data Bus is monitored for incoming values and missing operands are accepted if the address of the source RS matches the required address.
- Write result : As soon as the result of the operation has been calculated, it is placed on the common data bus together with the address of the RS executed and thus visible to the RS, which are waiting for the result.
Other features
In addition to the above logic, the Tomasulo algorithm recognizes overlapping write commands on one and the same register and only executes the last one to update the register.
Individual evidence
- ^ John Hennessy, David Patterson: Computer Architecture. A Quantitative Approach., 4th Edition, Morgan Kaufmann Publishers, ISBN 978-0-12-370490-0 (English), p. 92
Web links
- Original article: An Efficient Algorithm for Exploiting Multiple Arithmetic Units
- Introduction to computer architecture by Holger Kreißl: super scalarity
- WebHASE: Tomasulo's Algorithm: HASE Java applet simulation of the Tomasulo's Algorithm , Institute for Computing Systems Architecture, Edinburgh University.
- Dynamic Scheduling - Tomasulo