Simultaneous multithreading

from Wikipedia, the free encyclopedia

The term Simultaneous Multithreading ( SMT for short ; German for "simultaneous multi-thread operation") describes the ability of a microprocessor to execute several threads simultaneously using separate pipelines and / or additional register sets. With this, SMT represents a form of hardware-based multithreading .

The currently best-known form of SMT is Intel's Hyper-Threading Technology (HTT) for Pentium 4 , Xeon , Atom and Core i and newer, but processors from other manufacturers also have SMT, e.g. B. Cell , Power from POWER5 and POWER6 from IBM and the processor series from AMD from Zen architecture, Ryzen and EPYC.

SMT was developed in the 1990s by Hank Levy and Susan Eggers , among others . Eggers received the Eckert-Mauchly Award for this in 2018 . In the IEEE Computer Society's appreciation for Eggers, SMT was identified as the most significant contribution to computer architecture in the last 30 years.

functionality

The aim of SMT is to utilize the resources of a processor that are already redundant due to the pipeline architecture even better than is possible with the pipeline architecture. The pipeline architecture only processes commands within a thread. This means that it can only parallelize commands that are independent of one another within a thread.

Example of dual SMT

The following pipeline stages and 2 threads are given (IF = Instruction Fetch, ID = Instruction Decoding, OF = Operand Fetch, EX = Execution, WB = Write Back):

pipeline
IF ID OF EX WB
Thread 1
        LD R6,adr4
        ADD R4,R6,1
        BEQ R4,R6,j1
        BR j2
j1:   ADD R4,R4,1
j2:   ST R4,adr6
Thread 2
LD R1,adr0
OR R1,R1,0xF0
LD R2,adr1
ADD R3,R1,R2
ST R3,adr2

Out-of-Order-Issue and Out-of-Order-Completion 2 commands per cycle If there is a data dependency, the first EX phase of the dependent command coincides with the WB phase of the previous command.

2 integer units EX1, EX2
1 jump unit EX1, EX2
1 store unit EX1, EX2
1 load unit EX1, EX2, EX3 EX4

In the case of unconditional jumps, the target command (IF) can be fetched after the ID phase. In the event of a jump, conditional jumps must process the EX2 phase before the target instruction can be loaded. If the jump has not been carried out, the command can be resumed parallel to the EX2 phase.

Bars 1 2 3 4th 5 6th 7th 8th 9 10 11 12 13 14th 15th 16 17th 18th
LD R6, adr4 IF ID OF EX1 EX2 EX3 EX4 EX5 WB
LD R1, adr0 IF ID OF EX1 EX2 EX3 EX4 EX5 WB
LD R2, adr1 IF ID OF EX1 EX2 EX3 EX4 EX5 WB
ADD R4, R6.1 IF ID OF EX1 EX2 WB
OR R1, R1,0xF0 IF ID OF EX1 EX2 WB
BEQ R4, R6, m1 IF ID OF EX1 EX2 WB
ADD R3, R1, R2 IF ID OF EX1 EX2 WB
BR m2 IF ID OF EX1 EX2 WB
ST R3, adr2 IF ID OF EX1 EX2 WB
j1: ADD R4, R4.1 IF ID OF EX1 EX2 WB No jump to j1
j2: ST R4, adr6 IF ID OF EX1 EX2 WB

application areas

Simultaneous multithreading is an inexpensive, albeit much lower-performance, alternative to multicore processors . However, the performance of an SMT processor can only be used effectively if several tasks to be processed in parallel are to be performed by the operating system , the programmer or the compiler have also been designed in such a way that they can largely be carried out in parallel. In many modern applications this has been the case for several years.

Demarcation

Simultaneous multithreading should therefore be located between pipeline architecture and multi-core architecture.

Differentiation from pipeline architecture / superscalarity

SMT differs from the pipeline architecture in that it allows multiple threads to run at the same time. Not only are the processor's data processing units such as ALU and FPU replicated, but also the register set and instruction decoding . In relation to the system, an SMT CPU usually appears like several independent processors.

The pipeline architecture executes instructions of the same program in parallel if possible. If it is not possible due to dependencies, they are executed sequentially. SMT executes the commands of two or more threads (from one or more programs) in parallel, if possible. If not, they take turns. (This can be referred to as " multi-threaded superscalarity ".)

Both concepts try to utilize the various units of a CPU better by parallelizing the command processing and thus process programs faster without increasing the clock frequency or the number of command-executing units, whereby the degree of parallelization with SMT is higher or equal, but never lower than the pipeline architecture is.

Differentiation from multi-core architecture

SMT differs from multicore architecture in that the processors of an SMT CPU reported to the system are not independent processors. With SMT, the virtual processors share access to the same data processing units ( ALU / FPU ), while in the multi-core processor each core has its own data processing unit.

Both a two-thread SMT processor and a dual-core processor appear to the system as two processors. However, a dual core processor is actually two independent and correspondingly fast processors, while SMT is a processor with two or more hardware threads.

Processors with SMT

See also

Individual evidence

  1. ^ Susan Eggers First Woman to Receive Highly Prestigious Computer Architecture Award , PR Newswire, June 5, 2018