Simultaneous multithreading
The term Simultaneous Multithreading ( SMT for short ; German for "simultaneous multi-thread operation") describes the ability of a microprocessor to execute several threads simultaneously using separate pipelines and / or additional register sets. With this, SMT represents a form of hardware-based multithreading .
The currently best-known form of SMT is Intel's Hyper-Threading Technology (HTT) for Pentium 4 , Xeon , Atom and Core i and newer, but processors from other manufacturers also have SMT, e.g. B. Cell , Power from POWER5 and POWER6 from IBM and the processor series from AMD from Zen architecture, Ryzen and EPYC.
SMT was developed in the 1990s by Hank Levy and Susan Eggers , among others . Eggers received the Eckert-Mauchly Award for this in 2018 . In the IEEE Computer Society's appreciation for Eggers, SMT was identified as the most significant contribution to computer architecture in the last 30 years.
functionality
The aim of SMT is to utilize the resources of a processor that are already redundant due to the pipeline architecture even better than is possible with the pipeline architecture. The pipeline architecture only processes commands within a thread. This means that it can only parallelize commands that are independent of one another within a thread.
Example of dual SMT
The following pipeline stages and 2 threads are given (IF = Instruction Fetch, ID = Instruction Decoding, OF = Operand Fetch, EX = Execution, WB = Write Back):
IF | ID | OF | EX | WB |
LD R6,adr4
ADD R4,R6,1
BEQ R4,R6,j1
BR j2
j1: ADD R4,R4,1
j2: ST R4,adr6
|
LD R1,adr0
OR R1,R1,0xF0
LD R2,adr1
ADD R3,R1,R2
ST R3,adr2
|
Out-of-Order-Issue and Out-of-Order-Completion 2 commands per cycle If there is a data dependency, the first EX phase of the dependent command coincides with the WB phase of the previous command.
2 integer units EX1, EX2
1 jump unit EX1, EX2
1 store unit EX1, EX2
1 load unit EX1, EX2, EX3 EX4
In the case of unconditional jumps, the target command (IF) can be fetched after the ID phase. In the event of a jump, conditional jumps must process the EX2 phase before the target instruction can be loaded. If the jump has not been carried out, the command can be resumed parallel to the EX2 phase.
Bars | 1 | 2 | 3 | 4th | 5 | 6th | 7th | 8th | 9 | 10 | 11 | 12 | 13 | 14th | 15th | 16 | 17th | 18th | |||
LD R6, adr4 | IF | ID | OF | EX1 | EX2 | EX3 | EX4 | EX5 | WB | ||||||||||||
LD R1, adr0 | IF | ID | OF | EX1 | EX2 | EX3 | EX4 | EX5 | WB | ||||||||||||
LD R2, adr1 | IF | ID | OF | EX1 | EX2 | EX3 | EX4 | EX5 | WB | ||||||||||||
ADD R4, R6.1 | IF | ID | OF | EX1 | EX2 | WB | |||||||||||||||
OR R1, R1,0xF0 | IF | ID | OF | EX1 | EX2 | WB | |||||||||||||||
BEQ R4, R6, m1 | IF | ID | OF | EX1 | EX2 | WB | |||||||||||||||
ADD R3, R1, R2 | IF | ID | OF | EX1 | EX2 | WB | |||||||||||||||
BR m2 | IF | ID | OF | EX1 | EX2 | WB | |||||||||||||||
ST R3, adr2 | IF | ID | OF | EX1 | EX2 | WB | |||||||||||||||
j1: | ADD R4, R4.1 | IF | ID | OF | EX1 | EX2 | WB | No jump to j1 | |||||||||||||
j2: | ST R4, adr6 | IF | ID | OF | EX1 | EX2 | WB |
application areas
Simultaneous multithreading is an inexpensive, albeit much lower-performance, alternative to multicore processors . However, the performance of an SMT processor can only be used effectively if several tasks to be processed in parallel are to be performed by the operating system , the programmer or the compiler have also been designed in such a way that they can largely be carried out in parallel. In many modern applications this has been the case for several years.
Demarcation
Simultaneous multithreading should therefore be located between pipeline architecture and multi-core architecture.
Differentiation from pipeline architecture / superscalarity
SMT differs from the pipeline architecture in that it allows multiple threads to run at the same time. Not only are the processor's data processing units such as ALU and FPU replicated, but also the register set and instruction decoding . In relation to the system, an SMT CPU usually appears like several independent processors.
The pipeline architecture executes instructions of the same program in parallel if possible. If it is not possible due to dependencies, they are executed sequentially. SMT executes the commands of two or more threads (from one or more programs) in parallel, if possible. If not, they take turns. (This can be referred to as " multi-threaded superscalarity ".)
Both concepts try to utilize the various units of a CPU better by parallelizing the command processing and thus process programs faster without increasing the clock frequency or the number of command-executing units, whereby the degree of parallelization with SMT is higher or equal, but never lower than the pipeline architecture is.
Differentiation from multi-core architecture
SMT differs from multicore architecture in that the processors of an SMT CPU reported to the system are not independent processors. With SMT, the virtual processors share access to the same data processing units ( ALU / FPU ), while in the multi-core processor each core has its own data processing unit.
Both a two-thread SMT processor and a dual-core processor appear to the system as two processors. However, a dual core processor is actually two independent and correspondingly fast processors, while SMT is a processor with two or more hardware threads.
Processors with SMT
-
Intel x86
- Intel Pentium 4 ( Hyper-Threading )
- Intel Xeon ( Hyper-Threading )
- Intel Atom ( Hyper-Threading )
- from the Nehalem micro-architecture : Intel Core i series
- Intel Core i3 (except 8-Gen)
- Intel Core i5 (except quad-core models of the i5)
- Intel Core i7
- Intel Core i9
- Power architecture , IBM and partners
-
SPARC architecture , originally Sun
- Sun UltraSPARC T1
- Sun UltraSPARC T2
- Sun rock
- AMD x86 
- XMOS
See also
- hardware multithreading
- Hyper-threading
- Concurrency
- Parallelization
- Pipeline (processor)
- Multicore processor
Individual evidence
- ^ Susan Eggers First Woman to Receive Highly Prestigious Computer Architecture Award , PR Newswire, June 5, 2018