Processors with hardware-based multithreading (also called multi-threaded processors ) can run several processes in parallel on each processor core , i.e. H. execute several programs or program parts at the same time. The benefit consists in a better utilization of the arithmetic units of each CPU and in a faster response of the overall system to external events, since more tasks can be processed in parallel.
The main expense of hardware-side multithreading results from the provision of several register files. In the first Intel Xeons with HT, this made up about 5 percent of the processor area.
In contrast to execution on several processors (in which processes generally "only" have to share the main memory, I / O and L3 cache, possibly also power consumption and power loss), processes with hardware-side multithreading have to share considerably more resources share, so u. a. L1 cache, L2 cache, µOp cache, branch prediction, instruction fetcher, instruction decoder, physical register file, reservation station, load and store buffer, TLBs, arithmetic logic units, floating point units. Since the computing units of a processor core are seldom fully utilized, hardware-based multithreading results in computing power that is up to 30 percent higher than that of simple processors, according to Intel. In practice, the value is rarely more than 15 percent. With memory-intensive applications, cache thrashing can also result in a negative gain.
There is no change between the threads, the commands of the threads are executed simultaneously. This means that there are no dead times (10-30 ms) due to the context change of the operating system. For this purpose, each processor core has two architecture states including register set , stack memory , program counter and MMU .
Processors that simultaneously process a large number of threads per core are called barrel processors . Examples are the XMOS XCore XS1-L1 with 8 threads / core, the CDC 6000 with 10 threads / core and the Cray Tera MTA with 128 threads / core.
Take advantage of multithreading
The benefit of multithreading is a better utilization of the resources of the CPU.
In modern processors, which all have a pipeline, attempts are already being made to increase the utilization through out-of-order execution , for example , but studies have shown that many parts of the pipeline can also be utilized even better through, for example, simultaneous multithreading. One reason for this is pipeline hazards , which can halt the pipeline for a short time.
Processors capable of multithreading therefore process several threads quasi-simultaneously. This can be done in different ways:
- Simple multithreading: It is switched between the different threads per cycle.
- Simultaneous multithreading ( SMT ): Areas of the register file are reserved for a small processor context on which a thread is executed. Threads executed in parallel thus use one and the same arithmetic unit. This results in better utilization, especially with superscalar computers.
- Core MultiThreading (CMT): The processor has several arithmetic and logic units (ALU) that share a floating point unit (FPU). This means that several threads can be executed simultaneously
Multithreaded processors for embedded applications
In applications in the area of embedded systems , multi-thread processors offer additional possibilities in addition to the pure increase in performance through the explicit use of multi-threading. Such processors offer programmable algorithms for controlling the individual program threads (called context here). For example, a context can deterministically occupy a certain proportion of the clocks and thus the processor performance. The contexts can also compete for computing time through a priority control.
They can be put into a waiting state by appropriate commands and woken up by hardware events. This enables the system to react very quickly, since, in contrast to classic hardware interrupts, no overhead is required when changing context.
Contexts that only occupy a small part of the clock cycles can run permanently cyclically without noticeably influencing the performance of the system. B. can be used to generate or decrypt faster signals.
Thus, by using processors capable of multithreading, it is possible to dispense with dedicated hardware or additional processors or digital signal processors with little effort in the processor chip .
Since significantly more than two threads make sense in these applications, multiple pipelines are not implemented in multithreaded, embedded processors, but rather the first stage of the pipeline decides on the context to be executed next.
Examples of multi-threaded processors for embedded application:
- Ubicom 3K (8 threads)
- Ubicom 5K family (10 threads)
- MIPS-34K family (5 threads)
- Innovasic fido 1100 (one main processor with 5 threads, 4 additional I / O processors)
- XMOS (up to 4 cores with 8 threads each)
- Parallel computer
- Parallel programming
- Simultaneous multithreading
- Hyper-Threading , a special implementation of hardware-based multithreading in x86 processors
- Hyper-Threading Technology Architecture and Microarchitecture ( Memento of the original from September 23, 2015 in the Internet Archive ) Info: The archive link has been inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. (Intel)
- The basics of hardware multithreading based on Intel Hyperthreading (Servermeile Technet - The Knowledge Database)
- Hyper-Threading Technology Architecture and Microarchitecture ( Memento of the original from September 23, 2015 in the Internet Archive ) Info: The archive link has been inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. (Intel White Paper)
- Performance Insights to Intel® Hyper-Threading Technology (Intel)
- IBM Knowledge Center. Retrieved February 11, 2020 (American English).