Terascale processor

from Wikipedia, the free encyclopedia

The Intel Tera-Scale of Intel is a research project to a microprocessor to develop with hundreds of cores. Such an architecture is - analogous to the multicore architectures - called " manycore ".

The terascale processor is organized in tiles - the so-called tiles - whereby most of the tiles perform general computing tasks. The terascale processor has approximately 100 million transistors, with each tile containing approximately 1.2 million transistors. It was introduced in 2007.

Structure of the tiles

The tiles each have a processing engine (PE) and a crossbar switch . The processing engine takes over the arithmetic tasks with the help of two FMAC units and a floating point unit . In addition, the processing engine has 5 kB of local memory. The crossbar switch is used for communication with the neighboring tiles.

Basic circuit diagram of the processing engine
Specialized tiles in the terascale

A few additional tiles are optimized for special tasks such as processing high definition video , encryption , digital signal processing , physics acceleration or 3D computer graphics . These specialized tiles work more efficiently in the respective task area - i.e. faster and more energy-saving - than non-specialized tiles.

Memory structure

One problem that arises with the Terascale is that the high number of cores makes the connection to the memory very difficult, since on the one hand the data connection has to be shared and on the other hand the access to the memory has to be coordinated. Intel uses a hierarchical cache memory for this purpose. Each core gets its own 16 kB to 64 kB L1 cache. The 256 kB to 1 MB L2 cache is shared by a small group of cores. The L3 cache is available to all core groups within the processor.

In addition, an L4 cache from DRAM memory is used in the Terascale , which is not located on the same processor die, but on its own die. The L4 cache is then installed in an MCP design next to or in a stacked design on the processor. In addition, the programs are given a QoS prioritization so that the memory can be reserved for important applications. How much memory an application can use is determined dynamically by a resource monitor , which means that the operating system can move the applications into the optimal cache units.

speed

The terascale processor reaches a speed of more than one teraflop per second, which is comparable to the ASCI-Red supercomputer from 1996, which is made up of 10,000 Pentium Pro processors with a clock frequency of 200 MHz and a total of 500 kilowatts of electrical power consumption.

Clock frequency
in gigahertz
Core voltage
in volts
Power consumption
in watts
Data throughput
in terabits / s
Computing power
in tera flops
3.16 0.95 62 1.62 1.01
5.1 1.2 175 2.61 1.63
5.7 1.35 265 2.92 1.81

See also

Individual evidence

  1. J. Held, J. Bautista, S. Koehl: From a Few Cores to Many: A Tera-scale Computing Research Overview . (PDF) Intel 2006