Intel Nehalem microarchitecture

from Wikipedia, the free encyclopedia
Nehalem (micro architecture)
Manufacturer Intel
Manufacturing process 45 nm (Nehalem)
32 nm (Westmere)
base Socket 1156 (2 memory channels)
Socket 1366 (3 memory channels)
L1 cache 32 + 32 kB per core
L2 cache 256 kB per core
L3 cache 3/4/6/8/12 MB
predecessor Intel Core Solo (Yonah)
Intel Core Duo (Yonah)
Intel Core 2  (Allendale, Conroe, Merom, Kentsfield, Penryn, Wolfdale, Yorkfield)
successor Sandy Bridge (tock)
Ivy Bridge (tick)
Block diagram: Nehalem micro-architecture

The Nehalem is one of Intel developed microarchitecture . It is based in part on the Intel Core micro-architecture and replaced it in 2010. Processors based on the Nehalem architecture are the first Intel processors with an integrated memory controller . The first version of the Nehalem architecture was launched in November 2008 as a high-end CPU (Bloomfield) for desktop PCs as a Core i7 for Socket 1366 ( X58 ).

It was replaced in 2011 by the Intel Sandy Bridge microarchitecture .

The name of this micro-architecture comes from a small coastal town called Nehalem in Oregon.

Innovations in the Nehalem architecture

A major innovation in this architecture is that the Front Side Bus (FSB), which established the connection between processor and chipset in previous models, has given way to a point-to-point connection called QuickPath Interconnect (QPI), which is designed for high throughput and scalability is - similar to AMD's switch to HyperTransport five years earlier. Another important innovation, also similar to AMD, is the connection of the main memory via an integrated memory controller . This direct connection allows the processor to access the memory with significantly lower latency . These two measures eliminate the bottleneck of the core processors that existed to date , which was given by the FSB. However, this means that new bases are necessary.

There is also simultaneous multithreading (SMT) is implemented, which is already in Pentium 4 processors under the name Hyper-Threading was used. The new implementation generates significantly more performance, also because the resources per thread have increased due to the quadruple superscalar design in contrast to the triple superscalar design of the Pentium 4. With Intel's SMT, the four-core processor can process a maximum of eight threads simultaneously. The benefit of a quad-core was questionable for desktop applications, however, as only with software that has been specially optimized for this purpose can so many performance-relevant threads run at the same time. So far, even “normal” quad-cores have suffered from being used by very few programs and, due to a lower clock rate per core, have been slower in applications than dual-core models in some cases . In the server area, on the other hand, several processors - real or virtual - tend to be more useful, since a number of requests have to be processed in parallel more often.

Compared to its predecessors, the Nehalem architecture has a three-level cache hierarchy similar to that of the AMD Phenom : In addition to its own L1 cache, each core also has its own 256  KiB L2 cache, while all cores share a common L3 cache, which is up to 8 MiB in size. This is effectively less than the last up to 6 MiB for every two cores in the Core 2 , but the use of such large caches is questionable; Versions of the Core 2 that were stripped down in this regard often only lost minimal performance. The latter is an inclusive cache, i. In other words, it always contains all data that is stored in L1 or L2 caches. This simplifies the cache coherence protocol and reduces snooping traffic. In contrast to the predecessor processors, the L1 and L2 caches no longer consist of ordinary 6T SRAM cells, but of 8T SRAM cells, which Intel hopes will save energy.

The Power Control Unit (PCU), a kind of coprocessor for the power management of the processor, and new types of power gate circuits are intended to optimize the energy budget. On the one hand, this is intended to keep the power consumption to a minimum in every load situation, and on the other hand, it implements the so-called turbo mode, in which the processor is automatically clocked slightly higher when the load is weakly distributed across threads, if the processor's energy balance allows it. Specifically, this means: If two physical cores are unused and the TDP is not exceeded, the cores in use are clocked at least one multiplier level higher. If only one core is working, the increase in the clock frequency of the working core is even greater. The inactive cores are clocked down.

Other enhancements include a further stage of the Streaming SIMD Extensions , SSE4.2 , and that all four processor cores on the same The are housed.

Bloomfield

Underside of a Bloomfield processor (Core i7-940)

As the first Intel processor for the desktop market, the Bloomfield has integrated the memory controller directly on the chip, as AMD has been doing since the processors based on the K8 architecture . The processor has three memory channels , through which three identical memory modules can be used in parallel, similar to the dual-channel mode that was usual up to now ; however, it is also possible to use only two of the channels. The memory type used is exclusively DDR3 RAM with an officially up to 1333 MHz clock (DDR3-1066); an alternative use of DDR2 RAM is no longer possible. Communication to the Northbridge takes place via the so-called QuickPath Interconnect (QPI), which, depending on the model, provides a very broadband point-to-point connection with 9.6–12.8 GB / s in each direction, similar to the HyperTransport used by AMD . QPI thus replaces the front side bus , which in its basic form dates back to the primeval times of the x86 family . All processors with the Bloomfield core master the so-called Turbo Boost technology: If a single core is loaded , the multiplier of this core is increased by two counter points, while other cores are clocked down, and if two to four cores are loaded, theirs Multipliers increased by a maximum of one counter point, provided that the workload of the cores does not require the maximum TDP of the processor without this clock increase.

Chipset for bloomfield-based processors

Due to the performance classification, Intel only provides an expensive variant as a chipset , which was dubbedIntel X58 ”. The codenamed Tylersburg developed chipset - to later in contrast, cheaper processor series based on the Nehalem architecture - still a classic design from North and South Bridge .

There are no chipsets from other manufacturers for Bloomfield processors. Nvidia had announced in a press release that the SLI capability for platforms with the Core i7 processors would be realized via an additional chip on the X58. Ultimately, however, Intel and Nvidia were able to agree that Nvidia, like ATI, would offer a license for integrating multi-GPU technology into the X58 chipsets.

Error directory of the Bloomfield-based processors

In its “Specification Update” dated November 12, 2008, Intel describes an erratum for the Core i7 regarding the translation lookaside buffer , which already appeared in earlier processor series from Intel and AMD. Even before the market launch, Intel asked the motherboard manufacturers to work around the error with a BIOS update. Intel assumes that the manufacturers have complied with this request.

Lynnfield

Underside of a Lynnfield processor (Core i5-750)

Lynnfield is a quad-core processor with a structure size of 45 nm , which can execute up to eight threads simultaneously using simultaneous multithreading (deactivated on some models). As with the Bloomfield, the memory controller sits on the processor die . In contrast to this, the memory controller from Lynnfield addresses the DDR3 memory in the dual-channel process . Also from the Bloomfield series comes the integrated Turbo Boost technology, which increases the clock frequencies of the individual cores differently depending on the load on the individual cores without exceeding the TDP ; the maximum clock speed increase is more pronounced than is the case with the Bloomfield. If a single core is used, the multiplier of this core is increased by five counter points, if two cores are used by 4 counter points, while inactive cores are clocked down; in the case of three to four cores being used, their multipliers are increased by a maximum of one counter point increased, provided that the core load does not demand the maximum TDP of the processor without this clock increase. The Lynnfield also has the EIST power saving function , which lowers the clock rate and operating voltage of the processor when it is not used.

In contrast to the Bloomfield, not only the memory controller, but the entire Northbridge including the PCI Express controller was integrated into the processor; communication there continues to take place (but now within the processor) via the QuickPath Interconnect. Accordingly, only a southbridge is installed on the mainboard , which is connected via a Direct Media Interface (DMI) , as is usual with Intel . All Lynnfield CPUs can be used with chipsets of the Intel 5 series . However, since the processor uses socket 1156 , it cannot be operated on the X58 mainboards intended for the Bloomfield.

Intel abandoned the completion of the dual-core offshoot Havendale at an early stage in favor of Clarkdale.

Clarksfield

Clarksfield is identical in core design to Lynnfield. However, the CPU die is packaged in a chip package for the Socket 988. In addition, all processors are sold with a low TDP, as they are intended for the mobile market segment. The maximum possible clock rate of a core in turbo mode is even higher in relation to the standard clock rate than is the case with the desktop models based on Lynnfield. Ultimately, however, the utilization of this clock margin depends crucially on the cooling, since the Turbo Boost technology is also based on the temperature.

Westmere

Under the name "Westmere" (originally: Nehalem-C), Intel has been manufacturing the Nehalem microarchitecture since the end of 2009, with a structure size that has been reduced to 32 nm . The first semiconductor chips of this type are the dual-core processors called Clarkdale . In contrast to four-core cores on the Nehalem architecture, these processors again do not have the innovation of the memory controller integrated on the CPU die; instead, it is housed in the CPU housing, but on a different chip. Communication does not take place again via FSB, but via QPI, which does not lead to better latencies for memory access compared to the FSB connection. The new Westmere dual cores only experience a speed advantage over dual cores on the core architecture through simultaneous multithreading (SMT). With SMT disabled, the performance per cycle is therefore similar to the older core architecture. With SMT, the new dual cores with a quadruple superscalar core design behave similarly to the tricore models with a triple superscalar core design in benchmarks with software that is divided into different threads.

In the first half of 2010, six-core and four-core based on the Gulftown were also presented. These Westmere CPUs do not differ in architecture from Bloomfield CPUs, only the production has been switched to the 32 nm process. This means that additional cores and more cache are possible within the same TDP limits. In addition, as with the dual-core Westmere CPUs, seven additional instructions have been added, six of which are used for AES encryption : these are AES-NI and CLMUL . The larger L3 cache also has slightly higher latencies than its predecessor.

Clarkdale

Block diagram: Clarkdale / Arrandale CPU

The CPU cores of the Clarkdale are technically largely the same as those of the Lynnfield, but are manufactured in a 32 nm process. In contrast to the Lynnfield, the Clarkdale only has two cores and the memory controller is not directly connected to the cores, but is located on the same die with the GPU core and is connected via the QPI, which results in significantly higher latencies compared to Lynnfield Memory accesses on the part of the processor leads. The semiconductor chip with the GPU core and the memory controller is manufactured using the 45 nm process and is housed in the same chip package as the CPU in all Clarkdale processors.

Because of the reduction in structure size , Intel assigns the Clarkdale processors to a new generation of processors with the code name " Westmere ". As part of the Westmere architecture, the Clarkdale has seven new instructions (disabled on some models), six of which are dedicated to AES encryption.

Like Lynnfield, the Clarkdale processors are operated in socket 1156 and can be used on chipsets of the Intel 5 series. However, the integrated graphics unit cannot be used on mainboards with the P55 chipset, but only with the newer chipsets (H55, H57, Q57). The image signals from the integrated GPU are transmitted to the chipset via the "Flexible Display Interface" (FDI), whereby FDI is based on the DisplayPort standard.

Arrandale

Arrandale is technically the same as Clarkdale, but has been optimized for use in notebooks. This is why it is intended for use in sockets PGA988 and BGA1288 and has a reduced TDP , which is partially achieved with reduced clock rates and reduced voltage. Starting with the Arrandale processors, the notebooks also became WiDi-capable (Intel Wireless Display), provided a WLAN adapter from the Centrino series is also available.

Gulftown

On March 11, 2010 the first i7 processors with Gulftown architecture appeared. The first model was the i7-980X Extreme Edition with a clock frequency of 3.33 GHz. The processors with Gulftown architecture are the first native six-core processors (hexa-core) from Intel. Since this is manufactured in the 32 nm manufacturing process, Intel assigns it, just like the Clarkdale, to the Westmere generation. In terms of performance, the Gulftown starts above the Bloomfield, is manufactured for the 1366 socket and can be operated with the X58 chipset. Gulftown supports Hyper-Threading and also brings special functions for accelerated encryption in the form of the AES New Instructions (AES-NI).

Models

Desktop

Clarkdale

Dual-core processor

  • L1 cache: 32 + 32 KiB per core (data + instructions)
  • L2 cache: 256 KiB per core with processor clock
  • L3 cache: 4096 KiB with QPI clock
  • MMX , SSE , SSE2 , SSE3 , SSSE3 , SSE4 .2, Intel 64 , EIST , XD-Bit , IVT , SMT . Only activated for Core i5 models: AES instructions, TXT . Exception: Core i5-661 without TXT and Intel VT-d!
  • Dual-channel DDR3 memory controller, PCIe 2.0 controller and GPU connected via QPI
  • Socket 1156 , Direct Media Interface (DMI) and Flexible Display Interface (FDI)
  • Operating voltage ( VCore ): 0.65-1.4V
  • Power dissipation ( TDP ): 73-87 W.
  • Release DATE: January 4, 2010
  • Manufacturing technology: 32 nm (45 nm for the GPU core with memory controller and PCIe controller)
  • The size: 81 mm² with 383 million transistors and 114 mm² with 177 million transistors for the GPU core
  • Clock rates: 2.8-3.6 GHz
  • Models : Intel Core i3-530 to Intel Core i5-680

Lynnfield

Quad-core processor

  • L1 cache: 32 + 32 KiB per core (data + instructions)
  • L2 cache: 256 KiB per core with processor clock
  • L3 cache: 8192 KiB with QPI clock
  • MMX , SSE , SSE2 , SSE3 , SSSE3 , SSE4 .2, Intel 64 , EIST , XD-Bit , IVT . Only activated for Core i7: SMT , TXT
  • integrated dual-channel DDR3 memory controller; PCIe 2.0 controller with 16 lanes connected via internal QPI with 2.13–2.4 GHz (17.07–19.2 GB / s)
  • Socket 1156 , DMI with 2.5 GT / s ( full duplex , max. 10 G b / s per direction, a total of 2 G b / s)
  • Operating voltage ( VCore ): 0.65-1.4V
  • Power dissipation ( TDP ): 82-95 W.
  • Release DATE: September 8, 2009
  • Manufacturing technology: 45 nm
  • The size: 296 mm² with 774 million transistors
  • Clock rates: 2.66-3.06 GHz
  • Models : Intel Core i5-750 to Intel Core i7-880

Bloomfield

Quad-core processor

  • L1 cache: 32 + 32 KiB per core  (data + instructions)
  • L2 cache: 256 KiB per core with processor clock
  • L3 cache: 8192 KiB with QPI clock
  • MMX , SSE , SSE2 , SSE3 , SSSE3 , SSE4 .2, Intel 64 , EIST , XD-Bit , IVT , SMT
  • Integrated triple-channel DDR3 memory controller: support up to DDR3 -1066
  • Socket 1366 , QuickPath Interconnect with 2.4–3.2 GHz (9.6–12.8 GB / s in each direction, or 19.2–25.6 GB / s in total)
  • Operating voltage ( VCore ): 0.8-1.375V
  • Power dissipation ( TDP ): 130 W
  • Release DATE: November 18, 2008
  • Manufacturing technology: 45 nm
  • The size: 263 mm² with 731 million transistors
  • Clock rates: 2.66-3.33 GHz
  • Models : Intel Core i7-920 to Intel Core i7-975 Extreme Edition

Gulftown

Core i7 Extreme Edition logo

Six-core processor (Hexa-Core)

Mobile

Clarksfield

Quad-core processor

  • L1 cache: 32 + 32 KiB per core  (data + instructions)
  • L2 cache: 256 KiB per core with processor clock
  • L3 cache: 6144-8192 KiB, partially deactivated on some models
  • MMX , SSE , SSE2 , SSE3 , SSSE3 , SSE4 .2, Intel 64 , EIST , XD-Bit , IVT , SMT , TXT
  • integrated dual-channel DDR3 memory controller; PCIe 2.0 controller connected via internal QPI
  • Socket PGA988 , DMI with 2.5 GT / s ( full duplex , max. 10 Gbit / s per direction, total of 2 GB / s)
  • Operating voltage ( VCore ): 0.65-1.4V
  • Power dissipation ( TDP ): 45-55 W.
  • Release DATE: September 23, 2009
  • Manufacturing technology: 45 nm
  • The size: 296 mm² with 774 million transistors
  • Clock rates: 1.6–2.13 GHz
  • Models : Intel Core i7-720QM to i7-940XM Extreme Edition

Arrandale

Dual-core processor

  • L1 cache: 32 + 32 KiB per core (data + instructions)
  • L2 cache: 256 KiB per core with processor clock
  • L3 cache: 4096 KiB (not fully activated on some models)
  • MMX , SSE , SSE2 , SSE3 , SSSE3 , SSE4 .2, Intel 64 , EIST , XD-Bit , IVT , SMT . From Core i5-520M, AES instructions and TXT are also activated.
  • Dual-channel DDR3 memory controller, PCIe 2.0 controller and GPU connected via QPI
  • Socket PGA988 , Direct Media Interface (DMI) and Flexible Display Interface (FDI)
  • Base BGA1288 , Direct Media Interface (DMI) and Flexible Display Interface (FDI)
  • Operating voltage ( VCore ): k. A.
  • Power dissipation ( TDP ): 18–37 W.
  • Release DATE: January 4, 2010
  • Manufacturing technology: 32 nm (45 nm for the GPU core with memory controller and PCIe controller)
  • The size: 81 mm² with 383 million transistors for the CPU, and an additional 114 mm² with 177 million transistors for the uncore area
  • Clock rates: 1.06–2.80 GHz
  • Models : Intel Core i3-330M to Intel Core i7-640M

See also

Web links

Individual evidence

  1. King Ian: Intel's new faster chip right on AMD's heels . The Seattle Times. October 20, 2008. Retrieved January 21, 2011.
  2. George Jones: IAMD vs Intel: The future of desktop CPUs . PC Advisor UK. February 9, 2008. Retrieved January 21, 2011.
  3. Allround-PC.com: Intel Core i7 - The details shortly before the market launch ( memento of the original from January 26th, 2009 in the Internet Archive ) Info: The @1@ 2Template: Webachiv / IABot / www.allround-pc.com archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. , News dated November 4, 2008
  4. ^ Nehalem - Everything You Need to Know about Intel's New Architecture . AnandTech. P. 9 (article on the Nehalem architecture).
  5. ^ Nehalem - Everything You Need to Know about Intel's New Architecture . AnandTech. P. 12 (article on the Nehalem architecture).
  6. Andreas Stiller: IDF: Nehalem with turbo mode Heise online. August 20, 2008.
  7. Christof Windeck: Intel Core i7 now in stores . Heise online. November 17, 2008.
  8. ^ Nehalem - Everything You Need to Know about Intel's New Architecture. In: AnandTech , article about the Nehalem architecture, page 11, section "QPI"
  9. Increasing Performance with Intel® Turbo Boost Technology ( Memento from January 3, 2013 in the web archive archive.today ) Intel, accessed on March 26, 2009
  10. NVIDIA Brings SLI Technology to Intel Bloomfield CPU Platforms. Nvidia Corporation, press release July 14, 2008
  11. Intel Core i7 Processor Extreme Edition Series and Intel Core i7 Processor Specification Update. Intel (PDF; 786 kB)
  12. Intel's Core i7 processors also need attention during TLB invalidation. In: heise online , December 1, 2008
  13. TLB bug in the Intel Core i7! - Or not? In: ComputerBase , December 1, 2008
  14. First benchmarks of the Core i5 "Lynnfield". In: ComputerBase , December 9, 2008
  15. (Almost) official: Clarkdale replaces Havendale. In: ComputerBase , February 9, 2009
  16. AnandTech: The Clarkdale Review: Intel's Core i5 661, i3 540 & i3 530 , test report from January 4th 2010.
  17. ComputerBase: Test: Intel Core i3-530 / 540 and Core i5-661 , test report from January 4, 2010.
  18. AnandTech: A 12 MB L3 Cache: 50% Larger, 14% Higher Latency , test report from March 11, 2010.
  19. The Clarkdale Review: Intel's Core i5 661, i3 540 & i3 530. In: AnandTech , January 4, 2010.
  20. Test: Intel Core i3-530 / 540 and Core i5-661 (page 6). In: ComputerBase , January 4, 2010.
  21. Intel Core i5-600 and Core i3-500 series as well as Pentium G6950 ( Memento from February 6, 2016 in the Internet Archive ) Intel, data sheet PDF.
  22. Six-core processor from Intel. In: heise online , March 11, 2010
  23. Intel processor specification Core i5 661. Intel, January 22, 2010.
  24. a b Test: Intel Core i3-530 / 540 and Core i5-661 (page 3). In: ComputerBase , January 4, 2010, accessed January 4, 2010.
  25. a b Intel 5 Series Chipset and Intel 3400 Series Chipset ( Memento from April 26, 2015 in the Internet Archive ). Intel Data Sheet, September 2009, accessed January 1, 2010.
  26. a b Test: Intel Core i5-750, Core i7-860 and Core i7-870 - the Lynnfield processor. In: ComputerBase , September 8, 2009, accessed January 1, 2010.
  27. Intel presents the fastest processor of all time ( Memento from December 25, 2008 in the Internet Archive ) Intel, press release from November 18, 2008