AMD bulldozer

from Wikipedia, the free encyclopedia

Bulldozer is a microarchitecture developed by AMD for x86 processors with 64-bit expansion and the successor to AMD K10 . The first Bulldozer-based processor models were presented under the brand name AMD FX in October 2011. The most important architectural feature is the so-called " Core Multithreading " (CMT), but some elements have also been taken from the AMD K10 architecture. The Bulldozer architecture including the optimization of the Piledriver is being replaced by the Steamroller architecture .

Architectural features

The completely newly developed Bulldozer is AMD's biggest change in microarchitecture since the introduction of AMD64 in 2003. In contrast to the AMD K10, Bulldozer is based on modules. A module has two 128-bit floating point units (FPUs), which can be combined to form a 256-bit floating point unit if required. The FPU has two integer clusters each with two ALUs and two AGUs ("address generation units"). For each module there is an L2 cache shared by all units of the module . Operating systems recognize a module as two logical processor cores . A Bulldozer The hosts up to four modules. This means that a maximum of eight threads can be processed at the same time. The largest bulldozer offshoot (Interlagos) consists of two silicon wafers and, with its 16 threads, does not quite come close to the then largest Intel Xeon (Haswell-EP), which with max. 18 cores and Hyper-Threading comes to 36 threads.


Block diagram of a bulldozer module

The clustered integer core architecture introduced by AMD at Bulldozer was originally developed by DEC and first presented in 1996 with the RISC CPU Alpha 21264 .

The module is a compromise between real dual-core, where all functional units of the processor core are available to each thread, and a single-core with simultaneous multithreading (SMT) . The concept saves space compared to the usual dual core. A module is divided into various single and double existing units, which also share some resources. It has two integer units and a 256-bit floating point unit, which can be split into two 128-bit FPUs if necessary. The fetch and decode units are also simply present and distribute the load between the respective units. One module has a 2 MB shared L2 cache, a 16 kB 4-way L1 data cache per integer cluster and a 64 kB 2-way L1 instruction cache. The two independent integer clusters are each equipped with two ALUs and two AGUs, which allows a maximum of four arithmetic and storage operations per module and cycle. Each module has two symmetrical 128-bit FMAC floating point pipelines which, if necessary, can be converted into a 256-bit wide unit and thus used for an FMA command . Unlike the Multiply Add command, FMA only rounds the result after the end of the complete calculation. All modules of a CPU share the possibly existing L3 cache and the dual-channel interface.

Instruction set extensions

With the Bulldozer micro-architecture, AMD supports various instructions such as Intel's AVX (" Advanced Vector Extensions "), SSE4.1, SSE4.2 , AES , CLMUL , as well as instructions developed by AMD (XOP, FMA4 ). The 3DNow! disappears for the first time in this generation.


AMD Bulldozer block diagram (8 core CPU)

Bulldozer-based microprocessors were initially only introduced to the market by AMD in 2011 in the “Enthusiast” series (as AMD FX ) and in the server sector (as AMD Opteron ). For use in servers, CPUs with two dies are sold under an Integrated Heatspreader (IHS) with the code name Interlagos (up to 16 threads) on the G34 socket, as well as CPUs with one die under the IHS with the code name Valencia (4 to 8 threads) on socket C32. Unlike the consumer versions, these are designed as LGA CPUs. All previous CPUs based on Bulldozer, including the current Piledriver revision, are manufactured at GlobalFoundries using the 32 nanometer SOI - HKMG process. A module of the Orochi die, which forms the basis for CPUs of the types Zambezi (FX series) and Valencia (Opteron series), contains approx. 213 million transistors on an area of ​​30.9 mm².


Piledriver is the name of the first revision of AMD's implementation of the clustered integer core architecture. It was presented in 2012 and should find its way into all areas of application: in the server segment, it continues to be the Opteron, in the APU segment under the code name Trinity and as a replacement for the first generation of FX CPUs.

In addition to improvements in jump prediction and the utilization of the pipelines , the following new features have been introduced:

  • Support of FMA3 , which was only introduced by Intel with the Haswell architecture
  • Instructions for bit (mask) manipulation: BMI1 (Intel-compatible) and TBM (AMD-specific)
  • Half precision floating point support: F16C
  • Revised and faster L2 cache
  • New clock mesh (only with Trinity version)
  • Twice the size of Level 1 TLB for data (64 instead of 32 entries)

In addition to Trinity , the Richland processor stepping introduced in 2013 is also based on Piledriver CPU cores.

Individual evidence

  1. The Bulldozer Architecture, article on the official start of the processors. In:
  2. Archived copy ( memento of the original dated August 29, 2014 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot /
  3. Archive link ( Memento of the original from October 20, 2014 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice.  @1@ 2Template: Webachiv / IABot /

Web links