Intel Itanium

from Wikipedia, the free encyclopedia
     Itanium   >>
Itanium logo neu.svg
Intel Itanium logo
Production: 2001 to 2002
Producer: Intel
Processor clock: 733 MHz to 800 MHz
FSB cycle: 133 MHz
L3 cache size: 2 MiB to 4 MiB
Manufacturing : 180 nm
Instruction set : IA ‑ 64 , IA ‑ 32  (emulation)
Microarchitecture : Itanium
Base: PAC418 "Slot M"
Name of the processor core: Merced

The Intel Itanium is a 64-bit microprocessor that was jointly developed by Hewlett-Packard and Intel and first came onto the market in 2001. The development goal was a high-performance architecture of the "post- RISC era" using a modified VLIW design. The Itanium's native instruction set is IA-64 . The commands of the older x86 processors can be only in one (very slow) Firmware - emulation mode to run. In addition, there are extensions for easier migration of software that was developed for processors of the PA-RISC family. The successor is the Itanium 2 .

design

Intel Itanium : Functional Block Diagram
Itanium: Old logo
Itanium: cartridge

The post- RISC architecture of the Itanium design is called Explicitly Parallel Instruction Computing (EPIC) and is a variant of the VLIW architectures. The specialty of EPIC is that the CPU can load selected instructions in pairs and also execute them simultaneously - practically as if there were several completely independent CPUs. Bundling the instructions together so that they can be executed in parallel is a non-trivial task that the compiler must solve in an optimal way. The compiler and its optimization capabilities are therefore particularly important. So the design shifts some of the complexity away from the CPU and towards the compiler. Furthermore, like RISC processors , the CPU only uses a small number of instructions that can be executed very quickly. Like most modern CPUs, the Itanium has several parallel functional units - a requirement for EPIC. When it comes to loading and passing the instructions to the functional units, the Itanium differs from the RISC philosophy through its explicitly parallel approach.

In a traditional, superscalar design, complex decoding logic examines each instruction before it passes through the pipeline . One speaks of dynamic scheduling . It is checked which commands can be executed in parallel on different units. The instruction sequences A = B + C and D = E + F do not influence each other, they can therefore be parallelized.

However, predicting which instructions can be executed at the same time is often complicated. The arguments of one instruction depend on the result of another, but only if another condition is also true. A slight modification of the above example leads to exactly this case: A = B + C; IF A == 5 THEN D = E + F. Here the two calculations are still independent of one another, but the second command sequence needs the result of the first calculation in order to know whether it should be carried out at all.

In these cases, a CPU employing dynamic scheduling attempts to predict the likely outcome of the condition using various methods . Modern CPUs achieve hit rates of around 90%. In the remaining 10% of cases, it is not only necessary to wait for the result of the first calculation, but also to delete and rebuild the entire pre-sorted pipeline. As a result, around 20% of the theoretical maximum computing power of the processor is lost.

The Itanium approaches the problem very differently, it uses static scheduling , so it relies on the compiler for branch prediction. Although this has a more complete overview of the program, it does not have the specific runtime conditions (i.e. use cases and parameterization that are only determined at runtime). This runtime information, which is unknown to the compiler, can, however, be specified using the profile-guided optimization technique using defined test runs. Results are e.g. B. Which jumps are executed how often (the GCC offers the functions fprofile-arcs and fbranch-probabilities for this purpose, for example) and which functions are hot spots. The compiler can use this information to make decisions during the translation of the program code that would otherwise have to be made on the chip at runtime. As soon as the compiler knows which paths are being taken, it bundles instructions that can be executed in parallel into one larger instruction. This long instruction is written in the translated program. Hence the name VLIW ( Very Long Instruction Word , "very long command word").

Moving the problem of effective parallelization to the compiler has several advantages. First of all, the compiler can spend significantly more time examining the code. The chip does not have this advantage because it has to work as quickly as possible. Second, the prediction logic is quite complex, and the new approach can greatly reduce that complexity. The processor no longer has to examine the code, but only breaks down the VLIW instructions into smaller units, which it passes on to its functional units. The compiler can therefore get as much parallelism as possible from the program, and the processor can then make the most of it according to its capabilities (the number of parallel functional units).

The disadvantage of parallelization by the compiler is the fact that the runtime behavior of a program does not necessarily result from its source code. This means that the compiler can also make “wrong” decisions, theoretically more often than a similar logic on the CPU. The CPU has z. B. has the advantage that it can remember within certain limits which jump was taken and how often, which the compiler cannot do without test runs. So the Itanium design relies heavily on the performance of the compiler. Hardware complexity on the microprocessor is exchanged for software complexity in the compiler.

Programs can be examined during execution by a so-called profiler , which collects data on the runtime behavior of the application. This information can also flow into the compilation process ( Feedback-Directed Compilation or Profile Guided Optimization ) in order to achieve better optimization. This technique is not new and has been used on other processors. The difficulty lies in using representative data. In the case of synthetic benchmarks that regularly use the same data, the profiler-based optimization is easy and profitable to use.

implementation

The development of the Itanium series began in 1994 and was based on basic research on the part of Hewlett-Packard regarding the VLIW technology. The result was a completely newly developed VLIW processor without compromise, which however was not suitable for work (and was not intended for it). After Intel began to get involved in the development, several functions were added to this "clean" processor that were necessary for commercialization, in particular the ability to execute IA-32 (x86) instructions. HP contributed skills to facilitate migration from its HP-PA home architecture .

The Itanium was originally supposed to appear in 1997, but since then the schedule had shifted several times until the first version with the code name Merced was shipped in 2001 . Speeds of 733 and 800 MHz as well as cache sizes of 2 or 4  MiB were offered , the prices were between 1,200 and 4,000 US dollars. However, the performance of the new processor was disappointing: In IA-64 mode, it was only marginally faster than an x86 processor with the same clock speed , and when it had to execute x86 code, the performance dropped to about an eighth of the performance because of the emulation used a comparable x86 processor. Intel then claimed that the first Itanium versions were not a "real" release.

The biggest (but not the only) problem with the Itanium is the high latency of its L3 cache, which greatly reduces the cache bandwidth that can actually be used. Intel was forced on to the next start the L3 cache , the integrated. At the same time, the latencies of the primary and secondary cache were reduced to below the values ​​of the Power4 processor from IBM , which had the lowest latency times at that time. In addition, the front side bus of the Itanium was expanded from 266 MHz at 64 bit to 400 MHz at 128 bit, so that the system bandwidth tripled.

These problems were fixed or at least mitigated with the successor.

Problems

Shortly after the official introduction of the name on October 4, 1999, the nickname Itanic was coined, which took up the name of the Titanic and thus compared the new processor with the fast steamer, which was considered "unsinkable" and which collided with an iceberg and sank on its maiden voyage .

The Intel Itanium had two major problems to contend with from the start. The first was homemade, the second was a little more surprising.

  • The first was the result of a serious and foreseeable wrong decision at Intel not to offer any hardware support for the execution of x86-32 code and to emulate x86-32 code, albeit with a certain hardware support through suitable commands ( Legacy drop ). It was hoped, in vain, that all important programs would be quickly ported to the Itanium platform, but this happened only very slowly or did not materialize at all. Software that was mostly still available as x86-32 code ran very slowly on Itanium computers. The emulation reached the speed of a Pentium -100 , at a fraction of the price, at a fraction of the price when the AMD Athlon XP ran at 1600 MHz, Pentium-III Tualatin ran at 1400 MHz, and Pentium 4 Willamette ran at 2000 MHz. Although various efforts have been made to speed up x86 code execution, the Itanium has generally remained too slow for this purpose. While the relevance of this ability is debatable since most customers do not buy Itanium systems to run x86 code on. On the other hand, Itanium systems could only be used as a result of suitable software for servers and not as general PC workstations. Intel planned to replace the emulation unit for x86 code with a JIT compiler , inspired by Digital's FX! 32 for the Alpha processor . It was hoped to achieve faster execution and reduced chip complexity. But actually the ground for the Itanium was burned up pretty quickly.
  • The second problem was advances in CPU development in the late 1990s and early 2000s, fueled in part by the race between Intel and AMD, and in part by technological advances of the time. During the concept phase and the first implementations of the Itanium, the classic CPUs had grown so much in the area of ​​clock frequency (factor 20) as well as in the area of ​​efficiency (factor 2 to 5) within a few years that the target area of ​​Itanium was almost reached when it hit there after some delays. In particular, there was a decoupling between the instruction set of a CPU and the execution of code, which led the basic concept of Itanium to absurdity. In the end it was even the case that the classic CPUs could adapt themselves better to the given software (see out-of-order execution , register renaming , SIMD , speculative execution , branch prediction and prefetching ) than the Itanium with its rigid optimization during compile time , in which you had to know everything about the target system, including the access times to the main memory.

By shifting hardware complexity into the compiler, the problem arises, as already indicated, that for optimal performance of the software it would have to be profiled and compiled on each target system with a compiler optimized for this target system , which is the case with closed source - Software is impossible and expensive with open source software. It can take months or years for complex application software to be converted to new compilers, successfully tested, delivered and finally used by the user. With processors with a superscalar design, users usually benefit directly from improvements. In both cases, this does not affect improvements through new processor commands that can only be used by changing the software.

Sales forecasts: The sales figures targeted for 2000 were revised downwards over 6 years and were never even rudimentarily achieved.

The Itanium, designed as a new high-performance CPU, was an almost dead horse when it arrived. However, it took Intel over ten years to admit that . The development was continued half-heartedly over 10 years until 2012. The main development effort was put into the then booming market of x86-64 CPUs, which is where most of the money came in.

An acceleration of this process could possibly have been achieved if the manufacturer had offered appropriate optimizing compilers, with the special knowledge of their own architecture, freely and promptly. In particular, programs with source text that are translated on customer systems would have benefited.

Due to the Itanium developments, HP's Alpha processor and the PA-RISC architecture should be phased out ( support for these platforms should be guaranteed from 2007 for another five years), SGI has meanwhile discontinued its MIPS -based workstations in favor of Itanium.

The Oracle Corporation announced in March 2011 that it would support Itanium chips anymore. HP was also surprised by this step. HP sued Oracle because HP was of the opinion that there were contracts with Oracle in which long-term support for the Itanium platform was regulated. In the dispute, HP prevailed in court. Accordingly, Oracle must continue to offer software for Itanium.

Model data

Merced

  • Revision C0 , C1 and C2
  • L1 cache: 16 + 16  KiB (data + instructions)
  • L2 cache: 96 KiB on-die
  • L3 cache: 2 and 4 MiB with processor clock
  • IA-64 , IA-32 emulation: MMX , SSE
  • PAC418
  • 64-bit bus with 133 MHz DDR (FSB266)
  • Operating voltage ( VCore ):
  • Power consumption ( TDP ): 114 W (2 MiB L3 cache) and 130 W (4 MiB L3 cache)
  • First publication date: June 2001
  • Manufacturing technology: 180 nm
  • The size: 300 mm² with 325 million transistors (300 million of which for the L3 cache)
  • Clock rates:
    • 733 MHz with 2 or 4 MiB L3 cache
    • 800 MHz with 2 or 4 MiB L3 cache

See also

Web links

Commons : Itanium 1  - collection of pictures, videos and audio files

Individual evidence

  1. ^ Andy Patrizio: Why Intel can't seem to retire the x86 . ITworld. March 4, 2013. Archived from the original on May 16, 2013. Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. Retrieved April 15, 2013. @1@ 2Template: Webachiv / IABot / www.itworld.com
  2. Michael Kanellos: Intel named Merced chip Itanium . In: CNET News.com . October 4, 1999. Retrieved April 30, 2007.
  3. Kraig Finstad: Re: Itanium . In: USENET group comp.sys.mac.advocacy . October 4, 1999. Retrieved March 24, 2007.
  4. Oracle Stops All Software Development For Intel Itanium Microprocessor of March 22, 2011 (Eng.)
  5. HP Supports Customers Despite Oracle's Anti-customer Actions , HP News release of March 23, 2011.
  6. ^ Yasmin El-Sharif: Processor dispute: Hewlett-Packard sues Oracle. In: Spiegel Online . June 16, 2011, accessed July 26, 2015 .
  7. ^ Jens Ihlenfeld: Itanium processor: HP wins against Oracle. In: Golem. August 1, 2012, accessed July 26, 2015 .
  8. ^ Adrian Offerman: The Processor Portal: Intel Itanium processor (Merced). In: The Chiplist. Accessed February 12, 2017 .