x87

from Wikipedia, the free encyclopedia

x87 (also numeric processor extension , NPX) denotes a subset of the instruction set of the x86 architecture for floating point calculations . It is the oldest instruction set extension for this architecture. Their commands are not necessary to create working programs, but they provide hardware implementations for common numerical tasks that can be done much (10 × to 30 ×) faster. Before the x87 instructions could be processed by the processors, compilers or programmers had to call slow software library procedures to perform such floating point operations. This practice is common to many inexpensive embedded systemsstill often necessary. Alternatively, fixed point arithmetic is used in systems that do not have a floating point unit, as this can be efficiently implemented in integer arithmetic units.

Up to the Intel 80386 or i486SX, the x87 commands were implemented by a separate coprocessor . This coprocessor had to be purchased separately and inserted into the socket provided on the motherboard . Compared to emulation by software, the floating point calculation on an 80x87 FPU was 75 to 100 times faster.

In later x86 processor generations, but from the (more expensive) i486DX , the FPU part was usually already integrated in the main processor. The term x87 is still used to denote the subset of the instruction set that was originally processed in the x87 coprocessors. Since the introduction of SSE2 , x87 units have lost much of their former importance. For calculations that require a mantissa of 64 bits, as is possible with the 80-bit wide x87 registers, they are still important.

implementation

The x87 family does not use directly addressable registers like the main registers of the x86 processors; instead, the x87 registers form an eight-level stack that runs from st0 to st7. The x87 instructions work by putting values ​​on the stack, using them for calculations, and taking them down again. The x87 coprocessor therefore works in a similar way to pocket calculators designed for reverse Polish notation . However, two-digit operations such as FADD, FMUL, FCOM and so on can either address st0 or st1 implicitly, or alternatively use st0 together with another register or a memory operand. st0 can therefore be used as an accumulator (a register that is both a destination register and an operand) and it can also be swapped with another stack register using the fxch st ( x ) command . The x87 stack can therefore be used as seven freely addressable registers and as an accumulator. This is particularly useful on superscalar x86 processors (such as the Pentiums from 1993), where these Exchange commands are optimized so that they do not delay subsequent FPU instructions. For this purpose, the FPU, which handles the following floating point operations, is not used for each fxch command, but a different arithmetic unit .

The extension of the x86 architecture introduced in the Pentium MMX called MMX uses the same physical registers as the floating point unit. This simplified the market launch of MMX, since no additional registers have to be saved when changing tasks and therefore no adjustments to the operating system for MMX are necessary. It is the task of the application program to switch the processor from x87 to MMX mode and back again. However, these mode changes are comparatively slow, so that Intel and AMD went a different way with the later command extensions (SSE and successors).

IEEE compatibility

The x87 commands are compatible with the IEEE 754 standard . The floating point processor can process floating point numbers with single precision (32 bits, floator realin most languages), double precision (64 bits, double) or full 80 bits ( long doubleor extended). However, because the processors use the full 80 bits internally (to allow accuracy to be maintained over many calculations), rounding is not performed exactly as the strict 32- and 64-bit formats of the IEEE 754 require, unless one special rounding mode is set via a status register. A sequence of arithmetic operations can therefore behave slightly differently from the strict IEEE-754 formats.

Differences in the result of a calculation chain can also arise solely from activating the optimization during compilation. An optimized version of a program will therefore deliver a (usually slightly) different result than a non-optimized version, as is often used for debugging .

x87 coprocessors from Intel

8087

The 8087 was the first math coprocessor for 16-bit processors from Intel (the 8231 was older, but designed for the 8-bit 8080 ); it was built to be used with the 8088 and 8086 .

Processor core (die) of an earlier Intel 80287

80287

The 80287 (i287) was the math coprocessor for the Intel 80286 series. Intel and its competitors later introduced the 80287XL, which was actually an 80387XS with a pinout compatible with the 80287. The 80287XL included a 3: 2 clock multiplier so that motherboards that ran the coprocessor at only two-thirds of the CPU clock rate could run the floating point unit at full (= the same as the CPU) speed.

The 80287 and 80287XL also worked with the 80386 and were the only coprocessors available for the 80386 until the 80387 was introduced in 1987. They could also be used with the Cyrix Cx486SLC . However, the 80387 was preferred for both processors for performance reasons and because of the better possibilities of the instruction set.

The following models of the 80287 were produced:

  • i80287-3 (6 MHz)
  • i80287-6 (6 MHz)
  • i80287-8 (8 MHz)
  • i80287-10 (10 MHz)
  • i80287-12 (12.5 MHz)
  • i80287XL (12.5 MHz, 387SX core)
  • i80287XLT (12.5 MHz, laptop version)
Processor core (die) of an Intel 80387DX 16-33

80387

The 80387 ( 387 or i387 ) was the first Intel coprocessor to be fully compliant with the IEEE 754 standard. When it was introduced in 1987, a full two years after the 80386, the i387 was significantly faster than the 80287 and included significantly improved trigonometric functions. Functional scope ( FSIN, FCOSand FSINCOSwere added) and permitted value range ( FPATAN: any arguments for arctan (a / b) instead of | a | ≤ | b |,: FPTANany arguments instead of | x | ≤ π / 4) have been expanded.

The i387 was manufactured with CMOS III technology in 1.5 µm, its die size was 7 mm × 7.5 mm.

Versions

i387 micro-architecture with 16-bit barrel shifter and CORDIC unit

Three other versions of the i387 were later made:

i387DX

The i387DX was introduced in 1989 and was only compatible with the 386DX processor. It was produced with CHMOS IV technology in 1.0 µm, its die size was 5.5 mm × 5.5 mm.

i387SX

The i387 was only compatible with the standard 80386, which had a 32-bit processor bus . The later, lower-cost i386SX with a narrower 16-bit data bus could not be merged with the 32-bit bus of the i387. The i386SX therefore required its own variant of the coprocessor, the i387SX, which was compatible with the narrower bus of the SX.

Like the i387DX, the i387SX was also manufactured with CHMOS-IV technology in 1.0 µm.

i387SL Mobile

This variant, specially designed for i386SL processors and also produced with CHMOS-IV technology, was launched on the market in 1992 and, like the i386SL, has integrated power management.

The i387DX and the i387SX could be operated with a clock that is asynchronous to the system clock (× ​​0.8 to × 1.25).

80487

The i487 is an FPU coprocessor for the i486 SX. It was basically a full i486DX chip. If it was installed in an i486SX system, the i487 switched off the main processor and took over all CPU operations. In theory, such a computer could work even if the actual i486SX processor had been removed. In practice, however, a pin on the i487 prevented it from being used as a full-fledged i486.

See also

literature

Web links

Commons : X87 coprocessors  - collection of images, videos and audio files

Individual evidence

  1. 8087 Math Coprocessor. (PDF) Intel, October 1989, p. 3 , accessed on October 4, 2018 (English).
  2. STEVE FARRER: High Speed ​​Numerics with the 80186/80188 and 8087 . Editor: Intel Corporation. APPLICATION NOTE 258, 1986 ( intel.com [PDF; 270 kB ]).
  3. David Monniaux, The pitfalls of verifying floating-point computations , to appear in ACM TOPLAS
  4. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323