Pentium FDIV bug

from Wikipedia, the free encyclopedia

FDIV bug refers to a hardware error in the Pentium processor from Intel . The error became known in November 1994 one and a half years after the market launch and leads to inaccurate results for floating point divisions with certain, relatively few value pairs. No other flaw in a CPU design has caused so much turmoil and excitement among users and professionals. As a result, discovered hardware errors are published by many manufacturers. Many users have become aware that complex hardware, like software, typically has numerous errors.

The term FDIV-Bug is derived from the name of a floating point instruction that is frequently used in x86 processors . The error does not affect the FDIV command exclusively , as one might assume. Rather, all instructions that use the faulty division unit are affected. In detail, these are: FDIV, FDIVP, FDIVRP, FIDIV, FIDIVR, FPREM, FPREM1, FPTAN and FPATAN.

Discovery of the bug

Prof. Thomas Ray Nicely of Lynchburg College is recognized as the discoverer to whom it is thanks that the error became known to the general public . He discovered the error while trying to calculate Bruns constant precisely . After several months of getting to the bottom of the error, he informed selected specialist book authors and journalists on October 30, 1994, thus setting the ball rolling. Nicely has summarized the more detailed circumstances and the timing of the discovery in an FAQ .

According to its own information, Intel had already discovered the error before it became known. Various sources name June, others even August 1994 as the time of discovery. In his FAQ, Nicely says that Intel discovered the error in May while working on the floating point unit of the Pentium successor P6, the later Pentium Pro , which was still in development at the time. Robert Colwell , chief architect of the Pentium Pro, confirmed this representation in his book on the P6 development project. According to Colwell, the error was discovered about three months before Nicely's publication by the P6 floating point architect Patrice Roussel as part of a comparative validation of the RTL simulation of the future P6 floating point unit, and was passed on to the Pentium development team for processing. Therefore, Intel was able to deliver a bug-adjusted Pentium version in sample quantities as early as October.

The size, frequency and impact of the failure

The relative size of the error when it occurred was significantly less than one per thousand: at least the first 12 bits were always correct. In addition, the error rarely occurred: of the 2.28 · 10 47 possible pairs of numerator and denominator, around 3 · 10 37 were affected. In particular, the reciprocals of all simply exact numbers were correct. Noticeable effects could occur in applications in which a high level of accuracy was required and the input data provided this, such as in the forecasting of astronomical events, or in applications in which poor physical condition resulted in an increase in calculation errors.

Whether and to what extent normal users could be affected by the error was controversial in the weeks after the discovery. At that time, many users were still working with CPUs that did not have a floating point unit , so that common standard software ignored an existing floating point unit.

Intel initially claimed that the error would statistically only occur once every 27,000 years for a normal user and that it was only relevant when generating prime numbers or other sophisticated calculations. Last but not least, the trade press countered other estimates. In its January 1995 issue, the German trade magazine c't determined an average frequency of one error every 60 hours in applications with high floating point numbers, but at the same time admitted that “the number of those really affected should not actually be that large”.

IBM intervened and stopped the delivery of computers with Pentium -CPU effective for the press and calculated that the error could statistically even occur once every six hours. The reactions and claims of IBM were at that time not without controversy; With its PowerPC CPU and RS / 6000 workstations, IBM was one of Intel's fiercest competitors in the high-end sector. Thus, IBM's advance was seen by many as a strategic market maneuver.

Nicely defended Intel in the face of such dramatic presentations. In his FAQ, he took the position that a white paper that had meanwhile been circulated by Intel and that contained a statistical analysis of the error was much closer to reality than what IBM presented.

Cause of failure

The CPU uses the method of SRT division where the tables Lookup be used. There were five incorrect table entries in the faulty CPUs. Instead of the value 2, the value 0 is entered in the cells. Because of the uneven probability distribution with which the individual cells are read out, the error occurs only rarely, or only with certain number combinations.

Reactions to the bug

After Intel discovered the bug, it was tacitly eliminated and probably began sometime in late summer or early autumn to gradually convert the production of the various Pentium variants to the bugfixed versions. In spite of this, affected CPUs were delivered until late 1994, for a long time without the users' knowledge.

Critics therefore accused Intel of first trying to cover up the error and then to trivialize it. After the bug became known, Intel claimed that most users would never experience it. In this context, the aforementioned “statistical occurrence every 27,000 years in normal end users” is said to have been mentioned. This assessment triggered outraged reactions from users and the specialist press.

Intel initially announced that it only wanted to exchange CPUs from users who could demonstrate that they were affected by the error. Many users then asked Intel to replace all affected CPUs. The trade press didn’t give up on this announcement. After the pressure grew and the company threatened serious damage to its image, Intel finally gave in on December 20 and announced a comprehensive exchange program for all affected CPUs.

Last but not least, Intel earned a lot of glee for the mistake. "How many Intel employees does it take to change a lightbulb?" 1,9999983256 "or" You mean 2.00000000 + 2.000000000 doesn't equal 3.999998456? "Were in abundance at the time.

Intel learned lessons from the incident. Andy Grove apologized to the press for the anger his demeanor caused. A telephone exchange was set up especially for exchanging the faulty CPUs. In total, Intel made $ 475 million available for this incident, which was more than half of its fourth quarter 1994 profit. In the end, around one million faulty processors were exchanged. Nicely condemned the exchange as a waste of resources.

1995 began with the publication of all errors discovered in the own CPUs. In order to get this information, one had to sign a confidentiality agreement beforehand . From now on everyone could find out about errors in Intel products in so-called specification updates .

In the specification updates of affected Pentium CPUs, the effects of the FDIV bug are described as: "Slight Precision Loss for Floating-point Divides on Specific Operand Pairs", which translates as: "Slightly reduced accuracy in floating point divisions with certain operands -Pairs ". Depending on the type of Pentium CPU, the FDIV bug in these specification updates has the designation Erratum 20 (for the P5 Pentium) or Erratum 23 (for the P54C Pentium).

Affected Pentium Versions

Pentium 66 (SX837) with FDIV bug.

The error can be found in all Pentium CPUs that were produced up to the beginning of autumn 1994. Some copies made later still show it. Since only Pentium CPUs up to and including 100 MHz were manufactured until the beginning of 1995, all faster variants are not affected by the error.

But that does not mean that all Pentium CPUs with clock frequencies of 100 MHz and less have the problem. Since Intel exchanged the CPUs exclusively for those with the same clock frequency as part of the exchange program and continued to sell error-corrected versions of these CPUs, a large number of Pentium CPUs with clock frequencies between 60 and 100 MHz came into circulation that did not have the error. The presence of the FDIV bug cannot be concluded from the clock frequency alone.

Until the fall of 1994, however, the majority of Pentium production consisted of models of the first Pentium type P5, which were only available with clock frequencies of 60 and 66 MHz. The more advanced P54C type, which was initially only available with 90 and 100 MHz, was expensive and comparatively rare. This is one of the reasons why the P5 models manufactured in 1994 account for the largest part of all CPUs affected by the FDIV bug.

Since Intel did not present the Pentium with 75 MHz for the Socket 5 (SPGA type) until October 10, 1994, when the corrected B5 stepping of the P54C was already in production, only the Socket 5 versions are 90 and 100 MHz types affected, which were already sold before. Therefore, no affected 75 MHz types should be found in desktop PCs and servers. With the Pentium for mobile use, however, only the 75 MHz version is affected. Of course, this does not rule out the possibility that there may have been manufacturers who installed socket 5 types with 90 and 100 MHz in notebooks.

The following Pentium versions are affected:

Processor type Family Model Stepping Core stepping sSpec
Pentium 60/66
(P5)
5 1 3 B1 Q0352 Q0353 Q0394 Q0395 Q0399 Q0400 Q0412 Q0413 Q0466 Q0467 SX753 SX754 SX835 SX837 SZ949 SZ950
5 C1
Pentium 90/100
(P54C)
2 1 B1 Q0542 Q0543 Q0563 Q0587 Q0611 Q0612 Q0613 Q0614 Q0628 Q0677 SX874 SX879 SX885 SX886 SX909 SX910 SX921 SX922 SX923 SX942 SX943 SX944 SX960 SZ951
2 B3
mobile Pentium 75
(P54C)
1 B1 Q0601 Q0606 SX951
2 B3

If you have an expanded CPU, the easiest way to orientate yourself is to use the so-called sSpec , a usually five-digit abbreviation made up of letters and numbers that is printed on the CPU housing. The affected sSpecs are indicated in the table above.

It is of course not possible to identify a faulty CPU based on its label if it is in an operational system. However, there are still ways to find out whether the CPU is affected by the error. If there is an entry of the type  fdiv_bug: yes  in / proc / cpuinfo on a Linux system , the CPU is affected; likewise if the arithmetic operations listed below in the Windows calculator produce the wrong result:

Arithmetic operation 4195835/3145727 5505001/294911 8391667/1572863
Wrong result 1.333 73907 18.666 00093 5.33 49560642
Correct result 1.3338204x 18.666652x 5.3352816x

See also

Individual evidence

  1. http://www5.in.tum.de/lehre/seminare/semsoft/unterlagen_02/pentiumbug/website/
  2. a b c 60- and 66-MHz Pentium Processor Specification Update ( Memento of the original from February 24, 2008 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / download.intel.com
  3. a b c Pentium Processor Specification Update ( Memento of the original from February 24, 2008 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / download.intel.com
  4. a b c FAQ from Thomas Ray Nicely on the FDIV bug ( Memento from July 31, 2019 in the Internet Archive )
  5. ^ Robert P. Colwell, "The Pentium Chronicles," ISBN 0471736171 , p. 156
  6. Intel's white paper with analysis of the FDIV bug
  7. Tim Jackson. Inside Intel , 1998 Hoffmann and Campe
  8. Information from Intel on the FDIV Replacement Program
  9. Common jokes about the FDIV bug ( Memento from January 24, 2001 in the Internet Archive )

swell

Web links