IEEE 754-1985: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
AntiVandalBot (talk | contribs)
m BOT - rv 82.45.45.137 (talk) to last version by Luna Santin
Line 1: Line 1:
The '''[[IEEE]] Standard for Binary Floating-Point Arithmetic''' ('''IEEE 754''') is the most widely-used standard for [[floating point|floating-point]] computation, and is followed by many [[Central processing unit|CPU]] and [[floating point unit|FPU]] implementations. The standard defines formats for representing floating-point numbers (including [[−0 (number)|negative zero]] and [[denormal number]]s) and special values ([[infinity|infinities]] and [[NaN]]s) together with a set of ''floating-point operations'' that operate on these values. It also specifies four rounding modes and five exceptions (including when the exceptions occur, and what happens when they do occur).
==== CRAIG IS GAY WITH MIKE ====

IEEE 754 specifies four formats for representing floating-point values: single-precision (32-bit), double-precision (64-bit), single-extended precision (&ge; 43-bit, not commonly used) and double-extended precision (&ge; 79-bit, usually implemented with 80 bits). Only 32-bit values are required by the standard; the others are optional. Many languages specify that IEEE formats and arithmetic be implemented, although sometimes it is optional. For example, the [[C programming language]], which pre-dated IEEE 754, now allows but does not require IEEE arithmetic (the C <tt>float</tt> typically is used for IEEE single-precision and <tt>double</tt> uses IEEE double-precision).

The full title of the standard is '''IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985)''', and it is also known as '''IEC 60559:1989, Binary floating-point arithmetic for microprocessor systems''' (originally the reference number was IEC 559:1989).[http://www.opengroup.org/onlinepubs/009695399/frontmatter/refdocs.html]

== Anatomy of a floating-point number ==

Following is a description of the standards' format for floating-point numbers.

=== Bit conventions used in this article ===

[[Bit]]s within a [[word (computer science)|word]] of width W are indexed by [[integer]]s in the range 0 to W&minus;1 inclusive. The bit with index 0 is drawn on the right. The lowest indexed bit is usually the least significant.

=== General layout ===

Binary floating-point numbers are stored in a [[signed number representations#Sign-and-magnitude|sign-magnitude]] form as follows:

::[[Image:General_floating_point.PNG]]

where the [[most significant bit]] is the [[sign bit]], ''exponent'' is the [[exponent bias|biased]] exponent, and ''[[mantissa]]'' is the [[significand]] minus the ''most significant bit''.
<!-- ingenious .. but very confusing ... can we change this please? Let's discuss in Talk. -->

====Exponent biasing====

The exponent is biased by <math>2^{e-1} - 1</math>. Biasing is done because exponents have to be [[Negative and non-negative numbers#Computing|signed values]] in order to be able to represent both tiny and huge values, but [[two's complement]], the usual representation for signed values, would make [[#Comparing floating-point numbers|comparison]] harder. To solve this the exponent is biased before being stored, by adjusting its value to put it within an unsigned range suitable for comparison.

For example, to represent a number which has exponent of 17, ''exponent'' is <math>17+2^{e-1} - 1</math>.

====Cases====

The most significant bit of the [[mantissa]] is determined by the value of ''exponent''. If <math>0 <</math> ''exponent'' <math>< 2^{e} - 1</math>, the most significant bit of the ''mantissa'' is 1, and the number is said to be ''normalized''. If ''exponent'' is 0, the most significant bit of the ''mantissa'' is 0 and the number is said to be ''de-normalized''. Three special cases arise:
# if ''exponent'' is 0 and ''mantissa'' is 0, the number is ±0 (depending on the sign bit)
# if ''exponent'' = <math>2^{e} - 1</math> and ''mantissa'' is 0, the number is ±[[infinity]] (again depending on the sign bit), and
# if ''exponent'' = <math>2^{e} - 1</math> and ''mantissa'' is not 0, the number being represented is [[NaN|not a number (NaN)]].

This can be summarized as:

{| class="wikitable"
|-
! Type
! Exponent
! Mantissa
|-
| Zeroes
| 0
| 0
|-
| Denormalized numbers
| 0
| non zero
|-
| Normalized numbers
| <math>1</math> to <math>2^e-2</math>
| any
|-
| Infinities
| <math>2^e-1</math>
| 0
|-
| NaNs
| <math>2^e-1</math>
| non zero
|-
|}

=== Single-precision 32 bit ===

A [[single precision|single-precision]] binary floating-point number is stored in a 32-bit word:
:[[Image:Float_example.PNG]]

The exponent is biased by <math>2^{8-1} - 1 = 127</math> in this case, so that exponents in the range &minus;126 to +127 are representable. An exponent of &minus;127 would be biased to the value 0 but this is reserved to encode that the value is a denormalized number or zero. An exponent of 128 would be biased to the value 255 but this is reserved to encode an infinity or not a number.

For normalised numbers, the most common, Exp is the biased exponent and
Fraction is the fractional part of the [[significand]]. The number has value v:

v = s &times; 2<sup>e</sup> &times; m

Where

s = +1 (positive numbers) when the sign bit is 0

s = &minus;1 (negative numbers) when the sign bit is 1

e = Exp &minus; 127 (in other words the exponent is stored with 127 added to it, also called "biased with 127")

m = 1.Fraction in binary (that is, the significand is the binary number 1 followed by the radix point followed by the binary bits of Fraction). Therefore, 1 &le; m < 2.

Notes:
# Denormalized numbers are the same except that e = &minus;126 and m is 0.Fraction. (e is NOT &minus;127 : The significand has to be shifted to the right by one more bit, in order to include the leading bit, which is not always 1 in this case. This is balanced by incrementing the exponent to &minus;126 for the calculation.)
# &minus;126 is the smallest exponent for a normalized number
# There are two Zeroes, +0 (S is 0) and [[−0 (number)|&minus;0]] (S is 1)
# There are two Infinities +&infin; (S is 0) and &minus;&infin; (S is 1)
# NaNs may have a sign and a significand, but these have no meaning other than for diagnostics; the first bit of the significand is often used to distinguish ''signaling NaNs'' from ''quiet NaNs''
# NaNs and Infinities have all 1s in the Exp field.
# The smallest non-zero positive and largest non-zero negative numbers (represented by the denormalized value with all 0s in the Exp field and the binary value 1 in the Fraction field) are
#: ±2<sup>&minus;149</sup> ≈ ±1.4012985{{e|−45}}
# The smallest non-zero positive and largest non-zero negative normalized numbers (represented with the binary value 1 in the Exp field and 0 in the Fraction field) are
#: ±2<sup>&minus;126</sup> ≈ ±1.175494351{{e|−38}}
# The largest finite positive and smallest finite negative numbers (represented by the value with 254 in the Exp field and all 1s in the Fraction field) are
#: ±(2<sup>128</sup> − 2<sup>104</sup>) ≈ ±3.4028235{{e|38}}

=== An example ===

Let us encode the decimal number &minus;118.625 using the IEEE 754 system.

# First we need to get the sign, the exponent and the fraction. Because it is a negative number, the sign is "1".
# Now, we write the number (without the sign) using [[binary numeral system|binary notation]]. The result is 1110110.101.
# Next, let's move the radix point left, leaving only a 1 at its left: 1110110.101 = 1.110110101 &times; 2<sup>6</sup>. This is a normalized floating point number. The [[significand|mantissa]] is the part at the right of the radix point, filled with 0 on the right until we get all 23 bits. That is 11011010100000000000000.
# The exponent is 6, but we need to convert it to binary and bias it (so the most negative exponent is 0, and all exponents are non-negative binary numbers). For the 32-bit IEEE 754 format, the bias is 127 and so 6 + 127 = 133. In binary, this is written as 10000101.

Putting them all together:

::[[Image:Floating point example.PNG]]
<!--The bias is +127 - I don't know where to put that now-->

=== C Source ===
This takes a character array as read from a file of 4 bytes (32 bits) and converts it to a float using the IEEE Standard;
DBL_MAX is imported from <float.h>

float arrayToFloat(unsigned char data[4])
{
int s, e;
unsigned long src;
long f;
float value;
src = ((unsigned long)(data[0] & 0x000000FF) << 24) +
((unsigned long)(data[1] & 0x000000FF) << 16) +
((unsigned long)(data[2] & 0x000000FF) << 8) +
((unsigned long)(data[3] & 0x000000FF));
s = (src & 0x80000000UL) >> 31;
e = (src & 0x7F800000UL) >> 23;
f = (src & 0x007FFFFFUL);
if (e == 255 && f != 0) {
/* NaN - Not a number */
value = DBL_MAX;
}
else if (e == 255 && f == 0 && s == 1) {
/* Negative infinity */
value = -DBL_MAX;
}
else if (e == 255 && f == 0 && s == 0) {
/* Positive infinity */
value = DBL_MAX;
}
else if (e > 0 && e < 255) {
/* Normal number */
f += 0x00800000UL;
if (s) f = -f;
value = ldexp(f, e - 127 - 23);
}
else if (e == 0 && f != 0) {
/* Denormal number */
if (s) f = -f;
value = ldexp(f, -126 - 23);
}
else if (e == 0 && f == 0 && s == 1) {
/* Negative zero */
value = 0;
}
else if (e == 0 && f == 0 && s == 0) {
/* Positive zero */
value = 0;
}
else {
/* Never happens */
}
return value;
}

=== Double-precision 64 bit ===

[[Double precision]] is essentially the same except that the fields are wider:

::[[Image:General_double_precision_float.PNG]]

The mantissa is much larger, while the exponent is only slightly larger. This is because precision is more valued than range, according to the creators of the standard.

NaNs and Infinities are represented with Exp being all 1s (2047).

For Normalized numbers the exponent bias is +1023 (so e is Exp &minus; 1023). For Denormalized numbers the exponent is &minus;1022 (the minimum exponent for a normalized number&mdash;it is not &minus;1023 because normalised numbers have a leading 1 digit before the binary point and denormalized numbers do not). As before, both infinity and zero are signed.

Notes:
# The smallest non-zero positive and largest non-zero negative numbers (represented by the denormalized value with all 0s in the Exp field and the binary value 1 in the Fraction field) are
#: ±2<sup>&minus;1074</sup> ≈ ±5{{e|−324}}
# The smallest non-zero positive and largest non-zero negative normalized numbers (represented by the value with the binary value 1 in the Exp and 0 in the Fraction field) are
#: ±2<sup>&minus;1022</sup> ≈ ±2.2250738585072020{{e|−308}}
# The largest finite positive and smallest finite negative numbers (represented by the value with 1022 in the Exp field and all 1s in the Fraction field) are
#: ±(2<sup>1024</sup> − 2<sup>971</sup>) ≈ ±1.7976931348623157{{e|308}}

=== Comparing floating-point numbers ===

Comparing floating-point numbers is usually best done using floating-point instructions. However, this representation makes comparisons of some subsets of numbers possible on a byte-by-byte basis, if they share the same byte order and the same sign, and NaNs are excluded.

For example, for two positive floating-point numbers a and b, a comparison between a and b (>, <, or ==) gives identical results as the comparison of two signed (or unsigned) binary integers with the same bit patterns and same byte order as a and b. In other words, two positive floating-point numbers (known not to be NaNs) can be compared with a signed (or unsigned) binary integer comparison using the same bits, providing the floating-point numbers use the same byte order. Because the byte order matters, this type of comparison cannot be used in portable code through a union in the [[C programming language]]. This is an example of [[lexicographic ordering]].

=== Rounding floating-point numbers ===

The IEEE standard has four different rounding modes.

* '''Unbiased''' which rounds to the nearest value, if the number falls midway it is rounded to the nearest value with an even (zero) least significant bit. This mode is required to be default.
* '''Towards zero'''
* '''Towards positive infinity'''
* '''Towards negative infinity'''

== Recommended functions and predicates ==

* Under some C compilers, copysign(x,y) returns x with the sign of y, so abs(x) = copysign(x,1.0). Note that this is one of the few operations which operates on a NaN in a way resembling arithmetic. Note that copysign is not a standard C function.
* &minus;x returns x with the sign reversed. Note that this is different than 0&minus;x in some cases, notably when x is 0. So &minus;(0) is &minus;0, but the sign of 0&minus;0 depends on the rounding mode.
* scalb (y, N)
* logb (x)
* finite (x) a [[predicate]] for "x is a finite value", equivalent to &minus;Inf < x < Inf
* isnan (x) a predicate for "x is a nan", equivalent to "x ≠ x"
* x <> y which turns out to have different exception behavior than NOT(x = y).
* Unordered (x, y) is true when "x is unordered with y", i.e., either x or y is a NaN.
* class (x)
* nextafter(x,y) returns the next representable value from x in the direction towards y

== References ==

* [http://www.opencores.org/projects.cgi/web/fpu100/fpu_doc.pdf Floating Point Unit] by Jidan Al-Eryani

== Revision of the standard ==

Note that the IEEE 754 standard is [[as_of_2006|currently]] under revision. See: [[IEEE 754r]]

== See also ==

* [[&minus;0 (number)|&minus;0]] (negative zero)
* [[IEEE 754r]] working group to revise IEEE 754-1985.
* [[NaN]] (Not a Number)
* [[minifloat]] for simple examples of properties of IEEE 754 floating point numbers

== External links ==

* [http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html IEEE 754 references]
* [http://www.d6.com/users/checker/pdfs/gdmfp.pdf Let's Get To The (Floating) Point by Chris Hecker]
* [http://docs.sun.com/source/806-3568/ncg_goldberg.html What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg] - a good introduction and explanation.
* [http://www2.hursley.ibm.com/decimal/854mins.html IEEE 854-1987] History and minutes
* [http://www.h-schmidt.net/FloatApplet/IEEE754.html Converter]
* [http://babbage.cs.qc.edu/courses/cs341/IEEE-754.html Another Converter]

[[Category:Computer arithmetic]]
[[Category:IEEE standards]]

[[de:IEEE 754]]
[[es:IEEE punto flotante]]
[[fr:IEEE 754]]
[[ko:IEEE 754]]
[[it:IEEE 754]]
[[hu:IEEE lebegőpontos számformátum]]
[[ja:IEEE754]]
[[pl:IEEE 754]]

Revision as of 08:33, 3 October 2006

The IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation, and is followed by many CPU and FPU implementations. The standard defines formats for representing floating-point numbers (including negative zero and denormal numbers) and special values (infinities and NaNs) together with a set of floating-point operations that operate on these values. It also specifies four rounding modes and five exceptions (including when the exceptions occur, and what happens when they do occur).

IEEE 754 specifies four formats for representing floating-point values: single-precision (32-bit), double-precision (64-bit), single-extended precision (≥ 43-bit, not commonly used) and double-extended precision (≥ 79-bit, usually implemented with 80 bits). Only 32-bit values are required by the standard; the others are optional. Many languages specify that IEEE formats and arithmetic be implemented, although sometimes it is optional. For example, the C programming language, which pre-dated IEEE 754, now allows but does not require IEEE arithmetic (the C float typically is used for IEEE single-precision and double uses IEEE double-precision).

The full title of the standard is IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985), and it is also known as IEC 60559:1989, Binary floating-point arithmetic for microprocessor systems (originally the reference number was IEC 559:1989).[1]

Anatomy of a floating-point number

Following is a description of the standards' format for floating-point numbers.

Bit conventions used in this article

Bits within a word of width W are indexed by integers in the range 0 to W−1 inclusive. The bit with index 0 is drawn on the right. The lowest indexed bit is usually the least significant.

General layout

Binary floating-point numbers are stored in a sign-magnitude form as follows:

File:General floating point.PNG

where the most significant bit is the sign bit, exponent is the biased exponent, and mantissa is the significand minus the most significant bit.

Exponent biasing

The exponent is biased by . Biasing is done because exponents have to be signed values in order to be able to represent both tiny and huge values, but two's complement, the usual representation for signed values, would make comparison harder. To solve this the exponent is biased before being stored, by adjusting its value to put it within an unsigned range suitable for comparison.

For example, to represent a number which has exponent of 17, exponent is .

Cases

The most significant bit of the mantissa is determined by the value of exponent. If exponent , the most significant bit of the mantissa is 1, and the number is said to be normalized. If exponent is 0, the most significant bit of the mantissa is 0 and the number is said to be de-normalized. Three special cases arise:

  1. if exponent is 0 and mantissa is 0, the number is ±0 (depending on the sign bit)
  2. if exponent = and mantissa is 0, the number is ±infinity (again depending on the sign bit), and
  3. if exponent = and mantissa is not 0, the number being represented is not a number (NaN).

This can be summarized as:

Type Exponent Mantissa
Zeroes 0 0
Denormalized numbers 0 non zero
Normalized numbers to any
Infinities 0
NaNs non zero

Single-precision 32 bit

A single-precision binary floating-point number is stored in a 32-bit word:

The exponent is biased by in this case, so that exponents in the range −126 to +127 are representable. An exponent of −127 would be biased to the value 0 but this is reserved to encode that the value is a denormalized number or zero. An exponent of 128 would be biased to the value 255 but this is reserved to encode an infinity or not a number.

For normalised numbers, the most common, Exp is the biased exponent and Fraction is the fractional part of the significand. The number has value v:

v = s × 2e × m

Where

s = +1 (positive numbers) when the sign bit is 0

s = −1 (negative numbers) when the sign bit is 1

e = Exp − 127 (in other words the exponent is stored with 127 added to it, also called "biased with 127")

m = 1.Fraction in binary (that is, the significand is the binary number 1 followed by the radix point followed by the binary bits of Fraction). Therefore, 1 ≤ m < 2.

Notes:

  1. Denormalized numbers are the same except that e = −126 and m is 0.Fraction. (e is NOT −127 : The significand has to be shifted to the right by one more bit, in order to include the leading bit, which is not always 1 in this case. This is balanced by incrementing the exponent to −126 for the calculation.)
  2. −126 is the smallest exponent for a normalized number
  3. There are two Zeroes, +0 (S is 0) and −0 (S is 1)
  4. There are two Infinities +∞ (S is 0) and −∞ (S is 1)
  5. NaNs may have a sign and a significand, but these have no meaning other than for diagnostics; the first bit of the significand is often used to distinguish signaling NaNs from quiet NaNs
  6. NaNs and Infinities have all 1s in the Exp field.
  7. The smallest non-zero positive and largest non-zero negative numbers (represented by the denormalized value with all 0s in the Exp field and the binary value 1 in the Fraction field) are
    ±2−149 ≈ ±1.4012985×10−45
  8. The smallest non-zero positive and largest non-zero negative normalized numbers (represented with the binary value 1 in the Exp field and 0 in the Fraction field) are
    ±2−126 ≈ ±1.175494351×10−38
  9. The largest finite positive and smallest finite negative numbers (represented by the value with 254 in the Exp field and all 1s in the Fraction field) are
    ±(2128 − 2104) ≈ ±3.4028235×1038

An example

Let us encode the decimal number −118.625 using the IEEE 754 system.

  1. First we need to get the sign, the exponent and the fraction. Because it is a negative number, the sign is "1".
  2. Now, we write the number (without the sign) using binary notation. The result is 1110110.101.
  3. Next, let's move the radix point left, leaving only a 1 at its left: 1110110.101 = 1.110110101 × 26. This is a normalized floating point number. The mantissa is the part at the right of the radix point, filled with 0 on the right until we get all 23 bits. That is 11011010100000000000000.
  4. The exponent is 6, but we need to convert it to binary and bias it (so the most negative exponent is 0, and all exponents are non-negative binary numbers). For the 32-bit IEEE 754 format, the bias is 127 and so 6 + 127 = 133. In binary, this is written as 10000101.

Putting them all together:

File:Floating point example.PNG

C Source

This takes a character array as read from a file of 4 bytes (32 bits) and converts it to a float using the IEEE Standard; DBL_MAX is imported from <float.h>

 float arrayToFloat(unsigned char data[4])
 {
   int s, e;
   unsigned long src;
   long f;
   float value;
 
   src = ((unsigned long)(data[0] & 0x000000FF) << 24) + 
         ((unsigned long)(data[1] & 0x000000FF) << 16) +
         ((unsigned long)(data[2] & 0x000000FF) << 8) +
         ((unsigned long)(data[3] & 0x000000FF));
 
   s = (src & 0x80000000UL) >> 31;
   e = (src & 0x7F800000UL) >> 23;
   f = (src & 0x007FFFFFUL);
 
   if (e == 255 && f != 0) {
     /* NaN - Not a number */
     value = DBL_MAX;
   }
   else if (e == 255 && f == 0 && s == 1) {
     /* Negative infinity */
     value = -DBL_MAX;
   }
   else if (e == 255 && f == 0 && s == 0) {
     /* Positive infinity */
     value = DBL_MAX;
   }
   else if (e > 0 && e < 255) {
     /* Normal number */
     f += 0x00800000UL;
     if (s) f = -f;
     value = ldexp(f, e - 127 - 23);
     }
   else if (e == 0 && f != 0) {
     /* Denormal number */
     if (s) f = -f;
     value = ldexp(f, -126 - 23);
   }
   else if (e == 0 && f == 0 && s == 1) {
     /* Negative zero */
     value = 0;
   }
   else if (e == 0 && f == 0 && s == 0) {
     /* Positive zero */
     value = 0;
   }
   else {
     /* Never happens */
   }
 
   return value;
 }

Double-precision 64 bit

Double precision is essentially the same except that the fields are wider:

File:General double precision float.PNG

The mantissa is much larger, while the exponent is only slightly larger. This is because precision is more valued than range, according to the creators of the standard.

NaNs and Infinities are represented with Exp being all 1s (2047).

For Normalized numbers the exponent bias is +1023 (so e is Exp − 1023). For Denormalized numbers the exponent is −1022 (the minimum exponent for a normalized number—it is not −1023 because normalised numbers have a leading 1 digit before the binary point and denormalized numbers do not). As before, both infinity and zero are signed.

Notes:

  1. The smallest non-zero positive and largest non-zero negative numbers (represented by the denormalized value with all 0s in the Exp field and the binary value 1 in the Fraction field) are
    ±2−1074 ≈ ±5×10−324
  2. The smallest non-zero positive and largest non-zero negative normalized numbers (represented by the value with the binary value 1 in the Exp and 0 in the Fraction field) are
    ±2−1022 ≈ ±2.2250738585072020×10−308
  3. The largest finite positive and smallest finite negative numbers (represented by the value with 1022 in the Exp field and all 1s in the Fraction field) are
    ±(21024 − 2971) ≈ ±1.7976931348623157×10308

Comparing floating-point numbers

Comparing floating-point numbers is usually best done using floating-point instructions. However, this representation makes comparisons of some subsets of numbers possible on a byte-by-byte basis, if they share the same byte order and the same sign, and NaNs are excluded.

For example, for two positive floating-point numbers a and b, a comparison between a and b (>, <, or ==) gives identical results as the comparison of two signed (or unsigned) binary integers with the same bit patterns and same byte order as a and b. In other words, two positive floating-point numbers (known not to be NaNs) can be compared with a signed (or unsigned) binary integer comparison using the same bits, providing the floating-point numbers use the same byte order. Because the byte order matters, this type of comparison cannot be used in portable code through a union in the C programming language. This is an example of lexicographic ordering.

Rounding floating-point numbers

The IEEE standard has four different rounding modes.

  • Unbiased which rounds to the nearest value, if the number falls midway it is rounded to the nearest value with an even (zero) least significant bit. This mode is required to be default.
  • Towards zero
  • Towards positive infinity
  • Towards negative infinity

Recommended functions and predicates

  • Under some C compilers, copysign(x,y) returns x with the sign of y, so abs(x) = copysign(x,1.0). Note that this is one of the few operations which operates on a NaN in a way resembling arithmetic. Note that copysign is not a standard C function.
  • −x returns x with the sign reversed. Note that this is different than 0−x in some cases, notably when x is 0. So −(0) is −0, but the sign of 0−0 depends on the rounding mode.
  • scalb (y, N)
  • logb (x)
  • finite (x) a predicate for "x is a finite value", equivalent to −Inf < x < Inf
  • isnan (x) a predicate for "x is a nan", equivalent to "x ≠ x"
  • x <> y which turns out to have different exception behavior than NOT(x = y).
  • Unordered (x, y) is true when "x is unordered with y", i.e., either x or y is a NaN.
  • class (x)
  • nextafter(x,y) returns the next representable value from x in the direction towards y

References

Revision of the standard

Note that the IEEE 754 standard is currently under revision. See: IEEE 754r

See also

  • −0 (negative zero)
  • IEEE 754r working group to revise IEEE 754-1985.
  • NaN (Not a Number)
  • minifloat for simple examples of properties of IEEE 754 floating point numbers

External links