Saturation arithmetic

Saturation arithmetic or saturation arithmetic is arithmetic in which all operations (such as addition or multiplication) take place in a fixed interval between a minimum and a maximum . If the result of an operation is greater than the interval maximum, it is set to this value. A result can never exceed the interval limit, but it also lingers on it. So you can say that the value is saturated at the interval limit . The same applies to the minimum, which cannot be fallen below.

Examples

The interval from −100 to 100 is given as an example. Then the following operations produce the specified results:

60 + 43 = 100
(60 + 43) - 150 = −50
43 - 150 = −100
60 + (43-150) = −40
10 × 11 = 100
99 × 99 = 100
30 × (5 - 1) = 100
30 × 5 - 30 × 1 = 70

As you can see from these examples, associative and distributive laws do not apply in saturation arithmetic. Therefore it is of minor importance in abstract mathematics. However, it has an important function in digital computing systems and algorithms.

Application in the multimedia sector

Image addition with and without saturation arithmetic

Saturation arithmetic is particularly important in the multimedia sector, for example when processing images.

For example, two images should be superimposed in computer graphics . Each pixel is represented as a 4- tuple (R, G, B, A), where R, G, B are the corresponding red, green, and blue components of the color. The corresponding components can assume integer values in the interval [0.255], with a higher value meaning higher color saturation. The A component is the alpha channel, which determines how “opaque” a pixel is. A also lies in the interval [0.255], where 0 means completely transparent and 255 means completely solid and opaque.

In the following example, only the R component should be considered; analogous considerations also apply to the other three.

If two images are placed on top of each other (e.g. in a graphical user interface), the R components of the respective pixels are added. The red component of the first pixel is, for example, 100 and that of the second 50. Then the red component of the “superimposed” pixel increases to 150.

But if the first pixel has an R value of 150 and the second one of 200, the intermediate result is R = 350 ; the modulo 256 expected results 94. The problem is obvious: the combination of the two reds is "less red" than the two original pixels. This is not a meaningful result in this application.

In practice, red cannot become “redder” than “totally red”, and “totally red” is the value 255. One can say that at this value the red channel is saturated (saturated).

The solution to the problem can be achieved with saturation arithmetic: In contrast to modulo arithmetic, the lower or upper interval limit is not exceeded or undershot if there is an overflow (underflow). In saturation arithmetic, the above example therefore applies: 150 + 200 = 255 . And 255 + x = 255 for all x . Likewise, 255 - 300 does not result in 211 (as in modulo arithmetic), but 0. Again, 0 - x = 0 for all x .

Implementation options

software

The following is a short pseudo-code that shows how saturation arithmetic can be mapped in software. The algorithm shown is, however, not suitable for practical use, since it assumes that, in principle, arithmetic is available with sufficient accuracy and that the corresponding result is only then limited to the actual interval.

WertIn := a + b; { eigentliche Berechnung,
          dieses wird nun auf das Intervall (min,max)
          abgebildet. WertAus enthält im Abschluss
          das Ergebnis der Berechnung.}

if (WertIn > max) then
   WertAus := max
else
if (WertIn < min) then
   WertAus := min
else
   WertAus := WertIn

hardware

Only addition is considered as an example here. There are n -bit numbers are added. An “ordinary” n- bit adder and a 2-way n- bit multiplexer are used for this. The carry bit of the adder is fed to the selection input of the multiplexer. The result of the adder is sent to one multiplexer input and the constant 1 with a width of n bits to the other . If the carry bit is set (i.e. an overflow has occurred), the multiplexer switches through constant 1, i.e. it always remains at the upper interval limit. If the carry bit is not set, the result of the adder is switched through.

literature

Uwe Brinkschulte, Theo Ungerer: Microcontrollers and microprocessors . Springer 2007, ISBN 3-540-46801-3 , pp. 24,344ff