Eagle-32

Adler-32 is a simple checksum algorithm developed by Mark Adler based on Fletcher's checksum . Among other things, it is used by the zlib library to detect (accidental transmission) errors in the compressed data stream . In RFC 1950 is algorithm described in detail.

The Adler-32 algorithm is simple and can be quickly calculate than the known cyclic redundancy check (cyclic redundancy check) , but also offers less security in detecting random bit errors .

algorithm

The algorithm determines a 32-bit integer number, which is formed from the sequence of two 16-bit integer numbers s1 and s2 . The parameter s1 is initialized with 1 and the value of the next byte to be checked is added in each step. The parameter s2 is initialized with 0 and the current value of the parameter s1 is added in each step . Both sums are calculated modulo 65,521 (the largest prime number <2 ¹⁶ ).

Example implementation in C :

  /* Beispielcode zur Berechnung der Adler-32-Prüfsumme */

  #include <stdint.h> // fuer uint8_t/uint32_t
  #include <stddef.h> // fuer size_t
  uint32_t adler32(const void *buf, size_t buflength) {

     const uint8_t *buffer = (const uint8_t*)buf;

     uint32_t s1 = 1;
     uint32_t s2 = 0;

     for (size_t n = 0; n < buflength; n++) {
        s1 = (s1 + buffer[n]) % 65521;
        s2 = (s2 + s1) % 65521;
     }

     return (s2 << 16) | s1;
  }

Note example implementation

This example implementation is not optimized for speed, but for clarity and readability. For example, the very slow modulo operation does not need to be carried out for every data byte, but only if there is a threat of an overflow of the variables s1 or s2 . With a bit width of 32 bits (which is not guaranteed when using int , therefore uint32_t according to C99 above ), it is sufficient to carry out the modulo operation every 5552 bytes. With 64-bit (uint64_t) wide s1 and s2, even performing the modulo operation every 380,368,439 bytes would be sufficient, but this does not result in a noticeable increase in speed. For more details see residual class ring .

The high number of jumps in processors with a pipeline reduces the effective use of the quasi-parallel execution of individual instructions. It is therefore advisable to split the individual data into sub-blocks of no more than 5552 bytes, after which a modulo operation is carried out. These blocks should in turn be broken down into 16-byte sub-units that are added together in one loop. This so-called loop unrolling allows a speed increase of around 25-30% on modern processors with sufficiently large data.

Weaknesses of Adler-32

An optimal checksum algorithm generates a checksum that is as evenly distributed as possible over its value range. This is not the case with Adler-32 for short data sequences (<128 bytes), since the value for s1 does not overflow.

Due to the simple arithmetic structure of Adler-32, there are also many collisions, especially with similar input values. For example, if byte n of the input is increased by k, byte n + 1 decreased by 2 × k and byte n + 2 increased by k, both s1 (the sum of all bytes) and s2 (the sum of all intermediate values of s1) remain unchanged. This applies to any position n in the input and all positive or negative values of k, provided the bytes under consideration do not overflow. As a result, the Adler-32 32-bit checksum cannot even reliably secure a 24-bit input.

Only single and double bit errors are reliably detected, specifically by the sums s1 and s2. However, if bursts of three or more bit errors occur, as in the example above, reliable detection is not guaranteed.

For this reason, among other things, the Adler-32 checksum algorithm used in the implementation of the Stream Control Transmission Protocol was replaced by CRC-32 , since here only short data streams are used and the weakness of Adler-32 becomes apparent.

It is also relatively easy to generate a data stream with the same checksum through intentional modification. Therefore, Adler-32 cannot guarantee the integrity of the data. If such security is required, cryptographic hash functions such as SHA must be used.

literature

Jonathan Stone, Checksums and the internet , pp. 125ff, The Adler-32 checksum

Web links

RFC 1950 . (English). - Specification with example C code
ZLib - Implementation of the Adler-32 checksum in adler32.c
SIMD implementation of the Adler-32 checksum in Chrome - adler32_simd.c
RFC 3309 . (English). - Discuss the problems with short messages