Asymmetric Numeral Systems

Asymmetric Numeral Systems ( ANS , asymmetric number systems ) are a family of Entropiekodierungen that Jaroslaw "Jarek" Duda at the Jagiellonian University were developed. ANS combines the compression rate of the arithmetic coding , which uses an almost exact probability distribution , with a computational effort comparable to the Huffman coding .

AMS is used, among other things, in the compression algorithms Zstandard and LZFSE , in the compression of the image formats PIK and JPEG XL .

Entropy coding

The sequence of 1000 zeros and ones would be 1000 bits when stored directly. If the sequence is known to contain only one one and 999 zeros, it is sufficient to only store the position of the one, which means that only bits are required. ${\ displaystyle \ lceil \ log _ {2} (1000) \ rceil = 10}$

The number of combinations of symbols with ones and zeros corresponds approximately with a probability of for ones according to the Stirling formula ${\ displaystyle n}$ ${\ displaystyle pn}$ ${\ displaystyle (1-p) n}$ ${\ displaystyle p \ in (0,1)}$

${\ displaystyle {n \ choose pn} \ approx 2 ^ {nh (p)} {\ text {for large}} n {\ text {and}} h (p) = - p \ log _ {2} (p ) - (1-p) \ log _ {2} (1-p).}$

Therefore, approximately bits are required to store such a sequence , the entropy corresponding to one symbol. In the case of , bits are still required, but far fewer with asymmetrical probability. For example, only about bits are required. ${\ displaystyle nh (p)}$ ${\ displaystyle h (p)}$ ${\ displaystyle p = 1/2}$ ${\ displaystyle n}$ ${\ displaystyle p = 0 {,} 11}$ ${\ displaystyle n / 2}$

An entropy coder enables a symbol sequence to be coded with a number of bits per symbol approximately corresponding to the entropy.

Basic concept of ANS

The basic idea is to encode information into a single natural number . In the usual binary system, a bit of information can be added to using the coding function, so that . When the coding function is used, all bits are shifted by one position and added to the least significant position. The decoding function enables the extraction of the previous number as well as the added symbol . By using the coding function several times, a sequence can be coded and decoding again in the reverse order by using the decoding function several times. ${\ displaystyle x}$ ${\ displaystyle s \ in \ {0.1 \}}$ ${\ displaystyle C (s, x) = 2x + s}$ ${\ displaystyle x}$ ${\ displaystyle x '= C (s, x) = 2x + s}$ ${\ displaystyle s}$ ${\ displaystyle D (x ') = ((x' {\ bmod {2}}), \ lfloor x '/ 2 \ rfloor)}$ ${\ displaystyle x}$ ${\ displaystyle s}$

The procedure described is optimal when the probability distribution of the two possible symbols is symmetrical, ie . This process is generalized by ANS for any set of symbols with an associated, often asymmetrical, probability distribution . ${\ displaystyle p_ {0} = p_ {1} = 1/2}$ ${\ displaystyle s \ in S}$ ${\ displaystyle (p_ {s}) _ {s \ in S}}$

After adding the information from to is respectively , where the number of bits of information in the number and the approximate number of bits correspond to the symbol . ${\ displaystyle s}$ ${\ displaystyle x}$ ${\ displaystyle x '= C (s, x) \ approx x / p_ {s}}$ ${\ displaystyle \ log _ {2} (x ') = \ log _ {2} (C (s, x)) \ approx \ log _ {2} (x) + \ log _ {2} (1 / p_ {s})}$ ${\ displaystyle \ log _ {2} (x)}$ ${\ displaystyle x}$ ${\ displaystyle \ log _ {2} (1 / p_ {s})}$ ${\ displaystyle s}$

Uniform binary variant (uABS)

The binary variant with roughly evenly distributed symbols with and . The coding function and the decoding function result as follows: ${\ displaystyle s \ in \ {0.1 \}}$ ${\ displaystyle p_ {1} = p}$ ${\ displaystyle p_ {0} = 1-p}$ ${\ displaystyle C (s, x)}$ ${\ displaystyle D (x)}$

${\ displaystyle {\ begin {aligned} C (s, x) & = {\ begin {cases} \ left \ lceil {\ frac {x + 1} {1-p}} \ right \ rceil & {\ textrm { falls}} \ s = 0 \\\ left \ lfloor {\ frac {x} {p}} \ right \ rfloor & {\ textrm {falls}} \ s = 1 \ end {cases}} \\ D (x ) & = (s, x_ {s}) \\ s & = \ lceil (x + 1) p \ rceil - \ lceil xp \ rceil \\ x_ {1} & = \ lceil xp \ rceil \\ x_ {0} & = x-x_ {1} = x- \ lceil xp \ rceil \ end {aligned}}}$

Range variant (rANS)

The range variant also uses arithmetic formulas, but in contrast to uABS allows a larger alphabet. It can be seen as a modification of a place value system in which some consecutive digits have been combined into areas.

The probability distribution of the set of symbols is approximately described by fractions of the form with and . The symbol is assigned to the area with a ranking system as a basis . The symbol can be determined from the position of a symbol in the place value system . The coding function and the decoding function result as follows: ${\ displaystyle (p_ {s}) _ {s \ in S}}$ ${\ displaystyle S = \ {0,1, \ dots, n-1 \}}$ ${\ displaystyle p_ {s} \ approx l_ {s} / m}$ ${\ displaystyle l_ {s} \ in \ mathbb {N}}$ ${\ textstyle m = \ sum _ {s} l_ {s}}$ ${\ displaystyle s}$ ${\ displaystyle \ {b_ {s}, \ dots, b_ {s + 1} -1 \}}$ ${\ textstyle b_ {s} = \ sum _ {i = 0} ^ {s-1}}$ ${\ displaystyle m}$ ${\ displaystyle y}$ ${\ textstyle s (y) = \ min \, \ {s: y <\ sum _ {i = 0} ^ {s} l_ {i} \}}$ ${\ displaystyle C (s, x)}$ ${\ displaystyle D (x)}$

${\ displaystyle {\ begin {aligned} C (s, x) & = m \ left \ lfloor {\ frac {x} {l_ {s}}} \ right \ rfloor + b_ {s} + (x {\ bmod {l}} _ {s}) \\ D (x) & = \ left (s, l_ {s} \ left \ lfloor {\ frac {x} {m}} \ right \ rfloor + (x {\ bmod {m}}) - b_ {s} \ right) \; {\ textrm {with}} \ s = s (x {\ bmod {m}}) \ end {aligned}}}$

In the encoder there are usually , and in tabular form, ideally also and in order to achieve a better running time. ${\ displaystyle l_ {s}}$ ${\ displaystyle b_ {s}}$ ${\ displaystyle s (y)}$ ${\ displaystyle l (y) = l_ {s (y)}}$ ${\ displaystyle b (y) = b_ {s (y)}}$

If the power of 2 is chosen, the multiplications and divisions can be replaced by faster bit-wise shifts and by bit-wise AND . This means that only one multiplication is required during decoding. ${\ displaystyle m}$ ${\ displaystyle x {\ bmod {m}}}$

Tabular variant (tANS)

The tabular variant packs the entire process for in a table that describes a finite automaton . This makes it possible to dispense with multiplications entirely. ${\ displaystyle x \ in [L, 2L-1]}$

Remarks

As with Huffman coding, changing the probability distribution of tANS is relatively expensive, which is why it is mainly used in static application scenarios.

In contrast to this, rANS is a faster alternative to range coding . It requires multiplications, but is more memory efficient and is suitable for dynamically adapted probability distributions.

Encoding and decoding of ANS are done in opposite directions. The decoding runs from back to front in the encoded data. So that a stack can be dispensed with during decoding , backward coding is often used in practice.

Individual evidence

↑ Timothy B. Lee: Inventor says Google is patenting work he put in the public domain. In: Ars Technica . June 10, 2018, accessed June 24, 2020 .
↑ Zstandard Compression Format. In: GitHub . Retrieved June 23, 2020 (English).
↑ Sergio De Simone: Apple Open-Sources its New Compression Algorithm LZFSE. In: InfoQ. July 2, 2016, accessed June 24, 2020 .
↑ PIK. In: GitHub . Retrieved June 24, 2020 .
↑ Alexander Rhatushnyak, Jan Wassenberg, Jon Sneyers, Jyrki Alakuijala, Lode Vandevenne: Committee Draft of JPEG XL Image Coding System . August 13, 2019, arxiv : 1908.03565 .
↑ ^a ^b Jarek Duda: Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding . January 6, 2014, arxiv : 1311.2540 .

[1] Timothy B. Lee: Inventor says Google is patenting work he put in the public domain. In: Ars Technica . June 10, 2018, accessed June 24, 2020 .

[2] Zstandard Compression Format. In: GitHub . Retrieved June 23, 2020 (English).

[3] Sergio De Simone: Apple Open-Sources its New Compression Algorithm LZFSE. In: InfoQ. July 2, 2016, accessed June 24, 2020 .

[4] PIK. In: GitHub . Retrieved June 24, 2020 .

[5] Alexander Rhatushnyak, Jan Wassenberg, Jon Sneyers, Jyrki Alakuijala, Lode Vandevenne: Committee Draft of JPEG XL Image Coding System . August 13, 2019, arxiv : 1908.03565 .

[Duda_2014-6] Jarek Duda: Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding . January 6, 2014, arxiv : 1311.2540 .