Move to front

from Wikipedia, the free encyclopedia

Move to front is a coding method that is well suited to further processing data from the Burrows-Wheeler transformation so that it can then be compressed more effectively.

Format of the input and output data

The input data for Move-to-Front are a finite alphabet and a character string from this alphabet. The alphabet can be, for example, ASCII or Unicode , but also bytes .

The output from Move-to-Front is a sequence of natural numbers, where each of the numbers is less than the length of the alphabet.

functionality

Move-to-front coding:

  1. Write the entire alphabet in a string a .
  2. For each character z of the input:
    1. Output the position of z in a .
    2. Remove z from a and put it back in the front.

This sequence means that characters that occur frequently in the input are relatively far to the front of the alphabet a during encoding . This in turn means that the output contains more small numbers than large ones, which in turn is useful for subsequently compressing the sequence of numbers, for example using Huffman coding .

Move-to-front decoding:

  1. Write the entire alphabet in a string a .
  2. For each number z of the input:
    1. Output the character at position z of a .
    2. Remove this symbol from a and put it back in front.

The decoding works almost exactly like the coding, except that the position at which the alphabet is changed is already known (the number from the input), while it has to be determined during the coding.

example

The string "Mississippi" is to be encoded with the MTF algorithm. The alphabet that is used is (for brevity) "ABCIMPSabcimps".

In the following table, characters are the input characters and the alphabet is the current alphabet. The output is the position of the character in the current alphabet (starting with 0), and Alphabet ' is the new alphabet created by moving the input character to the beginning.

character alphabet output Alphabet'
M. ABCIMPSabcimps 4th MABCIPSabcimps
i MABCIPSabcimps 10 iMABCIPSabcmps
s iMABCIPSabcmps 13 siMABCIPSabcmp
s siMABCIPSabcmp 0 siMABCIPSabcmp
i siMABCIPSabcmp 1 isMABCIPSabcmp
s isMABCIPSabcmp 1 siMABCIPSabcmp
s siMABCIPSabcmp 0 siMABCIPSabcmp
i siMABCIPSabcmp 1 isMABCIPSabcmp
p isMABCIPSabcmp 13 pisMABCIPSabcm
p pisMABCIPSabcm 0 pisMABCIPSabcm
i pisMABCIPSabcm 1 ipsMABCIPSabcm

The result of the coding is the text (4,10,13,0,1,1,0,1,13,0,1). If you omit the rearrangement of the alphabet, you get the text (4,10,13,13,10,13,13,10,12,12,10) for comparison. You can see that after a short "familiarization phase" (here 3 characters: 4,10,13), small numbers are output relatively often, which is good for subsequent compression.

To decode MTF again, you go the opposite way:

The sequence of numbers (4,10,13,0,1,1,0,1,13,0,1) should be decoded using the alphabet "ABCIMPSabcimps". In the following table, position is the number from the sequence of numbers to be decoded and character is the decoded character. The Alphabet and Alphabet columns are exactly the same as in the coding table above.

position alphabet character Alphabet'
4th ABCIMPSabcimps M. MABCIPSabcimps
10 MABCIPSabcimps i iMABCIPSabcmps
13 iMABCIPSabcmps s siMABCIPSabcmp
0 siMABCIPSabcmp s siMABCIPSabcmp
1 siMABCIPSabcmp i isMABCIPSabcmp
1 isMABCIPSabcmp s siMABCIPSabcmp
0 siMABCIPSabcmp s siMABCIPSabcmp
1 siMABCIPSabcmp i isMABCIPSabcmp
13 isMABCIPSabcmp p pisMABCIPSabcm
0 pisMABCIPSabcm p pisMABCIPSabcm
1 pisMABCIPSabcm i ipsMABCIPSabcm

The output of the decoding is "Mississippi" as expected.

literature

  • Pack like never before . In: c't magazine for computer technology . No. 16 . Verlag Heinz Heise, 2000, ISSN  0724-8679 , p. 194 .