Token-based compression

from Wikipedia, the free encyclopedia

The token-based compression (English token-based compression ) is a method to save storage space during data processing. The pages of a document are represented as a collection of symbols ( tokens ) occurring in the document . Position information indicates where the symbols should appear. Each symbol is an image of a part of the document, such as a letter , a word or a graphic .

Multiple occurrences of the same character in the document are represented by using the image of the character only once. Each page of the document specifies which symbol appears on it and determines its position.

Frequently recurring keywords are replaced by abbreviations, tokens.

The compression rates with this method are quite high if the text to be encoded contains many repetitions. Token-based compression is unsuitable for entries with few or no repetitions.

Ausgangstext: Print "Hallo"; Print "Hier"
  Kodiertext: 3F "Hallo"; 3F "Hier"

See also