Base85

from Wikipedia, the free encyclopedia

Under the name Base85 , different, mutually incompatible coding methods are summarized, which convert 8-bit binary data into a sequence of printable ASCII characters. They have in common that they encode blocks of four bytes each in five ASCII characters. This requires at least 85 different characters, which is what gave this process its name. The advantage is the slightly lower coding overhead of 25% compared to 33% that occurs with standardized Base64 coding.

This encoding is most widespread in the PostScript file format from Adobe , this encoding version is also known as Ascii85 .

Basic idea

Four bytes can assume 256 4 = 4,294,967,296 different possible states. In order to encode these with the least possible overhead , a suitable subset of the printable ASCII characters is selected, which makes it possible to get by with 5 characters. An alphabet of at least 85 characters is required for this, as 85 5 = 4,437,053,125 ≥ 4,294,967,296. (84 characters are not enough, because 84 5 = 4,182,119,424 <4,294,967,296).

If the four bytes are labeled with and and the five coded characters , the following conversion formula results:

In other words: The four bytes are interpreted as a four-digit number on the base 256 and converted into a five-digit number on the base 85.

The codes are now represented by certain printable ASCII characters.

PostScript

The Base85 coding in PostScript adds the value 33 to the values and thus uses the ASCII values ​​33 to 117, which correspond to the ASCII characters to . The only exception: four consecutive zero bytes are not encoded with, but with a single one . This simple type of data compression reduces or even compensates for the coding overhead of Base85, depending on the data content, especially since longer sequences of zero bytes can occur quite frequently, especially with raster graphics embedded in PostScript. When encoding, spaces and line breaks can be inserted as desired, for example to achieve a certain maximum line length. These characters are ignored during decoding. All other characters represent an error, whereupon the decoding stops. !u!!!!!z

IPv6 address coding according to RFC 1924

A slightly different coding was proposed in RFC 1924 for IPv6 addresses (note the date of publication of this RFC ). The 128-bit IPv6 address to be encoded is not divided into four blocks of 32 bits each, but rather as a 128-bit number. This is successively divided by 85, the residues occurring are the "digits" of the Base85 coding.

Each IPv6 address can be coded in 20 numbers from the range 0… 84. These numbers are assigned to ASCII characters using a look-up table, as the aim was to avoid certain ASCII characters during the encoding, which “could be problematic in certain environments”. The look-up table used is as follows:

value character value character value character value character value character
0 0 17th H 34 Y 51 p 68 )
1 1 18th I 35 Z 52 q 69 *
2 2 19th J 36 a 53 r 70 +
3 3 20th K 37 b 54 s 71 -
4th 4 21st L 38 c 55 t 72 ;
5 5 22nd M 39 d 56 u 73 <
6th 6 23 N 40 e 57 v 74 =
7th 7 24 O 41 f 58 w 75 >
8th 8 25th P 42 g 59 x 76 ?
9 9 26th Q 43 h 60 y 77 @
10 A 27 R 44 i 61 z 78 ^
11 B 28 S 45 j 62 ! 79 _
12 C 29 T 46 k 63 # 80 `
13 D 30th U 47 l 64 $ 81 {
14th E 31 V 48 m 65 % 82 |
15th F 32 W 49 n 66 & 83 }
16 G 33 X 50 o 67 ( 84 ~

The ASCII characters: " ',. /: [ \ ]as well as the space and the 33 control characters are not used .

Z85

Since the Ascii85 encoding used in PostScript and PDF uses characters that cannot be used in XML, JSON and string literals in many programming languages, another encoding format called Z85 was developed for ZeroMQ . It uses the coding table opposite and also codes binary data only in complete 4-byte blocks. If binary data whose length is not an integer multiple of 4 has to be processed, an application-specific padding has to be used.

The following printable ASCII characters are not used: " ', ; \ _ ` | ~

However, it also uses the characters &, <and >, which serve as tag delimiters and entity markers in HTML / XML and can therefore not be used without restrictions in the HTML / XML source text.

value character value character value character value character value character
0 0 17th h 34 y 51 P 68 !
1 1 18th i 35 z 52 Q 69 /
2 2 19th j 36 A 53 R 70 *
3 3 20th k 37 B 54 S 71 ?
4th 4 21st l 38 C 55 T 72 &
5 5 22nd m 39 D 56 U 73 <
6th 6 23 n 40 E 57 V 74 >
7th 7 24 o 41 F 58 W 75 (
8th 8 25th p 42 G 59 X 76 )
9 9 26th q 43 H 60 Y 77 [
10 a 27 r 44 I 61 Z 78 ]
11 b 28 s 45 J 62 . 79 {
12 c 29 t 46 K 63 - 80 }
13 d 30th u 47 L 64 : 81 @
14th e 31 v 48 M 65 + 82 %
15th f 32 w 49 N 66 = 83 $
16 g 33 x 50 O 67 ^ 84 #

Other uses

Despite the slightly lower overhead, the Base85 coding - except in special areas - could not establish itself. An even more efficient process now exists with Base91. For the ASCII coding of binary data in e-mails and Usenet articles, only Base64 coding according to the MIME standard is intended.

See also

Web links

Individual evidence

  1. rfc.zeromq.org
  2. basE91