Base32

from Wikipedia, the free encyclopedia

Base32 describes a method for encoding of binary data in a string of only 32 different ASCII characters is (plus an additional 33 characters as padding at the end of data). Compared to the related method Base64 , it is suitable for data formats in which no distinction is made between uppercase and lowercase letters.

Basic principle

Base32 DE.svg

RFC 3548 describes the coding of any binary data as follows: Five bytes of 8 bits each (so a total of 40 bits) are divided into eight 5-bit groups. Each of these groups corresponds to a number between 0 and 31. These numbers are converted into "printable ASCII characters" and output using the conversion table below. If a complete 40-bit block can no longer be formed at the end, this block is padded with zero bytes and the 5-bit groups, which only consist of filler bits, are coded with = to tell the decoder how many filler bits have been added.

Coding table

While Base64 is used in machine-to-machine communication, Base32-like encodings are often used in areas where they are read and entered by humans. Various encodings are in use, the aim of which is to minimize the risk of confusion between characters that look similar and to specifically exclude from use individual characters that are believed to be ambiguous. This is why the Base32 numbers are usually translated into coded characters in tabular form.

Base32 according to RFC 3548 / RFC 4648

value character value character value character value character
0 A. 8th I. 16 Q 24 Y
1 B. 9 J 17th R. 25th Z
2 C. 10 K 18th S. 26th 2
3 D. 11 L. 19th T 27 3
4th E. 12 M. 20th U 28 4th
5 F. 13 N 21st V 29 5
6th G 14th O 22nd W. 30th 6th
7th H 15th P 23 X 31 7th

The digits 0 and 1 are not used because there is a risk of confusion with the letters O and I when reproduced in writing .

Base32hex according to RFC 4648

Base32hex according to RFC 4648
value character value character value character value character
0 0 8th 8th 16 G 24 O
1 1 9 9 17th H 25th P
2 2 10 A. 18th I. 26th Q
3 3 11 B. 19th J 27 R.
4th 4th 12 C. 20th K 28 S.
5 5 13 D. 21st L. 29 T
6th 6th 14th E. 22nd M. 30th U
7th 7th 15th F. 23 N 31 V

RFC 3548 has been superseded by RFC 4648 , which introduces another coding. Similar to the hexadecimal system , this uses the decimal digits for the values ​​0 to 9. The values ​​10 to 31 are represented by the letters A to V. As with hexadecimal numbers, the sequence of the coded values ​​is retained with lexicographical sorting.

This coding is used in DNSSEC , among others .

Bech32 encoding of Bitcoin addresses

Bech32 coding table
value character value character value character value character
0 q 8th G 16 s 24 c
1 p 9 f 17th 3 25th e
2 z 10 2 18th j 26th 6th
3 r 11 t 19th n 27 m
4th y 12 v 20th 5 28 u
5 9 13 d 21st 4th 29 a
6th x 14th w 22nd k 30th 7th
7th 8th 15th 0 23 H 31 l

Bitcoin addresses are usually given in a coding called " base58check ", which allows a relatively compact textual representation, but has some disadvantages in practice:

  • The text display is compact, but quite inefficient as a QR code .
  • Since upper and lower case letters are used, it is difficult to specify the addresses e.g. B. to pass it on orally.
  • The Base58 encoding is quite computationally intensive and requires 256-bit arithmetic.
  • The selected checksum was not selected after carefully considered error detection or correction options.

The format proposed in the Bitcoin Improvement Proposal 0173 (BIP0173) called "Bech32" tries to circumvent these disadvantages:

  • Base32 is about 15% longer than Base58. If addresses are passed on via copy and paste , however, the slightly longer length does not matter.
  • Lower case letters should be used in the text display. As a QR code, however, capital letters, as this allows the more compact "Alphanumeric Mode", which encodes 2 characters in 11 bits.
  • Base32 is efficient to implement using 32-bit arithmetic
  • The checksum algorithm was specifically selected for the desired error detection and correction properties.

Bech32 uses a special coding table that was designed in such a way that the coded 5-bit sequence of visually similar (and therefore most easily confused) characters always differs by more than just 1 bit, so that the checksum algorithm benefits from it.

More coding alphabets

In video games, passwords and level codes are often represented in a modified Base32 coding. The coding alphabets used are not standardized. Often digits and consonants are used to avoid generating “speaking” passwords.

ZRTP uses its own coding table called z-base-32, which has also been optimized to avoid misunderstandings when played back orally (e.g. via telephone).

Examples

Example coding for a byte with the value 0

step Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8
Integer value 0 - - - - - - -
Represented as 8 bits 00000000 - - - - - - -
Divided into 8 × 5 blocks 00000 000 ... - - - - - -
Padded missing zeros 00000 00000 - - - - - -
Integer value 0 0 - - - - - -
Base32 encoding A. A. = = = = = =

Example coding for the string "AB" (corresponds to the values ​​65 and 66 in ASCII coding)

step Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 Block 8
Integer values 65 66 - - - - - -
Represented as 8 bits 01000001 01000010 - - - - - -
Divided into 8 × 5 blocks 01000 00101 00001 0 .... - - - -
Padded missing zeros 01000 00101 00001 00000 - - - -
Integer values 8th 5 1 0 - - - -
Base32 encoding I. F. B. A. = = = =

See also

Web links

Individual evidence

  1. https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki
  2. philzimmermann.com