Base64

from Wikipedia, the free encyclopedia

Base64 is a method for encoding of 8- bit - binary data (such as executable programs. ZIP files or images) in a string that only readable, code page -independent ASCII character is.

It is used in the Internet standard Multipurpose Internet Mail Extensions (MIME) and is used there to send e-mail attachments. This is necessary to ensure the problem-free transport of any binary data, since SMTP in its original version was only designed for the transmission of 7-bit ASCII characters. The coding increases the space requirement of the data stream by 33–36% (33% due to the coding itself, up to a further 3% due to the line breaks inserted in the coded data stream). Base64 is also used, for example, to encode user names and passwords in HTTP basic authentication and to transfer SSH server certificates.

Coding procedure

To encode the characters A–Z, a–z, 0–9, +and /used as well =in the end. Since these characters also appear in the Extended Binary Coded Decimals Interchange Code (EBCDIC) (albeit in different code positions), loss-free data exchange between these platforms is ensured.

Encoding of Base64

For coding, three bytes of the byte stream (= 24 bits) are divided into four 6-bit blocks. Each of these 6-bit blocks forms a number from 0 to 63. These numbers are converted into "printable ASCII characters" and output using the conversion table below. The name of the algorithm is explained by this fact - each character of the coded data stream can be assigned a number from 0 to 63 (see table). From a mathematical point of view, this is like a base 64 place value system .

Padding : If the total number of input bytes cannot be divided by three, the text to be encoded is padded at the end with padding bytes consisting of zero bits, so that a number of bytes that can be divided by three results. In order to inform the decoder how many filler bytes have been added, the 6-bit blocks, which were created entirely from filler bytes, are also=coded. This means that zero, one or two=characters can appearat the end of a Base64-encoded file. In other words, as many=characters are appended as there are padding bytes.

Since the number of original bytes can always be clearly determined from the number of Base64 input characters, padding is not used in some contexts and protocols (deviating from the original Base64 definition).

If the input is to be encoded with bytes, the space required for the Base64-encoded content (without line breaks) is characters. (The brackets around the fraction represent the rounding up integer division .)

In the representation of very long Base64 strings, these are often broken (for example after every 64 characters), i.e. a line break is inserted. Such line breaks are irrelevant for decoding and are ignored.

Base64 character set

value character value character value character value character
dec. binary hex. dec. binary hex. dec. binary hex. dec. binary hex.
0 000000 00 A 16 010000 10 Q 32 100000 20th g 48 110000 30th w
1 000001 01 B 17th 010001 11 R 33 100001 21st h 49 110001 31 x
2 000010 02 C 18th 010010 12 S 34 100010 22nd i 50 110010 32 y
3 000011 03 D 19th 010011 13 T 35 100011 23 j 51 110011 33 z
4th 000100 04 E 20th 010100 14th U 36 100100 24 k 52 110100 34 0
5 000101 05 F 21st 010101 15th V 37 100101 25th l 53 110101 35 1
6th 000110 06 G 22nd 010110 16 W 38 100110 26th m 54 110110 36 2
7th 000111 07 H 23 010111 17th X 39 100111 27 n 55 110111 37 3
8th 001000 08 I 24 011000 18th Y 40 101000 28 o 56 111000 38 4
9 001001 09 J 25th 011001 19th Z 41 101001 29 p 57 111001 39 5
10 001010 0A K 26th 011010 1A a 42 101010 2A q 58 111010 3A 6
11 001011 0B L 27 011011 1B b 43 101011 2 B r 59 111011 3B 7
12 001100 0C M 28 011100 1C c 44 101100 2C s 60 111100 3C 8
13 001101 0D N 29 011101 1D d 45 101101 2D t 61 111101 3D 9
14th 001110 0E O 30th 011110 1E e 46 101110 2E u 62 111110 3E +
15th 001111 0F P 31 011111 1F f 47 101111 2F v 63 111111 3F /

The characters , and cannot be used for file names or URLs , as they are reserved for special functions there. In such a case, “base64url” describes an incompatible modification. The characters and are then replaced by (minus, ASCII 2D hex ) and (underscore, ASCII 5F hex ). The fill character at the end is coded as a percentage , but can be omitted if the length of the string is known. +/=+/-_=%3d

example

Polyfon zwitschernd aßen Mäxchens Vögel Rüben, Joghurt und Quark

The 64-character text is UTF-8 -coded 68 bytes long, since the Eszett and the umlauts each have a length of two bytes. With the conversion to Base64 it becomes a 92 characters long Base64 string:

UG9seWZvbiB6d2l0c2NoZXJuZCBhw59lbiBNw6R4Y2hlbnMgVsO2Z2VsIFLDvGJl
biwgSm9naHVydCB1bmQgUXVhcms=

It can be seen here that Base64 creates an illegible coding. However, this fact is not to be regarded as effective encryption , since the data stream of input and output is unchanged and it is therefore easily possible to reverse the coding if the character string is recognized as Base64-coded.

Radix-64

The OpenPGP data format defines a variant of Base64 called ASCII Armor . This consists of standardized headers and footers, which on the one hand indicate the beginning and the end of the data, and on the other hand give the human reader an indication of what type of data is encoded and with which program the data was generated.

A checksum ( CRC-24 ) is appended to the Base64-coded data ; this slightly modified process is called Radix-64 .

-----BEGIN PGP MESSAGE-----
Version: GnuPG v1.4.10 (GNU/Linux)

jA0EAwMCxamDRMfOGV5gyZPnyX1BBPOQAE4BHbh7PfTDInn+94hXmnBr9D8+4x5R
kNNl4E499Me3Fotq8/zvznEycz2h7vJ21SdP5akLhRPd4W1S79LoCvbZYh2x4t6x
Cnqev6S97ys4chOPgz0FePfKQos0I7+rrMSAc9+vXHmUCthFqp7FJJ7/D9bCfmdF
1qkYNhtk/P5uvZ0N2zAUsiScDJA=
=XXuR
-----END PGP MESSAGE-----

The Base64 part in this example begins with jA0E…and ends with …DJA=. This is followed by a line break, an equal sign and the base64-coded CRC-24 checksum over the original message (i.e. before the Base64 coding).

See also

Norms and standards

  • J. Linn:  RFC 1421 . - Privacy Enhancement for Internet Electronic Mail: Part I: Message Encryption and Authentication Procedures . February 1993. (Replaces RFC 1113 - historical - English).
  • N. Borenstein, N. Freed:  RFC 1521 . - MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies . September 1993. Standard: [draft]. (Replaces RFC 1341 - updated by RFC 1590  - obsolete - English).
  • S. Josefsson:  RFC 3548 . - The Base16, Base32, and Base64 Data Encodings . [Errata: RFC 3548 ]. July 2003 (obsolete, superseded by RFC 4648  - English).
  • S. Josefsson:  RFC 4648 . - The Base16, Base32, and Base64 Data Encodings . [Errata: RFC 4648 ]. October 2006. Standard: [proposed]. (Replaces RFC 3548 - English).
  • J. Callas, L. Donnerhacke, H. Finney, D. Shaw, R. Thayer:  RFC 4880 . - OpenPGP message format . [Errata: RFC 4880 ]. November 2007. Standard: [proposed]. (Updated by RFC 5581  - replaces RFC 1991 and RFC 2440 - English).

Individual evidence

  1. ^ S. Josefsson:  RFC 4648 . - The Base16, Base32, and Base64 Data Encodings . October 2006. Standard: [proposed]. (English).