8-bit clean
8-bit clean is the name of a system that correctly takes into account and processes all 8 bits of a byte .
Early code pages like ASCII encode the characters over 7 bits, while the eighth bit can be used for parity checking or other purposes.
Later code pages such as CP437 or CP850 , as well as the ISO-8859 series and UTF-8 are based on ASCII. By setting the eighth bit to 0, an ASCII character is converted into each of these code pages, while the 1 leads to a character with a different meaning (umlaut, graphic symbol, part of a multibyte sequence, etc.).
Conversely, texts that are written in code pages in which the eighth bit is defined, as well as binary data, must first be converted before processing by a 7-bit system. Otherwise special characters with the eighth bit set would be interpreted incorrectly and binary data would be corrupted. This procedure is common with mail systems ( SMTP , MIME , uuencode ).
It has become common practice to set the eighth bit to zero from the outset in ASCII-coded data, i.e. H. make sure that the data is 8-bit clean. An explicit conversion into the code pages mentioned is thus superfluous. The loss of parity information does not appear to be critical today because error-prone data transmissions are now protected by packet checksums.
Common applications and operating systems have been 8-bit clean since the 1990s, while SMTP continues to process 7-bit data for reasons of downward compatibility.
