ISO / IEC 2022

from Wikipedia, the free encyclopedia

ISO / IEC 2022 , Information Technology - Character code structure and extension techniques ( English Information Technology — Character code structure and extension techniques ) is an ISO standard that defines a technique for encoding multiple character sets and languages ​​that cannot be encoded in 7 bits .

The character set should solve the problem of different, mutually incompatible character encodings and enable the coding of East Asian writing systems. A string encoded in ISO 2022 can easily be transported through 7-bit channels, which enables the character set to be used in mail and Usenet traffic. With mostly three or four byte long escape sequences, you can switch between several character sets. Either 94, 8,836 (in a 94 × 94 matrix) or 830,584 (in a three-dimensional 94 × 94 × 94 matrix) characters can be encoded per escape sequence, depending on its definition.

However, ISO / IEC 2022 was only able to establish itself in East Asian mail traffic; no version was published for western languages. Instead, Unicode was designed to accomplish this task.

There are three versions of ISO / IEC 2022 for the three East Asian scripts, ISO-2022-JP , ISO-2022-KR, and ISO-2022-CN .

ISO-2022-JP

ISO-2022-JP encodes the Japanese script . It is often used in mail traffic, otherwise Shift-JIS or EUC-JP are used.

The original version is described in RFC 1468 and contains the following four escape sequences :

ISO-2022-JP-1 is described in RFC 2237 and adds another escape sequence:

ISO-2022-JP-2 is described in RFC 1554 and adds further escape sequences to support additional languages. It extends ISO-2022-JP-1 by the following escape sequences:

ISO-2022-JP-3 extends the original version with the following escape sequences:

ISO-2022-JP-2004 extends ISO-2022-JP-3 by the following escape sequence:

ISO-2022-KR

ISO-2022-KR encodes the Korean script and is used alongside EUC-KR on Korean websites. It only contains a single escape sequence:

ISO-2022-CN

ISO-2022-CN encodes the Chinese script (both short and long characters ) and is described in RFC 1922 . It is almost never used, EUC-CN or Big5 and HZ in mail traffic are found much more frequently. The coding contains the following escape sequences:

ISO-2022-CN-EXT extends the original character set by the following escape sequences:

Web links