ISO / IEC 2022
ISO / IEC 2022 , Information Technology - Character code structure and extension techniques ( English Information Technology — Character code structure and extension techniques ) is an ISO standard that defines a technique for encoding multiple character sets and languages that cannot be encoded in 7 bits .
The character set should solve the problem of different, mutually incompatible character encodings and enable the coding of East Asian writing systems. A string encoded in ISO 2022 can easily be transported through 7-bit channels, which enables the character set to be used in mail and Usenet traffic. With mostly three or four byte long escape sequences, you can switch between several character sets. Either 94, 8,836 (in a 94 × 94 matrix) or 830,584 (in a three-dimensional 94 × 94 × 94 matrix) characters can be encoded per escape sequence, depending on its definition.
However, ISO / IEC 2022 was only able to establish itself in East Asian mail traffic; no version was published for western languages. Instead, Unicode was designed to accomplish this task.
There are three versions of ISO / IEC 2022 for the three East Asian scripts, ISO-2022-JP , ISO-2022-KR, and ISO-2022-CN .
ISO-2022-JP
ISO-2022-JP encodes the Japanese script . It is often used in mail traffic, otherwise Shift-JIS or EUC-JP are used.
The original version is described in RFC 1468 and contains the following four escape sequences :
-
ESC ( Bswitches to ASCII (1 byte) -
ESC ( Jswitches to JIS-Roman (1-byte) -
ESC $ @switches to JIS X 0208: 1978 (2-byte) -
ESC $ Bswitches to JIS X 0208: 1983 (2-byte)
ISO-2022-JP-1 is described in RFC 2237 and adds another escape sequence:
-
ESC $ ( Dswitches to JIS X 0212: 1990 (2-byte)
ISO-2022-JP-2 is described in RFC 1554 and adds further escape sequences to support additional languages. It extends ISO-2022-JP-1 by the following escape sequences:
-
ESC $ Aswitches to GB2312-1980 (2-byte) -
ESC $ ( Cswitches to KS C 5601-1987 (2-byte) -
ESC . Aswitches to ISO 8859-1 (1 byte) -
ESC . Fswitches to ISO 8859-7 (1 byte)
ISO-2022-JP-3 extends the original version with the following escape sequences:
-
ESC ( Iswitches to JIS X 0201 (1 byte) -
ESC $ ( Oswitches to JIS X 0213: 2000 , plane 1 (2-byte) -
ESC $ ( Pswitches to JIS X 0213: 2000 , plane 2 (2-byte)
ISO-2022-JP-2004 extends ISO-2022-JP-3 by the following escape sequence:
-
ESC $ ( Qswitches to JIS X 0213: 2004 , plane 1 (2-byte)
ISO-2022-KR
ISO-2022-KR encodes the Korean script and is used alongside EUC-KR on Korean websites. It only contains a single escape sequence:
-
ESC $ ( Cswitches to KS C 5601-1987 (2-byte)
ISO-2022-CN
ISO-2022-CN encodes the Chinese script (both short and long characters ) and is described in RFC 1922 . It is almost never used, EUC-CN or Big5 and HZ in mail traffic are found much more frequently. The coding contains the following escape sequences:
-
ESC $ ( Aswitches to GB2312-1980 (2-byte) -
ESC $ ( Gswitches to CNS 11643-1992 , plane 1 (2-byte) -
ESC $ ( Hswitches to CNS 11643-1992 , plane 2 (2-byte)
ISO-2022-CN-EXT extends the original character set by the following escape sequences:
-
ESC $ ( Eswitches to ISO-IR-165 (2-byte) -
ESC $ ( Iswitches to CNS 11643-1992 , plane 3 (2-byte) -
ESC $ ( Jswitches to CNS 11643-1992 , plane 4 (2-byte) -
ESC $ ( Kswitches to CNS 11643-1992 , plane 5 (2-byte) -
ESC $ ( Lswitches to CNS 11643-1992 , plane 6 (2-byte) -
ESC $ ( Mswitches to CNS 11643-1992 , plane 7 (2-byte)
Web links
- ECMA 35 (identical to ISO 2022)