Extended UNIX coding
Extended UNIX code ( abbreviation EUC ) is an 8- bit - character encoding , mainly for Chinese , Japanese and Korean is needed. EUC is a collective term for various encodings that can encode up to four different character sets depending on the country . Originally developed by the Open Software Foundation (OSF), Unix International (UI) and the Unix System Laboratories Pacific ( USLP ) as the standard coding for UNIX systems, this coding is used less and less today because it is often based on more common local coding ( Shift -JIS , Big5 etc.) and / or Unicode ( UTF-8 ) has been replaced.
Similarities
All EUC codes have some things in common:
- They support up to four different character sets, called code sets in EUC terminology . Code set 0 is always (7-bit) - ASCII , code sets 1–3 are different depending on the subspecies.
- Code set 0 is always coded directly by a byte.
- There are two special characters ( escape characters ) that are used to switch to Code Set 2 or Code Set 3: SS2 (0x8e) and SS3 (0x8f).
- The non-ASCII range from 0xa0–0xff is used for multi-byte characters.
There are several coding options for code sets 1 to 3 (different depending on the sub-variant of EUC). The following codes are possible:
Code set | version 1 | Variant 2 | Variation 3 |
---|---|---|---|
Code set 0 | 1 byte: 0x21-0x7e | ||
Code set 1 | 1 byte: 0xa0-0xff | 2 bytes: 0xa0–0xff, 0xa0–0xff | 3 bytes: 0xa0–0xff, 0xa0–0xff, 0xa0–0xff |
Code set 2 | 2 bytes: 0x8e, 0xa0–0xff | 3 bytes: 0x8e, 0xa0–0xff, 0xa0–0xff | 4 bytes: 0x8e, 0xa0–0xff, 0xa0–0xff, 0xa0–0xff |
Code set 3 | 2 bytes: 0x8f, 0xa0–0xff | 3 bytes: 0x8f, 0xa0–0xff, 0xa0–0xff | 4 bytes: 0x8f, 0xa0–0xff, 0xa0–0xff, 0xa0–0xff |
EUC-JP
EUC-JP is the variant used in Japan.
Code set 0 is ASCII (more precisely JIS-Roman ) and is coded directly by a byte from the range 0x21 to 0x7e.
Code set 1 is JIS X 0208: 1997 and is coded by two characters (variant 2 in the table above)
Code Set 2 are half-width katakana , which are also coded by two bytes (variant 1 in the table). The second byte is only from the range 0xa1 to 0xdf, since there are only 56 katakana (and a handful of special characters) and these then correspond to the 1-byte coding from JIS X 0201: 1997 (only with the escape character 0x8e as a prefix ).
In Code Set 3, JIS X 0212: 1990 is coded in the three-byte variant.
EUC-KR
EUC-KR is the version of EUC used in Korea. It is similar to ISO-2022-KR (or KS X 1001 ).
EUC-CN
EUC-CN is used in China and is equivalent to GB2312 . It encodes the simplified Chinese characters.
EUC-TW
Originally developed for Taiwan, EUC-TW is rarely used. Big5 is much more common there . Both encode the traditional Chinese characters.