Big5

from Wikipedia, the free encyclopedia

Big5 is a character encoding for traditional Chinese characters . It encodes 13,062 Chinese characters (two characters are double-coded, however) and is by far the most widely used character set in the Republic of China (Taiwan) . The name Big5 is derived from the fact that this standard was jointly developed by the five largest Taiwanese computer manufacturers.

history

Before Big5 existed, various incompatible character sets such as the IBM 5550 were used in Taiwan . Big5 was supposed to replace these character sets and was introduced in 1984.

After its introduction, Big5 found widespread use and was introduced in a modified form in Windows as code page 950. CNS 11643 was later introduced to replace Big5, but this attempt failed. Because of this, Big5 itself was declared the official standard of Taiwan in 2003.

Besides Taiwan, Big5 is used in Hong Kong and Macau , which also use traditional characters.

Coding

For the coding of the Chinese characters, byte pairs are used in Big5. The first byte in such a pair is called the lead byte and can have values ​​from A1 hex to C6 hex or C9 hex to F9 hex . The second byte is called the trail byte (following byte) and can have values ​​40 hex to 7E hex or A1 hex to FE hex . Unofficially, the bytes in which the top bit is not set (00 hex to 7F hex ) are interpreted as ASCII characters. This means that characters in Big5 have a variable length of 1 or 2 bytes.

Structure and structure

Big5 is divided into several areas:

  • The area from 8140 hex to A0FE hex is reserved for private use.
  • The range from A140 hex to A3FF hex encodes punctuation marks, the Greek alphabet, and symbols.
  • The range from A440 hex to C67E hex encodes Chinese characters that are sorted first by dashes and then by radicals .
  • The area from C6A1 hex to C8FE hex is reserved for private use.
  • The range from C940 hex to F9D5 hex encodes further Chinese characters, which are also sorted first by dashes and then by radicals.
  • The area from F9D6 hex to FEFE hex is reserved for private use.

Extensions

Since Big5 is missing many of the required characters, both companies and government institutes have developed their own extensions to Big5.

E-Ten

E-Ten has added some characters from the IBM 5550 character set for their operating system:

  • The area A3C0 hex -A3E0 hex contains control characters.
  • The range C6A1 hex -C875 hex contains circled and bracketed numbers, radicals, Japanese Kana and the Cyrillic script .
  • The area F9D6 hex -F9FE hex contains seven additional Chinese characters and frame drawings .

Microsoft

Microsoft has created the code page 950 for Windows , which is practically identical to Big5, but also contains the characters from the range F9D6 hex -F9FE hex of the E-Ten extensions and the euro symbol .

HKSCS

Hong Kong also uses Big5. However, since this character set does not contain many of the characters required for Cantonese, Hong Kong developed the Hong Kong Supplementary Character Set , which is based on Big5, but contains many additional characters.

Web links