GB2312

from Wikipedia, the free encyclopedia

GB2312 is a character set (Engl. Coded character set ) for simplified Chinese characters , which was introduced in 1980. It comprises a total of 7,445 characters, 6,763 of which are Chinese characters.

All characters are arranged in a 94 × 94 matrix, so a maximum of 8,836 characters are possible. This system is also used by JIS X 0208 and KS X 1001 .

The first area (lines 1 to 9) encodes punctuation marks as well as the Greek script , the Cyrillic script , Japanese kana , zhuyin and letters for pinyin . The other two areas contain Chinese characters: In lines 16 to 55, Chinese characters are sorted according to the Pinyin transliteration, lines 56 to 87 contain Chinese characters according to the order in the Kangxi dictionary .

Coding

The font itself, which is coding (English. Character Encoding Scheme ) to distinguish.
GB2312 is usually used in the form of EUC-CN . The two character sets US-ASCII (as 1-byte characters) and GB2312 (as 2-byte characters) are combined. To distinguish it from the ASCII characters, 160 (0xA0) are added to the row and column numbers of the GB2312 characters, so that bytes in the range 0xA1 to 0xFF are created. The 1st byte corresponds to the line number, the 2nd byte to the column number.
The 7-bit HZ coding was also common in mail traffic .

Further development

In 1995 GB2312 was extended by the specification GBK , which however never became an official standard and therefore did not have a GB number. However, when it was used on Windows, it found widespread use.
In 2000, GB2312 was officially superseded by GB18030 , but is still widely used.

Use on Windows

On Windows , GB2312 is available in EUC-CN encoding as code page 20936. To do this, the "Files for East Asian Languages" extension option must be installed under Windows XP. No expansion option is required under Windows 7, GB2312 is available here by default.
However, in some places on Windows, code page 936 is incorrectly referred to as GB2312. In reality, code page 936 is an implementation of GBK . In the "File Conversion" dialog of Microsoft Word 2003 and Word 2010, code page 936 as "Simplified Chinese (GB2312)" and code page 20936 as "Simplified Chinese (GB2312-80)" are offered for selection.

credentials

  1. Ken Lunde: CJKV Information Processing . O'Reilly, 1999, ISBN 1-56592-224-7 (1st edition) or ISBN 0-596-51447-6 (2nd edition 2009), app. E ( Memento of the original from November 22, 2004 in the Internet Archive ) Info: The archive link was automatically inserted and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. (PDF; 3.6 MB) @1@ 2Template: Webachiv / IABot / examples.oreilly.de
  2. RFC 2978
  3. (see GB standard )

Web links

  • Character table GB2312 from O'Reilly (PDF; 3.6 MB) or from C. Wittern, Kyoto (PDF; 3.6 MB) (Note: Lines 10 and 11 of this illustration contain the half-width variants of the ASCII characters (from line 3 ) and the special Latin characters used for pinyin (from line 8). These contents are subsequent additions.)
  • Character table in the form of EUC-CN by Ngai Kim Hoong