Character set table

Text, words and characters are represented by numbers in computers, so it is necessary to establish an association between numbers and characters. This assignment is defined by a character set table which assigns numerical values to the characters and control characters that can be displayed . Alternative terms for the character set table are code page or character map .

history

Historical character set tables are often limited to 256 characters, which in turn means that a character set table with 256 characters can usually only store one additional alphabet in addition to the Latin alphabet. However, the use of these early, simple character set tables created problems. In some character set tables not all characters are adequately documented, or certain entries in the character set table are used differently. Furthermore, a text can often only use one character set table, which makes it difficult to integrate characters from other languages into the text. To solve these problems, Unicode was introduced. In contrast to normal character set tables, Unicode separates the assignment of numbers (so-called code points ) to characters and the coding of the characters. The various Unicode coding schemes can in turn be understood as character set tables. While a character set table defines the assignment of numbers to characters, fonts store the appearance of the characters. To display text on computers, both a character set table and a font are usually necessary.

The display of texts or file names with the wrong character set table leads to the display of incorrect characters. In German texts, umlauts or Eszett often suffer from this , even if the text remains largely legible. Texts with other writing systems, on the other hand, become illegible when displayed with the wrong character set table ( Mojibake ).

Examples

IBM PC (OEM) character set tables

These character set tables should only be used for compatibility with existing documents and systems. The use of Unicode is recommended for new systems and texts.

DBCS / MBCS

These code pages allow the storage of Asian characters for which the 256 characters resulting from 8 bits are not sufficient. 16-bit tuples are used for this ( DBCS / MBCS ), which allow up to 65536 different characters.

Important character set tables

For efficient processing on computers, character set tables are identified by numbers. The numbering of the character set tables is not standardized, however, so that different computers or operating systems can use different numbers.

Code page number	meaning	Character encoding
437	The original character set table of the IBM PC	char (8 bit)
720	Arabic alphabet	char (8 bit)
737	Greek alphabet	char (8 bit)
775	Estonian alphabet , Lithuanian alphabet, and Latvian alphabet	char (8 bit)
819	"Latin-1", corresponds to ISO 8859-1	char (8 bit)
850	"Multilingual (DOS-Latin-1)", Western European languages	char (8 bit)
852	Slavic languages ( Latin-2 ), Central European and Eastern European languages	char (8 bit)
855	Cyrillic alphabet	char (8 bit)
857	Turkish alphabet	char (8 bit)
858	"Multilingual" with euro symbols	char (8 bit)
860	Latin alphabet with special Portuguese characters	char (8 bit)
861	Icelandic alphabet	char (8 bit)
862	Hebrew alphabet	char (8 bit)
863	Latin alphabet with special French characters	char (8 bit)
864	Arabic alphabet	char (8 bit)
865	Danish and Norwegian - differs from 437 only by Ø (ø) instead of ¥ and ¢	char (8 bit)
866	Cyrillic alphabet	char (8 bit)
869	Greek alphabet	char (8 bit)
874	Thai alphabet	char (8 bit)
932	Japanese writing systems ( DBCS )	Mixed 8 and 16 bit
936	GBK for Chinese Abbreviations ( DBCS )	Mixed 8 and 16 bit
949	Hangul / Korean characters ( DBCS )	Mixed 8 and 16 bit
950	Traditional Chinese characters ( DBCS )	Mixed 8 and 16 bit
1200	UTF-16 LE little-endian ( Unicode )	Tuples of 16-bit words
1201	UTF-16 BE big-endian ( Unicode )	Tuples of 16-bit words
1250	Central and Eastern European languages	char (8 bit)
1251	Cyrillic alphabet	char (8 bit)
1252	Western European languages	char (8 bit)
1253	Greek alphabet	char (8 bit)
1254	Turkish alphabet	char (8 bit)
1255	Hebrew alphabet	char (8 bit)
1256	Arabic alphabet	char (8 bit)
1257	Baltic languages	char (8 bit)
1258	Vietnamese languages	char (8 bit)
10,000	Macintosh novel	char (8 bit)
10007	Macintosh Cyrillic	char (8 bit)
10029	Macintosh, Central European Languages	char (8 bit)
20127	US-ASCII	char (7 bit)
28591	ISO-8859-1	char (8 bit)
65000	UTF-7 ( Unicode )	Tuples of 8-bit words
65001	UTF-8 ( Unicode )	Tuples of 8-bit words

Web links