Character set
A character set is a set of elements, called characters , from which character strings can be put together. Such elements can include the letters of an alphabet , digits , but also other symbols , such as special characters , the characters of the phonetic transcription of the IPA code or Braille , pictograms of various types or control characters . A character set is less than a character code that must also contain a defined numbering of the characters in the character set. In this respect, the examples listed below, like ASCII, are strictly speaking a character code. In information technology , in addition to the defined character set, the numbering of the characters and thus a character code is required. In times of the Internet, the character codes used must be internationally standardized in order to ensure a smooth exchange of data. This pressure greatly reduces the earlier variety of character codes.
The graphic design of a character is called a character or glyph, that of the entire character set is called a font or sentence (English font), the rules of punctuation, punctuation rules .
Character sets in scientific formalisms
Typical character sets are:
- Unit symbol for the units of physical quantities
- Formula symbols and math symbols
Character sets for computer systems
Character encodings traditionally known in computer science are the ASCII and EBCDIC codes. However, the latter has lost a lot of its importance. Character sets with internationally necessary characters that go beyond English have increasingly come to the fore, e.g. B. related character sets according to ANSI and in particular the internationally recognized Unicode standard .
Surname | introduction | bit | Code points | displayable characters | Norms | first use |
---|---|---|---|---|---|---|
ASCII | 1963 | 7th | 128 | 95 | ANSI X3.4-1968 | Teletype ASR-33 |
EBCDIC | 1964 | 8th | 256 | 93 to 192 | IBM mainframe | |
Unicode | 1991 | 21st | 1,114,112 | 120.737 (Unicode 8.0) | ISO 10646 | Xerox, Apple |
Some of the characters in a character set are the characters that can be displayed . Their number is less than the total number of characters provided in the character set, as part of the character set is used for other purposes, for example as non-printable control characters.
The part of the computer hardware that makes the characters visible on a screen or on a plotter is called the character generator .
International character sets
- ASCII - One of the oldest computer character sets (1963)
- ISO 646 - Defines national ASCII variants in 7-bit coding (1972)
- ISO / IEC 8859 family - With 15 different character encodings to cover all European languages as well as Arabic , Hebrew , Thai and Turkish (1986)
- Unicode and ISO / IEC 10646 - The international standard on which almost all modern computers are based (1991)
Computer company fonts
- EBCDIC , character set developed by IBM (1964)
- Macintosh Roman , MacCyrillic, and other proprietary character sets for Apple Mac computers prior to Mac OS X, which uses Unicode
- PETSCII , the character set of the 8-bit Commodore computers
- Commodore Amiga , character set derived from ISO-8859-1 for the 16-bit Commodore computer (" Amiga ")
- Windows and DOS code pages , e.g. B. Windows-1252 and MS-DOS code page 437 , code page 850
- Windows Glyph List 4
National variants
- ARMSCII - Armenian
- Big5 - character set for traditional Chinese characters ( Taiwan , overseas Chinese )
- DIN 66003 - German , national variant of ISO 646 (1974)
- EUC (Extended UNIX Coding) - Several East Asian languages
- GEOSTD - Georgian
- Guojia Biaozhun (GB) - character set for simplified Chinese characters
- HKSCS - A Hong Kong Standard for Cantonese (1999)
- ISCII - All Indian languages
- KOI8-R - Russian
- KOI8-U - Ukrainian
- Shift-JIS , also SJIS - Japanese , designed by Microsoft
- TIS-620 - Thai , similar to ISO 8859-11 (1990)
- TSCII - Tamil
- VISCII - Vietnamese
See also
literature
- Johannes Bergerhausen, Siri Poarangan: decodeunicode: The characters of the world . Hermann Schmidt, Mainz 2011, ISBN 978-3-87439-813-8
- The Unicode Standard, Version 6.0.0 . The Unicode Consortium, Mountain View CA 2011, ISBN 978-1-936213-01-6
- Richard Gillam: Unicode Demystified, a practical programmer's guide to the encoding standard. Addison-Wesley, Boston MA 2003, ISBN 0-201-70052-2
Web links
- schoenitzer.de - Basic knowledge and handling of encodings
- Joel Spolsky: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) . October 8, 2003