American Standard Code for Information Interchange

The American Standard Code for Information Interchange ( ASCII , alternatively US-ASCII , often pronounced [ ˈæski ], German " American Standard Code for Information Interchange " ) is a 7-bit character coding ; it corresponds to the US version of ISO 646 and serves as the basis for later codings for character sets based on more bits .

The ASCII code was first approved by the American Standards Association (ASA) on June 17, 1963 as the ASA X3.4-1963 standard, and was substantially updated in 1967/1968 and last updated by its successor institutions in 1986 ( ANSI X3.4-1986) and is still used today. The character encoding defines 128 characters, consisting of 33 non-printable and the following 95 printable characters, starting with the space :

!"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
`abcdefghijklmnopqrstuvwxyz{|}~

The printable characters include the Latin alphabet in upper and lower case, the ten Arabic numerals, and some punctuation marks ( punctuation marks , word marks ) and other special characters . The set of characters largely corresponds to that of a keyboard or typewriter for the English language . In computers and other electronic devices that display text, this is usually stored in accordance with ASCII or backwards compatible ( ISO 8859 , Unicode ).

The non-printable control characters contain output characters such as line feed or tab characters , protocol characters such as end of transmission or confirmation, and separators such as data record separators.

Coding

Letters as 7-bit code
ASCII	Dec	Hex	Binary
`A`	65	41	(0) 100 0001
`B`	66	42	(0) 100 0010
`C`	67	43	(0) 100 0011
...	...	...	...
`Z`	90	5A	(0) 101 1010

A bit pattern of 7 bits is assigned to each character . Since each bit can take on two values, there are 2 ⁷ = 128 different bit patterns that can also be interpreted as the whole numbers 0–127 ( hexadecimal 00h – 7Fh).

The eighth bit, which is not used for ASCII, can be used for error correction purposes ( parity bit ) on the communication lines or for other control tasks. Today it is almost always used to expand ASCII to an 8-bit code. These extensions are largely compatible with the original ASCII , so that all characters defined in ASCII are also encoded in the various extensions using the same bit pattern. The simplest extensions are encodings with language-specific characters that are not included in the basic Latin alphabet, cf. below .

composition

ASCII character table, hexadecimal numbering
code	… 0	…1	… 2	… 3	… 4	… 5	… 6	… 7	…8th	… 9	… A	… B	... C	… D	… E	... F
0 ...	NUL	SOH	STX	ETX	EOT	ENQ	ACK	BEL	BS	HT	LF	VT	FF	CR	SO	SI
1…	DLE	DC1	DC2	DC3	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS	RS	US
2…	SP	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3…	0	1	2	3	4th	5	6th	7th	8th	9	:	;	<	=	>	?
4…	@	A.	B.	C.	D.	E.	F.	G	H	I.	J	K	L.	M.	N	O
5…	P	Q	R.	S.	T	U	V	W.	X	Y	Z	[	\	]	^	_
6…	`	a	b	c	d	e	f	G	H	i	j	k	l	m	n	O
7…	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~	DEL

The first 32 ASCII character codes (from 00 _hex to 1F _hex ) are for control characters (control character) reserved; see there for the explanation of the abbreviations in the table above. These characters do not represent characters, but serve (or were used) to control devices that use ASCII (such as printers). Control characters are, for example, the carriage return for the line break or Bell (the bell); their definition is historically based.

Code 20 _hex (SP) is the space (engl. Space or blank ) which is used in a text as a blank and separate words on the keyboard and by the space key is generated.

The codes 21 _hex to 7E _hex stand for printable characters that include letters, digits and punctuation marks ( punctuation marks , word characters ). The letters are only lower case and upper case of the Latin alphabet . Letter variants used in non-English languages - for example the German umlauts - are not included in the ASCII character set. Typographically correct dashes and quotation marks are also missing , the typography is limited to the typewriter type . The purpose was information exchange , not printing .

Code 7F _hex (all seven bits set to one) is a special character that is also known as a deletion character ( DEL ) . In the past, this code was used like a control character in order to be able to delete an already punched character on punched tape or punched cards by setting all the bits, i.e. by punching out all seven markings. This was the only way to erase, as holes once they have existed cannot be undone. Areas without holes (i.e. with the code 00 _hex ) were mainly found at the beginning and end of a perforated strip ( NUL ) .

For this reason there were only 126 characters in the actual ASCII, because the bit patterns 0 (0000000) and 127 (1111111) did not correspond to any character codes. Code 0 was later interpreted in the C programming language as the "end of the character string"; various graphic symbols have been assigned to the character 127.

history

Teletype

An early form of the character encoding was Morse code . It was ousted from the telegraph networks with the introduction of teleprinters and replaced by the Baudot code and Murray code . It was only a small step from the 5-bit Murray code to the 7-bit ASCII - ASCII was also first used for certain American teleprinter models , such as the Teletype ASR33 .

Dec	Hex	ASCII 1963	ASCII 1965	ASCII today
0-63	00-3F	see normal composition
64	40	`@`	`	`@`
65-91	41-5B	see normal composition
92	5C	`\`	`~`	`\`
93	5D	see normal composition
94	5E	`↑`	`^`
95	5F	`←`	`_`
96	60	unoccupied	`@`	`
97-122	61-7A	unoccupied	`a` - `z`
123	7B	unoccupied	`{`
124	7C	unoccupied	`¬`	`\|`
125	7D	unoccupied	`}`
126	7E	`ESC`	`\|`	`~`
127	7F	see normal composition

The first version, still without lowercase letters and with small deviations from today's ASCII for the control and special characters, was created in 1963.

The second form of the ASCII standard followed in 1965. Although the standard was approved, it was never published and therefore never applied. The reason for this was that it was reported to the ASA that the ISO (the International Standards Organization) was standardizing a character set that was similar to but slightly contradicting this standard.

In 1968 the version of the ASCII standard that is still valid today was established. This version gave birth to the Caesar encryption ROT47 as an extension of ROT13 . While ROT13 only rotates the Latin alphabet by half its length, ROT47 rotates all ASCII characters between 33 ( !) and 126 ( ~).

computer

At the beginning of the computer age, ASCII developed into the standard code for characters. For example, many terminals ( VT100 ) and printers were only controlled with ASCII.

For the coding of Latin characters, the 8-bit coding EBCDIC , incompatible with ASCII , is used almost exclusively on mainframes , which IBM developed parallel to ASCII for its System / 360 , at that time a serious competitor. The use of the alphabet is more difficult in EBCDIC, because there it is divided into two separate code areas. IBM itself used ASCII for internal documents. ASCII was supported by President Lyndon B. Johnson's 1968 arrangement to use it in government offices.

Use for other languages

With the International Alphabet 5 (IA5), a 7-bit coding based on ASCII was standardized as ISO 646 in 1963. The reference version (ISO 646-IRV) corresponds to ASCII except for one position. In order to be able to display letters and special characters in different languages (for example the German umlauts), 12 character positions were provided for redefinition ( #$@[\]^`{|}~). Simultaneous display is not possible. Failure to adapt the software to the variant used for the display often led to unintentionally funny results, e.g. B. When the Apple II was switched on, "APPLE ÜÄ" appeared instead of "APPLE] [".

Since there are characters that are used in programming, especially e.g. B. the various brackets, programming languages have been upgraded for internationalization using substitute combinations ( digraphs ). Only characters from the invariant part of ISO 646 were used for coding. The combinations are language-specific. For example, Pascal (* and *)the curly brackets correspond to ( {}), while C <% and %>provides for it.

Extensions

Use of the remaining 128 positions in the byte

To overcome the incompatibilities of national 7-bit variants of ASCII, various manufacturers first developed their own ASCII-compatible 8-bit codes (i.e. those that match ASCII in the first 128 positions). The code page 437 called Code has long been the most widely used, he came on the IBM PC under English MS-DOS , and is still in the DOS window of English Microsoft Windows used. In their German installations, the Western European code page 850 has been the standard since MS-DOS 3.3 .

Eight bits were also used in later standards such as ISO 8859 . There are several variants, for example ISO 8859-1 for the Western European languages, which was adopted in Germany as DIN 66303 . German-language versions of Windows (except DOS windows) use the Windows-1252 encoding based on ISO 8859-1 - this is why the German umlauts, for example, look incorrect if text files were created under DOS and viewed under Windows.

Beyond 8 bits

Many older programs that used the eighth bit for their own purposes couldn't handle it. Over time, they have often been adapted to the new requirements.

Even 8-bit codes, in which one byte stood for one character, offered too little space to accommodate all characters of human writing culture at the same time. This made several different specialized extensions necessary. In addition, there are some ASCII-compatible codes, especially for the East Asian region, which either switch between different code tables or require more than one byte for each non-ASCII character. However, none of these 8-bit extensions is "ASCII", because that only describes the uniform 7-bit code.

In order to meet the requirements of the various languages, Unicode (identical in its character set to ISO 10646 ) was developed. It uses up to 32 bits per character and could thus differentiate between over four billion different characters, but is restricted to around one million permitted code points . This means that all characters previously used by humans can be displayed, provided they have been included in the Unicode standard. UTF-8 is an 8-bit encoding of Unicode that is backwards compatible with ASCII. One character can take up one to four 8-bit words . Seven-bit variants no longer have to be used, but Unicode can also be encoded in seven bits with the help of UTF-7 . UTF-8 became the standard for many operating systems. For example, Apple's macOS and some Linux distributions use UTF-8 by default, and more than 90% of the websites are created in UTF-8.

Formatting marks compared to markup languages

ASCII contains only a few characters that are generally used for formatting or structuring text; these emerged from the control commands of the teleprinters . These include in particular the line feed, the carriage return, the horizontal tab character , the form feed and the vertical tab character. In typical ASCII text files , in addition to the printable characters, there is usually only the carriage return or the line feed to mark the end of the line; in DOS and Windows systems both are usually used one after the other, with older Apple and Commodore computers (without Amiga ) only the carriage return and on Unix-like and Amiga systems only the line feed. The use of additional characters for text formatting is handled differently. Markup languages such as HTML are now more commonly used to format text .

Compatible character encodings

Most of the character encodings are designed in such a way that they use the same code as ASCII for characters between 0… 127 and the range above 127 for other characters.

Fixed length codings (selection)

There is a fixed number of bytes for one character. In most encodings, this is one byte per character - a single byte character set or SBCS for short. With the East Asian scripts there are two or more bytes per character, which means that these encodings are no longer ASCII-compatible. The compatible SBCS character sets correspond to the ASCII extensions discussed above:

ISO 8859 with 15 different character encodings to cover all European languages, Turkish , Arabic , Hebrew and Thai (see table on the right)
MacRoman , MacCyrillic and other proprietary character sets for Apple Mac computers prior to Mac OS X
DOS code pages (e.g. 437, 850) and Windows code pages (e.g. Windows-1252 )
KOI8-R for Russian and KOI8-U for Ukrainian
ARMSCII-8 and ARMSCII-8a for Armenian
GEOSTD for Georgian
ISCII for all Indian languages
TSCII for Tamil

MS-DOS code pages
437	English
708	Arabic (ASMO)
720	Arabic (Microsoft)
737	Greek
775	Baltic
850	Western European
852	Central European
855	Cyrillic
857	Turkish
858	Western European with euro
860	Portuguese
861	Icelandic
862	Hebrew
863	Canadian French
864	Arabic (IBM)
865	Nordic
866	Russian
869	Greek

Windows code pages
0874	Thai
0932	Japanese
0936	Simplified Chinese
0949	Korean
0950	Traditional Chinese
1250	Central European
1251	Cyrillic
1252	Western European
1253	Greek
1254	Turkish
1255	Hebrew
1256	Arabic
1257	Baltic
1258	Vietnamese

ISO 8859
-1	Latin-1 , Western European
-2	Latin-2 , Central European
-3	Latin-3 , Southern European
-4	Latin-4 , Northern European
-5	Cyrillic
-6	Arabic
-7	Greek
-8th	Hebrew
-9	Latin-5 , Turkish
-10	Latin-6 , Nordic
-11	Thai
~~-12~~	(does not exist)
-13	Latin-7 , Baltic
-14	Latin-8 , Celtic
-15	Latin-9 , Western European
-16	Latin-10 , Southeast European

Variable length codings

In order to be able to encode more characters, the characters 0 to 127 are encoded in one byte, other characters are encoded by several bytes with values greater than 127:

UTF-8 and GB 18030 for Unicode
ISO 6937 for European languages with Latin script
Big5 for Traditional Chinese ( Republic of China (Taiwan) , overseas Chinese )
EUC (Extended UNIX Coding) for several East Asian languages
GB (Guojia Biaozhun) for Simplified Chinese ( PRC )

ASCII table

In addition to the hexadecimal codes, the following table also shows the decimal and octal codes .

Dec	Hex	Oct	ASCII
0	00	000	`NUL`
1	01	001	`SOH`
2	02	002	`STX`
3	03	003	`ETX`
4th	04	004	`EOT`
5	05	005	`ENQ`
6th	06	006	`ACK`
7th	07	007	`BEL`
8th	08	010	`BS`
9	09	011	`HT`
10	0A	012	`LF`
11	0B	013	`VT`
12	0C	014	`FF`
13	0D	015	`CR`
14th	0E	016	`SO`
15th	0F	017	`SI`
16	10	020	`DLE`
17th	11	021	`DC1`
18th	12	022	`DC2`
19th	13	023	`DC3`
20th	14th	024	`DC4`
21st	15th	025	`NAK`
22nd	16	026	`SYN`
23	17th	027	`ETB`
24	18th	030	`CAN`
25th	19th	031	`EM`
26th	1A	032	`SUB`
27	1B	033	`ESC`
28	1C	034	`FS`
29	1D	035	`GS`
30th	1E	036	`RS`
31	1F	037	`US`

Dec	Hex	Oct	ASCII
32	20th	040	`SP`
33	21st	041	`!`
34	22nd	042	`"`
35	23	043	`#`
36	24	044	`$`
37	25th	045	`%`
38	26th	046	`&`
39	27	047	`'`
40	28	050	`(`
41	29	051	`)`
42	2A	052	`*`
43	2 B	053	`+`
44	2C	054	`,`
45	2D	055	`-`
46	2E	056	`.`
47	2F	057	`/`
48	30th	060	`0`
49	31	061	`1`
50	32	062	`2`
51	33	063	`3`
52	34	064	`4`
53	35	065	`5`
54	36	066	`6`
55	37	067	`7`
56	38	070	`8`
57	39	071	`9`
58	3A	072	`:`
59	3B	073	`;`
60	3C	074	`<`
61	3D	075	`=`
62	3E	076	`>`
63	3F	077	`?`

Dec	Hex	Oct	ASCII
64	40	100	`@`
65	41	101	`A`
66	42	102	`B`
67	43	103	`C`
68	44	104	`D`
69	45	105	`E`
70	46	106	`F`
71	47	107	`G`
72	48	110	`H`
73	49	111	`I`
74	4A	112	`J`
75	4B	113	`K`
76	4C	114	`L`
77	4D	115	`M`
78	4E	116	`N`
79	4F	117	`O`
80	50	120	`P`
81	51	121	`Q`
82	52	122	`R`
83	53	123	`S`
84	54	124	`T`
85	55	125	`U`
86	56	126	`V`
87	57	127	`W`
88	58	130	`X`
89	59	131	`Y`
90	5A	132	`Z`
91	5B	133	`[`
92	5C	134	`\`
93	5D	135	`]`
94	5E	136	`^`
95	5F	137	`_`

Dec	Hex	Oct	ASCII
96	60	140	`
97	61	141	`a`
98	62	142	`b`
99	63	143	`c`
100	64	144	`d`
101	65	145	`e`
102	66	146	`f`
103	67	147	`g`
104	68	150	`h`
105	69	151	`i`
106	6A	152	`j`
107	6B	153	`k`
108	6C	154	`l`
109	6D	155	`m`
110	6E	156	`n`
111	6F	157	`o`
112	70	160	`p`
113	71	161	`q`
114	72	162	`r`
115	73	163	`s`
116	74	164	`t`
117	75	165	`u`
118	76	166	`v`
119	77	167	`w`
120	78	170	`x`
121	79	171	`y`
122	7A	172	`z`
123	7B	173	`{`
124	7C	174	`\|`
125	7D	175	`}`
126	7E	176	`~`
127	7F	177	`DEL`

Eponyms

The asteroid (3568) ASCII , discovered in 1936, was named after the character encoding in 1988.

expenditure

American Standards Association: American Standard Code for Information Interchange. ASA X3.4-1963. American Standards Association, New York 1963 ( PDF 11 pages ( Memento from May 26, 2016 in the Internet Archive ))
American Standards Association: American Standard Code for Information Interchange. ASA X3.4-1965. American Standards Association, New York 1965 (approved but not published)
United States of America Standards Institute: USA Standard Code for Information Interchange. USAS X3.4-1967. United States of America Standards Institute, 1967.
United States of America Standards Institute: USA Standard Code for Information Interchange. USAS X3.4-1968. United States of America Standards Institute, 1968.
American National Standards Institute: American National Standard for Information Systems. ANSI X3.4-1977. 1977.
American National Standards Institute: American National Standard for Information Systems. Coded character sets. 7-bit American National Standard Code for Information Interchange (7-bit ASCII). ANSI X3.4-1986. 1986.
Further revisions:
- ANSI X3.4-1986 (R1992)
- ANSI X3.4-1986 (R1997)
- ANSI INCITS 4-1986 (R2002)
- ANSI INCITS 4-1986 (R2007)
- ANSI INCITS 4-1986 (R2012)

literature

Jacques André: Caractères numériques: introduction. In: Cahiers GUTenberg. Volume 26, May 1997, ISSN 1257-2217 , pp. 5-44, (French).
Yannis Haralambous: Fonts & encodings. From Unicode to advanced typography and everything in between. Translated by P. Scott Horne. O'Reilly, Beijing et al. a. 2007, ISBN 978-0-596-10242-5 (English).
Peter Karow: Digital Fonts. Presentation and formats. 2nd improved edition. Springer, Berlin a. a. 1992, ISBN 3-540-54917-X .
Mai-Linh Thi Truong, Jürgen Siebert, Erik Spiekermann (Eds.): FontBook. Digital Typeface Compendium (= FontBook 4). 4th revised and expanded edition. FSI FontShop International, Berlin 2006, ISBN 3-930023-04-0 (in English).

Web links

RFC 20 . - ASCII format for Network Interchange . October 16, 1969 (ANSI X 3.4-1968 - English).
ITU T.50 (09/1992) International Alphabet No.5 (English)
ISO / IEC 646: 1991 (English)
ASA X3.4-1963 (English)
Notes on the control characters (English)
ASCII table with explanations (German)
Conversion from and to decimals, octals, hexadecimal and binary ASCII notation (English)

Individual evidence

↑ American Standards Association (Ed.): American Standard Code for Information Interchange . 1963 ( scans ).
^ Fred W. Smith: New American Standard Code for Information Interchange . In: Western Union Technical Review . April 1964, p. 50-58 ( worldpowersystems.com ).
↑ United States of America Standards Institute (ed.): USA Standard Code for Information Interchange USAS X3.4-1967 . 1967.
↑ American National Standards Institute (ed.): American National Standard for Information Systems - Coded Character Sets - 7-Bit American Standard Code for Information Interchange (7-Bit ASCII) ANSI X3.4-1986 . 1986 ( unicode.org [PDF; 1.7 MB ] ANSI INCITS 4-1986 [R2002]).
↑ ^a ^b ASA / USASI / ANSI + ISO ( Memento from January 16, 2010 in the Internet Archive )
↑ Basics of technical informatics for technical informatics, HAW Hamburg ( Memento from September 28, 2007 in the Internet Archive ) (PDF)
↑ w3techs.com
↑ Minor Planet Circ. 12973 (PDF)

[x3-4-1967-1] American Standards Association (Ed.): American Standard Code for Information Interchange . 1963 ( scans ).

[f-w-smith-1964-2] Fred W. Smith: New American Standard Code for Information Interchange . In: Western Union Technical Review . April 1964, p. 50-58 ( worldpowersystems.com ).

[x3.4-1967-3] United States of America Standards Institute (ed.): USA Standard Code for Information Interchange USAS X3.4-1967 . 1967.

[incits-4-1986-4] American National Standards Institute (ed.): American National Standard for Information Systems - Coded Character Sets - 7-Bit American Standard Code for Information Interchange (7-Bit ASCII) ANSI X3.4-1986 . 1986 ( unicode.org [PDF; 1.7 MB ] ANSI INCITS 4-1986 [R2002]).

[cwi.nl-5] ASA / USASI / ANSI + ISO ( Memento from January 16, 2010 in the Internet Archive )

[Geschichte-6] Basics of technical informatics for technical informatics, HAW Hamburg ( Memento from September 28, 2007 in the Internet Archive ) (PDF)

[7] w3techs.com

[8] Minor Planet Circ. 12973 (PDF)