ANSI escape sequence

ANSI escape sequences or ANSI escape codes are character strings for screen control ( escape sequences ) that use the ASCII / ANSI character 27 (1B hexadecimal ), "Escape", as an introductory control character and based on the ANSI X3.64 and ECMA -48 standard are based. This ANSI / ECMA standard defines a standard for screen and keyboard control on terminals such as the DEC VT100 (1979). In addition to the terminals themselves, ANSI escape sequences are implemented in corresponding terminal emulations and in command line interpreters .

The 2nd edition of ECMA-48 was standardized as ISO 6429 in 1978 and as ISO / IEC 6429 with the 4th edition in 1986 . The current ECMA-48 standard corresponds to the 5th edition from June 1991.

history

In the 1970s, the ASCII standard, which had been established in 1968, was revised by the American National Standards Institute (ANSI). This standard, known as ANSI X3.4-1977, only defines the first 7 bits, became known as the ASCII character table and forms the basis for other international character sets . The first work to standardize an 8-bit character set resulted in ANSI X3.41 and ECMA-35 as early as 1971. Together with the European Computer Manufacturers Association (ECMA), the committees with the designations "X3L2" at ANSI and "TC 1" at ECMA worked on expanding the 8-bit input and output control. a. should expand the possibilities of video output on terminals and at the same time standardize them. The result of this work is ECMA-48 from September 1976 and ANSI X3.64 from 1977. This specification was also submitted to the ISO committee and accepted as ISO 6429 in 1978. The 2nd edition of ANSI X3.64 and ECMA-48 from 1979 is identical to the ISO standard.

The first terminals to implement ANSI X3.64-1977 were the DEC-VT100 terminal from 1978 and the Heathkit H89 from 1979.

Standards

The standard was issued with minimal deviations from both ANSI and ECMA and, after submission, also to the ISO and IEC standard. However, only the ECMA standards are freely accessible (free of charge). The ANSI standard was withdrawn in favor of the ISO standard in order to avoid double standardization.

year	designation	ANSI	ECMA	ISO / IEC
1965	7-bit Coded Character Set	USAS X3.4	ECMA-6	ISO / IEC 646
1971	Character Code Structure and Extension Techniques	ANSI X3.41	ECMA-35	ISO / IEC 2022
1974	8-bit Coded Character Set Structure and Rules	?	ECMA-43	ISO / IEC 4873
1979	Control Functions for Coded Character Sets	ANSI X3.64	ECMA-48	ISO / IEC 6429

The standards are constructive and interwoven - if one of the standards was adapted, an adapted version of the other standards was usually published. Unfortunately, there are still incompatible control characters and sequences in different implementations and documents (even standards and norms).

ANSI control characters and control sequences

The ASCII standard according to ECMA-6 (ANSI X3.4) defines C0 control characters (in the range 0–31 decimal or 00–1F hexadecimal ) and is limited to 7 bits. The extension to 8 bits in accordance with ECMA-43 contain the control characters designated as C1 in the range of 128–159 decimal or 80–9F hexadecimal for screen and printer control. However, because the space for control characters was limited, additional commands and functions were implemented using control sequences.

While a control character implements a function directly, several characters are required for a control sequence. The number of characters varies depending on the function. The primary control characters are in the C1 area and are therefore only available on systems with 8-bit character sets. They are standardized according to ANSI X3.64 or ECMA-48 and contain cursor commands, screen commands (delete, attribute, mode commands) and keyboard commands.

Escape sequences

So that 7-bit systems can also benefit from the extended ANSI control characters and control sequences , so-called escape sequences have been introduced. Most control characters in the C1 area have an equivalent to this via an escape sequence, which is therefore also available on systems with a 7-bit character set - the ASCII character set. These became known as "ANSI escape sequences " ( English ANSI escape sequences , sometimes also ANSI escape codes ).

ANSI control characters

This is an excerpt of C1 control characters from the 8-bit "ANSI" character set, which can be accessed using escape sequences (C0 control characters at ASCII position 27) in the 7-bit ASCII character set.

7-bit equivalent (C0) to the extended 8-bit control characters (C1)
Control command			C1 position			C0 positions
Character name	ISO	IETF	Hex	Dec	Oct	Escape sequence	Hex	Dec	Oct
Padding character	PAD	PA	80	128	200	ESC @	1B 40	027 064	33 100
High octet preset	HOP	HO	81	129	201	ESC A	1B 41	027 065	33 101
Break Permitted Here	BPH	bra	82	130	202	ESC B	1B 42	027 066	33 102
No break here	NBH	NH	83	131	203	ESC C	1B 43	027 067	33 103
index	IND	IN	84	132	204	ESC D	1B 44	027 068	33 104
Next line	NEL	NL	85	133	205	ESC E	1B 45	027 069	33 105
Start of Selected Area	SSA	SA	86	134	206	ESC F	1B 46	027 070	33 106
End of selected area	ESA	IT	87	135	207	ESC G	1B 47	027 071	33 107
Character tabulation set	HTS	HS	88	136	210	ESC H	1B 48	027 072	33 110
Character tabulation with justification	HTJ	HJ	89	137	211	ESC I	1B 49	027 073	33 111
Line tabulation set	VTS	VS	8A	138	212	ESC J	1B 4A	027 074	33 112
Partial Line Forward	PLD	PD	8B	139	213	ESC K	1B 4B	027 075	33 113
Partial Line Backward	PLU	PU	8C	140	214	ESC L	1B 4C	027 076	33 114
Reverse Line Feed	RI	RI	8D	141	215	ESC M	1B 4D	027 077	33 115
Single shift 2	SS2	S2	8E	142	216	ESC N	1B 4E	027 078	33 116
Single shift 3	SS3	S3	8F	143	217	ESC O	1B 4F	027 079	33 117
Device control string	DCS	DC	90	144	220	ESC P	1B 50	027 080	33 120
Private Use One	PU1	P1	91	145	221	ESC Q	1B 51	027 081	33 121
Private Use Two	PU2	P2	92	146	222	ESC R	1B 52	027 082	33 122
Set transmit state	STS	TS	93	147	223	ESC S	1B 53	027 083	33 123
Cancel Character	CCH	CC	94	148	224	ESC T	1B 54	027 084	33 124
Message waiting	MW	MW	95	149	225	ESC U	1B 55	027 085	33 125
Start protected area	SPA	SG	96	150	226	ESC V	1B 56	027 086	33 126
End Protected Area	EPA	EG	97	151	227	ESC W	1B 57	027 087	33 127
Start Of String	SOS	SS	98	152	230	ESC X	1B 58	027 088	33 130
Single Graphic Character Introducer	SGCI	GC	99	153	231	ESC Y	1B 59	027 089	33 131
Single character introducer	SCI	SC	9A	154	232	ESC Z	1B 5A	027 090	33 132
	ROI		9A	154	232	ESC %	1B 25	027 037	33 45
Control sequence intro	CSI	CI	9B	155	233	ESC [	1B 5B	027 091	33 133
String terminator	ST	SI	9C	156	234	ESC \	1B 5C	027 092	33 134
Operating system command	OSC	OC	9D	157	235	ESC]	1B 5D	027 093	33 135
Privacy message	PM	PM	9E	158	236	ESC ^	1B 5E	027 094	33 136
Application Program Command	APC	AC	9F	159	237	ESC _	1B 5F	027 095	33 137

To calculate the escape sequence, 40h, 64 decimal or 100 octal is deducted from the C1 control character. For example, the control character has PADthe C1 position 80h: if you subtract 40h from this, you get the escape sequence ESC @because the @ character has the C0 position 40h, i.e. 80h-40h = 40h. The same applies to character positions expressed in decimal: 128-64 = 64 (corresponds to 40h), and octal: 200-100 = 100 (corresponds to 40h).

The only disadvantage of the escape sequence is that an additional character has to be processed per control command, which could lead to a loss of speed on slow terminals - at least in theory and if an ANSI script was very long. According to the specification, all 8-bit capable devices can also use the 7-bit escape function so that the escape sequences have prevailed.

Character sets

Most of the character sets contain the C0 and C1 control characters in the standardized positions. With the exception of emulated VT100 terminals , almost only C0 control characters are used.

These were also adopted during the development of Unicode, so that the control characters in accordance with ANSI X3.64 and ECMA-48 are mapped within the first 256 positions. ANSI escape sequences have no function in Unicode, but some of the functions have been implemented similarly in other Unicode positions (e.g. a non-breaking space ).

Control characters

The function of a C1 control character called up via an escape sequence has exactly the same function as the individual control character according to the specification. As an escape sequence, control characters remain within the 7-bit C0 range of ASCII and are therefore compatible with systems that only support 7-bit or that have been switched to this mode.

Control sequences

A control sequence is always introduced by a control character and consists of at least two characters. If the size is variable, the control sequence is terminated with a defined final character or a separator. A control sequence is treated like a single control character, with the difference that the entire control sequence must be read before it can be implemented.

There are essentially three control characters that initiate a control sequence:

ESC, Escape
SCI, Single character introducer or ROIon VT100 terminals
CSI, Control Sequence Intro

Only the control character ESCis in the ASCII area and is therefore a 7-bit compatible C0 control character. The two C1 control characters SCI, ROIand CSIcan be substituted using an escape sequence, which means that the control sequence remains ASCII-compatible and is limited to 7 bits.

The control characters APC, DCS, OSC, PMand SOSalso initiate a control sequence and have the separator STto be completed.

Single character introducer

The control character “ Single Character Introducer ” (SCI) initiates a control sequence with only one additional character and therefore does not require a separator. However, since the functions of this control character have not been standardized, they mean something different on each system.

<ESC>Z<Funktion>

The respective proprietary function is introduced with the escape ESC Zequation, followed by a defined function. Since the ECMA / ANSI standard does not define any standardized functions, each implementation can define its own proprietary functions.

In DECs VT100 the same C1-control character (154 and 9A _hex ) is used for the proprietary control characters ROI, but which is introduced with another escape sequence: ESC %. Unlike CSI, however, ROI is variable in length.

Many terminal emulations offer a VT100 compatible mode.

Example:

<ESC>%0K

The control sequence ROI 0 Kturns off the keyboard. With ROI 1 Kit is switched on again.

<ESC>%1I

The ROI 1 Icurrent IP address can be queried with the control sequence. The return has the format ROI ? <IP-Adresse> I.

Control sequence intro

The control character " Control Sequence Intro " (CSI) is the most frequently used control character, as it offers a large number of other functions that would otherwise not have fit into the available frame of only 8 bits. It is initiated with the character 9B _hex in 8-bit mode, but mostly as an escape sequence ESC [in 7-bit mode, i.e. 1B _hex 5B _hex .

A CSIcontrol sequence is always made up of an introductory control character or the corresponding escape sequence, a parameter part and a final character, the latter determining the function. The semicolon is used ;as a separator in the parameter section . The parameter part is optional or there is usually a standard parameter if it is missing.

<ESC>  [  0  ;  1  ;  4  m
|      |  |           |  |
++++++++  +++++++++++++  |
    |           |        |
Steuerzeichen   |   abschließendes Zeichen
         Parameterteil

In this example, ESC [the introductory control character is CSIan escape sequence, followed by the parameters, followed 0;1;4by the character mthat determines the actual function.

If the parameter part is omitted, the control sequence looks like this:

<ESC>[m

This control sequence is synonymous with ESC [ 0 msince 0 is the default parameter.

safety

Since the control characters can also be used to simulate and redefine keyboard inputs, a file with ANSI escape sequences can also cause damage on a computer. It is only necessary to have the file displayed by a fully ANSI-capable program, which then executes the escape sequences it contains unfiltered. This type of malicious function is also known as the ANSI bomb .

Implementations

Hardware:

DEC VT100 and its successors (VT102, VT220, VT320, VT420, VT520)
Heathkit H89 and terminal variants (H19; also as Zenith Z19)

Software:

ANSI.SYS from the IBM PC-compatible DOS operating systems , such as a. PC DOS , MS-DOS , DR DOS
BBS clients like Kermit or Qmodem
xterm
ANSI.SYS alternatives for DOS: ANSI.COM, NANSI.SYS, NNANSI.COM
OS / 2 command line
Amiga console.device
Command prompt since Windows 10 1511

Web links

ECMA-48 Control Functions for Coded Character Sets in the current 5th edition from June 1991 (English)
Edward Moy, Stephen Gildea, Thomas Dickey: rtfm / Xterm / Escape Sequences . XFree86 , 1999 - a good reference for ANSI escape sequences in xterm
ANSI Escape sequences (ANSI Escape codes) on ascii-table.com (English)

Individual evidence

↑ ANSI: Historical Overview (English); Retrieved March 27, 2016. The American National Standards Institute (ANSI) was previously known as the United States of America Standards Institute (USASI).
↑ ECMA-6 (English)
↑ ECMA-35 (English)
↑ ECMA-43 (English)
↑ ECMA-48 (English)
↑ ISO International Register of Coded Character Sets To Be Used With Escape Sequences (English, PDF, 153 kB); accessed on March 28, 2016
↑ ^a ^b Aivosto: Control characters in ASCII and Unicode (English), section History of C1 ; Quote: “ The standards actually cover more control codes than those that fit in the C1 area. These additional controls are used via control sequences (escape sequences). [...] the sequences are an important part of the standards that should be used together with the C1 controls. The sequences, together with C1, are also known as VT100 and ANSI escape sequences. ”
↑ Programming: ANSI.SYS Escape Sequences (English); accessed on March 26, 2016.
↑ Aivosto, Resources for developers: Control characters in Unicode (English); accessed on March 28, 2016.
↑ Lots of color with Ansi bombs. PC world ; accessed on March 26, 2016.
↑ ANSI.SYS MSDN; accessed on March 26, 2016 (English).
↑ manpage: console codes (English)
↑ Ask Felgall (Computer Help): OS / 2 Command Reference (English); Retrieved April 5, 2016.
↑ Nivot Ink Blog : Windows 10 TH2 (v1511) Console Host Enhancements (English), Oisin Grehan, February 4, 2016; accessed on March 26, 2016.

[1] ANSI: Historical Overview (English); Retrieved March 27, 2016. The American National Standards Institute (ANSI) was previously known as the United States of America Standards Institute (USASI).

[ECMA6-2] ECMA-6 (English)

[ECMA35-3] ECMA-35 (English)

[ECMA43-4] ECMA-43 (English)

[ECMA48-5] ECMA-48 (English)

[6] ISO International Register of Coded Character Sets To Be Used With Escape Sequences (English, PDF, 153 kB); accessed on March 28, 2016

[aivosto_historyC1-7] Aivosto: Control characters in ASCII and Unicode (English), section History of C1 ; Quote: “ The standards actually cover more control codes than those that fit in the C1 area. These additional controls are used via control sequences (escape sequences). [...] the sequences are an important part of the standards that should be used together with the C1 controls. The sequences, together with C1, are also known as VT100 and ANSI escape sequences. ”

[8] Programming: ANSI.SYS Escape Sequences (English); accessed on March 26, 2016.

[9] Aivosto, Resources for developers: Control characters in Unicode (English); accessed on March 28, 2016.

[10] Lots of color with Ansi bombs. PC world ; accessed on March 26, 2016.

[MSDN_ANSI.SYS-11] ANSI.SYS MSDN; accessed on March 26, 2016 (English).

[12] : console codes (English)

[13] Ask Felgall (Computer Help): OS / 2 Command Reference (English); Retrieved April 5, 2016.

[nivot_20160204_Win10TH2-14] Nivot Ink Blog : Windows 10 TH2 (v1511) Console Host Enhancements (English), Oisin Grehan, February 4, 2016; accessed on March 26, 2016.