ANSI escape sequence

from Wikipedia, the free encyclopedia

ANSI escape sequences or ANSI escape codes are character strings for screen control ( escape sequences ) that use the ASCII / ANSI character 27 (1B hexadecimal ), "Escape", as an introductory control character and based on the ANSI X3.64 and ECMA -48 standard are based. This ANSI / ECMA standard defines a standard for screen and keyboard control on terminals such as the DEC VT100 (1979). In addition to the terminals themselves, ANSI escape sequences are implemented in corresponding terminal emulations and in command line interpreters .

The 2nd edition of ECMA-48 was standardized as ISO 6429 in 1978 and as ISO / IEC 6429 with the 4th edition in 1986 . The current ECMA-48 standard corresponds to the 5th edition from June 1991.

history

In the 1970s, the ASCII standard, which had been established in 1968, was revised by the American National Standards Institute (ANSI). This standard, known as ANSI X3.4-1977, only defines the first 7 bits, became known as the ASCII character table and forms the basis for other international character sets . The first work to standardize an 8-bit character set resulted in ANSI X3.41 and ECMA-35 as early as 1971. Together with the European Computer Manufacturers Association (ECMA), the committees with the designations "X3L2" at ANSI and "TC 1" at ECMA worked on expanding the 8-bit input and output control. a. should expand the possibilities of video output on terminals and at the same time standardize them. The result of this work is ECMA-48 from September 1976 and ANSI X3.64 from 1977. This specification was also submitted to the ISO committee and accepted as ISO 6429 in 1978. The 2nd edition of ANSI X3.64 and ECMA-48 from 1979 is identical to the ISO standard.

The first terminals to implement ANSI X3.64-1977 were the DEC-VT100 terminal from 1978 and the Heathkit H89 from 1979.

Standards

The standard was issued with minimal deviations from both ANSI and ECMA and, after submission, also to the ISO and IEC standard. However, only the ECMA standards are freely accessible (free of charge). The ANSI standard was withdrawn in favor of the ISO standard in order to avoid double standardization.

year designation ANSI ECMA ISO / IEC
1965 7-bit Coded Character Set USAS X3.4 ECMA-6 ISO / IEC 646
1971 Character Code Structure and Extension Techniques ANSI X3.41 ECMA-35 ISO / IEC 2022
1974 8-bit Coded Character Set Structure and Rules ? ECMA-43 ISO / IEC 4873
1979 Control Functions for Coded Character Sets ANSI X3.64 ECMA-48 ISO / IEC 6429

The standards are constructive and interwoven - if one of the standards was adapted, an adapted version of the other standards was usually published. Unfortunately, there are still incompatible control characters and sequences in different implementations and documents (even standards and norms).

ANSI control characters and control sequences

The ASCII standard according to ECMA-6 (ANSI X3.4) defines C0 control characters (in the range 0–31 decimal or 00–1F hexadecimal ) and is limited to 7 bits. The extension to 8 bits in accordance with ECMA-43 contain the control characters designated as C1 in the range of 128–159 decimal or 80–9F hexadecimal for screen and printer control. However, because the space for control characters was limited, additional commands and functions were implemented using control sequences.

While a control character implements a function directly, several characters are required for a control sequence. The number of characters varies depending on the function. The primary control characters are in the C1 area and are therefore only available on systems with 8-bit character sets. They are standardized according to ANSI X3.64 or ECMA-48 and contain cursor commands, screen commands (delete, attribute, mode commands) and keyboard commands.

Escape sequences

So that 7-bit systems can also benefit from the extended ANSI control characters and control sequences , so-called escape sequences have been introduced. Most control characters in the C1 area have an equivalent to this via an escape sequence, which is therefore also available on systems with a 7-bit character set - the ASCII character set. These became known as "ANSI escape sequences " ( English ANSI escape sequences , sometimes also ANSI escape codes ).

ANSI control characters

This is an excerpt of C1 control characters from the 8-bit "ANSI" character set, which can be accessed using escape sequences (C0 control characters at ASCII position 27) in the 7-bit ASCII character set.

7-bit equivalent (C0) to the extended 8-bit control characters (C1)
Control command C1 position C0 positions
Character name ISO IETF Hex Dec Oct Escape sequence Hex Dec Oct
Padding character PAD PA 80 128 200 ESC  @ 1B 40 027 064 33 100
High octet preset HOP HO 81 129 201 ESC A 1B 41 027 065 33 101
Break Permitted Here BPH bra 82 130 202 ESC B 1B 42 027 066 33 102
No break here NBH NH 83 131 203 ESC C 1B 43 027 067 33 103
index IND IN 84 132 204 ESC D 1B 44 027 068 33 104
Next line NEL NL 85 133 205 ESC E 1B 45 027 069 33 105
Start of Selected Area SSA SA 86 134 206 ESC F 1B 46 027 070 33 106
End of selected area ESA IT 87 135 207 ESC G 1B 47 027 071 33 107
Character tabulation set HTS HS 88 136 210 ESC H 1B 48 027 072 33 110
Character tabulation with justification HTJ HJ 89 137 211 ESC I 1B 49 027 073 33 111
Line tabulation set VTS VS 8A 138 212 ESC J 1B 4A 027 074 33 112
Partial Line Forward PLD PD 8B 139 213 ESC K 1B 4B 027 075 33 113
Partial Line Backward PLU PU 8C 140 214 ESC L 1B 4C 027 076 33 114
Reverse Line Feed RI RI 8D 141 215 ESC M 1B 4D 027 077 33 115
Single shift 2 SS2 S2 8E 142 216 ESC N 1B 4E 027 078 33 116
Single shift 3 SS3 S3 8F 143 217 ESC O 1B 4F 027 079 33 117
Device control string DCS DC 90 144 220 ESC P 1B 50 027 080 33 120
Private Use One PU1 P1 91 145 221 ESC Q 1B 51 027 081 33 121
Private Use Two PU2 P2 92 146 222 ESC R 1B 52 027 082 33 122
Set transmit state STS TS 93 147 223 ESC S 1B 53 027 083 33 123
Cancel Character CCH CC 94 148 224 ESC T 1B 54 027 084 33 124
Message waiting MW MW 95 149 225 ESC U 1B 55 027 085 33 125
Start protected area SPA SG 96 150 226 ESC V 1B 56 027 086 33 126
End Protected Area EPA EG 97 151 227 ESC W 1B 57 027 087 33 127
Start Of String SOS SS 98 152 230 ESC X 1B 58 027 088 33 130
Single Graphic Character Introducer SGCI GC 99 153 231 ESC Y 1B 59 027 089 33 131
Single character introducer SCI SC 9A 154 232 ESC Z 1B 5A 027 090 33 132
ROI 9A 154 232 ESC  % 1B 25 027 037 33 45
Control sequence intro CSI CI 9B 155 233 ESC [ 1B 5B 027 091 33 133
String terminator ST SI 9C 156 234 ESC \ 1B 5C 027 092 33 134
Operating system command OSC OC 9D 157 235 ESC] 1B 5D 027 093 33 135
Privacy message PM PM 9E 158 236 ESC ^ 1B 5E 027 094 33 136
Application Program Command APC AC 9F 159 237 ESC _ 1B 5F 027 095 33 137

To calculate the escape sequence, 40h, 64 decimal or 100 octal is deducted from the C1 control character. For example, the control character has PADthe C1 position 80h: if you subtract 40h from this, you get the escape sequence ESC @because the @ character has the C0 position 40h, i.e. 80h-40h = 40h. The same applies to character positions expressed in decimal: 128-64 = 64 (corresponds to 40h), and octal: 200-100 = 100 (corresponds to 40h).

The only disadvantage of the escape sequence is that an additional character has to be processed per control command, which could lead to a loss of speed on slow terminals - at least in theory and if an ANSI script was very long. According to the specification, all 8-bit capable devices can also use the 7-bit escape function so that the escape sequences have prevailed.

Character sets

Most of the character sets contain the C0 and C1 control characters in the standardized positions. With the exception of emulated VT100 terminals , almost only C0 control characters are used.

These were also adopted during the development of Unicode, so that the control characters in accordance with ANSI X3.64 and ECMA-48 are mapped within the first 256 positions. ANSI escape sequences have no function in Unicode, but some of the functions have been implemented similarly in other Unicode positions (e.g. a non-breaking space ).

Control characters

The function of a C1 control character called up via an escape sequence has exactly the same function as the individual control character according to the specification. As an escape sequence, control characters remain within the 7-bit C0 range of ASCII and are therefore compatible with systems that only support 7-bit or that have been switched to this mode.

Control sequences

A control sequence is always introduced by a control character and consists of at least two characters. If the size is variable, the control sequence is terminated with a defined final character or a separator. A control sequence is treated like a single control character, with the difference that the entire control sequence must be read before it can be implemented.

There are essentially three control characters that initiate a control sequence:

  • ESC, Escape
  • SCI, Single character introducer or ROIon VT100 terminals
  • CSI, Control Sequence Intro

Only the control character ESCis in the ASCII area and is therefore a 7-bit compatible C0 control character. The two C1 control characters SCI, ROIand CSIcan be substituted using an escape sequence, which means that the control sequence remains ASCII-compatible and is limited to 7 bits.

The control characters APC, DCS, OSC, PMand SOSalso initiate a control sequence and have the separator STto be completed.

Single character introducer

The control character “ Single Character Introducer ” (SCI) initiates a control sequence with only one additional character and therefore does not require a separator. However, since the functions of this control character have not been standardized, they mean something different on each system.

<ESC>Z<Funktion>

The respective proprietary function is introduced with the escape ESC Zequation, followed by a defined function. Since the ECMA / ANSI standard does not define any standardized functions, each implementation can define its own proprietary functions.

In DECs VT100 the same C1-control character (154 and 9A hex ) is used for the proprietary control characters ROI, but which is introduced with another escape sequence: ESC %. Unlike CSI, however, ROI is variable in length.

Many terminal emulations offer a VT100 compatible mode.

Example:

<ESC>%0K

The control sequence ROI 0 Kturns off the keyboard. With ROI 1 Kit is switched on again.

<ESC>%1I

The ROI 1 Icurrent IP address can be queried with the control sequence. The return has the format ROI ? <IP-Adresse> I.

Control sequence intro

The control character " Control Sequence Intro " (CSI) is the most frequently used control character, as it offers a large number of other functions that would otherwise not have fit into the available frame of only 8 bits. It is initiated with the character 9B hex in 8-bit mode, but mostly as an escape sequence ESC [in 7-bit mode, i.e. 1B hex 5B hex .

A CSIcontrol sequence is always made up of an introductory control character or the corresponding escape sequence, a parameter part and a final character, the latter determining the function. The semicolon is used ;as a separator in the parameter section . The parameter part is optional or there is usually a standard parameter if it is missing.

<ESC>  [  0  ;  1  ;  4  m
|      |  |           |  |
++++++++  +++++++++++++  |
    |           |        |
Steuerzeichen   |   abschließendes Zeichen
         Parameterteil

In this example, ESC [the introductory control character is CSIan escape sequence, followed by the parameters, followed 0;1;4by the character mthat determines the actual function.

If the parameter part is omitted, the control sequence looks like this:

<ESC>[m

This control sequence is synonymous with ESC [ 0 msince 0 is the default parameter.

safety

Since the control characters can also be used to simulate and redefine keyboard inputs, a file with ANSI escape sequences can also cause damage on a computer. It is only necessary to have the file displayed by a fully ANSI-capable program, which then executes the escape sequences it contains unfiltered. This type of malicious function is also known as the ANSI bomb .

Implementations

Hardware:

  • DEC VT100 and its successors (VT102, VT220, VT320, VT420, VT520)
  • Heathkit H89 and terminal variants (H19; also as Zenith Z19)

Software:

Web links

Individual evidence

  1. ANSI: Historical Overview (English); Retrieved March 27, 2016. The American National Standards Institute (ANSI) was previously known as the United States of America Standards Institute (USASI).
  2. ECMA-6 (English)
  3. ECMA-35 (English)
  4. ECMA-43 (English)
  5. ECMA-48 (English)
  6. ISO International Register of Coded Character Sets To Be Used With Escape Sequences (English, PDF, 153 kB); accessed on March 28, 2016
  7. a b Aivosto: Control characters in ASCII and Unicode (English), section History of C1 ; Quote: “ The standards actually cover more control codes than those that fit in the C1 area. These additional controls are used via control sequences (escape sequences). [...] the sequences are an important part of the standards that should be used together with the C1 controls. The sequences, together with C1, are also known as VT100 and ANSI escape sequences.
  8. Programming: ANSI.SYS Escape Sequences (English); accessed on March 26, 2016.
  9. Aivosto, Resources for developers: Control characters in Unicode (English); accessed on March 28, 2016.
  10. Lots of color with Ansi bombs. PC world ; accessed on March 26, 2016.
  11. ANSI.SYS MSDN; accessed on March 26, 2016 (English).
  12. manpage: console codes (English)
  13. Ask Felgall (Computer Help): OS / 2 Command Reference (English); Retrieved April 5, 2016.
  14. Nivot Ink Blog : Windows 10 TH2 (v1511) Console Host Enhancements (English), Oisin Grehan, February 4, 2016; accessed on March 26, 2016.