Escape sequence

from Wikipedia, the free encyclopedia

An escape sequence (after the escape character, English to escape 'escaped' ) is a combination of characters in technical informatics that does not represent text, but is intercepted by the device and performs a special function. In the case of a screen terminal , this can be, for. B. the cursor positioning, with a printer switching to a different font size or ejecting the page ( ANSI escape sequence ).

As English Escaping the use of is masking character referred to, which also represents an escape sequence. Conversely, a character that would normally be recognized as part of a special function is interpreted without a function.

functionality

The name is derived from the character with which the sequence is usually introduced; the ESC character (in the ASCII character set, hexadecimal code 1B, decimal 27), which has been used as a toggle between the normal meaning of the characters and special functions since 1968 at the latest.

The respective program - regardless of whether it is an application for desktop computers or a control program in a peripheral device - recognizes the escape character when processing a character sequence, for example a text, breaks out of normal processing and triggers the following Special function assigned to the character sequence. Normal processing then continues. On the other hand, while the text is being edited, for example in a text editor , an escape character remains uninterpreted as a normal character and therefore does not trigger any function. A special case are WYSIWYG programs, which include, for example, modern word processing programs in which the display corresponds directly to the output with all special functions.

In order to display non-printable control characters in the source text during programming , certain sequences of printable characters are given the meaning of a special function by prefixing a (different) specific character that serves as a masking character . In the C programming language, for example, within a character string constant one stands \nfor a line break , one \tfor a horizontal tab and one \"for a quotation mark (while the simple one is "not part of a character string constant, but denotes its end). Such a character sequence is also called an escape sequence in the transfer of the old function name, although the actual ESC character is no longer used. Similarly, Microsoft Word^ uses the character in the "Find and Replace" editing function, for example ^tfor the horizontal tab character.

Use for printer control

Escape sequences are still used to control printers . Examples of widely used escape sequence-based printer languages ​​are:

Some printers, on the other hand, do not work with escape sequences but, for example, with page description languages ​​such as PostScript or receive commands via a separate control address on the bus (e.g. on Commodore computers).

Use for terminal control

The ANSI escape sequences, which are based on the escape sequences of the VT100 terminal , are widespread in the terminal area . They became the general standard as ANSI X3.41-1974 and X3.64-1977 or ECMA -48 (1976). Sequences consist of the control character Escape and a sequence of printable characters. ECMA-48 received its fifth and final extension in 1991 and was also standardized as ISO / IEC 6429.

Examples: ESC c( reset terminal ), ESC K( delete line from cursor ), ( n is a decimal number, cursor up by (n) lines ). This standard became so popular that console drivers such as ANSI.SYS for MS-DOS (or generally PC-compatible DOS ) and OS / 2 , the virtual consoles and terminal windows of most Unix- like operating systems (such as macOS and Linux ) or the shell of the AmigaOS also support these sequences. The prompt of Windows 10 supports ANSI escape sequences from version 1511. It is worth noting, however, that virtually all of these consoles and terminals only one of all defined part implement ANSI escape sequences. ESC PnA

In C and related programming languages

In C and C-related programming languages such as C ++ , C # , Java , awk , Perl and JavaScript , the following escape sequences can be used to insert frequently required control characters into character strings (although not all of these languages ​​support all of the sequences listed here). In C itself, escape sequences, in German also escape sequences, are part of the execution character set of the programming language. These are also standardized according to ANSI C, although some compilers (on certain operating systems) can also use escape sequences that differ from the standard. The designations of many control characters come from the time when outputs were mainly made on teletypes and printers .

In C and related programming languages, an escape sequence is initiated with the backslash , key:. \

Escape sequences in C and C ++
\a acoustic signal (from English alert )
\b Backward step (from English backspace )
\e or
\E
ANSI Escape, hexadecimal 0x1B

An escape character for a higher level of interpretation, see above. Not part of ISO C and ISO C ++!

\f Feed (of English form feed )
\n Line feed (from English new line )
\r Carriage return (of English carriage return )
\t Horizontal tab character (from English horizontal tabulator )
\v Vertical tab character (from English vertical tabulator )
\xhh.. Direct character selection using the following hexadecimal digits hh (from he x adecimal). Example: \x40corresponds to the '@' character.
If the hexadecimal number formed in this way is larger than can be represented in one character, the result depends on the implementation.
\ooo Direct character selection using the following one to three octal digits ooo . Example \100corresponds to the '@' character.
The short form with one or two octal digits can only be used if no further octal digits follow. \0(Null sign, NUL) is a special case of this rule.
\uhhhh Unicode characters; There must always be four hexadecimal digits hhhh . Example: \u20acfor the euro symbol U + 20AC "€"
\Uhhhhhhhh Unicode characters, especially if this is outside plane 0; Eight hexadecimal digits must always follow. Example: \U0001D49Cstands for the Unicode character U + 1D49C ? ( MATHEMATICAL SCRIPT CAPITAL A )

The backslash (also backslash or backslash) also serves in this form as a masking character in order to be able to use characters from the basic character set of C, i.e. all characters that actually have a meaning and function, even without their function. This also applies to the backslash itself: if you want to use parts of the so-called graphic symbols of C, i.e. the characters ! " % & / ( ) [ ] { } \ ? = ' # + * ~ - _ . : ; , | < > ^, as pure text characters, then these must be used (partially) with the backslash as masking characters.

Use of the mask character \in C and C ++
\' The character ', single quotation mark
\" The character ", double quotation mark
\? The question mark ?
\\ The character \, backslash (backslash)

Mask character to prevent an escape sequence

Since an escape sequence is initiated by at least one of the available characters, these characters are no longer available for normal text since they are assigned to special functions ( function characters ). In the C programming language, this is the backslash. Due to the influence of C, which is also due to its widespread use, the same escape sequences can also be found in other contexts, e.g. B. in certain configuration files or on a terminal (e.g. under Unix , Linux or macOS ), etc. a. when specifying file names .

Technically, a mask character also initiates an escape sequence, but the function of this escape sequence is to output the character that follows it. The function therefore enables the original symbol to be used without its assigned function.

Example:

user@computer:~$ touch $HOME/Dokumente/Eine\ Datei\ mit\ Leerzeichen\ und\ einem\ \"Fragezeichen\"\?.txt

Under Unix-like systems (e.g. Linux or macOS ), this command creates the file in the directory in the userEine Datei mit Leerzeichen und einem "Fragezeichen"?.txt directory Dokumente(however, the directory must already exist). Since the terminal on a space as a separator is interpreted, it must be masked are ( english the character has to be escaped ). It is the same with the quotation mark in a file name: since it usually begins and ends a character string , it must be escaped to be used as a character. And the question mark is usually interpreted as a wildcard .

In the World Wide Web , the percent symbol performs a similar function, which is why it is also referred to as the % representation .

Web links

Individual evidence

  1. Computer Museum Munich: History of the Seiko Epson Corporation
  2. Digital Equipment Corporation: VT100 User Guide - manual of the terminal VT100 (English).
  3. Standard ECMA-48 Fifth Edition - June 1991 - Control Functions for Coded Character Sets. (PDF) Ecma International, June 1991, pp. 53–54 , accessed on May 5, 2015 (English).