Control characters
As a control character , even control code or control code , English control , the characters of a character set called the no displayable represent characters - displayable characters include letters , numbers and punctuation marks .
Originally they were used to control text output devices such as text printers, automatic typists , telegram devices or teleprinters . Using control characters, it is possible to transfer control commands for the output devices within the character set instead of transferring the control information via another protocol .
Today only a few control characters have a meaning (e.g. Line Feed, Form Feed, Carriage Return, Escape), most control characters are practically no longer used. Sometimes they are also used to mark to transfer that are not defined in the character set used otherwise.
A character table usually defines both displayable characters and control characters; the most common ASCII codes are the characters 0 to 31 and the character 127. The Unicode characters are used to make control characters visible as graphic symbols, e.g. to control data transmission of the Control Pictures division (U + 2400 to U + 243F).
C0 control characters
Dec | Code value of the character in the decimal number system |
---|---|
Hex | Code value of the character in the hexadecimal number system |
Ctrl | Usual notation (" caret notation ") as tax code The control character can be entered on the keyboard: The introductory symbol |
C. | The " " characters indicate the spelling for this character in the C programming language and languages derived from it, such as C ++ , Java and, above all, scripting languages , shells , and others. This notation is usually interpreted in character strings , e.g. B.
\x printf("Ein\tTab\nZeilenumbruch\rWagenrücklauf");
|
ISO | official abbreviation for the control character (according to ISO-646 standard) |
U | graphic Unicode symbol from the block U + 2400–243F |
Type | Character type:
|
English | official name for which the abbreviation stands (according to ASCII standard) |
German | unofficial German translation of this English name |
(original) meaning | Meaning of the control character. The italic explanations describe the obsolete meaning, which is now to be regarded as historical and is no longer used. |
Dec | Hex | Ctrl | C. | ISO | U | Type | English | German | (original) meaning |
---|---|---|---|---|---|---|---|---|---|
0 | 0x00 | ^ @ | \ 0 | NUL | ␀ | zero | Null sign |
Sign without informational content. Can be added to a message as desired and is discarded by the recipient. Marks the end of a string in C . |
|
1 | 0x01 | ^ A | SOH | ␁ | CC | Start of heading | Beginning of the header | Marks the beginning of the machine-readable destination address or routing information. The header is ended with the character STX. | |
2 | 0x02 | ^ B | STX | ␂ | CC | Start of text | Beginning of the message | Marks the beginning of the message to be transmitted and thus the end of the header. | |
3 | 0x03 | ^ C | ETX | ␃ | CC | End of text | End of message |
Marks the end of the message to be transmitted. Used as "abort" character for terminal input. |
|
4th | 0x04 | ^ D | EOT | ␄ | CC | End of transmission | End of broadcast |
Marks the end of the entire transmission, which can consist of several messages including headers. Used as a "program termination" for some command interpreters. Used as "end of input" for terminal input. |
|
5 | 0x05 | ^ E | ENQ | ␅ | CC | Inquiry | inquiry | A request in a bidirectional communication device. The other station can respond with its identification or with the status. Usually called "Wer Da?" On German teleprinters. | |
6th | 0x06 | ^ F | ACK | ␆ | CC | Acknowledge | acknowledgment of receipt | Control character that expresses the positive response to a previous request. | |
7th | 0x07 | ^ G | \ a | BEL | ␇ | Bell | Beep | Generates an acoustic signal (bell or beep ) on the receiving terminal. Used as an alarm or warning sign. | |
8th | 0x08 | ^ H | \ b | BS | ␈ | FE | Backspace | Regression | Moves the printhead / cursor back one position. The sequence e Backspace ´ generates an é on a printer, often just an e on a terminal. |
9 | 0x09 | ^ I | \ t | HT | ␉ | FE | Horizontal tab | Horizontal tab character | Moves the print head / cursor to the next predefined position (tab stop) in the current line. |
10 | 0x0A | ^ J | \ n | LF | ␊ | FE | Line feed | Line feed | Moves the printhead / cursor to the next line. If agreed between sender and recipient, it means "New Line", whereby the first print position of the next line is approached. We you. a. used as "line end character" in Unix systems ( Unix , BSD, macOS , Linux ). Under MS-DOS or Windows , the combination "Carriage Return" + "Line Feed" ends a line. |
11 | 0x0B | ^ K | \ v | VT | ␋ | FE | Vertical tab | Vertical tab character | Moves the printhead / cursor to the next predefined line. |
12 | 0x0C | ^ L | \ f | FF | ␌ | FE | Form feed | Form feed | Moves the printhead / cursor to the first printing position on the next page ( page break ). (Ejects the current page, clears the screen). |
13 | 0x0D | ^ M | \ r | CR | ␍ | FE | Carriage return | Carriage return | Moves the printhead / cursor back to the first print position of the current line. Is used as a line break in BASIC . Is used in classic Mac OS up to version 9 as a line end character ("New line"). Under MS-DOS or Windows , the combination "Carriage Return" + "Line Feed" ends a line. Carriage return can be used on terminals or printers to write several times in a line (e.g. loading bar). |
14th | 0x0E | ^ N | SO | ␎ | Shift Out | Switching | Switch to special display, e.g. B. Bold on a printer. | ||
15th | 0x0F | ^ O | SI | ␏ | Shift In | Downshift | Switch back to normal display. | ||
16 | 0x10 | ^ P | DLE | ␐ | CC | Data Link Escape | "Data connection escape symbol" (literally translated) |
Gives special meaning to the following characters. May only be used for additional protocol characters. | |
17th | 0x11 | ^ Q | DC1 | ␑ | Device Control 1 | Device control symbol 1 |
Device-specific control characters, e.g. to switch certain device functions (e.g. font for printers) on and off. ^ S (XOFF) and ^ Q (XON) are also used for flow control with XON / XOFF . |
||
18th | 0x12 | ^ R | DC2 | ␒ | Device Control 2 | Device control symbol 2 | |||
19th | 0x13 | ^ P | DC3 | ␓ | Device Control 3 | Device control symbol 3 | |||
20th | 0x14 | ^ T | DC4 | ␔ | Device Control 4 | Device control symbol 4 | |||
21st | 0x15 | ^ U | NAK | ␕ | CC | Negative Acknowledge | Negative confirmation | Expresses the negative answer to a previous query. | |
22nd | 0x16 | ^ V | SYN | ␖ | CC | Synchronous idle | Synchronization signal | In the case of synchronous data transmissions, it enables synchronization even in the absence of signals to be transmitted. | |
23 | 0x17 | ^ W | ETB | ␗ | CC | End of Transmission Block | End of the transmission block | Indicates the end of a block of transmitted data blocks if this block end cannot be recognized from the data itself. | |
24 | 0x18 | ^ X | CAN | ␘ | Cancel | cancellation | Indicates that the data just transmitted is or was incorrect and must be discarded. | ||
25th | 0x19 | ^ Y | EM | ␙ | End of medium | End of medium | Indicates the end of the storage medium (physical or logical). | ||
26th | 0x1A | ^ Z | SUB | ␚ | Substitutes | replacement |
Replaces a character that is invalid or incorrect, e.g. B. because of a parity error in the transmission. End of file character (EOF) for text files under CP / M due to the lack of byte-specific file lengths, was initially also common under DOS, although unnecessary. |
||
27 | 0x1B | ^ [ | ESC | ␛ | Escape | Escape symbol | If the following characters have a special meaning, an escape sequence starts . | ||
28 | 0x1C | ^ \ | FS | ␜ | IS | File separator | File separator | Separators that logically divide data blocks. The exact meaning of the logical units “File”, “Group”, “Record”, “Unit” is not specified, but it should be arranged from “File” as the uppermost structural unit to “Unit” as the lowest structural unit. | |
29 | 0x1D | ^] | GS | ␝ | IS | Group separator | Group separator | ||
30th | 0x1E | ^^ | RS | ␞ | IS | Record separator | Record separator | ||
31 | 0x1F | ^ _ | US | ␟ | IS | Unit separator | Unit separator | ||
127 | 0x7F | DEL | ␡ | Delete | Delete characters | The DEL sign has a binary code made up of all ones. There is a historical reason for this: once a hole has been punched in a punched tape, it cannot be refilled. However, you can punch out all the remaining holes in a character and thus make it a non-printing control character 'BU' (in the 5-channel Baudot code ), i.e. overwriting an incorrect entry in this way. This is why this character also stands for "deleted character" or "deleted". |
C1 control characters
The control characters newly defined in ISO 8859 for all of its sub-standards are rarely used and are now only of historical interest. Most Windows character sets, including the CP 1252 , occupy these code positions with printable characters that are not contained in the corresponding ISO standard, e.g. ISO 8859-1 .
All C1 control characters can also be mapped as C0 control characters using escape sequences , see ANSI escape sequence .
Dec | Hex | IETF | ISO | Character name | comment |
---|---|---|---|---|---|
128 | 0x80 | PA | PAD | Padding character | Reserved control character; considered in a DIS-10646 draft, but never included in the ISO-10646 standard. Marked as XXX in Unicode . |
129 | 0x81 | HO | HOP | High octet preset | |
130 | 0x82 | bra | BPH | Break Permitted Here | A position where a line break can occur. Similar to the wide-free spaces , Unicode U + 200B space zero width . |
131 | 0x83 | NH | NBH | No break here | A position where you do not want a line break. Comparable to Unicode U + 2060 word joiner . |
132 | 0x84 | IN | IND | index | Moves the current position one line down, but maintains the horizontal position. The index function was declared obsolete in the 4th edition of ECMA-48 (1986) and was deleted in the 5th edition (1991). |
133 | 0x85 | NL | NEL | Next line | Moves the current position to the beginning of the next line, alternatively to the home or line limit position. NEL is at the same position as EBCDIC NL ( English Nextline ). |
134 | 0x86 | SA | SSA | Start of Selected Area | |
135 | 0x87 | IT | ESA | End of selected area | |
136 | 0x88 | HS | HTS | Character tabulation set | Sets a tab stop at the active position. Before ECMA-48 (4th edition, 1986) referred to as the " Horizontal Tabulation Set ". |
137 | 0x89 | HJ | HTJ | Character tabulation with justification | Moves text to the next tab stop position. The text is understood as the part from the previous tab stop to the active position. Before ECMA-48 (4th edition, 1986) referred to as " Horizontal Tabulation with Justify ". |
138 | 0x8A | VS | VTS | Line tabulation set | Places a vertical tab stop on the active line. Before ECMA-48 (4th edition, 1986) referred to as " Vertical Tabulation Set ". |
139 | 0x8B | PD | PLD | Partial Line Forward | Before ECMA-48 (5th edition, 1991) referred to as " Partial Line Down ". |
140 | 0x8C | PU | PLU | Partial Line Backward | Before ECMA-48 (5th edition, 1991) referred to as " Partial Line Up ". |
141 | 0x8D | RI | RI | Reverse Line Feed | Moves the previous line while maintaining the horizontal position. Before ECMA-48 (4th edition, 1986) referred to as the " Reverse Index ". |
142 | 0x8E | S2 | SS2 | Single shift 2 | Load character set G2 for 1 character into GL |
143 | 0x8F | S3 | SS3 | Single shift 3 | Load character set G2 for 1 character into GL |
144 | 0x90 | DC | DCS | Device control string | Start character of a control sequence that ends with ST (" String Terminator "); can contain a command for the receiving device or a status report of the sending device.
|
145 | 0x91 | P1 | PU1 | Private Use One | Reserved, no standardized meaning. |
146 | 0x92 | P2 | PU2 | Private Use Two | |
147 | 0x93 | TS | STS | Set transmit state | |
148 | 0x94 | CC | CCH | Cancel Character | |
149 | 0x95 | MW | MW | Message waiting | Sets a " message waiting " indicator in the receiving device. |
150 | 0x96 | SG | SPA | Start protected area | With the following character string, which contains a list of character positions, defines an area that is protected against manual modification or transmission; deletion protection is optional. The character string must end with EPA (" End Protected Area "). The function is called " Start of Protected Area " according to ANSI X3.64 and ECMA-48 (1979), " Start of Guarded Protected Area " acc. ISO 6429 (1983) and ECMA-48 (1984) or " Start of Guarded Area " according to ISO 6429 (1992) and ECMA-48 (1986 and 1991). |
151 | 0x97 | EG | EPA | End Protected Area | Specifies the end of a zone that started with SPA. The function is called " End of Protected Area " according to ANSI X3.64 and ECMA-48 (1979), " End of Guarded Protected Area " acc. ISO 6429 (1983) and ECMA-48 (1984) or " End of Guarded Area " according to ISO 6429 (1992) and ECMA-48 (1986 and 1991). |
152 | 0x98 | SS | SOS | Start Of String | Marks the beginning of a control character string which is ended with ST (" String Terminator "). The character string must not SOS contain any additional characters (152 decimal or 98 hexadecimal). The interpretation of the character string is up to the respective program.
|
153 | 0x99 | GC | SGCI | Single Graphic Character Introducer | Reserved control character; considered in a DIS-10646 draft, but never included in the ISO-10646 standard. Marked as XXX in Unicode . |
154 | 0x9A | SC | SCI | Single character introducer | Executes the function defined by a single subsequent byte, which, however, has not been standardized. Also the introduction of a proprietary VT100 control sequence. |
155 | 0x9B | CI | CSI | Control sequence intro | Initiation of a control sequence. See ANSI escape sequence . |
156 | 0x9C | SI | ST | String terminator | Sign of the end of a string , with APC , DCS , OSC , PM or SOS was started.
|
157 | 0x9D | OC | OSC | Operating system command | Marks the beginning of an “ Operating System Command ” character string which is ended with ST (“ String Terminator ”). The interpretation of the character string is up to the respective operating system.
|
158 | 0x9E | PM | PM | Privacy message | Marks the beginning of a " Privacy Message " which is ended with ST (" String Terminator ").
|
159 | 0x9F | AC | APC | Application Program Command | Marks the beginning of an “ Application Program Command ” character string that is ended with ST (“ String Terminator ”). The interpretation of the character string is up to the respective program.
|
Unicode
The control characters of the ASCII range 0x00 to 0x1F can be found in Unicode under C0 Controls (U + 0000 to U + 001F), those of the ISO-8859 range 0x80 to 0x9F under C1 Controls (U + 0080 to U + 009F). The first 128 characters in the Unicode coding UTF-8 correspond to those of the ASCII and ISO-8859 coding, so this also applies to the control characters in the range 0x00 to 0x1F. In addition to these characters, there are a number of other control characters in Unicode .
Graphic symbols for the control characters can be found in the Unicode area Control Pictures (U + 2400 to U + 243F).
Entry under MS-Windows or DOS
As a test, control characters can also be entered under Windows . By holding down the (left) Alt key and then typing in the decimal code of a control character on the numeric keypad, a control character can be entered at the prompt.
Example: Open the command prompt, Alt+ (0 and 7 on the numeric keypad) logs the character ^ G at the prompt, which also makes it clear: Strg+ Gdoes the same thing. If you now Enterpress (or ^ M), this control character is executed in the terminal window and a beep sounds from the system loudspeaker (if available), which corresponds to the bell (BEL) (see table above). Likewise, Alt+ (0 and 8), like pressing Backspace(or Strg+ H), deletes a character. BASIC interpreters that use their own keyboard drivers (e.g. GW-BASIC ) also accept hexadecimal ASCII codes in the form & hZZ, where Z stands for a hex number (e.g. & h0D for carriage return).
See also
- Special Characters - Special letters and text punctuation marks
- Control characters in EBCDIC
- Escape sequences - control commands introduced by the escape character
- ANSI bomb - uses control characters as a security hole
Web links
- Character tables on different systems ( Memento of March 10, 2010 in the Internet Archive )
- ASCII control codes in detail (English)
Individual evidence
- ↑ a b c unicode.org (PDF).
- ↑ RFC 1345
- ↑ ISO 8859
- ↑ unicode.org (PDF).
- ↑ unicode.org (PDF).