Control characters

from Wikipedia, the free encyclopedia

As a control character , even control code or control code , English control , the characters of a character set called the no displayable represent characters - displayable characters include letters , numbers and punctuation marks .

Originally they were used to control text output devices such as text printers, automatic typists , telegram devices or teleprinters . Using control characters, it is possible to transfer control commands for the output devices within the character set instead of transferring the control information via another protocol .

Today only a few control characters have a meaning (e.g. Line Feed, Form Feed, Carriage Return, Escape), most control characters are practically no longer used. Sometimes they are also used to mark to transfer that are not defined in the character set used otherwise.

A character table usually defines both displayable characters and control characters; the most common ASCII codes are the characters 0 to 31 and the character 127. The Unicode characters are used to make control characters visible as graphic symbols, e.g. to control data transmission of the Control Pictures division (U + 2400 to U + 243F).

C0 control characters

Legend for the following table 
Dec Code value of the character in the decimal number system
Hex Code value of the character in the hexadecimal number system
Ctrl Usual notation (" caret notation ") as tax code

The control character can be entered on the keyboard: The introductory symbol ^stands for Ctrl( control ) or, on German keyboards, the Strgkey ( control ). This is held down while the second character is entered.

C. The " " characters indicate the spelling for this character in the C programming language and languages ​​derived from it, such as C ++ , Java and, above all, scripting languages , shells , and others. This notation is usually interpreted in character strings , e.g. B. \x
printf("Ein\tTab\nZeilenumbruch\rWagenrücklauf");
ISO official abbreviation for the control character (according to ISO-646 standard)
U graphic Unicode symbol from the block U + 2400–243F
Type Character type:
  • CC = Communication Control (English for protocol characters)
  • FE = Format Effector (English for output characters)
  • IS = Information Separator (English for separator)
English official name for which the abbreviation stands (according to ASCII standard)
German unofficial German translation of this English name
(original) meaning Meaning of the control character.

The italic explanations describe the obsolete meaning, which is now to be regarded as historical and is no longer used.

ASCII or C0 control characters
Dec Hex Ctrl C. ISO U Type English German (original) meaning
0 0x00 ^ @ \ 0 NUL zero Null sign Sign without informational content. Can be added to a message as desired and is discarded by the recipient.
Marks the end of a string in C .
1 0x01 ^ A SOH CC Start of heading Beginning of the header Marks the beginning of the machine-readable destination address or routing information. The header is ended with the character STX.
2 0x02 ^ B STX CC Start of text Beginning of the message Marks the beginning of the message to be transmitted and thus the end of the header.
3 0x03 ^ C ETX CC End of text End of message Marks the end of the message to be transmitted.
Used as "abort" character for terminal input.
4th 0x04 ^ D EOT CC End of transmission End of broadcast Marks the end of the entire transmission, which can consist of several messages including headers.
Used as a "program termination" for some command interpreters.

Used as "end of input" for terminal input.

5 0x05 ^ E ENQ CC Inquiry inquiry A request in a bidirectional communication device. The other station can respond with its identification or with the status. Usually called "Wer Da?" On German teleprinters.
6th 0x06 ^ F ACK CC Acknowledge acknowledgment of receipt Control character that expresses the positive response to a previous request.
7th 0x07 ^ G \ a BEL Bell Beep Generates an acoustic signal (bell or beep ) on the receiving terminal. Used as an alarm or warning sign.
8th 0x08 ^ H \ b BS FE Backspace Regression Moves the printhead / cursor back one position.
The sequence e Backspace ´ generates an é on a printer, often just an e on a terminal.
9 0x09 ^ I \ t HT FE Horizontal tab Horizontal tab character Moves the print head / cursor to the next predefined position (tab stop) in the current line.
10 0x0A ^ J \ n LF FE Line feed Line feed Moves the printhead / cursor to the next line. If agreed between sender and recipient, it means "New Line", whereby the first print position of the next line is approached. We you. a. used as "line end character" in Unix systems ( Unix , BSD, macOS , Linux ). Under MS-DOS or Windows , the combination "Carriage Return" + "Line Feed" ends a line.
11 0x0B ^ K \ v VT FE Vertical tab Vertical tab character Moves the printhead / cursor to the next predefined line.
12 0x0C ^ L \ f FF FE Form feed Form feed Moves the printhead / cursor to the first printing position on the next page ( page break ). (Ejects the current page, clears the screen).
13 0x0D ^ M \ r CR FE Carriage return Carriage return Moves the printhead / cursor back to the first print position of the current line. Is used as a line break in BASIC . Is used in classic Mac OS up to version 9 as a line end character ("New line"). Under MS-DOS or Windows , the combination "Carriage Return" + "Line Feed" ends a line. Carriage return can be used on terminals or printers to write several times in a line (e.g. loading bar).
14th 0x0E ^ N SO Shift Out Switching Switch to special display, e.g. B. Bold on a printer.
15th 0x0F ^ O SI Shift In Downshift Switch back to normal display.
16 0x10 ^ P DLE CC Data Link Escape "Data connection escape symbol"
(literally translated)
Gives special meaning to the following characters. May only be used for additional protocol characters.
17th 0x11 ^ Q DC1 Device Control 1 Device control symbol 1 Device-specific control characters, e.g. to switch certain device functions (e.g. font for printers) on and off.

^ S (XOFF) and ^ Q (XON) are also used for flow control with XON / XOFF .

18th 0x12 ^ R DC2 Device Control 2 Device control symbol 2
19th 0x13 ^ P DC3 Device Control 3 Device control symbol 3
20th 0x14 ^ T DC4 Device Control 4 Device control symbol 4
21st 0x15 ^ U NAK CC Negative Acknowledge Negative confirmation Expresses the negative answer to a previous query.
22nd 0x16 ^ V SYN CC Synchronous idle Synchronization signal In the case of synchronous data transmissions, it enables synchronization even in the absence of signals to be transmitted.
23 0x17 ^ W ETB CC End of Transmission Block End of the transmission block Indicates the end of a block of transmitted data blocks if this block end cannot be recognized from the data itself.
24 0x18 ^ X CAN Cancel cancellation Indicates that the data just transmitted is or was incorrect and must be discarded.
25th 0x19 ^ Y EM End of medium End of medium Indicates the end of the storage medium (physical or logical).
26th 0x1A ^ Z SUB Substitutes replacement Replaces a character that is invalid or incorrect, e.g. B. because of a parity error in the transmission.
End of file character (EOF) for text files under CP / M due to the lack of byte-specific file lengths, was initially also common under DOS, although unnecessary.
27 0x1B ^ [ ESC Escape Escape symbol If the following characters have a special meaning, an escape sequence starts .
28 0x1C ^ \ FS IS File separator File separator Separators that logically divide data blocks. The exact meaning of the logical units “File”, “Group”, “Record”, “Unit” is not specified, but it should be arranged from “File” as the uppermost structural unit to “Unit” as the lowest structural unit.
29 0x1D ^] GS IS Group separator Group separator
30th 0x1E ^^ RS IS Record separator Record separator
31 0x1F ^ _ US IS Unit separator Unit separator
127 0x7F DEL Delete Delete characters The DEL sign has a binary code made up of all ones. There is a historical reason for this: once a hole has been punched in a punched tape, it cannot be refilled. However, you can punch out all the remaining holes in a character and thus make it a non-printing control character 'BU' (in the 5-channel Baudot code ), i.e. overwriting an incorrect entry in this way. This is why this character also stands for "deleted character" or "deleted".

C1 control characters

The control characters newly defined in ISO 8859 for all of its sub-standards are rarely used and are now only of historical interest. Most Windows character sets, including the CP 1252 , occupy these code positions with printable characters that are not contained in the corresponding ISO standard, e.g. ISO 8859-1 .

All C1 control characters can also be mapped as C0 control characters using escape sequences , see ANSI escape sequence .

ISO 8859 or C1 control characters
Dec Hex IETF ISO Character name comment
128 0x80 PA PAD Padding character Reserved control character; considered in a DIS-10646 draft, but never included in the ISO-10646 standard. Marked as XXX in Unicode .
129 0x81 HO HOP High octet preset
130 0x82 bra BPH Break Permitted Here A position where a line break can occur. Similar to the wide-free spaces , Unicode U + 200B space zero width .
131 0x83 NH NBH No break here A position where you do not want a line break. Comparable to Unicode U + 2060 word joiner .
132 0x84 IN IND index Moves the current position one line down, but maintains the horizontal position. The index function was declared obsolete in the 4th edition of ECMA-48 (1986) and was deleted in the 5th edition (1991).
133 0x85 NL NEL Next line Moves the current position to the beginning of the next line, alternatively to the home or line limit position. NEL is at the same position as EBCDIC NL ( English Nextline ).
134 0x86 SA SSA Start of Selected Area
135 0x87 IT ESA End of selected area
136 0x88 HS HTS Character tabulation set Sets a tab stop at the active position. Before ECMA-48 (4th edition, 1986) referred to as the " Horizontal Tabulation Set ".
137 0x89 HJ HTJ Character tabulation with justification Moves text to the next tab stop position. The text is understood as the part from the previous tab stop to the active position. Before ECMA-48 (4th edition, 1986) referred to as " Horizontal Tabulation with Justify ".
138 0x8A VS VTS Line tabulation set Places a vertical tab stop on the active line. Before ECMA-48 (4th edition, 1986) referred to as " Vertical Tabulation Set ".
139 0x8B PD PLD Partial Line Forward Before ECMA-48 (5th edition, 1991) referred to as " Partial Line Down ".
140 0x8C PU PLU Partial Line Backward Before ECMA-48 (5th edition, 1991) referred to as " Partial Line Up ".
141 0x8D RI RI Reverse Line Feed Moves the previous line while maintaining the horizontal position. Before ECMA-48 (4th edition, 1986) referred to as the " Reverse Index ".
142 0x8E S2 SS2 Single shift 2 Load character set G2 for 1 character into GL
143 0x8F S3 SS3 Single shift 3 Load character set G2 for 1 character into GL
144 0x90 DC DCS Device control string Start character of a control sequence that ends with ST(" String Terminator "); can contain a command for the receiving device or a status report of the sending device.
145 0x91 P1 PU1 Private Use One Reserved, no standardized meaning.
146 0x92 P2 PU2 Private Use Two
147 0x93 TS STS Set transmit state
148 0x94 CC CCH Cancel Character
149 0x95 MW MW Message waiting Sets a " message waiting " indicator in the receiving device.
150 0x96 SG SPA Start protected area With the following character string, which contains a list of character positions, defines an area that is protected against manual modification or transmission; deletion protection is optional. The character string must end with EPA (" End Protected Area "). The function is called " Start of Protected Area " according to ANSI X3.64 and ECMA-48 (1979), " Start of Guarded Protected Area " acc. ISO 6429 (1983) and ECMA-48 (1984) or " Start of Guarded Area " according to ISO 6429 (1992) and ECMA-48 (1986 and 1991).
151 0x97 EG EPA End Protected Area Specifies the end of a zone that started with SPA. The function is called " End of Protected Area " according to ANSI X3.64 and ECMA-48 (1979), " End of Guarded Protected Area " acc. ISO 6429 (1983) and ECMA-48 (1984) or " End of Guarded Area " according to ISO 6429 (1992) and ECMA-48 (1986 and 1991).
152 0x98 SS SOS Start Of String Marks the beginning of a control character string which is ended with ST(" String Terminator "). The character string must not SOScontain any additional characters (152 decimal or 98 hexadecimal). The interpretation of the character string is up to the respective program.
153 0x99 GC SGCI Single Graphic Character Introducer Reserved control character; considered in a DIS-10646 draft, but never included in the ISO-10646 standard. Marked as XXX in Unicode .
154 0x9A SC SCI Single character introducer Executes the function defined by a single subsequent byte, which, however, has not been standardized. Also the introduction of a proprietary VT100 control sequence.
155 0x9B CI CSI Control sequence intro Initiation of a control sequence. See ANSI escape sequence .
156 0x9C SI ST String terminator Sign of the end of a string , with APC, DCS, OSC, PMor SOSwas started.
157 0x9D OC OSC Operating system command Marks the beginning of an “ Operating System Commandcharacter string which is ended with ST(“ String Terminator ”). The interpretation of the character string is up to the respective operating system.
158 0x9E PM PM Privacy message Marks the beginning of a " Privacy Message " which is ended with ST(" String Terminator ").
159 0x9F AC APC Application Program Command Marks the beginning of an “ Application Program Commandcharacter string that is ended with ST(“ String Terminator ”). The interpretation of the character string is up to the respective program.

Unicode

The control characters of the ASCII range 0x00 to 0x1F can be found in Unicode under C0 Controls (U + 0000 to U + 001F), those of the ISO-8859 range 0x80 to 0x9F under C1 Controls (U + 0080 to U + 009F). The first 128 characters in the Unicode coding UTF-8 correspond to those of the ASCII and ISO-8859 coding, so this also applies to the control characters in the range 0x00 to 0x1F. In addition to these characters, there are a number of other control characters in Unicode .

Graphic symbols for the control characters can be found in the Unicode area Control Pictures (U + 2400 to U + 243F).

Entry under MS-Windows or DOS

As a test, control characters can also be entered under Windows . By holding down the (left) Alt key and then typing in the decimal code of a control character on the numeric keypad, a control character can be entered at the prompt.

Example: Open the command prompt, Alt+ (0 and 7 on the numeric keypad) logs the character ^ G at the prompt, which also makes it clear: Strg+ Gdoes the same thing. If you now Enterpress (or ^ M), this control character is executed in the terminal window and a beep sounds from the system loudspeaker (if available), which corresponds to the bell (BEL) (see table above). Likewise, Alt+ (0 and 8), like pressing Backspace(or Strg+ H), deletes a character. BASIC interpreters that use their own keyboard drivers (e.g. GW-BASIC ) also accept hexadecimal ASCII codes in the form & hZZ, where Z stands for a hex number (e.g. & h0D for carriage return).

See also

Web links

Individual evidence

  1. a b c unicode.org (PDF).
  2. RFC 1345
  3. ISO 8859
  4. unicode.org (PDF).
  5. unicode.org (PDF).