new line

from Wikipedia, the free encyclopedia

The term line break comes from electronic word processing and specifies the point at which a text should move from one line to the next. Often one speaks only briefly of upheaval .

General

On a typewriter , the line break is carried out explicitly by pressing a key or a lever. Two functions are carried out:

  • Carriage return - positioning of the writing point at the beginning of the line (far left).
  • Line feed - positioning of the writing point down one line.

When teleprinters were introduced , various control characters (codes for electrical signals) were introduced to represent the line break function of a typewriter. These are then, by using them as the first output devices of the computer science , from the telecommunications been transferred to the electronic data processing.

Pure text files on the computer initially resemble text written on a typewriter in their display on the screen ; the control characters are generally invisible to the user. With the scroll bar , the relationship between screen width and line length is lost, with proportional fonts the relationship between the number of characters and line length. The characters for the line break only got detailed functions in the text markup ( rich text format and similar).

Because the control characters were still specified in the early stages of computer technology, their function change has made them one of the major incompatibilities between different operating system and application software systems.

Word processing: new paragraph, new line, hard and soft line break

A distinction is made in the text formatting of the word processing systems between a paragraph break and a line break , as well as between hard (manual) and soft (automatic) line breaks. The following input methods and control characters correspond to the conventions of common word processing programs; Different operation and display depending on the system are possible.

  1. A paragraph break (new paragraph) is still used today by many users as a line break. The entry is made with the Enter/ key , « » (paragraph mark, pilcrow) is often used to display the control character on the screen . With current word processing systems, this key should only be used if, in addition to the line break, changed paragraph formatting (e.g. by changing the formatting template), the automatic insertion of white space or the execution of other rich text formatting is desired. Only in systems without these capabilities (pure text editors, plaintext ) should the Enter key be used to simply end a line. HTML - Tags are for the beginning and end of a paragraph <p>and </p>to paragraph.
  2. A simple line break (new line) is used to start a new line without breaking the current paragraph formatting, or to create line breaks in tables (where a cell would be terminated by the paragraph mark). The control character is «↵», entry is made with ⇧ Shift+↵ Enter or Strg+↵ Enter depending on the system . It is also saved in the file. The HTML tag is <br>and for W3C -compliant ( valid ) XHTML is <br />for (line) break .
  3. A hard line break ( English hard break ) is - as far as the software stores texts nor as a sign of power - "hard" written in the edited file at the end of Vorzeile or the previous paragraph as control characters (as in point 1 and 2 below).
  4. A soft line break ( automatic word wrap , English soft break ) on the other hand automatically generated and when displaying the text by the software is not added to the file. The software can automatically place the current word at the beginning of a new line when a certain line length, especially the window width , is exceeded (Word Wrap) . In this way, the entire text can be displayed without the user having to scroll horizontally. The soft line break is (depending on the system) mostly not saved in the file. The user is relieved of the need to manually break the lines. Many modern text editors have a line break function that automatically breaks the paragraph when additional words are inserted or removed. In the web typography (HTML documents), this is preset by default.
  5. Many programs offer the option of entering non-breaking spaces or optional hyphens, where no automatic break may be carried out or where word hyphenation should be preferred ("soft hyphen", i.e. the conditional hyphen ) (if, for example, the word hyphenation in the integrated Dictionary missing). In HTML there are formatting <pre>instructions (such as ) or the instruction no automatic line break in the paragraph formatting (in CSS white-space:nowrap; earlier also with the non-standardized HTML tag <nobr>).

Further line break situations arise both when changing pages ( full page break ) and when setting columns (column break).

In printing, the breaking of lines, taking into account columns and pages as well as image elements, graphics and the like, is called mettage . In electronic data processing, this is done by the word processing software . The more powerful the latter is, the more beautiful and legible the upheaval is.

Coding of the line break

ASCII and EBCDIC

Text file created with gedit under Linux in a hex editor . Apart from the text objects, you only see the coded line feeds.
0A

When developing the ASCII character set, two characters were reserved:

  • The control character for the line feed ( English line feed , short LF) is 0Acoded as ASCII character 10 (hexadecimal ). Some systems allow you to enter the LF-sign with the key combination Strg + J.
  • The control character for the carriage return ( English carriage return , in short CR) is an ASCII character 13 (hexadecimal 0Dcodes). Some systems allow you to enter the CR-sign with the key combination Strg+ M.

There are various standards for explicitly encoding line breaks in a text file:

operating system Character set abbreviation Code  hex Code  decimal Escape sequence
Unix , Linux , Android , macOS , AmigaOS , BSD , others ASCII LF 0A 10 \ n
Windows , DOS , OS / 2 , CP / M , TOS (Atari) CR LF 0D 0A 13 10 \ r \ n
Mac OS Classic , Apple II , C64 CR 0D 13 \ r
AIX OS & OS / 390 EBCDIC NL 15th 21st \ 025

On IBM - mainframes is the line break in the files no control characters. Rather, the line length is stored in DCB (record format F or FB) or in a length field at the beginning of the line (record format V or VB).

In Mac OS X, due to its extensive compatibility with its predecessor Mac OS, there are still some text formats that use CRinstead of LFline separators. Many modern Mac OS X programs can therefore handle both formats in text files. When using incorrectly declared files that CR LFuse, this leads in some programs to double line breaks. Only files that come from the BSD or Unix world are usually LFbound to be used as line separators.

Unicode: additional characters that mark line breaks

Unicode text calls the Unicode standard in the Unicode line breaking algorithm of software to be unicode compliant, in addition to the above, and in unicode compliant strings CR, LFand CR LFthe following additional characters are recognized as line breaks:

abbreviation English name German name Codepoint
FF Form feed Form feed (with inevitable line break) U + 000C
NEL Next line Next line U + 0085
LS Line separator Line separator U + 2028
PS Paragraph separator Paragraph separator U + 2029

Programming: coding of the line break

Problems arise when exchanging between different systems due to the different conventions for coding the forms of line breaks on computer systems, which arose when the teletypewriter conventions were adopted in electronic word processing.

A well-known example is the function printf()or fprintf()from the Standard C Library for writing to files. The escape sequence \n( LF) stands for a line break in C. When writing to files, C distinguishes between text mode and binary mode. When files are opened in text mode, they are translated \ninto the control characters for line breaks that are customary on the respective system. This means that there is no conversion in the Unix-like operating system, since LFthe line break is already there. On the other hand, there is a substitution by under Windows CR LF. The resulting files are therefore not identical. If the file is opened in binary mode, no translation takes place; instead, a is always LFwritten to the file.

With Java the character constants (escape sequences) \nand \rare available; there is no conversion, instead the platform-dependent characters for the line break can be inserted using separate functions. The newer printffunction knows the formatting %ncode to output the platform-specific line separator. When reading the Java library is tolerant and accepts both CR, LFas well CR LFas end of line for readLine(). Is a EBCDIC - Code Page like Cp500is used, the byte is EBCDIC NEL(0x15) to LF(U + 000A) and not NEL(U + 0085) mapped.

Other programming languages ​​such as Visual Basic or Perl also provide similar functionalities in order to process text files correctly.

Numerous network protocols for the transmission of text, e.g. B. HTTP, SMTP or FTP define the sequence CR LFfor a line break. Some programs, e.g. As mail transfer agents , are strictly and even refuse the processing of data with standalone LFs ( "Bare LF"). However, other protocols recommend LFinterpreting a single one as a (possibly soft) break. Section 2.11 of the W3C Recommendation on XML defines how line breaks are to be handled. In version 1.1 U + 0085 and U + 2028 have been added here.

Identification of unspecified or undesired line breaks

A typographical break that is suppressed is used, for example, in poetry quotations when quoting lines:

“I sit down a stone / and behind it leg with legs, / on it I put my elbow; [...] "

- Walther von der Vogelweide

This ( Virgel ) marks the rhymes, for example, clearer paragraphs such as stanzas can then be set with "//".

Conversely, it may be necessary in electronic word processing to mark a line break that is created as undesirable. That arises z. B. in programming languages ​​in which the break is a control character, but also when specifying URLs (web addresses). Here you use " _ " (underline), " \ " (backslash), depending on what is not otherwise used as a control character in the respective format, or the character such as " wie " (U + 21A9). The character «↩» here is a print typographical instruction "ignore break" - when copying and pasting the text passage in the address line of a browser, for example, the part after the line break is ignored by some programs, others put the web link back together, then the character " ↩ »can be removed manually - in a purely electronic medium, the sign is rather annoying.

When proofreading in printing, the correction characters " Correction mark paragraph.svg" are used for missing and " Correction mark append paragraph" for unwanted paragraphs ('insert line break' or 'remove line break', i.e. 'append paragraph'):

Text with correction marks

See also

Individual evidence

  1. br in the SELFHTML Wiki
  2. java.io.BufferedReader Java documentation at Oracle
  3. Bare LFs in SMTP
  4. Extensible Markup Language (XML) 1.1 (Second Edition), W3C Recommendation 16 August 2006
  5. Correction marks. mediaforum.ch