Word separation

from Wikipedia, the free encyclopedia

Word separation (in Austria: compartments ), the division of - mostly longer - words , is used in alphabet fonts to improve the use of space for handwritten and typed line breaks . The word separation follows fixed orthographic rules.

The term hyphenation which means the same thing is e.g. B. with regard to the German language problematic, since the word division here sometimes does not match the phonological or phonetic structure in syllables .

General

At the end of the line there is a split for economic reasons (a word no longer fits completely on one line) and aesthetic reasons (the page is filled more evenly). In many languages, including German , the main basis for separating words is to break down compound words into their components and then break them down into syllables .

Another basis of word separation used in German and some other languages ​​when writing is separation according to etymological principles, i.e. separation based on the original composition (i.e. the original spoken syllables) in one's own or the borrowed language. This type of word division is based on the division into word components, which does not always match the division into syllables as phonetic units. Linguistics defines the syllable as the smallest group of sounds in the natural flow of speech. It is a phonetic and not a unit of meaning. This means that the division into syllables often does not match the division into meaningful units ( morphemes ). Among other things, due to the inextricable conflict between morphological and phonetic principles, word separation, for example in the English language, is so complicated that it is only rarely and then only briefly explained in school in English-speaking countries. Even on the Internet you can find almost no information on this, apart from the advice that is already common in school to look up the dictionary. There are also differences between British and US customs and rules. However, due to the very weak correspondence between sounds and letters in English, it is impossible to make word separation easier, i.e. more phonetic, without a drastic spelling reform.

The word separation in German

delimiter

Book from 1862 with double hyphens

In the German orthography one is to separate words hyphen used. Outwardly, this is identical to the hyphen , which is why the character is also referred to as a divis in both cases. In the sentence, the quarter quarter dash is used as divis , historically also the double hyphen ("„ "). The typewriter created the hyphen minus , which was used here for all middle bars and was longer than a quarter, but shorter than a half-fourth and the minus was even longer. This character persisted on computer keyboards and in the first computer fonts. Today's character sets also contain the other characters and modern text programs often generate different characters depending on the context when the hyphen-minus key is pressed. The glyph of the hyphen-minus usually corresponds to the quarter-quarter again.

Principles

The orthographic word separation in the German language is based on word components, (spoken) syllables, graphic and aesthetic properties. The reformed rules are presented in the article new German spelling .

Occasionally one encounters meaning-distorting separations, which mostly result from automatic spell checking of text processing programs such as Fluch-torte (correct: escape places), Turk-left (correct: door handle), urine-stinks (correct: primal instinct). The reason for this is the incorrect comparison with morphemes that are contained in the preinstalled dictionary. This can only be remedied by manually correcting the hyphenation exceptions in the program or by expanding the installed dictionary. Likewise, there are sometimes “missing” separations that distort the meaning, such as, for example, “the accidental creation of a half-fallen tree” (better: the accidental creation).

Automatic hyphenation

In addition to a spell checker, today's word processing programs usually also offer the option of automatic hyphenation. To do this, they use the approach of built-in dictionaries with data on syllable division. It makes sense to use the dictionaries together for hyphenation and spell checking. In this way, the vast majority of regular and special cases can be covered. The dictionaries are necessarily language-specific; they cannot be used to edit texts in other languages.

In earlier times, when such large amounts of data as in the dictionaries mentioned could not be handled (for reasons of storage space and speed), attempts were made to achieve hyphenation algorithmically, i.e. with pure rule logic. The basic approach is that the software looks at the desired hyphenation point (the end of the line), then searches the text to the left until the next vowel (including umlauts and the Y counts as vowels) and then goes one consonant to the left and then one before this one Separation suggests. Consonant groups such as “ch”, “sch” or (according to the new spelling) “ck” (as well as according to the old spelling “st”) and then for example “gn” (for foreign words derived from Greek such as magnet) are used as a consonant as a refinement level counted. With these very simple rules, which require little program memory space and no memory space at all for dictionary data, programs for German-language texts achieve around 75-80% correct hyphenation points, with the rest they are usually only one letter off. This is always done interactively, so that the user can postpone this hyphenation suggestion before confirming it or even reject a word hyphenation entirely. Because of the exceptional cases mentioned, this approach is also language-specific, but given the manageable amount of data it is comparatively easy to cover several languages ​​with one software.

One element that helps with both dictionary-based and algorithmic separation is the user's ability to use conditional hyphens . These are characters that indicate to the software a suitable separation position; if the separation is required, it is replaced by a normal separating line during printing, if it is not required, it remains invisible during printing. In this way, the user can, for example, prepare foreign language words that are interspersed or special expressions unknown to the dictionary for the correct separation.

Regardless of the basic approach, the software will also follow more general rules for the printing set as a further refinement , for example not to separate fragments of a word that are too small or to use an existing hyphen as a separation if it lies within a tolerance zone (which can be configured in its size). In a refinement stage, this tolerance zone with regard to conditional dividing lines is set larger than for algorithmically found dividing points; because if a conditional separation point is specified, this should preferably be locked onto instead of algorithmically found and possibly deviating separation points.

Word separation in non-printed texts

In texts that are not or at least not primarily intended for printing, word separation is usually dispensed with. This affects most of the content on the Internet, such as websites or e-mails . Since the display of such texts and thus the appropriate position for the line break can vary greatly depending on the device (screen width , font size, etc.), it is usually not possible to set the word separation automatically or manually as part of the text generation. This task would have to be taken over by the software of the end device, in the case of websites, for example, by the browser .

However, since the automatic word division is time-consuming and error-prone, it is often dispensed with, whereby the lines are simply broken at the appropriate place after the end of a word and only hyphens are taken into account as additional possible dividing points. This usually results in left-justified text that looks more or less “tattered” on the right. This can be counteracted by justified representation , which, however, has the disadvantage that very wide spaces can arise.

The lack of word separation is particularly problematic in the case of excessively long words compared to the line width: In extreme cases, a single word that goes beyond the intended line length or even the possible screen width can destroy the layout . Preventing this is in turn a more or less well-fulfilled task of the respective software of the terminal. If necessary, performing software can help itself by forcing a line break at an arbitrary point in the word, disregarding orthographic rules.

In the HTML standard, ­a “conditional hyphen” is provided as a usable character with which a website programmer could specify (“soft”, i.e. conditional) dividing points. Also CSS provides ways to handle the problem. However, both mechanisms are not supported by all browsers, are not ignored by all search engines and are therefore not very common.

“Word Breaking” in URLs

The "word separation" within long URLs poses a particular problem . Since the hyphen is a permitted and frequently used character in URLs, it would not be clear whether a hyphen at the end of a line is a hyphen belonging to the URL or an inserted one Hyphen is. For example, the URL http://de.wikipedia.org/wiki/Wort-trahme could mean both http://de.wikipedia.org/wiki/Wort-trammlung and for http://de.wikipedia.org/wiki / Word separation.

URLs should therefore "not" be separated by a hyphen, but rather be broken without inserting a (ambiguous) separator. Instead, within texts, a URL should be delimited by unique characters that must not be part of a URL. RFC 3986 , Appendix C, recommends using "double quotation marks" or <angle brackets>, i.e. a lower and a higher character (see also URL # URLs in texts ). Long URLs and the likely separations (especially in emails ) can be avoided by using short URLs .

Examples:

See also

literature

Web links

Wiktionary: word separation  - explanations of meanings, word origins, synonyms, translations
Wiktionary: Hyphenation  - explanations of meanings, word origins, synonyms, translations

Individual evidence

  1. ↑ Word separation . Duden
  2. Utz Maas : The connection correlation of German in the horizon of a typology of syllable structure . (PDF, 1.8 MB) In: P. Auer u. a. (Ed.): Syllable cuts and tone accents . Niemeyer, Tübingen 2002, pp. 11–34, here p. 19.