Grapheme or Grafeme ( ancient Greek γραφή graphḗ , German 'writing' and suffix -em ) are the smallest, but not meaningful, graphic units of the writing system of a certain language. So they take meaningful same graphs together in joint classes.

A particular sound in a spoken language, a phoneme , can be written in different ways. Thus, in the two words writing and language of the letters, the phoneme / ⁠ ʃ ⁠ / and once with the graph < sch > and once with the Graph < s shown>. The same applies to the words flying and bird : the phoneme / ⁠ f ⁠ / corresponds with the different graphs < f > and < v >. Different graphs that can either be freely exchanged or are distributed in a complementary manner according to deterministic rules are called allographs . A grapheme consists of such allographs.

The linguistics studied in their sub-discipline grapheme (at) ik the structures and relationships of graphemes to meet by the formation of classes and design principles generalize about a language and their textualization.

Spelling (notation)

Graphemes from the script of the metalanguage , d. H. Latin letters, as well as Greek and Cyrillic letters for some authors , are usually enclosed in pointed brackets pointing outwards in linguistics, as a makeshift guillemets or upper and lower case characters. This also applies to grapheme strings , i.e. sequences of letters that normally form orthographic words .

  • A ⟩, ⟨ from ⟩ (U + 27E8 / 9, mathematically)
    • HTML representation: ⟨a⟩, ⟨ab⟩ or & # x27E8; a & # x27E9 ;, & # 10216; from & # 10217;
  • < A >, < from > (U + 2329 / A, technical)
  • < A >, < from > (U + 3008/9, East Asia)
  • < A >, < from >

Graphemes from object languages with a different phonographic script are usually not marked separately, instead their transliteration is bracketed and, if necessary, their name is given:

  • ψ ⟨ps⟩ Psi - a Greek lowercase letter with a transcription and name.
  • ⟨ka⟩ - a hiraganasyllabogram with transcription, which is also a name.
  • ⟨chi⟩ Ti - a katakana character in which the transcription and name differ.

The (basic) meaning of functional and logographic graphemes is usually given in small caps . For complex graphemes, the same conventions for lexemes are used:

  • Γ Uppercase - a function graph that is only visible in conjunction with other graphemes.
  • 3three - a logo graphic grapheme with its meaning.
  • &And ampersand - a logographic grapheme with its meaning and name.
  • ⟨shān / san / yama /…⟩ mountain or ⟨…⟩ ‚mountain '- a logographic grapheme with a pictographic origin with its transcriptions from different languages ​​and its meaning.
  • san  - yama⟩ the imported Japanese sino On-reading of the kanji in capital letters, the native Kun-reading ": in small letters, implemented via Katakana and Hiragana in Japanese themselves 山サンやま ".
  • ?  awīlum⟩, ?  antuhšaš⟩ person - the sumerogram ? ⟨lú⟩ in Akkadian and Hittite.
  • ⟨mù⟩ water 'wash your hair' - a logographic graph consisting of the meaning water and the pronunciation root ⟨mù⟩, which together take on the specified lexical meaning.


The term grapheme for the graphic level of language follows the same pattern of formation as the phoneme and morpheme . These three emic , language-dependent terms first appeared in Baudouin de Courtenay at the beginning of the 20th century . The term graph was, however, re-coined by Aarni Penttilä around 1932, presumably independently of this, and only then gained international acceptance . However, there is still no complete agreement on what a grapheme is and what is not.

On the basis of these terms, the corresponding terms for etic, language-independent basic units, i.e. phon , morph and graph , as well as for specialist disciplines, e.g. B. Phonem (at) ik ( phonology ), morphem (at) ik ( morphology ) and graphem (at) ik or correspondingly phonetics and graphetics .

Some linguists see graphemes as unidirectional dependent visual representations of spoken phonemes . In common parlance, graphemes are often not or not significantly from the language-independent character and the only segmental occurring alphabets letters distinguished. Even among script linguists , the terminology is inconsistent and sometimes Eurocentric . The differentiation from the graphical units graph and glyph is sometimes difficult and controversial, and for many applications it is also irrelevant.

Spoken language: analog formation to phoneme

The grapheme was created analogously to the phoneme. Both are the smallest meaningful units of their medium. As a phonetic ( phonic ) unit is the phoneme of the object of investigation of phonology .

In many cases, with reference to Aristotle , Ferdinand de Saussure or Leonard Bloomfield, written language is understood not only as a historical ( phylogenetic ) and individual ( ontogenetic ) secondary to spoken language, but as a dependent, unidirectionally dependent system of signs . Proponents of this dependence theory , according to which writing is subordinate to speech and not secondary, see the grapheme accordingly as an image of the phoneme. Correspondingly, exactly one phoneme corresponds to each grapheme and vice versa. For supporters of the competing autonomy hypothesis - such as the Prague School - or the mediating interdependence or correspondence theory , the terms phoneme and grapheme, on the other hand, are laid out in parallel and have equal rights: in an orthography they can be assigned to one another on the basis of rules ( phonographic principle ).

This is expressed, for example, in the terminology of glossematics according to Louis Hjelmslev , in which the smallest expression- side units are called kenemes and can be both phonemes and graphemes; Together with the content-related pleremes, they form the smallest linguistic characters, so-called glossemes . In order to please both camps, John McLaughlin suggested the term graphoneme for the hyponymous grapheme group of phoneme-dependent graphic units , while later Klaus Heller instead divided the graphemes into phonematically determined phonographemes - with encapsulated notation as in ⟨sch / ʃ / ⟩ - and graphematically determined graphographemes as generalized classes of concrete characters. Oliver Rezec analogously separates phoneme images from the actual graphemes.

Both phonemes and graphemes should be found as semantically distinctive (i.e. meaning-differentiating but not themselves meaningful) units by means of minimal pair analysis, whereby they must not be further decomposable on the same level. Since the sound medium is continuous and analog, while the written medium is (approximately) discrete and digital, phonemes are found as components of the larger unit of syllable when comparing phonematic words in pairs , while graphemes are found as compositions of the smaller unit of characters. This is due to the fact that graphematic words in many writing systems on the one hand already have clear external boundaries and on the other hand are already structured segmentally internally. Then distinguish the two monosyllabic words / zaɪn / and / ʃaɪn / only syllable head , which with the phonemes / z / and / ʃ / is busy, which is also reflected in their graphical equivalents ⟨ His ⟩ and ⟨ bill , finds⟩ only that the grapheme ⟨ Sch ⟩ three individual characters, which - despite other minimal pairs like ⟨ stone are inextricably linked -⟩. In both realizations of language, identity occurs as a special case (which is common depending on the language) in that a phoneme corresponds to a vowel with syllable syllables or a grapheme corresponds to a letter.

Recursivity: delimitation of letters

The grapheme (together with the phoneme) was supposed to replace the concept of the letter used before the linguistic turnaround (in other languages letter , littera, etc.), as this was used for both written and spoken language signs. Since its phonic meaning has largely disappeared, some linguists use letters interchangeably with or in place of grapheme, at least as long as they are familiar with segmental, i.e. H. employ alphabetic writing systems.

Martin Neef argues that the graphematics an alphabetic script can get along alone with the unit letter had since the unit grapheme both by single letters, for example. ⟨ S ⟩, as well as by groups of letters, ⟨ sch can be realized⟩, and for example, is. The initial capitalization of letters, ⟨ snow ⟩, instead of grapheme, * ⟨ snow ⟩ refers.

An alphabet is a conventionalized set of letters. The main difference between it and any other closed character set is that it determines the sort order of the letters. The elements of some alphabets therefore also serve as counters or numerals, for example the Qoppa ⟨ϟ⟩ has only survived as an ordinal symbol for 90th in the Greek script . Graphemes do not have such properties, but a grapheme can recursively contain other, smaller graphemes.

In some scripts, the concept of letters encompasses various forms or cases, in particular the distinction between lowercase and uppercase letters , i.e. H. Lowercase and uppercase letters; therefore, for example, the English alphabet has 26 and not 52 letters. Graphemes, on the other hand, although they are graphic units, are not necessarily bound to a visible character body and often either no distinction is made between uppercase and lowercase letters at all or these letter variants are treated in exactly the same way as two independent graphs.

Language dependency: use of characters

Graphemes are language dependent; H. they must be determined separately for each written language or for each writing system. Characters (Engl. Character ) as constituents of a script, however, are linguistically or even unsprachlich definable. There are special cases such as the Serbo-Croatian writing system, in which each basic graph uses a Latin and a Cyrillic character.

In terms of semantics , characters are not yet linguistic characters, despite the other implicit designation, but rather graphemes. However, both are implemented as graphs using glyphs .

Abstraction: concretization as a graph

Graphemes are the smallest linguistic units, but they can sometimes be further subdivided. This further analysis is by definition no longer part of (written) linguistics, but rather the task of one of its (paralinguistic) auxiliary sciences, graphetics . The object of consideration is every “concrete, classifiable graphic appearance” and is called a graph . Graphemes are then abstract classes of equivalent concrete graphs, which are called allographs .

Some graphic artists, including Beatrice Primus and Herbert E. Brekle , have analyzed the Latin and sometimes also the Greek minuscule to the effect of how their abstract formal components correlate with phonological properties; For example, the normal vowel letters ⟨a , e, i, o, u⟩ have no ascenders and descenders. Since this would justify the fact that graphemes can be smaller units than letters, the definition is sometimes supplemented by the requirement that graphemes consist of (in a frame ) closed, unbound, but possibly complexly composed units. Grapheme can therefore be recursive, which both letters compounds such ⟨ sch ⟩ is the case as well as in most sinograms , since at least some of its constituents may be grapheme each themselves.

Many names of graphic units use the morph { -graph } or { -gram }. The first is more associated with a graphical meaning and the second with a language-dependent graphematic meaning, but the distinction is not made uniformly and systematically.

In Rezec, basic forms correspond to graphs in this sense, while there graph is equated with glyph.

Virtuality: materialization as a glyph

When it comes to the (prototypical) appearance of a graph in a certain handwriting or print, one speaks of glyphs instead. You are u. a. the subject of work of type designers and type artists ( calligraphers ). This makes them even less an object of linguistics than graphs.

For example, positional or directional allographs appear in several different, but visually different, similar glyphs. The position can be the beginning of the word ( initial ), in the middle of the word ( medial ), the end of the word ( final ) and standing alone ( isolated ) and the writing direction is line by line (horizontal) or column by column (vertical), clockwise ( dextrograd ) or counterclockwise ( sinistrograd) ). Also ligatures have their own glyphs, although they consist of several graphs, while individual graphs of basic and diacritic of two glyphs can be assembled. Glyph boundaries can, but do not have to, correlate with graph or grapheme boundaries. The same glyph can be used for different writing systems, e.g. As the Latin capitals ⟨ A ⟩ and the Greek big ⟨ .alpha.Alpha , the lowercase differ.

Glottographic types

The status of graphic characters in non- and parilingual notations, e.g. B. in mathematical or chemical formulas, is not handled uniformly. In some theories they are simply ignored or overlooked, others try to describe them as a special type of grapheme. Many writing linguists restrict themselves to glottographic grapheme types, which are characters for writing down language. There are basically two different groups:

  1. Open character sets with several thousand logo or morphograms that are used uniformly across dialects and languages ​​and, at least in principle, can be constantly expanded to include new characters using constitutional rules.
  2. Closed classes of a few dozen to a hundred phonograms , the graphic form of which is largely arbitrary and which in the word context - depending on sometimes complex, conventionalized orthographic distribution and correspondence rules - have a clear, language-dependent reading with only dialectal variance.

In addition, there is a group of auxiliary graphs for punctuation , which are usually only of importance at the syntactic level and can influence the pronunciation of the phrase ( prosody ), but are not announced themselves. This also includes the space (space) ⟨␣⟩ in various forms as a "null graph" . Most of these characters also indicate word boundaries.

Open character set

Although they often logograms are called, are signs of this type are generally not for complete word forms , but for the most free lexical word paradigm and usually bound grammatical affixes or free particles , ie for L - and G - morphemes . That is why morphograms are used as an alternative .

The basis of these graphic signs was often originally formed using one of two iconic methods: either concretely depicting, pictographically , or abstractly symbolizing, ideographically . If one observes some graphical conventions that result from the preferred writing medium , this possibility of character genesis still exists in principle, but normally there is a limited (sub) inventory of constituents from which new characters can be composed according to certain combination rules.

In this case, the term grapheme, which deviates from its actual definition, is usually applied to the product and not to the meaning-differentiating constituents, which are instead referred to as subgraphemes or the like. This is all the more true if, as is customary with the East Asian sinograms , this is independent of its inherent and combinatorial complexity, i.e. H. the number of bars and partial characters, is written in an invisible frame ( frame ) of a fixed, usually square size. Due to this rule of formation, they are also called tetragrams , which are perceived as atomic in the normal writing and reading process despite their systematic structure. Since this is only a graphetic and not a graphemic term, it also covers the characters of the closed phonographic systems of the Japanese Kana and the Korean Hangul .

Depending on the script, the subgraphemes can also remain usable independently, so they are recursive units on different levels of the writing system. They can have various positionally motivated allographs; H. they look a little different depending on the position and combination.

The constituents either contribute to the meaning as a pleremic determinative (Δ), also a signifier , or, as a kenemic phonetic (Φ), provide an indication of the pronunciation of the sum object , i.e. H. of the morphogram. Both are usually imprecise and only unambiguous in the conventionalized combination. If there are only a few possible determinants, which therefore only allow a rough categorization, one also speaks of taxograms , if it is suitable for a more precise semantic classification, also of semagrams . In some scripts some or all of the subgraphemes can play both roles, in others they are limited to one, e.g. B. in the Egyptian hieroglyphs . The positions in the overall character can be preferred or exclusively occupied by one or the other type. Depending on the system, only some or all conceivable combinations of the two types occur: ΔΦ, ΦΔ, ΔΔ, ΦΦ and more complex constructs. In the simplest case, a single phonetic stands for a homophone and if several, but exclusively phonetics, are used, the rebus principle applies , whereby the semantically most obvious one is often chosen from several possible alternative characters, e.g. in the Chinese transcription of foreign place names and personal names.

With a narrow interpretation of the grapheme term, i.e. when the constituents are referred to as graphemes and the products as (two-dimensional) grapheme chains or graphematic words , the difference to the syllable systems largely disappears. The phonograph inventory of the phonetics is only very large, contains duplicates and under certain circumstances overlaps with the graphograph inventory of the determinatives, which are not available or only to a very limited extent in classic syllabars.

Closed character set

Phonograms are classically divided according to whether they are mainly used to represent syllables or syllable segments (letters).

Syllabic graphemes

Segmentation of the spoken syllable (σ) in rhyme (ρ), optional consonant (C * ) onset (ω), obligatory vowel (V + ) nucleus (ν), optional consonant coda (κ) and possibly tonal (T ? ) Tone ( τ)

In syllabary scripts , the inventory of characters is called a syllabar and the characters are called syllabograms . However, there are many types of syllabary fonts and syllable characters, since never all possible phonematic syllables of a language are written with an exclusive syllabogram each. Instead, there are orthographic rules in order to make do with a limited, possibly cross-lingual repertoire. Because of such character combinations of Graphembegriff is sometimes digraph "Syllabogrammketten" as the Japanese Yoon extended (⟨ Ci jv ⟩ / CJV /).

In a "real" syllabary, the syllabic writing and reading units can be assumed to be graphemes, but many systems are not completely arbitrary:

On the one hand, there are synthetic scripts sometimes called Abugida , such as those of the Indian Brahmi script , in which vowels are either inherent Ca ⟨Ca⟩ or are diacritically tied to the syllabic bases Ca e ⟨Ce⟩ and thereby z. T. form complex ligatures. Since consonants and vowels are notated on different levels of the writing system, some linguists see the formed syllable ligatures as graphemes, for others this hierarchical segmental principle does not differ significantly from the equally important segmental scripts, and accordingly, subject to individual linguistic studies, both phonographic character types are considered graphemes. A special type of grapheme in these scriptures is the virama , which is used like a vowel symbol but denotes the omission of the inherent vowel.

On the other hand, scripts developed in the modern era, such as the Cree, are often systematic in that both the consonants and the vowels of the CV-Syllabograms uniformly vary geometrically in rows or columns. Here the question arises whether the character orientation is graphemic, i.e. whether, for example, a general vowel graph triangle  Vcan be postulated to which the exact value is assigned via a dependent alignment graph top (right) below  X e etc., i.e. ᐁ  V e , ᐃ  V i , ᐅ  V o , ᐊ  V a , or four independent vowel graphs ᐁ  e , ᐃ  i , ᐅ  o , ᐊ  a can be identified.

Most of the so-called syllabic scripts also contain non-syllabic characters, e.g. B. Such as the Japanese ん  n , which can only be used for the coda. These usually also have grapheme status.

Some characters used with the syllabograms, such as the Japanese ⟨ Chōon and ⟨ っ / ッSokuon, have a variable or functional character due to orthographic conventions and do not appear freely, but are linked to the preceding or following syllabogram, which changes its phonographic quality becomes.

Segmental graphemes

For alphabet fonts, the respective letters are often used as graphemes (alphabetical grapheme). This also applies to other segmental scripts, sometimes called Abdschad , in which (some) vowels are not written on the same writing level as consonants or optionally or not at all.

Many linguists understand the upper and lower case letters as allographic variants of the same grapheme or as a “functionally connected pair of graphemes”, but the distinction can be semantically relevant, especially in German, cf. Adjectivepoor ⟩ vs. Nounarm ⟩ vs. AcronymARM ⟩. It should also be noted that the initial capitalization affects letters in many languages, but in some languages ​​it affects graphemes, e.g. B. Dutch ⟨ IJssel ⟩ instead * ⟨ Ijssel ⟩. Therefore, some scientists postulate abstract function graphemes that indicate their presence only by interacting with other character bodies, e.g. B. "Supragrapheme" at Gallmann. These functions can be activated syntagmatic (out of the sentence) or paradigmatically (out of the word) and even purely grapho-stylistically. This allows ⟨ ARM ⟩, ⟨ arm ⟩ and ⟨ poor ⟩ each as unmarked or marked overrides arm , arm and ARM legitimized while ⟨ aRM ⟩, ⟨ arM ⟩, ⟨ ArM ⟩ and ⟨ AR m ⟩ be excluded.

Because the grapheme is the smallest meaningful unit, grapheme strings of the same meaning, i.e. H. graphematic words, built up from the same graphemes. However, many orthographies allow certain variants, e.g. B. ⟨ hairdresser ⟩ and ⟨ hairdresser ⟩ can be so why not be generally assumed that if only from the fact that a grapheme corresponds to a letter ( orthographic grapheme ). In the dependenztheoretischen term world what often involves writing didactics can between the used normally Basisgraphem , for example. / T / → ⟨ t ⟩, and its orthographic justified variants, the Orthographemen be distinguished, for example. / T / → ⟨ d ⟩ because of morpheme constancy.

Native German grapheme inventory
9 vowel graphemes: A ⟩, ⟨ ä ⟩, ⟨ e ⟩, ⟨ i ⟩, ⟨ ie ⟩, ⟨ o ⟩, ⟨ ö ⟩, ⟨ u ⟩, ⟨ ü
20 consonant graphs: B ⟩, ⟨ d ⟩, ⟨ f ⟩, ⟨ g ⟩, ⟨ ch ⟩, ⟨ h ⟩, ⟨ j ⟩, ⟨ k ⟩, ⟨ l ⟩, ⟨ m ⟩, ⟨ n ⟩, ⟨ p ⟩, ⟨ qu ⟩, ⟨ r ⟩, ⟨ s ⟩, ⟨ sch ⟩, ⟨ ß ⟩, ⟨ t ⟩, ⟨ w ⟩, ⟨ z

Since in many writing systems some (fixed) letter combinations in the minimal pair analysis take positions that can otherwise only be assumed by single letters, such digraphs , trigraphs or plurigraphs are often also viewed as graphemes of this language. In the Latin script is the ⟨particularly frequently h ⟩ used as rear components of these combinations. The letter ⟨ c ⟩ ⟨in the German (not only) compounds ck ⟩ and ⟨ ch ⟩ can be interpreted as allographic variant of the following character, located so that the regular (theoretical) graphemes formed ⟨ kk ⟩ and ⟨ hh ⟩ result; the same applies to the relationship between ⟨ tz ⟩ to ⟨ zz ⟩.

The diacritical marks , which incidentally also exist in syllabary scripts, can be analyzed graphematically in three different ways according to Gallmann (here using the German umlauts as an example):

  1. Like ⟩, ⟨ ö ⟩, ⟨ ü ⟩ as ⟨ i ⟩ independent graphemes. In Austrian telephone books, such an interpretation can be seen in the fact that the accented letters are placed after the basic letters, while in the Scandinavian languages ​​they are added at the end of the alphabet.
  2. Ä ⟩, ⟨ ö ⟩, ⟨ ü ⟩ are allographic realizations of Graphemgruppen ⟨ ae ⟩, ⟨ oe ⟩, ⟨ ue ⟩ or umlaut ⟨ ¨ ⟩ are allograph to ⟨ e ⟩. This interpretation is based on crossword puzzles and the library sorting, in which names that sound the same but are spelled differently should appear next to one another.
  3. Like ⟩, ⟨ ö ⟩, ⟨ ü ⟩ are ⟨ a ⟩, ⟨ o ⟩, ⟨ u ⟩ plus diacritical mark "umlaut", d. H. they only form a marked variant of the basic letters. The dictionary sorting uses this principle, since lexemes formed from derivations ( derivation ) often appear next to their root words.

The second and third approaches correspond to the colloquial view that the German alphabet has 26 letters.


So you write according to traditional German spelling and without Germanized spelling of French loanwords ⟨Nuss -Nougat- Crème⟩, whereby the accent is often left out, ⟨ … -Cream⟩ , and the end- ‹ e › can remain silent. With the spelling reform must ⟨ nut ⟩ through ⟨ nut be replaced⟩ while the Gallizismusnougat ⟩ either as ⟨ nougat written⟩ and then ⟨ cream ⟩ depending unintended debate on ⟨ Krem ⟩ or ⟨ Kreme can be changed⟩. Thus, this word shares without formal changes in the orthographic Graphemes ⟨ N ⟩⟨ u ⟩⟨ ß | ss ⟩⟨ - ⟩⟨ N ⟩⟨ ou | u ⟩⟨ g ⟩⟨ a ⟩⟨ t ⟩⟨ - ⟩⟨ C | K ⟩ ⟨ r ⟩⟨ è | e ⟩⟨ m ⟩⟨ e | ∅ ⟩ that their combination with respect to each other but not completely free.

The hyphen (and, in a somewhat different way, the apostrophe ) has a special role in European letter scripts, as it is not a phonogram, i.e. it is not spoken directly, but can only change the status of other grapheme chains by combining them into compounds . In addition to this explicit hyphen there is also an implicit hyphen at all possible separation points > ⟨ · ⟩, where it is visible only at end of line. Although hyphenation is often used across the board, in the various writing systems of the world there are hyphenation methods on syllable, morpheme and grapheme, rarely also on glyph boundaries.

The composition of several single words to a new word, depending on the orthographic conventions, however, not only by a hyphen ⟨ x - y ⟩ but also by directly concatenating ⟨ xy ⟩ or - often outside of text, but not according to the rules - with spaces (ie horizontal white space ⟨ xy ⟩ or line break ⟨ xy ⟩) done with the initial capitalization of the hind limb are not preserved must, since not a word beginning ⟨necessarily # ⟩ is included.

The grapheme Γ binding point or ⟨ - ⟩ can thus have the values ​​‹ - # ›, ‹ # ›, ‹ › ( empty ), ‹ ␣ # › ( space ), ‹ › and ‹ ↩ # › ( line break ). It should be noted that this apparent Allographs the partial words to different degrees bind to each other and can therefore be used for one meaning, making them possibly be different graphemes: ⟨ nut nougat cream ⟩, ⟨ NussNugatKrem ⟩ and ⟨ nut nougat cream ⟩ and ⟨ Nussnugatkrem ⟩ are part of the extension of the same graphical definition, whereas in ⟨ nut Nugatkrem ⟩ and ⟨ pralines-cream ⟩ different semantically relevant accents, namely, for example, in the first case as opposed. to ⟨ almond Nugatkrem ⟩ or ⟨ nut cake ⟩ and for example, in the second case. ⟨to pralines bars ⟩ or ⟨ peanut cream ⟩.

For the first grapheme after the initial word boundary can be either syntactically (if with initial block boundary coincides), grammatical (the nichtpronominalen head of a noun phrase ) or lexemisch (with names and nouns motivated) grab a Allographiebeschränkung, according to the first letter of the grapheme only a capitals is allowed.

Similarly were once, especially in the fracture rate , for the last grapheme before a medial or final word boundary special conditions when the letter ⟨ s ends⟩, which then does not match the graph < s >, rather than < s was shown>, and In addition, an orthographic ligation rule applies for an original ‹ ſſ › , which requires the graph ‹ ß › instead of ‹ ſs › .

This allows this example word with its diachronic variants that z. Some of them are not yet or no longer orthographically valid, describe graphemically as follows:

Nut nougat cream
→ ⟨ # nu⟨ ss ⟩⟨ - ⟩n⟨ ou ⟩ · gat⟨ - ⟩⟨ c ⟩r⟨ è ⟩⟨
# ⟩ → Γ Majuskel ∨ ⟨
Ss ⟩ → < ss > ∨ < ß >
C ⟩ → < c > ∨ < k >
Ou ⟩ → < ou > ∨ < u >
È ⟩ → < è > ∨ < e >
⟩ → < · me > ∨ < m >
· ⟩ → ‹ › ∨ ‹ -↩ › ∨ ‹
- ⟩ → ⟨ · ⟩ ∨ Γ Bond#
Γ Binding → ‹ - › ∨ ‹ › ∨ ‹

However, this variant analysis does not yet make a decision about graphemes actually present in the German writing system, but only identifies the first candidates.

Syntactic graphemes

Graphic signs of a writing system, which are not used for word formation but only for sentence structure, are also sometimes viewed as graphemes and called syntactic graphemes or syngraphemes . In addition to punctuation, this also includes indirectly visible function graphs such as capitalization at the beginning of a sentence, which if necessary ensures that words that, according to lexical rules, begin with a lowercase letter are also written with an initial major.

There are different types of punctuation marks: some can only appear at the beginning or end of a word or sentence, others also or only in the middle (separating or connecting) and still others, e.g. brackets and quotation marks , occur i. d. Usually only as a couple.

Gallmann grapheme classes
formally defined
grapheme classes ( graphical means )
functionally defined
grapheme classes
Graphemes Supragraphemes
linear flat
independent employed concrete abstract concrete abstract
  • Letters
  • Auxiliary characters
  • Spaces
  • Digits
  • special character
  • diacritical marks
  • Underline
  • Initial capitalization
  • general capitalization
  • Small caps
  • font
  • Font marking
  • font size
  • Lock
  • Ligatures
  • Superscript / subscript
  • Border
  • Clay, grid surface
  • Line, block of text
  • exclusion
  • Indent
  • Beginning and end of line
  • (Basic) graphemes
  • Ideograms
  • Classifiers
  • Limit signals
  • Sentence intention signals
  • Omission signals

Technical grapheme

Applied linguistics includes the electronic coding of writing and its characters at the interface with computer technology. There are glyph-based approaches such as in 7-bit SMS messages , grapheme / constituent-based approaches or frame-based approaches.

In Unicode , the characters are basically modeled as combinations of basic units that are coded, but partly for reasons of compatibility and partly for pragmatics, there are many ready-made characters, which makes it necessary to define canonical equivalences and preferred codes (e.g. NFC ). Thus, a ⟨ ä ⟩ as a combination of ⟨ aU + 0061 and ⟨ ¨Trema U + 0308 or directly as ⟨ ä00E4 U + are stored. The Korean Hangul are not only in the form of 70 combinable individual components ( Jamo ), but also in over 11,000 blocks of syllables, while sinograms are only encoded as a total unit.

The character coding does not need to correlate in a simple way with the input via the keyboard or the like, for example accents are typed in before the base letter using dead keys , but they are stored the other way around or as a common unit.

The Unicode standard uses the term grapheme in a simplified and language-independent meaning. A grapheme ( grapheme ) is then either "minimal distinctive unity of Scripture in the context of a writing system, so a graphic sign, can be distinguished with the two words from each other" ( minimal pair analysis ) or "what that user for a character ( Character hold) ". In addition, the terms Grapheme Base , Grapheme Cluster - "a horizontally segmentable text unit, consisting of any graph base combined with any number of non-width markers" -, Grapheme Extender - all zero-width markers, connectors and separators as well as some not null wide marker - graphic characters ( Graphic Character ) - letter ( letter ), Recombinant marker, number, punctuation, symbol or Spatium.

See also


