Greek and Coptic in Unicode

from Wikipedia, the free encyclopedia

Greek letters are encoded in Unicode in two blocks, there are also a few other characters derived from Greek letters. The coded letters include both the classical Greek alphabet and non-classical letters and display variants. For polytonic orthography, there is not only the option of combining the basic letters with diacritics , but also prefabricated characters. The Coptic script is also coded together with Greek .

Coded characters

Greek letters

The Unicode block Greek and Coptic contains both the classic characters of the Greek alphabet and a few other characters, such as the Digamma . Vowels with a tone and a trema , as used in modern Greek , are also coded as separate characters . The block thus largely corresponds to the ISO 8859-7 coding , some characters were taken from ISO 5428 .

Sigma

The small sigma takes different forms depending on whether it is at the end of the word or not. Although it would have been possible to encode the small sigma only once and to ensure a correct representation with rules similar to those for Arabic in Unicode , it was decided to encode the small sigma in the final form separately. On the one hand, this serves for downward compatibility with previous character encodings, and on the other hand, the effort required for correct representation would be disproportionately high.

Further display variants

A number of other letters were coded one more time in a representation variant. Besides the usual theta θ (U + 03B8) there is also the variant Variante (U + 03D1). Since Unicode does not make any statements about glyphs , the Theta U + 03B8 can take different forms depending on the font . Fonts that are also intended for use in mathematical or physical formulas should, however, represent U + 03B8 as a closed character, U + 03D1 as an open one. In Greek texts, only the variant U + 03B8 should be used, the character U + 03D1, on the other hand, is reserved for formulas where the different representations can have different meanings. Other symbols derived from Greek letters are encoded in Unicode in several places together with mathematical characters .

Polytonic Greek

The characters for polytonic orthography can be represented in two ways:

One possibility is to use combining characters . The same characters from the Unicode block of combining diacritical marks are used as for other languages :

In the Unicode block Greek, Addition , the characters already composed of basic characters and combining characters are available. However, the font support for these characters is usually worse than when using combining characters.

Numeral

The Unicode block Ancient Greek numerals contains Greek acrophonic and papyrological digits.

Coptic signs

Originally, the Coptic alphabet was viewed as an extension of the Greek, so only those characters were included that were not already encoded as Greek letters. These were encoded with the Greek letters together in the Unicode block Greek and Coptic . Since Unicode 4.1, the Coptic script has been viewed as an independent script; the Coptic letters, which are derived directly from the Greek, were encoded in the Coptic Unicode block .

presentation

Big alpha with alcohol asper and acute

Combining characters have to be represented differently in Greek than in other languages: Several combined characters are not stacked on top of each other, but are arranged next to each other. In the case of capital letters, they are not placed above the character, but in front of it.

swell

  • Julie D. Allen et al .: The Unicode Standard. Version 6.2 - Core Specification. The Unicode Consortium, Mountain View, CA, 2012. ISBN 978-1-936213-07-8 . Chapter 7.2: Greek. Chapter 7.3: Coptic. ( online , PDF)

Individual evidence

  1. a b FAQ: Greek Language and Script , accessed on February 19, 2013

Web links