Latin characters in Unicode

from Wikipedia, the free encyclopedia

Latin letters , i.e. characters that are based on the Latin alphabet , are contained in different blocks in Unicode .

The 26 basic letters are - in addition to digits , punctuation marks and control characters - in the Unicode block Basic Latin , while the other blocks contain extensions of the basic alphabet:

  • modified letter forms such as ð , ə or ŋ
  • Ligatures such as æ , œ or ƕ
  • Additional letters such as þ or ɛ borrowed from other scripts but used in Latin orthographies
  • diacritical marks that can be combined with basic letters
  • For reasons of compatibility with older code pages, a large number of ready-made combinations of basic letters and diacritical marks such as ä , ç , č or
  • also for compatibility individual digraphs as ij , nj or dz
  • Display of the Latin letters for the CJK fonts (full-width and half-width)
  • Ornamental and calligraphic variants such as Ⓐ, ⒜, ⒈, ℋ, ℳ,
  • Symbols based on the Latin alphabet such as $ , , ,

Coded characters

Letters

Up to the code point U + 00FF, Unicode follows the Latin-1 character encoding, and thus also ASCII . Thus, the basic letters of the Latin alphabet are together with other characters in the Unicode block Basic Latin , the following block Latin-1, supplement contains, among other characters, letters with diacritics and some special letters , especially the German ß . The next Latin block , extended-A, contains the other Latin letters from ISO / IEC-8859 encodings 2, 3, 4 and 9, as well as letters encoded in ISO 6937 . This block also contains the long s . The Unicode block Latin, extended-B mainly contains phonetic and non-European extensions of the Latin alphabet, including most of the characters of the African alphabet that are still missing . Since Unicode 3.0, the Romanian letters Ș and Ț have also been coded in this block. The Latin block , further addition, contains further Latin letters, including those of the Vietnamese alphabet and the capital ß . The Unicode block Latin, extended-C covers the Uighur alphabet and an extension of the Latin alphabet by Claudius . Further historical letters can be found in the Unicode blocks Latin, extended-D and Latin, extended-E .

The Unicode block Alphabetic Presentation Forms encodes some ligatures of Latin letters for compatibility with other standards .

To represent letters with diacritical marks that are not encoded in Unicode, they can be written as a combination of a basic letter with a combining character . These are located in the blocks Combining Diacritics , Combining Diacritics, Complement , Combining Half Diacritics, and Combining Diacritics, Expanded .

Phonetic spelling

Phonetic transcriptions such as the International Phonetic Alphabet and the Ural Phonetic Alphabet use Latin and Greek letters, as well as some of their own extensions. In Unicode, these extensions are mostly also used as Latin letters. These characters can be found in the blocks IPA Extensions , Spacing Modifier Letters , Phonetic Extensions , Phonetic Extensions, Supplement, and Superscript and Subscript .

Full width characters

The Unicode block of half-width and full-width forms contains the Latin basic letters in a wide form in which they are used together with East Asian scripts in Unicode .

Symbols

Unicode also encodes a number of symbols derived from Latin letters. These are in the blocks Letter-Like Symbols , Enclosed Alphanumeric Characters and Mathematical Alphanumeric Symbols . The latter in particular are intended for use with the other mathematical characters in Unicode . The characters for Roman numbers in the Unicode block Numerals are also considered to be Latin characters.

swell

  • Julie D. Allen et al .: The Unicode Standard. Version 6.2 - Core Specification. The Unicode Consortium, Mountain View, CA, 2012. ISBN 978-1-936213-07-8 . Chapter 7.1: Latin. ( online , PDF)

Web links