Numeral in Unicode

from Wikipedia, the free encyclopedia

In addition to letters and other characters, Unicode also encodes a series of number characters for various number fonts . In addition to various forms of decimal digits , Chinese numbers and historical numerals such as Roman numerals are also encoded. There are also fractions and various symbols derived from numbers.

General

In order to work with numerals, the Unicode standard provides two properties : Numeric_TypeSpecifies what kind of numeral it is. The value decimalidentifies a character as a decimal digit so that programs can easily determine the numerical value of a sequence of such numerals. With other numerals, more complex conversions may be necessary, for example with Roman numerals. The numerical value of a character can be read from the property Numeric_Value. The encoded number characters cover a range of values ​​from −½ (༳, U + 0F33, Tibetan) to 1,000,000,000,000 (兆, U + 5146, Chinese and ?, U + 16B61, Pahawh Hmong).

Characters that are only sometimes used to represent numbers are not considered numerals. In a list that uses letters ( a)… b)… c)… ) the letters have the values ​​1 to 3, but since this is not the main use, Unicode treats them as letters, not numbers.

Coded characters

Decimal digits

The Indian decimal digits are used in different forms in many different scripts. Unicode therefore encodes the digits separately for the individual writing systems. “European” refers to the number forms that originally developed in Europe, but are in use worldwide today. There are also the numerals in Arabic and various Indian scripts. N'Ko is out of the ordinary, as numbers are written here from right to left.

shape Digits block
European 0123456789 Basic Latin
Arabic ٠١٢٣٤٥٦٧٨٩ Arabic
Arabic
(Iran, Pakistan, Afghanistan)
۰۱۲۳۴۵۶۷۸۹
Devanagari ०१२३४५६७८ ९ Devanagari
Bengali ০১২৩৪৫৬৭৮৯ Bengali
Gurmukhi ੦੧੨੩੪੫੬੭੮੯ Gurmukhi
Gujarati ૦૧૨૩૪૫૬૭૮૯ Gujarati
Oriya ୦୧୨୩୪୫୬୭୮୯ Oriya
Tamil ௦௧௨௩௪௫௬௭௮௯ Tamil
Telugu ౦౧౨౩౪౫౬౭౮౯ Telugu
Kannada ೦೧೨೩೪೫೬೭೮೯ Kannada
Malayalam ൦൧൨൩൪൫൬൭൮൯ Malayalam
Tibetan ༠༡༢༣༤༥༦༧༨༩ Tibetan
Lepcha ᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉ Lepcha
Limbu ᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏ Limbu
Saurashtra ꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙ Saurashtra
Sharada ?????????? Sharada
Takri ?????????? Takri
Chakma ?????????? Chakma
Meitei-Mayek ꯰꯱꯲꯳꯴꯵꯶꯷꯸꯹ Meitei-Mayek
Ol Chiki ᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙ Ol Chiki
Sorang-Sompeng ?????????? Sorang-Sompeng
Brahmi ?????????? Brahmi
Thai ๐๑๒๓๔๕๖๗๘๙ Thai
Laotian ໐໑໒໓໔໕໖໗໘໙ Laotian
Burmese ၀၁၂၃၄၅၆၇၈၉ Burmese
Burmese
(Shan)
႐႑႒႓႔႕႖႗႘႙
Khmer ០១២៣៤៥៦៧៨៩ Khmer
New Tai Lue ᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙ New Tai Lue
Lanna (secular) ᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉ Lanna
Lanna (sacred) ᪐᪑᪒᪓᪔᪕᪖᪗᪘᪙
Kayah Li ꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉ Kayah Li
Cham ꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙ Cham
Balinese ᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙ Balinese
Javanese ꧐꧑꧒꧓꧔꧕꧖꧗꧘꧙ Javanese
Sundanese ᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹ Sundanese
Mongolian ᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙ Mongolian
Osmaniya ?????????? Osmaniya
N'Ko ߀߁߂߃߄߅߆߇߈߉ N'Ko
Vai ꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩ Vai

There are also other blocks that contain symbols derived from the European digits, such as circled numbers.

Letter-based numerals

Many number systems use the common letters of script to represent numbers. Such letters are not considered numerals in Unicode, nor are they double-encoded in most cases. But there are also some number systems whose numerals are based on the letters, but differ from them. The Unicode block Ancient Greek Numerals contains a series of ancient Greek acrophonic numerals for the Greek numerals .

The Roman numerals are a special case. Here the numbers from 1 to 12, as well as 50 (L), 100 (C), 500 (D) and 1000 (M) are coded separately in the Unicode block number characters together with the characters for 5000 and 10,000. These are primarily intended for use with the characters of East Asian fonts in Unicode , as they are not shown rotated by 90 ° in the column layout like normal letters. In other cases, however, Roman numerals should be composed of the common Latin letters.

Chinese numerals

The characters for the Chinese number font are coded together with the other CJK characters in the Unicode block Unified CJK Ideograms . Circled forms are also coded as for the European decimal digits. The older stick digits also have their own block with the Unicode block counting stick digits .

More numerals

Other numerals are usually coded together with the letters of a font in the same block. Other blocks that are specifically dedicated to numerals are the Aegean numerals , cuneiform numerals and punctuation , Coptic numerals and Sinhala numerals .

Fractions

In addition to characters for whole numbers, Unicode also contains a number of fractions from various number fonts. For the European numbers, these are mainly in the Unicode block number characters . North Indian fractions are in the Unicode block General Indian numerals , ancient Greek with the other ancient Greek numerals. Here, too, there are a number of other numerals that are in a block with the letters of a font.

swell

  • Julie D. Allen et al .: The Unicode Standard. Version 6.2 - Core Specification. The Unicode Consortium, Mountain View, CA, 2012. ISBN 978-1-936213-07-8 . Chapter 15.3: Numerals. ( online , PDF)

Individual evidence

  1. DerivedNumericValues.txt , Unicode 7.0