Mathematical characters in Unicode
Unicode provides a wide range of letters and symbols specifically for use in mathematical formulas . The symbols allow a linear set of formulas , in which superscripts and subscripts for powers and indices or also as integral limits are only possible to a limited extent, and the construction of multi-line structures such as matrices is not possible at all. For such exact positioning, higher-level protocols must be used which, in addition to the pure formula text, also contain instructions on its exact formatting.
Characters that are used exclusively or mainly in formulas are identified as such by the
Math- property . Mathematical symbols can also be recognized by their general category, this is
In contrast to most of the other characters, the coding was based on optical principles. With U + 2264 (≤), U + 2266 (≦) and U + 2A7D (⩽) there are three different characters for “less than or equal to” that differ only minimally. In addition to the usual characters, Latin letters are also encoded in 13 different fonts as separate characters. Conversely, characters that have the same appearance but are used for different purposes are often only encoded once: For example, U + 2206 (∆) is used as the Laplace operator for the symmetrical difference , in difference calculus and in physical or chemical formulas as a size change.
The mathematical characters in Unicode come partly from existing standards, such as ISO 9573-13 , on the other hand characters were also included that were used in mathematical or physical publications.
In addition to the usual Latin letters, letters with special fonts are also available.
|normal||U + 0041-U + 005A, U + 0061-U + 007A||Basic Latin|
|fat||U + 1D400-U + 1D433||Math alphanumeric symbols|
|italic||U + 1D434 – U + 1D467 *|
|bold, italic||U + 1D468-U + 1D49B|
|calligraphic||U + 1D49C – U + 1D4CF *|
|calligraphic, bold||U + 1D4D0-U + 1D503|
|fracture||U + 1D504 – U + 1D537 *|
|with double line||U + 1D538 – U + 1D56B *|
|Fracture, bold||U + 1D56C-U + 1D59F|
|Sans serif||U + 1D5A0-U + 1D5D3|
|Sans serif, bold||U + 1D5D4-U + 1D607|
|Sans serif, italic||U + 1D608-U + 1D63B|
|Sans serif, bold, italic||U + 1D63C-U + 1D66F|
|Monospace||U + 1D670-U + 1D6A3|
In connection with accents, the letters i and j are also used in a variant without a point; these are encoded in the Unicode block mathematical alphanumeric symbols at the code points U + 1D6A4 and U + 1D6A5 as italic letters.
Small subscript letters for indices are found in most of the Unicode block superscripts and subscripts , superscripts for powers only n in Unicode blocks superscripts and subscripts. The use of these characters is not recommended in favor of appropriate formatting in a mathematical context.
In addition to the usual Greek letters, letters with a few selected existing script markings are also available. Some characters, such as the little phi , are coded in two different representations. There are also some symbols derived from Greek letters, such as the Nabla .
|normal||U + 0391 – U + 03E1 **||Greek and Coptic|
|fat||U + 1D6A8-U + 1D6E1, U + 1D7CA, U + 1D7CB||Math alphanumeric symbols|
|italic||U + 1D6E2-U + 1D71B|
|bold, italic||U + 1D71C-U + 1D755|
|Sans serif, bold||U + 1D756-U + 1D78F|
|Sans serif, bold, italic||U + 1D790-U + 1D7C9|
|with double line||only a few characters||Letter-like symbols|
In the unicode block Arabic mathematical alphanumeric symbols , Arabic letters are coded for use in formulas, some Hebrew letters used in mathematics are coded in the unicode block letter-like symbols as characters that, unlike in Hebrew, do not influence the direction of writing from left to right. In individual cases, letters from other alphabets can also appear in mathematical formulas, such as the Cyrillic Ш (U + 0428) for the Tate-Shafarevich group .
Often letters in mathematical formulas are accented and other diacritical marks , such as circumflex or macron . Above all, in physics, superimposed points are used for time derivatives. There are also other accents that are only used in formulas, such as the arrow above to identify vectors. In addition to the unicode block combining diacritical marks , the unicode block combining diacritical marks is used for symbols .
The usual digits are also available in multiple coding for different fonts.
|normal||U + 0030 - U + 0039||Basic Latin|
|fat||U + 1D7CE-U + 1D7D7||Math alphanumeric symbols|
|with double line||U + 1D7D8-U + 1D7E1|
|Sans serif||U + 1D7E2-U + 1D7EB|
|Sans serif, bold||U + 1D7EC-U + 1D7F5|
|Monospace||U + 1D7F6-U + 1D7FF|
Small superscript and subscript numbers for powers and indices can be found in the Unicode block superscript and subscript characters (1, 2, 3 in Unicode block Latin-1, supplement ). The use of these characters is not recommended in favor of appropriate formatting in a mathematical context (but common: km²).
Some fractions are encoded in the Unicode block numerals and Unicode block Latin-1, supplement , other fractions can be generated with the fraction bar from the Unicode block General Punctuation . It is provided that the preceding and following numbers are determined in the display and are each formatted as a whole as a numerator and denominator. There is no direct support for the representation of fractions whose numerators or denominators are not ordinary numbers, but letters for variables.
Operators, relation signs
Four blocks are provided for operators , relational symbols, and other math symbols: Math operators , Various math symbols-A , Various math symbols-B, and Additional math operators . Some elementary symbols are in the Basic Latin Unicode block . If a negated relation is not specially coded, it can be created with combining symbols .
Some other blocks also contain symbols that can appear in formulas. These include the Unicode block Various technical characters , in which, among other things, characters are defined from which large brackets can be put together from several pieces. The Geometric Shapes Unicode block contains various triangles, squares, circles, and other shapes for general use. In addition to spaces of different widths, the Unicode block General Punctuation also contains some invisible characters that can semantically structure formulas: An implicit multiplication can be expressed using the character U + 2062.
For some symbols, display variants are possible using variant selectors. The negation line in U + 2268 (≨) is normally inclined, the combination <U + 2268, U + FE00>, on the other hand, should be displayed with a vertical line.
TeX and LaTeX are older than Unicode, so traditionally use adapted fonts to represent formulas. With unicode-math there is a LaTeX package that on the one hand allows most of the Unicode characters for mathematical symbols in the input instead of the usual commands, and on the other it also uses these in the output. Other systems for formatting formulas, such as MathML , use all Unicode characters directly and thus benefit from the large number of Unicode characters for formulas.
In some programming languages , with the help of preprocessors and similar methods, it is possible to a certain extent to use these Unicode characters in the program code and thus make the formulas more readable.
- Julie D. Allen et al .: The Unicode Standard. Version 6.2 - Core Specification. The Unicode Consortium, Mountain View, CA, 2012. ISBN 978-1-936213-07-8 . Chapter 15: Symbols. ( online , PDF)
- Barbara Beeton, Asmus Freytag and Murray Sargent III: Unicode Technical Report # 25: Unicode Support for Mathematics. ( online , PDF)
- Will Robertson, Philipp Stephani and Khaled Hosny: Experimental Unicode mathematical typesetting: The unicode-math package. Version of July 28, 2012. ( online , PDF)
- Murray Sargent III: Unicode Nearly Plain-Text Encoding of Mathematics. Unicode Technical Note # 28, version of March 10, 2010. ( online , PDF)