Punctuation marks in Unicode

from Wikipedia, the free encyclopedia

Punctuation marks are encoded in Unicode for punctuation mostly for use with all writing systems in Unicode . Unlike letters and other characters, punctuation marks are coded according to their appearance, not their function. The ordinary point has a wide variety of functions: It marks the end of a sentence, abbreviations, ordinal numbers, is used in English as a decimal point, in German as a separator for thousands. The semicolon is used as a question mark in Greek. Depending on the context, a punctuation mark can also be represented differently. In most languages, for example, the point is represented as a circle, whereas in Armenian it should be square. Only in a few cases are special punctuation marks coded for certain fonts; these are then usually in the same block as the characters in the font.

Blocks with punctuation marks

The most important punctuation marks are in the two blocks Basic Latin and Latin-1, a supplement that was taken from the ASCII and Latin-1 standard. There are also a number of blocks that only contain characters for punctuation: The Unicode block General Punctuation contains punctuation marks for all writing systems, the Unicode block Additional Punctuation some rare and historical punctuation marks. The CJK Symbols and Punctuation Unicode block contains punctuation marks that are used in conjunction with the East Asian scripts in Unicode . Additional punctuation marks for these fonts, which have been coded for compatibility with other standards, are in the blocks Vertical Shapes , CJK Compatibility Shapes and Small Shape Variants .

Coded characters

Unicode divides the punctuation marks into several classes according to their general category .

Horizontal strokes

While only one horizontal bar was defined in the ASCII character set, Unicode encodes a large number of such bars with different widths and different behavior in the Unicode line break algorithm . Depending on the length, a distinction is made between quarter-square , half-square , square and double- square .

Paired punctuation marks

Some punctuation marks usually appear in pairs, the brackets and - depending on the language - the quotation marks . Most of the brackets have the special feature that they adapt in appearance to the direction of writing, i.e. when using the Unicode-Bidi algorithm, they are mirrored in counter-clockwise text compared to the usual display.

swell

  • Julie D. Allen et al .: The Unicode Standard. Version 6.2 - Core Specification. The Unicode Consortium, Mountain View, CA, 2012. ISBN 978-1-936213-07-8 . Chapter 6.2: General Punctuation. ( online , PDF)