Teletext character sets (ETSI EN 300 706)
The following tables describe the 7-bit character sets defined in ETSI EN 300 706 of the teletext standard used in Europe .
General
The first 32 positions (00 hex to 1F hex ) of the character sets are not defined. However, these character codes are defined as control characters in the simple level 1 teletext page.
The character 24 hex represents the general currency symbol (¤) in the Latin G0 standard primary character set and the dollar sign ($) in the other G0 primary character sets .
The character 2A hex in the G0 primary character sets represents the asterisk (*) or the at sign (@) depending on the control .
The filled rectangle at position 7F hex in the G0 primary character sets and in some G2 supplementary character sets is as large as the maximum extension of all letters without descenders . It has no fixed Unicode assignment and is encoded in DOS character sets like the FE hex (■) character , which is also used in many software-based decoders. The exact layout of the Unicode character depends heavily on the font, but at least in the “Courier” font family, the filled square ( ■ ) with the Unicode number 25A0 hex largely corresponds to the example layout given in ETSI EN 300 706 . However, in the Arabic G0 primary character set , the rectangle is shown with a slightly shorter length than the Arabic letter Alif maqṣūra (ﻯ) at position 70 hex , which is also not the case with all decoders.
The G2 supplementary character sets and the G3 character set "high-resolution graphics" are supported from teletext presentation level 1.5. With many Level 1.5 decoders, the character set of these character sets is still limited.
Legend
A. | Γ | Basic alphabet letter ( Latin / non-Latin script) |
ß | ά | Special letter or addition |
` | ΄ | Diacritical mark (single) |
O | Diacritical mark (combining) | |
2 | ٢ | Digit of the number system |
½ | Numeral | |
@ | ₪ | Punctuation marks or special characters |
O | Combining special character | |
▌ | ◣ | Graphic or frame element ( defined / not defined in Unicode ) |
␠ | RLM | Spaces or control characters |
Undefined character | ||
| ¦ | Characters with layout variations (often due to the low resolution or historical reasons) | |
41 | 41 | See notes on the table (unique / different codings ) |
Α A | ﺏ ﺐ | Context-dependent meaning (identical layout / suitable form ) |
У (Y) | ﺁ (ﺂ) | Context-dependent meaning (different layout / missing form) |
Ë | $ | Different codings ( depending on the control or the decoder) |
With the Unicode numbers, the official Unicode name is given as an (invalid) web link so that it can be displayed as a reference text - unfortunately the wiki syntax does not provide a better way of doing this . For characters without a Unicode assignment ("N / A"), a descriptive name is used here, which is based on the names of similar Unicode characters.
Latin
The Latin G0 ("Standard" variant) and G2 character sets are essentially identical to the 8-bit character set ISO 6937-2: 1983 / Add 1: 1989 (ISO-IR-142) supplemented by the two characters A6 hex (#) and A8 hex (¤) from the equivalent 8-bit character set ITU T.61 (see also the current version of ISO 6937: 2001 ), whereby the G2 supplementary character set corresponds to the characters A0 hex to FF hex .
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
! 0021 |
" " 0022 |
# ⋕ 0023 |
¤ 00A4 |
% 0025 |
& 0026 |
' ' 0027 |
( 0028 |
) 0029 |
* ∗ | @ |
+ 002B |
, 002C |
- 002D |
. 002E |
/ 002F |
3_ |
0 0030 |
1 0031 |
2 0032 |
3 0033 |
4th 0034 |
5 0035 |
6th 0036 |
7th 0037 |
8th 0038 |
9 0039 |
: 003A |
; 003B |
< 003C |
= 003D |
> 003E |
? 003F |
4_ |
@ 0040 |
A. 0041 |
B. 0042 |
C. 0043 |
D. 0044 |
E. 0045 |
F. 0046 |
G 0047 |
H 0048 |
I. 0049 |
J 004A |
K 004B |
L. 004C |
M. 004D |
N 004E |
O 004F |
5_ |
P 0050 |
Q 0051 |
R. 0052 |
S. 0053 |
T 0054 |
U 0055 |
V 0056 |
W. 0057 |
X 0058 |
Y 0059 |
Z 005A |
[ 005B |
\ 005C |
] 005D |
^ 005E |
_ 005F |
6_ |
` ‵ 0060 |
a 0061 |
b 0062 |
c 0063 |
d 0064 |
e 0065 |
f 0066 |
G 0067 |
H 0068 |
i 0069 |
j 006A |
k 006B |
l 006C |
m 006D |
n 006E |
O 006F |
7_ |
p 0070 |
q 0071 |
r 0072 |
s 0073 |
t 0074 |
u 0075 |
v 0076 |
w 0077 |
x 0078 |
y 0079 |
z 007A |
{ 007B |
| ¦ 007C |
} 007D |
~ ~ 007E |
■ 25A0 |
The 7F hex (■) character is coded differently from ISO 6937 .
The double quotation mark (") at position 22 hex is typographically correct in ETSI EN 300 706 in the example layout as a closing quotation mark in English (") with the Unicode number 201D hex . However, the character should still be encoded as a neutral variant according to ISO 6937 , in order to be able to be used visually and semantically better as opening quotation marks in English (“). In addition, the typographically correct variant is also shown at position 3A hex in the Latin G2 supplementary character set with a different example layout rather than closing quotation marks.
The number sign (#) at position 23 hex is shown in ETSI EN 300 706 in the example layout with vertical lines, although this is only a layout variation that is probably due to the low resolution.
The apostrophe (') at position 27 hex is typographically correct in ETSI EN 300 706 in the example layout and could also be closed with the optically more suitable, alternative Unicode characters in English (') with the Unicode number 2019 hex or modifying apostrophe ( ʼ) can be coded with the Unicode number 02BC hex , but both of these would be different from ISO 6937 and would not be optically and semantically suitable if used as opening quotation marks in English ('). In addition, the typographically correct variant is also shown at position 39 hex in the Latin G2 supplementary character set with a different example layout rather than closing quotation marks.
The coding of the character 2A hex depends on the control .
The asterisk (*) at the position 2A hex is in 300706 ETSI EN displayed large in the example layout sechsstrahlig, standing on a beam and centered vertically and could optically more suitable also with the alternative Unicode character asterisk operator ( * ) with the Unicode number 2217 hex coded, which would be different to ISO 6937 .
The center dash (-) at the position 2D hex can according to EBU Tech 3232-a and ITU T.61 also context-dependent than dash - with the Unicode number 2010 () hex or a minus sign - with the Unicode number 2212 () hex encoded. The character can also be used as a dash (-) with the Unicode number 2013 hex . However, for the long dash in English (-) with the Unicode number 2014 hex, it is better to use the horizontal line (-) at position 60 hex in the " English " variant and at position 50 hex in the Latin G2 supplementary character set or two consecutive middle bars become.
The capital letter I in position 49 hex can be used as a capital letter for the small letter i in position 69 hex and as a capital letter for the small letter i without a dot (ı) in position 60 hex or 5F hex in the two variants " Turkish " and " Romanian ”, as well as at position 75 hex in the Latin G2 supplementary character set. The lower case letter i at position 69 hex can be used as a lower case letter for the upper case letter I at position 49 hex and as a lower case letter for the upper case letter I with a dot (İ) at position 40 hex in the "Turkish" variant and for the corresponding combination in Latin G2 supplementary character set can be used. Even in Unicode , no distinction is made between the two optically identical characters.
The circumflex (^) at position 5E hex is shown in ETSI EN 300 706 in the example layout in large and superscript, as is also common in modern printed publications.
The underscore (_) at position 5F hex is not shown connecting left and right in ETSI EN 300 706 in the example layout, but this is unusual in modern publications.
The single diacritical gravis (`) at position 60 hex is shown in ETSI EN 300 706 in the example layout of the size and height as a vertically mirrored counterpart to the typographically correct form of the apostrophe (') at position 27 hex , but still has the straight line shape and inclination of a grave accent. Nevertheless, the character could possibly also be used as an opening single quotation mark in English (‛) with the Unicode number 201B hex , but this would differ from ISO 6937 and would not semantically fit.
The vertical bar (|) at position 7C hex is shown in ETSI EN 300 706 in the example layout with a broken line in the middle (as well as not connecting at the top and bottom) and could also be broken with the optically more appropriate, alternative Unicode character (¦) with the Unicode number 00A6 hex coded, which would be different to ISO 6937 . In addition, it is only a historically determined layout variation.
The tilde (~) at position 7E hex is shown in ETSI EN 300 706 in the example layout in uppercase and superscript and in this form is not defined as an independent character in Unicode . The single diacritical tilde (˜) with the Unicode number 02DC hex fits the altitude, but is too small. According EBU Tech 3232-a and ITU T.101 as an alternative to coding, the Unicode character overline with the Unicode number 203E (~) hex or possibly the lone diacritics macron (¯) and the Unicode number 00AF hex be used, but both would deviate from ISO 6937 and, unlike ITU T.101, usually connect left and right.
The coding of the other characters framed in bold depends on the control and the selected national variant .
Selection bits |
23 | 24 | 40 | 5B | 5C | 5D | 5E | 5F | 60 | 7B | 7C | 7D | 7E | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0_ | 1_ | 2_ | 3_ | 4_ | 6_ | 8th_ | ||||||||||||||
default |
# ⋕ 0023 |
¤ 00A4 |
@ 0040 |
[ 005B |
\ 005C |
] 005D |
^ 005E |
_ 005F |
` ‵ 0060 |
{ 007B |
| ¦ 007C |
} 007D |
~ ~ 007E |
|||||||
Czech / Slovak | 06 | 16 | 46 |
# ⋕ 0023 |
ů 016F |
č 010D |
ť tˇ 0165 |
ž 017E |
ý 00FD |
í 00ED |
ř 0159 |
é 00E9 |
á 00E1 |
ě 011B |
ú 00FA |
š 0161 |
||||
English | 00 | 20th | 80 |
£ 00A3 |
$ 0024 |
@ 0040 |
← 2190 |
½ 00BD |
→ 2192 |
↑ 2191 |
# ⋕ 0023 |
- 2015 |
¼ 00BC |
∥ 2225 |
¾ 00BE |
÷ 00F7 |
||||
Estonian | 42 |
# ⋕ 0023 |
O 00F5 |
Š 0160 |
Ä 00C4 |
Ö 00D6 |
Ž 017D |
Ü 00DC |
O 00D5 |
š 0161 |
Ä 00E4 |
ö 00F6 |
ž 017E |
ü 00FC |
||||||
French | 04 | 14th | 24 | 84 |
é 00E9 |
ï 00EF |
à 00E0 |
ë 00EB |
ê 00EA |
ù 00F9 |
î 00EE |
# ⋕ 0023 |
è 00E8 |
â 00E2 |
O 00F4 |
û 00FB |
ç 00E7 |
|||
German | 01 | 11 | 21st | 41 |
# ⋕ 0023 |
$ 0024 |
§ 00A7 |
Ä 00C4 |
Ö 00D6 |
Ü 00DC |
^ 005E |
_ 005F |
° 00B0 |
Ä 00E4 |
ö 00F6 |
ü 00FC |
ß 00DF |
|||
Italian | 03 | 13 | 23 |
£ 00A3 |
$ 0024 |
é 00E9 |
° 00B0 |
ç 00E7 |
→ 2192 |
↑ 2191 |
# ⋕ 0023 |
ù 00F9 |
à 00E0 |
O 00F2 |
è 00E8 |
ì 00EC |
||||
Latvian / Lithuanian | 43 |
# ⋕ 0023 |
$ 0024 |
Š 0160 |
ė 0117 |
ę 0119 |
Ž 017D |
č 010D |
ū 016B |
š 0161 |
ą 0105 |
ų 0173 |
ž 017E |
į 012F |
||||||
Polish | 10 |
# ⋕ 0023 |
ń 0144 |
ą 0105 |
Ż Ƶ 017B |
Ś 015A |
Ł 0141 |
ć 0107 |
O 00F3 |
ę 0119 |
ż 017C |
ś 015B |
ł 0142 |
ź 017A |
||||||
Portuguese / Spanish | 05 | 25th |
ç 00E7 |
$ 0024 |
¡ 00A1 |
á 00E1 |
é 00E9 |
í 00ED |
O 00F3 |
ú 00FA |
¿ 00BF |
ü 00FC |
ñ 00F1 |
è 00E8 |
à 00E0 |
|||||
Romanian | 37 |
# ⋕ 0023 |
¤ 00A4 |
Ț 021A |
 00C2 |
Ș 0218 |
Ă 0102 |
Î 00CE |
ı 0131 |
ț 021B |
â 00E2 |
ș 0219 |
ă 0103 |
î 00EE |
||||||
Serbian / Croatian / Slovenian | 35 |
# ⋕ 0023 |
Ë 00CB |
Č 010C |
Ć 0106 |
Ž 017D |
Đ 0110 |
Š 0160 |
ë 00EB |
č 010D |
ć 0107 |
ž 017E |
đ 0111 |
š 0161 |
||||||
Swedish / Finnish, Hungarian | 02 | 12 | 22nd |
# ⋕ 0023 |
¤ 00A4 |
É 00C9 |
Ä 00C4 |
Ö 00D6 |
Å 00C5 |
Ü 00DC |
_ 005F |
é 00E9 |
Ä 00E4 |
ö 00F6 |
å 00E5 |
ü 00FC |
||||
Turkish | 26th | 66 |
Tʟ N / A |
G 011F |
İ 0130 |
Ş 015E |
Ö 00D6 |
Ç 00C7 |
Ü 00DC |
G 011E |
ı 0131 |
ş 015F |
ö 00F6 |
ç 00E7 |
ü 00FC |
In the national variants, the Háček (ˇ) and the Breve (˘) for the special letters in ETSI EN 300 706 are shown imprecisely the same. In the languages of the three variants " Czech / Slovak ", " Latvian / Lithuanian " and " Serbian / Croatian / Slovenian " only the Háček is used, while in the languages of the two variants " Romanian " and " Turkish " only the breve is used. The letters in question are coded accordingly in the variants.
In the " Czech / Slovak " variant , the lowercase letter t with Háček (ť) at position 5B hex in ETSI EN 300 706 shows the Háček (ˇ) in normal form, but is often similar in one form to the lowercase t in modern print an apostrophe (ʼ) to the right of the basic character. The coding is identical as it is just a layout variation.
The " English " variant is essentially identical to the 7-bit character set of the British Viewdata standard (ISO-IR-47), only the 5F hex (#) character is coded differently.
The two arrows to the left (←) and right (→) at positions 5B hex and 5D hex are shown in ETSI EN 300 706 in the example layout to match the horizontal line (-) at position 60 hex and can be used seamlessly at the beginning get connected. In such a combination, the horizontal line should be semantically appropriately encoded as a horizontal line extension ( ⎯ ) with the Unicode number 23AF hex , although the Unicode character is currently only supported by very few fonts (correctly).
The double cross (#) at the position 5F hex is in ETSI EN 300 706 represented the same as the number sign at position 23 hex in the variant " standard " and, accordingly, identical coded. In the Viewdata standard, the character is coded as a viewdata square ( ⌗ ) with the Unicode number 2317 hex , which is visually similar, but correctly represented but looks different (see ISO-IR-47) and has a different semantic meaning as a terminator for addresses which is not given in teletext.
The horizontal line (-) at position 60 hex can also be used as a long dash in English (-) with the Unicode number 2014 hex and is shown in ETSI EN 300 706 in the example layout connecting left and right.
The vertical double line at position 7C hex is coded as a parallel character (∥) in accordance with EBU Tech 3232-a and is not shown as a connecting element in ETSI EN 300 706 in the example layout above and below. According to the character designation in the Viewdata standard, the optically identical Unicode character double vertical line (‖) with the Unicode number 2016 hex can also be used for coding . However, according to RFC 1345 , this character is also coded there as a parallel character. But regardless of the primary encoding, the character can be used equally as a parallel character and as a double vertical line.
The " German " variant is essentially identical to the German 7-bit character set DIN 66003 (ISO-IR-21), only the 60 hex (°) character is coded differently.
In the " Latvian / Lithuanian " variant , the two lower-case letters e with Ogonek (ę) and i with Ogonek (į) at positions 5C hex and 7E hex in ETSI EN 300 706 are probably incorrectly displayed with cedilla (¸), as these are in Latvian or Lithuanian can never be used with cedilla, but with Ogonek (˛). An alternative coding is not necessary, since the wrongly represented letters do not even occur in Europe and should therefore never be used.
In the " Polish " variant , the capital letter Z with an overlap (Ż) at position 5B hex in ETSI EN 300 706 is shown as Z with a slash (Ƶ), but is usually not coded that way because it is only a layout variation acts. In addition, the associated lower case letter at position 7B hex is also shown in ETSI EN 300 706 as z with a point (ż).
In the " Romanian " variant , the two letters T with sub-comma (Ț / ț) and S with sub-comma (Ș / ș) are in positions 40 hex / 60 hex and 5C hex / 7C hex according to the Romanian standardization authority with sub- comma (̦) coded (see also ISO 8859-16 ). However, until the beginning of the 1990s, these were only regarded as layout variations of the letters T with cedilla (Ţ / ţ) and S with cedilla (Ş / ş) in international standards , and ISO 6937 only contains the special letters with cedilla (¸) .
In the variant " Serbian / Croatian / Slovenian " the character 24 hex instead of the capital letter E with trema (Ë) represents the dollar sign ($) with the Unicode number 0024 hex or the common fraction a half (½) with the Unicode number 00BD on some decoders hex .
The variant " Swedish / Finnish, Hungarian " is identical to the Swedish 7-bit character set SEN 850200 Annex C (ISO-IR-11).
In the " Turkish " variant , the symbol for the Turkish currency ( Tʟ ) at position 23 hex can only be found in this form in teletext and is otherwise displayed as normal with the two single capital letters TL. There are in Unicode but different currency symbols that can be used for the Turkish currency: the Turkish Lirazeichen (₺) with the Unicode number 20ba hex that Lirazeichen (₤) with the Unicode number 20A4 hex and the pound sign (£) and the Unicode number 00A3 hex .
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
¡ 00A1 |
¢ 00A2 |
£ 00A3 |
$ 0024 |
¥ 00A5 |
# ⋕ 0023 |
§ 00A7 |
¤ 00A4 |
' 2018 |
" 201C |
« 00AB |
← 2190 |
↑ 2191 |
→ 2192 |
↓ 2193 |
3_ |
° 00B0 |
± 00B1 |
² 00B2 |
³ 00B3 |
× 00D7 |
µ 00B5 |
¶ 00B6 |
· 00B7 |
÷ 00F7 |
' 2019 |
” 201D |
» 00BB |
¼ 00BC |
½ 00BD |
¾ 00BE |
¿ 00BF |
4_ |
|
` 0060 |
´ 00B4 |
ˆ 02C6 |
˜ 02DC |
¯ ˉ 00AF |
˘ 02D8 |
˙ 02D9 |
¨ 00A8 |
̣ N / A |
˚ 02DA |
¸ (̦) |
_ 005F |
˝ 02DD |
˛ 02DB |
ˇ 02C7 |
Comb. |
|
O 0300 |
ó (ģ) |
O 0302 |
O 0303 |
O 0304 |
O 0306 |
ȯ 0307 |
ö 0308 |
O 0323 |
å 030A |
ç (o̦) |
O 0332 |
O 030B |
ǫ 0328 |
ǒ 030C |
5_ |
- 2015 |
¹ 00B9 |
® 00AE |
© 00A9 |
™ 2122 |
♪ 266A |
₠ 20A0 |
‰ 2030 |
∝ 221D |
|
|
|
⅛ 215B |
⅜ 215C |
⅝ 215D |
⅞ 215E |
6_ |
Ω 2126 |
Æ 00C6 |
Đ Ð |
ª 00AA |
H 0126 |
|
IJ 0132 |
Ŀ 013F |
Ł 0141 |
O 00D8 |
Œ 0152 |
º 00BA |
Þ 00DE |
Ŧ 0166 |
Ŋ 014A |
ʼn 0149 |
7_ |
ĸ 0138 |
æ 00E6 |
đ 0111 |
ð 00F0 |
H 0127 |
ı 0131 |
ij 0133 |
ŀ 0140 |
ł 0142 |
O 00F8 |
œ 0153 |
ß 00DF |
þ 00FE |
ŧ 0167 |
ŋ 014B |
■ 25A0 |
The six characters 20 hex (space), 49 hex (̣), 56 hex (₠), 57 hex (‰), 58 hex (∝) and 7F hex (■) are coded differently from ISO 6937 and ITU T.61 .
The space at position 20 hex can also be coded as a protected space with the Unicode number 00A0 hex in accordance with ISO 6937 . However, the line break behavior in teletext is irrelevant.
The two arrows to the left (←) and right (→) at positions 2C hex and 2E hex are shown in ETSI EN 300 706 in the example layout to match the horizontal line (-) at position 50 hex and can be used seamlessly at the beginning get connected. In such a combination, the horizontal line should be semantically appropriately encoded as a horizontal line extension ( ⎯ ) with the Unicode number 23AF hex , although the Unicode character is currently only supported by very few fonts (correctly).
The single diacritical grave accent (`) at position 41 hex is shown in the Latin G0 standard primary character set in ETSI EN 300 706 with a different example layout and can also be used with the alternative Unicode character modifying grave accent (ˋ) with the Unicode number 02CB hex coded. However, these two characters are optically identical in modern printed matter. The single diacritical acute accent (´) at position 42 hex with the alternative Unicode character modifying acute accent (eventuell) with the Unicode number 02CA hex could be coded accordingly, but this would be different from ISO 6937 .
Since the single diacritical characters circumflex (ˆ) at position 43 hex and tilde (˜) at position 44 hex in the Latin G0 standard primary character set in ETSI EN 300 706 are shown with a different example layout, a more suitable, alternative coding is used as used in ISO 6937 (see Windows-1252 ).
The layout of the single diacritical Unicode character macron (¯) at position 45 hex is also heavily dependent on the font and is often more like the overline (‾), so the optically more suitable, alternative Unicode character modifying macron ( ˉ) with the Unicode number 02C9 hex can be used, but this would be different from ISO 6937 .
The diacritical mark in the form of a horizontal colon (¨) at position 48 hex can be used as a trema and umlaut points according to EBU Tech 3232-a and ITU T.61 . Even in Unicode , no distinction is made between these two optically identical characters. If a semantic differentiation is necessary, the diacritical symbol Trema can be coded with the Unicode string combining grapheme connector with the Unicode number 034F hex and combining Trema (¨) with the Unicode number 0308 hex , while the diacritical symbol umlaut dots can be coded quite normally with the Unicode- Character combining Trema (¨) is encoded with the Unicode number 0308 hex or the Unicode characters combined with Trema. You shouldn't be confused by the names of the Unicode characters.
Historically, the diacritical cedilla (an) at position 4B hex can also be used as a sub- comma (̦).
The combining underlining (_) and the associated underlining at position 4C hex are not shown in ETSI EN 300 706 in the example layout on the left and right and should be better implemented using the " Underline " font . Correspondingly, the underscore at position 5F hex in the Latin G0 primary character set should also be coded as a protected space in the font “underline” in order to avoid a double line and to achieve uniform lines. But at least in the “Courier” font family, the underline is optically compatible with the “Underline” font.
The horizontal line (-) at position 50 hex can also be used as a long dash in English (-) with the Unicode number 2014 hex and is shown in ETSI EN 300 706 in the example layout connecting left and right.
The proportional symbol (∝) at position 58 hex is probably incorrectly referred to as alpha in EBU Tech 3232-a , but should not be confused with the Greek lowercase alpha (α), as both characters are shown in ETSI EN 300 706 with a different example layout .
According to EBU Tech 3232-a and ISO 6937, the character 62 hex can be used as a capital letter D with a slash (Đ) for the lower case letter of the same name (đ) at position 72 hex and as an Icelandic capital letter Eth (Ð) for the lower case letter of the same name (ð) position 73 hex can be used. In case of doubt, the first Unicode number according to ISO 6937 should be selected.
The character for the indefinite article in Afrikaans (ʼn) at position 6F hex is only available in lower case and is usually always lower case . In capitals , the character is displayed normally with the capital letter N at position 4E hex with a preceding modifying apostrophe (ʼ) in position 27 hex in the Latin G0 primary character set. The capitalized form is not defined as a separate character in Unicode either.
The previously used Greenlandic letter Kra (an) at position 70 hex is only available as a lowercase letter. The associated capital letter is represented with the capital letter K at position 4B hex with a subsequent modifying apostrophe (ʼ) at position 27 hex in the Latin G0 primary character set and is not defined as a separate character in Unicode either.
The capital letter I at position 49 hex in the Latin G0 primary character set is used as the uppercase letter for the Turkish lowercase letter i without a period (ı) at position 75 hex . This is also provided for in Unicode (see also note on the Latin G0 primary character set ).
The German letter Eszett (ß) at position 7B hex is only available as a lowercase letter. The capitalization is usually carried out with two consecutive capital letters S at the position 53 hex in the Latin G0 primary character set and is not defined in this form as a separate character in Unicode . It was not until 2008 that the Eszett in capital letter form (ẞ) was added as a new character in Unicode and has been part of the official German spelling since 2017 .
The alternative coding of the characters in the "Combining" line is used depending on the control . The supported combinations depend on the decoder. If in doubt, you should limit yourself to the combinations specified in ISO 6937 . Accordingly, to represent the lowercase letter g with cedilla (ģ), the lowercase letter g is combined with the acute (´) at position 42 hex , unlike in Unicode . With the two Cyrillic and Greek G2 supplementary character sets, the combining characters should only be used in conjunction with the Latin G0 primary character set.
Cyrillic
The Cyrillic G0 primary character sets are for the most part identical to the 7-bit character set GOST 13052 (adopted in ISO-IR-111 ), whereby the uppercase and lowercase letters are swapped and thus arranged as in the other character sets.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
! 0021 |
" " 0022 |
# ⋕ 0023 |
$ 0024 |
% 0025 |
& 0026 |
' ' 0027 |
( 0028 |
) 0029 |
* ∗ | @ |
+ 002B |
, 002C |
- 002D |
. 002E |
/ 002F |
3_ |
0 0030 |
1 0031 |
2 0032 |
3 0033 |
4th 0034 |
5 0035 |
6th 0036 |
7th 0037 |
8th 0038 |
9 0039 |
: 003A |
; 003B |
< 003C |
= 003D |
> 003E |
? 003F |
4_ |
Ч 0427 |
А A |
Б 0411 |
Ц 0426 |
Д 0414 |
Е 0415 |
Ф 0424 |
Г 0413 |
Х X |
И 0418 |
Ј 0408 |
К 041A |
Л 041B |
М M |
Н H |
О O |
5_ |
П 041F |
Ќ 040C |
Р P |
С C |
Т T |
У (Y) |
В B |
Ѓ 0403 |
Љ 0409 |
Њ 040A |
З 0417 |
Ћ 040B |
Ж 0416 |
Ђ 0402 |
Ш 0428 |
Џ 040F |
6_ |
ч 0447 |
а a |
б 0431 |
ц 0446 |
д 0434 |
е 0435 |
ф 0444 |
г 0433 |
х x |
и 0438 |
ј 0458 |
к 043A |
л 043B |
м (m) |
н (h) |
о o |
7_ |
п 043F |
ќ 045C |
р p |
с c |
т (t) |
у y |
в (b) |
ѓ 0453 |
љ 0459 |
њ 045A |
з 0437 |
ћ 045B |
ж 0436 |
ђ 0452 |
ш 0448 |
■ 25A0 |
The two characters 24 hex ($), 7F hex (■) and twelve Cyrillic letter pairs are coded differently to GOST 13052 and are arranged as closely as possible to the Latin G0 variant "Serbian / Croatian / Slovenian" (see Cyrillic alphabet, Serbian, Serbo-Croatian and Montenegrin ), whereby the Cyrillic letter Dže (Џ) in position 5F hex is only present as a capital letter.
Instead of the dollar sign ($) in some decoders, the 24 hex character represents the Cyrillic capital letter Jo (Ё) with the Unicode number 0401 hex or the Latin capital letter E with Trema (Ë) with the Unicode number 00CB hex .
The coding of the character 2A hex depends on the control .
The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Cyrillic G2 supplementary character set .
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
! 0021 |
" " 0022 |
# ⋕ 0023 |
$ 0024 |
% 0025 |
ы 044B |
' ' 0027 |
( 0028 |
) 0029 |
* ∗ | @ |
+ 002B |
, 002C |
- 002D |
. 002E |
/ 002F |
3_ |
0 0030 |
1 0031 |
2 0032 |
3 0033 |
4th 0034 |
5 0035 |
6th 0036 |
7th 0037 |
8th 0038 |
9 0039 |
: 003A |
; 003B |
< 003C |
= 003D |
> 003E |
? 003F |
4_ |
Ю 042E |
А A |
Б 0411 |
Ц 0426 |
Д 0414 |
Е 0415 |
Ф 0424 |
Г 0413 |
Х X |
И 0418 |
Й (Ѝ) |
К 041A |
Л 041B |
М M |
Н H |
О O |
5_ |
П 041F |
Я 042F |
Р P |
С C |
Т T |
У (Y) |
Ж 0416 |
В B |
Ь 042C |
Ъ 042A |
З 0417 |
Ш 0428 |
Э 042D |
Щ 0429 |
Ч 0427 |
Ы 042B |
6_ |
ю 044E |
а a |
б 0431 |
ц 0446 |
д 0434 |
е 0435 |
ф 0444 |
г 0433 |
х x |
и 0438 |
й (ѝ) |
к 043A |
л 043B |
м (m) |
н (h) |
о o |
7_ |
п 043F |
я 044F |
р p |
с c |
т (t) |
у y |
ж 0436 |
в (b) |
ь 044C |
ъ 044A |
з 0437 |
ш 0448 |
э 044D |
щ 0449 |
ч 0447 |
■ 25A0 |
The three characters 24 hex ($), 26 hex (ы) and 7F hex (■) are coded differently to GOST 13052, as well as the two Cyrillic letter pairs at positions 59 hex / 79 hex (Ъ / ъ) and 5F hex / 26 hex (Ы / ы) swapped according to the Bulgarian variant.
The coding of the character 2A hex depends on the control .
With the Cyrillic letters short I (Й / й) at positions 4A hex and 6A hex , in ETSI EN 300 706 the breve (˘) is probably incorrectly displayed like the superscript (˙). But maybe this was done in order to be able to use it better as a Cyrillic letter I with grave accent (Ѝ / ѝ).
The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Cyrillic G2 supplementary character set .
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
! 0021 |
" " 0022 |
# ⋕ 0023 |
$ 0024 |
% 0025 |
ї 0457 |
' ' 0027 |
( 0028 |
) 0029 |
* ∗ | @ |
+ 002B |
, 002C |
- 002D |
. 002E |
/ 002F |
3_ |
0 0030 |
1 0031 |
2 0032 |
3 0033 |
4th 0034 |
5 0035 |
6th 0036 |
7th 0037 |
8th 0038 |
9 0039 |
: 003A |
; 003B |
< 003C |
= 003D |
> 003E |
? 003F |
4_ |
Ю 042E |
А A |
Б 0411 |
Ц 0426 |
Д 0414 |
Е 0415 |
Ф 0424 |
Г 0413 |
Х X |
И 0418 |
Й (Ѝ) |
К 041A |
Л 041B |
М M |
Н H |
О O |
5_ |
П 041F |
Я 042F |
Р P |
С C |
Т T |
У (Y) |
Ж 0416 |
В B |
Ь 042C |
І 0406 |
З 0417 |
Ш 0428 |
Є 0404 |
Щ 0429 |
Ч 0427 |
Ї 0407 |
6_ |
ю 044E |
а a |
б 0431 |
ц 0446 |
д 0434 |
е 0435 |
ф 0444 |
г 0433 |
х x |
и 0438 |
й (ѝ) |
к 043A |
л 043B |
м (m) |
н (h) |
о o |
7_ |
п 043F |
я 044F |
р p |
с c |
т (t) |
у y |
ж 0436 |
в (b) |
ь 044C |
і 0456 |
з 0437 |
ш 0448 |
є 0454 |
щ 0449 |
ч 0447 |
■ 25A0 |
The three characters 24 hex ($), 26 hex (ї), 7F hex (■) and three Cyrillic letter pairs are coded differently from GOST 13052.
The coding of the character 2A hex depends on the control .
With the Cyrillic letters short I (Й / й) at positions 4A hex and 6A hex , in ETSI EN 300 706 the breve (˘) is probably incorrectly displayed like the superscript (˙). But maybe this was done in order to be able to use it better as a Cyrillic letter I with grave accent (Ѝ / ѝ).
The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Cyrillic G2 supplementary character set .
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
¡ 00A1 |
¢ 00A2 |
£ 00A3 |
$ 0024 |
¥ 00A5 |
|
§ 00A7 |
|
' 2018 |
" 201C |
« 00AB |
← 2190 |
↑ 2191 |
→ 2192 |
↓ 2193 |
3_ |
° 00B0 |
± 00B1 |
² 00B2 |
³ 00B3 |
× 00D7 |
µ 00B5 |
¶ 00B6 |
· 00B7 |
÷ 00F7 |
' 2019 |
” 201D |
» 00BB |
¼ 00BC |
½ 00BD |
¾ 00BE |
¿ 00BF |
4_ |
|
` 0060 |
´ 00B4 |
ˆ 02C6 |
˜ 02DC |
¯ ˉ 00AF |
˘ 02D8 |
˙ 02D9 |
¨ 00A8 |
̣ N / A |
˚ 02DA |
¸ (̦) |
_ 005F |
˝ 02DD |
˛ 02DB |
ˇ 02C7 |
Comb. |
|
O 0300 |
ó (ģ) |
O 0302 |
O 0303 |
O 0304 |
O 0306 |
ȯ 0307 |
ö 0308 |
O 0323 |
å 030A |
ç (o̦) |
O 0332 |
O 030B |
ǫ 0328 |
ǒ 030C |
5_ |
- 2015 |
¹ 00B9 |
® 00AE |
© 00A9 |
™ 2122 |
♪ 266A |
₠ 20A0 |
‰ 2030 |
∝ 221D |
Ł 0141 |
ł 0142 |
ß 00DF |
⅛ 215B |
⅜ 215C |
⅝ 215D |
⅞ 215E |
6_ |
D. 0044 |
E. 0045 |
F. 0046 |
G 0047 |
I І |
J Ј |
K 004B |
L. 004C |
N 004E |
Q 0051 |
R. 0052 |
S Ѕ |
U 0055 |
V 0056 |
W. 0057 |
Z 005A |
7_ |
d 0064 |
e 0065 |
f 0066 |
G 0067 |
i і |
j ј |
k 006B |
l 006C |
n 006E |
q 0071 |
r 0072 |
s ѕ |
u 0075 |
v 0076 |
w 0077 |
z 007A |
The characters 20 hex to 5F hex are essentially identical to the Latin G2 supplementary character set without the two additional characters from ITU T.61 . The three characters 59 hex to 5B hex are coded with special Latin letters.
The characters 60 hex to 7F hex are coded with Latin letters which, together with similar looking letters in the Cyrillic G0 primary character sets, each represent the complete Latin alphabet.
The alternative coding of the bold framed characters can be used to supplement the coded Cyrillic alphabet, whereby the two Cyrillic letters Belarusian-Ukrainian I (І / і) and Serbian Je (Ј / ј) at positions 64 hex / 74 hex and 65 hex / 75 hex already exist in the Cyrillic G0 variant 3 "Ukrainian" or 1 "Serbian / Croatian" .
The alternative coding of the characters in the "Combining" line is used depending on the control . As with the Latin G2 supplementary character set, the combining characters should only be used in conjunction with the Latin G0 primary character set.
Greek
The Greek G0 primary character set is essentially identical to the characters 20 hex to 3F hex and C0 hex to FE hex of the 8-bit character set ELOT 928 (identical to ISO 8859-7 ).
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
! 0021 |
" " 0022 |
# ⋕ 0023 |
$ 0024 |
% 0025 |
& 0026 |
' ' 0027 |
( 0028 |
) 0029 |
* ∗ | @ |
+ 002B |
, 002C |
- 002D |
. 002E |
/ 002F |
3_ |
0 0030 |
1 0031 |
2 0032 |
3 0033 |
4th 0034 |
5 0035 |
6th 0036 |
7th 0037 |
8th 0038 |
9 0039 |
: 003A |
; 003B |
« 00AB |
= 003D |
» 00BB |
? 003F |
4_ |
ΐ 0390 |
Α A |
Β B |
Γ 0393 |
Δ 0394 |
Ε E |
Ζ 0396 |
Η H |
Θ 0398 |
Ι I |
Κ K |
Λ 039B |
Μ M |
Ν N |
Ξ 039E |
Ο O |
5_ |
Π 03A0 |
Ρ P |
΄ 0384 |
Σ 03A3 |
Τ T |
Υ 03A5 |
Φ 03A6 |
Χ X |
Ψ 03A8 |
Ω 03A9 |
Ϊ 03AA |
Ϋ 03AB |
ά 03AC |
έ 03AD |
ή 03AE |
ί 03AF |
6_ |
ΰ 03B0 |
α 03B1 |
β 03B2 |
γ 03B3 |
δ 03B4 |
ε 03B5 |
ζ 03B6 |
η 03B7 |
θ 03B8 |
ι 03B9 |
κ 03BA |
λ 03BB |
μ 03BC |
ν 03BD |
ξ 03BE |
ο o |
7_ |
π 03C0 |
ρ 03C1 |
ς 03C2 |
σ 03C3 |
τ 03C4 |
υ 03C5 |
φ 03C6 |
χ 03C7 |
ψ 03C8 |
ω 03C9 |
ϊ 03CA |
ϋ 03CB |
ό 03CC |
ύ 03CD |
ώ 03CE |
■ 25A0 |
The four characters 3C hex («), 3E hex (»), 52 hex (΄) and 7F hex (■) are coded differently to ELOT 928.
The coding of the character 2A hex depends on the control .
The single tone (΄) at position 52 hex is shown in ETSI EN 300 706 in the example layout, right-justified, so that it is correctly positioned for a subsequent capital letter. This also results in sufficient space for word separation.
In ETSI EN 300 706, for historical reasons, the tonos (΄) is a single character at position 52 hex and in the Greek lowercase letters with dialysis and tonos (΅) in positions 40 hex and 60 hex vertically ('), as well as in the Greek Lowercase letters with tones in positions 5C hex to 5F hex and 7C hex to 7E hex as shown by the over- point (˙).
The Greek small letter Iota (ι) at position 69 hex , as well as with diacritics (ΐ, ί and ϊ) at positions 40 hex , 5F hex and 7A hex is in ETSI EN 300 706 imprecise like the Latin small letter i with serifs ( ı ) shown.
The variant for the end of the word of the Greek lowercase letter Sigma (ς) at position 72 hex is shown in ETSI EN 300 706 inexactly like the Latin lowercase letter s.
The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Greek G2 supplementary character set .
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
a 0061 |
b 0062 |
£ 00A3 |
e 0065 |
H 0068 |
i 0069 |
§ 00A7 |
: 003A |
' 2018 |
" 201C |
k 006B |
← 2190 |
↑ 2191 |
→ 2192 |
↓ 2193 |
3_ |
° 00B0 |
± 00B1 |
² 00B2 |
³ 00B3 |
× 00D7 |
m 006D |
n 006E |
p 0070 |
÷ 00F7 |
' 2019 |
” 201D |
t 0074 |
¼ 00BC |
½ 00BD |
¾ 00BE |
x 0078 |
4_ |
|
` 0060 |
´ 00B4 |
ˆ 02C6 |
˜ 02DC |
¯ ˉ 00AF |
˘ 02D8 |
˙ 02D9 |
¨ 00A8 |
̣ N / A |
˚ 02DA |
¸ (̦) |
_ 005F |
˝ 02DD |
˛ 02DB |
ˇ 02C7 |
Comb. |
|
O 0300 |
ó (ģ) |
O 0302 |
O 0303 |
O 0304 |
O 0306 |
ȯ 0307 |
ö 0308 |
O 0323 |
å 030A |
ç (o̦) |
O 0332 |
O 030B |
ǫ 0328 |
ǒ 030C |
5_ |
? 003F |
¹ 00B9 |
® 00AE |
© 00A9 |
™ 2122 |
♪ 266A |
₠ 20A0 |
‰ 2030 |
∝ 221D |
Ί 038A |
Ύ 038E |
Ώ 038F |
⅛ 215B |
⅜ 215C |
⅝ 215D |
⅞ 215E |
6_ |
C. 0043 |
D. 0044 |
F. 0046 |
G 0047 |
J 004A |
L. 004C |
Q 0051 |
R. 0052 |
S. 0053 |
U 0055 |
V 0056 |
W. 0057 |
Y 0059 |
Z 005A |
Ά 0386 |
Ή 0389 |
7_ |
c 0063 |
d 0064 |
f 0066 |
G 0067 |
j 006A |
l 006C |
q 0071 |
r 0072 |
s 0073 |
u 0075 |
v 0076 |
w 0077 |
y 0079 |
z 007A |
Έ 0388 |
■ 25A0 |
The characters 20 hex to 5F hex and 7F hex are largely identical to the Latin G2 supplementary character set without the two additional characters from ITU T.61 . The three characters 59 hex to 5B hex are coded with special Greek letters, and a further eleven characters with Latin lower case letters. In addition, the two characters 28 hex and 50 hex are coded differently as a colon (:) and question mark (?), Although these are already included in the Greek G0 primary character set . This may have historical reasons, because these two characters are not available in the 7-bit ISO-IR-27 character set.
The characters 60 hex to 7E hex are coded with Latin letters and special Greek letters. The Latin letters together with similar looking letters in the Greek G0 primary character set form the complete Latin alphabet.
For the Greek capital letters with tonos in positions 59 hex to 5B hex , 6E hex , 6F hex and 7E hex , the tonos (΄) is shown vertically (') in ETSI EN 300 706 for historical reasons.
The alternative coding of the characters in the "Combining" line is used depending on the control . As with the Latin G2 supplementary character set, the combining characters should only be used in conjunction with the Latin G0 primary character set.
Arabic
The Arabic G0 primary character set is largely identical to the 7-bit character set ASMO 449 (adopted in ISO 8859-6 ), whereby the Latin G0 variant "English" is used for the special characters and the Arabic letters are shown with their presentation forms. Five special letters have been moved to the Arabic G2 supplementary character set , which also contains additional letters for Persian.
The Arabic letters with multiple codings and an optional connection to the right are shown in ETSI EN 300 706 on the right without their own connecting line and are accordingly coded primarily as an initial or isolated form of presentation. Deviating from this, the three Arabic letters of the " Ǧīm " family (ﺝ, ﺡ and ﺥ) at positions 4C hex to 4E hex in the Arabic G0 primary character set are more likely to be presented as a medial form of presentation (with a straight baseline), but still primary Coded as the initial form of presentation, as the medial forms of presentation (without a straight base line) are also available at positions 5C hex to 5E hex in the Arabic G0 primary character set (see also the note on the table ).
In addition, the Arabic letter Yāʾ (ﻱ) at position 27 hex in the Arabic G0 primary character set and with Hamza above (ﺉ) at position 27 hex in the Arabic G2 supplementary character set is more of a final form of presentation and is therefore primarily coded as the isolated form of presentation does not optically allow a correct connection to the right.
The Arabic letters with several codings and an optional connection to the left are shown in ETSI EN 300 706 on the left with a connecting line and accordingly primarily coded as an initial form of presentation. In contrast to this, the four Arabic letters of the " Sīn " family (ﺱ, ﺵ, ﺹ and ﺽ) at positions 53 hex to 56 hex in the Arabic G0 primary character set are shown on the left without a terminator or their own connecting line and must each have a second Characters are completed (see note on the table ).
In the case of Arabic letters with several Unicode numbers, when outputting in Unicode, either the appropriate Unicode number must be selected according to the two neighboring characters on the left and right or, in the simplest case, the first Unicode number must be used. A bold unicode number stands for the actual character. If the actual characters are used instead of the presentation forms for the output in Unicode, then the non-width non-connector (ZWNJ) with the Unicode number 200C hex or the non- width connector (ZWJ) with the Unicode number 200D hex may have to be inserted in order to enable the automatic selection of the To restrict glyphs to the possible forms of presentation of the respective characters.
The Arabic script is written from right to left , but the arrangement in teletext is from left to right as usual. For this reason, when outputting in Unicode, either the Unicode Bidi algorithm must be used backwards or, in the simplest case, the bidirectional control character left-to-right-forced (LRO) with the Unicode number 202D hex must be placed in front of each line.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
! 0021 |
" " 0022 |
£ 00A3 |
$ 0024 |
% 0025 |
ں FE73 |
ﻲ ﻱ |
) 0029 |
( 0028 |
* ∗ | @ |
+ 002B |
, , |
- 002D |
. 002E |
/ 002F |
3_ |
0 0030 |
1 0031 |
2 0032 |
3 0033 |
4th 0034 |
5 0035 |
6th 0036 |
7th 0037 |
8th 0038 |
9 0039 |
: 003A |
؛ 061B |
> 003E |
= 003D |
< 003C |
؟ 061F |
4_ |
ﺔ |
ﺀ |
ﺒ |
ﺏ ﺐ |
ﺘ |
ﺕ ﺖ |
ﺎ |
ﺍ |
ﺑ |
ﺓ |
ﺗ |
ﺛ |
ﺟ ﺠ ﺟ ﺠ |
ﺣ ﺤ ﺣ ﺤ |
ﺧ ﺨ ﺧ ﺨ |
ﺩ ﺪ |
5_ |
ﺫ ﺬ |
ﺭ ﺮ |
ﺯ ﺰ |
ﺳ ﺴ (ﺱ ﺲ) |
ﺷ ﺸ (ﺵ ﺶ) |
ﺻ ﺼ (ﺹ ﺺ) |
ﺿ ﻀ (ﺽ ﺾ) |
ﻃ ﻁ ﻂ ﻄ |
ﻇ ﻅ ﻆ ﻈ |
ﻋ |
ﻏ |
ﺜ |
ﺠ ﺠ |
ﺤ ﺤ |
ﺨ ﺨ |
# ⋕ 0023 |
6_ |
ـ 0640 |
ﻓ |
ﻗ |
ﻛ ﻜ |
ﻟ |
ﻣ |
ﻧ |
ﻫ |
ﻭ ﻮ |
ﻰ |
ﻳ |
ﺙ ﺚ |
ﺝ ﺞ |
ﺡ ﺢ |
ﺥ ﺦ |
ﻴ |
Pers. |
ﯼ |
ﮐ ﮎ ﮏ ﮑ |
ﯽ |
ﯾ |
ﯿ |
|||||||||||
7_ |
ﻯ |
ﻌ |
ﻐ |
ﻔ |
ﻑ ﻒ |
ﻘ |
ﻕ ﻖ |
ﻙ ﻚ |
ﻠ |
ﻝ ﻞ |
ﻤ |
ﻡ ﻢ |
ﻨ |
ﻥ ﻦ |
ﻻ FEFB |
■ 25A0 |
The two characters 26 hex () and 27 hex (ﻱ) are coded differently to ASMO 449 . In addition, five special letters and almost all special characters in positions 40 hex to 7E hex have been replaced by other forms of presentation of the coded Arabic letters.
The character 26 hex () serves as the final part for the isolated and final forms of presentation of the four Arabic letters of the " Sīn " family (ﺱ, ﺵ, ﺹ and ﺽ) at positions 53 hex to 56 hex .
The two round brackets (“)” and “(”) at positions 28 hex and 29 hex , as well as the two comparison characters (> and <) at positions 3C hex and 3E hex are coded clockwise as in the other character sets , since the All characters in teletext are always arranged from left to right.
The coding of the character 2A hex depends on the control .
The Arabic comma (،) at the 2C hex position is shown in ETSI EN 300 706 in the example layout so that it can also be used optically as a normal comma (,).
The combined initial and medial presentation forms of the three Arabic letters of the " Ǧīm " family ( ﺟ / ﺠ , ﺣ / ﺤ and ﺧ / ﺨ ) at positions 4C hex to 4E hex are in ETSI EN 300 706 suitable for the initial and medial Presentation forms of the Persian letter Che ( ﭼ / ﭽ ) at positions 28 hex and 29 hex in the Arabic G2 supplementary character set shown with a straight base line. However, the coding as media presentation forms are identical to the media presentation forms without a straight base line ( ﺠ , ﺤ and an ) at positions 5C hex to 5E hex , since this is only a layout variation. The same applies to the use as initial forms of presentation, although there are no separate characters for the layout variation without a straight baseline ( ﺟ , ﺣ and ﺧ ).
The four Arabic letters of the " Sīn " family (ﺱ, ﺵ, ﺹ and ﺽ) at positions 53 hex to 56 hex are shown on the left without any termination or their own connecting line and each must be completed with a second character. When used as an isolated or final form of presentation, the end piece () must be added to the left at position 26 hex . When used as an initial or medial form of presentation, the modifying character Taṭwīl (ـ) must be added to the left at position 60 hex if the left neighbor does not have its own connecting line to the right or if it is very short.
The alternative coding (with identical layout) of the letters in the line "Persian" serves to complete the Persian letters coded in the Arabic G2 supplementary character set.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
ﻉ |
ﺁ (ﺂ) |
ﺃ (ﺄ) |
ﺅ ﺆ |
ﺇ (ﺈ) |
ﺋ |
ﺊ ﺉ |
ﭼ ﭼ |
ﭽ ﭽ |
ﭺ ﭻ |
ﭘ |
ﭙ |
ﭖ ﭗ |
ﮊ ﮋ |
ﮔ ﮒ ﮓ ﮕ |
3_ |
٠ 0660 |
١ 0661 |
٢ 0662 |
٣ 0663 |
٤ 0664 |
٥ 0665 |
٦ 0666 |
٧ 0667 |
٨ 0668 |
٩ 0669 |
ﻎ |
ﻍ |
ﻼ FEFC |
ﻬ |
ﻪ |
ﻩ |
4_ |
à 00E0 |
A. 0041 |
B. 0042 |
C. 0043 |
D. 0044 |
E. 0045 |
F. 0046 |
G 0047 |
H 0048 |
I. 0049 |
J 004A |
K 004B |
L. 004C |
M. 004D |
N 004E |
O 004F |
5_ |
P 0050 |
Q 0051 |
R. 0052 |
S. 0053 |
T 0054 |
U 0055 |
V 0056 |
W. 0057 |
X 0058 |
Y 0059 |
Z 005A |
ë 00EB |
ê 00EA |
ù 00F9 |
î 00EE |
ﻊ |
6_ |
é 00E9 |
a 0061 |
b 0062 |
c 0063 |
d 0064 |
e 0065 |
f 0066 |
G 0067 |
H 0068 |
i 0069 |
j 006A |
k 006B |
l 006C |
m 006D |
n 006E |
O 006F |
7_ |
p 0070 |
q 0071 |
r 0072 |
s 0073 |
t 0074 |
u 0075 |
v 0076 |
w 0077 |
x 0078 |
y 0079 |
z 007A |
â 00E2 |
O 00F4 |
û 00FB |
ç 00E7 |
|
The character set is partially identical to the Latin G0 primary character set . The digits are coded differently with their Arabic-Indian variants. In addition, all special characters have been replaced by presentation forms of Arabic letters and modified Latin lowercase letters to spell French (see Windows-1256 ).
The alternative coding of the characters framed in bold is necessary to complete all forms of presentation of the coded Arabic letters.
Hebrew
The Hebrew G0 primary character set is essentially identical to the 7-bit character set SI 960 (adopted in ISO 8859-8 ), whereby the Latin G0 variant "English" is used for the special characters . A Hebrew G2 supplementary character set is not defined; the Arabic G2 supplementary character set is used.
The Hebrew script is written from right to left , but the arrangement in teletext is from left to right as usual. For this reason, when outputting in Unicode, either the Unicode Bidi algorithm must be used backwards or, in the simplest case, the bidirectional control character left-to-right-forced (LRO) with the Unicode number 202D hex must be placed in front of each line.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
! 0021 |
" " 0022 |
£ 00A3 |
$ 0024 |
% 0025 |
& 0026 |
' ' 0027 |
( 0028 |
) 0029 |
* ∗ | @ |
+ 002B |
, 002C |
- 002D |
. 002E |
/ 002F |
3_ |
0 0030 |
1 0031 |
2 0032 |
3 0033 |
4th 0034 |
5 0035 |
6th 0036 |
7th 0037 |
8th 0038 |
9 0039 |
: 003A |
; 003B |
< 003C |
= 003D |
> 003E |
? 003F |
4_ |
@ 0040 |
A. 0041 |
B. 0042 |
C. 0043 |
D. 0044 |
E. 0045 |
F. 0046 |
G 0047 |
H 0048 |
I. 0049 |
J 004A |
K 004B |
L. 004C |
M. 004D |
N 004E |
O 004F |
5_ |
P 0050 |
Q 0051 |
R. 0052 |
S. 0053 |
T 0054 |
U 0055 |
V 0056 |
W. 0057 |
X 0058 |
Y 0059 |
Z 005A |
← 2190 |
½ 00BD |
→ 2192 |
↑ 2191 |
# ⋕ 0023 |
6_ |
א 05D0 |
ב 05D1 |
ג 05D2 |
ד 05D3 |
ה 05D4 |
ו 05D5 |
ז 05D6 |
ח 05D7 |
ט 05D8 |
י 05D9 |
ך 05DA |
כ 05DB |
ל 05DC |
ם 05DD |
מ 05DE |
ן 05DF |
7_ |
נ 05E0 |
ס 05E1 |
ע 05E2 |
ף 05E3 |
פ 05E4 |
ץ 05E5 |
צ 05E6 |
ק 05E7 |
ר 05E8 |
ש 05E9 |
ת 05EA |
₪ 20AA |
∥ 2225 |
¾ 00BE |
÷ 00F7 |
■ 25A0 |
In contrast to SI 960, the 7B hex ( Zeichen ) character is coded as a shekel currency symbol (see Windows-1255 ).
The coding of the character 2A hex depends on the control .
graphic
The characters with a 6-digit Unicode number (01FBxx hex ) will only be included in a future version of Unicode and may still change.
With normal teletext in 4: 3 format , the ratio of width to height of a character is 4: 5. This must be observed for the justified display of a graphic.
Since the exact layout of the Unicode characters is heavily dependent on the font and these do not always match, you should draw all graphic characters yourself if necessary.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
␠ 0020 |
█▌ 01FB00 |
▐█ 01FB01 |
███ 01FB02 |
01FB03 |
█▌ 01FB04 |
▐█ 01FB05 |
███ 01FB06 |
01FB07 |
█▌ 01FB08 |
▐█ 01FB09 |
███ 01FB0A |
01FB0B |
█▌ 01FB0C |
▐█ 01FB0D |
███ 01FB0E |
||||||
3_ |
01FB0F |
█▌ 01FB10 |
▐█ 01FB11 |
███ 01FB12 |
01FB13 |
▌ 258C |
▐█ 01FB14 |
███ 01FB15 |
01FB16 |
█▌ 01FB17 |
▐█ 01FB18 |
███ 01FB19 |
01FB1A |
█▌ 01FB1B |
▐█ 01FB1C |
███ 01FB1D |
||||||
4_ |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
||||||
5_ |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
[G0] |
||||||
6_ |
01FB1E |
█▌ 01FB1F |
▐█ 01FB20 |
███ 01FB21 |
01FB22 |
█▌ 01FB23 |
▐█ 01FB24 |
███ 01FB25 |
01FB26 |
█▌ 01FB27 |
▐ 2590 |
███ 01FB28 |
01FB29 |
█▌ 01FB2A |
▐█ 01FB2B |
███ 01FB2C |
||||||
7_ |
01FB2D |
█▌ 01FB2E |
▐█ 01FB2F |
███ 01FB30 |
01FB31 |
█▌ 01FB32 |
▐█ 01FB33 |
███ 01FB34 |
01FB35 |
█▌ 01FB36 |
▐█ 01FB37 |
███ 01FB38 |
01FB39 |
█▌ 01FB3A |
▐█ 01FB3B |
█ 2588 |
The graphic space at position 20 hex is as wide as the block elements at positions 21 hex to 3F hex and 60 hex to 7F hex and can be coded as normal or protected spaces , as they are just as wide in a font with a fixed character width are. However, encoding as a separate character similar to the digit space with the Unicode number 2007 hex would be better, which is not available in Unicode . The attribute "Separate block graphic / underline " has no effect on the graphic space.
The 63 block elements at the positions 21 hex to 3F hex and 60 hex to 7F hex be dependent on the corresponding attribute as shown in contiguous or alternatively as to the right of the full block (█) at the position 7F hex illustrated in separate form. In the split shape, the six rectangular blocks that make up these graphic characters are smaller and not connected to each other. The separated forms are not defined as independent characters in Unicode .
The corresponding characters of the selected G0 primary character set are used for the 32 positions 40 hex to 5F hex .
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8th | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2_ |
? 01FB3C |
? 01FB3D |
? 01FB3E |
? 01FB3F |
? 01FB40 |
◣ ( 25E3 ) |
? 01FB41 |
? 01FB42 |
? 01FB43 |
? 01FB44 |
? 01FB45 |
? 01FB46 |
? 01FB68 |
? 01FB69 |
▐ |
▒ 2592 |
3_ |
? 01FB47 |
? 01FB48 |
? 01FB49 |
? 01FB4A |
? 01FB4B |
◢ ( 25E2 ) |
? 01FB4C |
? 01FB4D |
? 01FB4E |
? 01FB4F |
? 01FB50 |
? 01FB51 |
? 01FB6A |
? 01FB6B |
▌ |
█ 2588 |
4_ |
▌ ( 2537 ) |
( 252F ) |
▌ ( 251D ) |
▌ ( 2525 ) |
? 01FBA4 |
? 01FBA5 |
? 01FBA6 |
? 01FBA7 |
? 01FBA0 |
? 01FBA1 |
? 01FBA2 |
? 01FBA3 |
▌ ( 253F ) |
⚫ 26AB |
⬤ 2B24 |
◯ 25EF |
5_ |
│ 2502 |
─ | - |
┌ 250C |
┐ 2510 |
└ 2514 |
┘ 2518 |
├ 251C |
┤ 2524 |
┬ 252C |
┴ 2534 |
┼ 253C |
⭢ | → |
⭠ | ← |
⭡ | ↑ |
⭣ 2B63 |
␠ 0020 |
6_ |
? 01FB52 |
? 01FB53 |
? 01FB54 |
? 01FB55 |
? 01FB56 |
◥ ( 25E5 ) |
? 01FB57 |
? 01FB58 |
? 01FB59 |
? 01FB5A |
? 01FB5B |
? 01FB5C |
? 01FB6C |
? 01FB6D |
|
|
7_ |
? 01FB5D |
? 01FB5E |
? 01FB5F |
? 01FB60 |
? 01FB61 |
◤ ( 25E4 ) |
? 01FB62 |
? 01FB63 |
? 01FB64 |
? 01FB65 |
? 01FB66 |
? 01FB67 |
? 01FB6E |
? 01FB6F |
|
|
The 57 smoothed block elements at the positions 20 hex to 2D hex , 30 hex to 3D hex , 3F hex , 60 hex to 6D hex and 70 hex to 7D hex are in some decoders depending on the associated attribute as shown in contiguous or alternatively like the block elements shown in separate form in the G1 block graphic character set (see ITU T.101 ). The separated forms are not defined as independent characters in Unicode .
In the case of the four triangles at positions 25 hex , 35 hex , 65 hex and 75 hex , the alternatively coded Unicode characters are not graphic elements that connect the teletext characters , but rather geometric shapes aligned on the baseline , each on all four sides of space are surrounded.
The left thin vertical frame line ( │ ) at position 2E hex is centered horizontally in relation to the left half block (▌) at position 35 hex in the G1 block graphic character set . The alternatively coded Unicode characters, on the other hand, are not lines, but vertical eighth blocks to the left and right of the line position.
The right thin vertical frame line ( │ ) at position 3E hex is centered horizontally in relation to the right half block (▐) at position 6A hex in the G1 block graphic character set . The alternatively coded Unicode characters, however, are not lines, but vertical eighth blocks to the right and left of the line position.
For the five frame elements at positions 40 hex to 43 hex and 4C hex , the thick horizontal line corresponds to the middle horizontal third block (?) at position 2C hex in the G1 block graphic character set . With the alternatively coded Unicode characters, on the other hand, the thick horizontal line corresponds to the thick horizontal frame line (━) with the Unicode number 2501 hex , which is significantly thinner.
The following three circles do not have a fixed Unicode assignment and are coded based on Unicode Technical Report # 25. The exact layout of the Unicode characters depends heavily on the font, if they are supported at all. For the two large circles in full block width, at least in a font with a fixed character width, the largest Unicode circles should fit best, and even in the proportional font "Arial Unicode MS" the large circle line ( ◯ ) with the Unicode number 25EF hex is the same wide as the full block ( █ ) at position 3F hex .
The filled small circle ( ⚫ ) at position 4D hex is the same size as the sixth block (?) at position 24 hex in the G1 block graphic character set and is centered.
The filled in large circle ( ⬤ ) at position 4E hex and the large circle line ( ◯ ) at position 4F hex are each as wide as the full block (█) at position 3F hex and vertically centered.
The two arrows to the right (⭢) and left (⭠) at positions 5B hex and 5C hex match the thin horizontal frame lines (─) of the characters 51 hex to 5A hex and can be seamlessly connected to these at the beginning. These characters are shown in ETSI EN 300 706 in the example layout with a thicker line width than the three characters with a similar layout (→, ← and -) at positions 5D hex , 5B hex and 60 hex in the Latin G0 variant "English" and at positions 2E hex , 2C hex and 50 hex in the Latin G2 supplementary character set and should not be mixed together.
The two arrows up (⭡) and down (⭣) at the positions 5D hex and 5E hex match the thin vertical frame lines (│) of the characters 40 hex to 4C hex and 50 hex to 5A hex and can start with these be seamlessly connected.
The graphic space at position 5F hex is identical to the graphic space at position 20 hex in the G1 block graphic character set and should therefore be coded identically.
The characters with the Unicode number in brackets are similar to the example layouts given in ETSI EN 300 706 , but usually do not match the other graphic characters visually and semantically. However, there is no better Unicode encoding for these characters .
Many Level 1.5 decoders only support the four characters framed in bold, so the assumption is that they use characters with a similar layout from the Latin G0 variant "English" and that the characters must be coded alternatively accordingly .
Character set selection
With the selection bits in the national G0 character set tables, the associated G2 character set is usually also selected. The first hexadecimal number indicates the four most significant bits (the region) and the second number the three least significant bits (the national variant).
Notes on the G0 character set:
- For the X / 26 selection and all other X / 26 functions for character selection , Latin (with a green background) always uses the "Standard" variant .
- Icelandic channels use the Latin G0 variant "Portuguese / Spanish" and the Latin G2 supplementary character set .
Notes on the second G0 character set:
Level | priority | Selection bits for standard G0 / G2 | G0 character set | G1 character set | G2 character set | |||||
---|---|---|---|---|---|---|---|---|---|---|
1 = highest |
superior | inferior | default | Second G0 | X / 26 selection | default | default | X / 26 selection | ||
X / 0 (page header) | all | 8th | Decoder 1 | Page header | ● | ○ 2 | ○ 3 (from level 1.5) |
|||
X / 28/1 | ≤ 1.5 4 | 4th | package | Page header | ● | ○ 5 | ● | ○ 5 (from level 1.5) |
||
M / 29/1 | ≤ 1.5 4 | 7th | package | Page header | ● | ○ 5 | ● | ○ 5 (from level 1.5) |
||
X / 28/0 format 1 | ≥ 2.5 | 2 | package | Page header (with some Level 2.5 decoders from the package) |
● | ● | ● | |||
X / 28/4 | ≥ 3.5 | 3 | package | Page header | ● | ● | ● | |||
M / 29/0 | ≥ 2.5 | 5 | package | Page header (with some Level 2.5 decoders from the package) |
● | ● | ● | |||
M / 29/4 | ≥ 3.5 | 6th | package | Page header | ● | ● | ● | |||
X / 26 column function…… 08 hex "Modified G0 and G2 Character Set" |
≥ 2.5 | 1 | ● 6 , 7 | ● 7 |
Presettings for each Teletext page:
Notes on packages X / 28/1 and M / 29/1:
Notes on the X / 26 selection:
Level | Control characters 00 hex ..1F hex |
G0 character set | G1 character set | G2 character set | G3 character set | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
default | Second G0 | X / 26 selection | Character 2A hex | Latin variant | Standard a | default | X / 26 selection | Standard b | |||
X / 0 to X / 25 Simple level 1 teletext page | all | ● 1 | ● 2 , 3 | ● 3 | * | national | ● 4 | ||||
X / 26 column function ... | |||||||||||
... 10 hex "G0 Character" | ≥ 1.5 | ● | ● | @ | default | ||||||
… 09 hex "G0 Character (Levels 2.5 & 3.5)" | ≥ 2.5 | ● | ● | * | default | ||||||
... 11 hex to 1F hex "G0 Character with diacritical mark" | ≥ 1.5 | ● | ● | * | default | combining | combining | ||||
... 01 hex "G1 character" | ≥ 2.5 | ○ 5 | ○ 5 | default | ● 5 | ||||||
… 0F hex "G2 Character" | ≥ 1.5 | ● 6 | ● | ||||||||
... 02 hex "G3 Character (Level 1.5)" | ≥ 1.5 | ● 6 | |||||||||
… 0B hex "G3 Character (Levels 2.5 & 3.5)" | ≥ 2.5 | ● |
Notes on the G1 and G3 character sets :
Notes on the simple level 1 teletext page:
Comment on the X / 26 column function 01 hex "G1 Character":
Comment on the X / 26 column functions 0F hex "G2 Character" and 02 hex "G3 Character (Level 1.5)":
Web links
- ETSI EN 300 706 - Enhanced Teletext specification (2003) and ETS 300 706 (1997), ETSI (English)
- ITU-T Recommendation T.101: International interworking for Videotex services (1994) and ITU-T Recommendation T.101, Annex C (1990), ITU (English)
- EBU Tech 3232 - Displayable Character Sets for Broadcast Teletext and EBU Tech 3232-a - Appendices , EBU, 1982 (English)
- STV5348 , STMicroelectronics, 2004 (English)
- Philips SAA5243 (1991), Philips SAA5244A (1992), Philips SAA5249 (1996), Philips SAA5254 (1996), Philips SAA5281 (1996), Philips SAA5288 (1997) and Philips SAA5290 (1995), Philips (English)
- The Cyrillic Charset Soup , Roman Czyborra, 1998 (English)
- Notes on some Unicode Arabic characters: recommendations for usage , Jonathan Kew, Draft 2, 2005
- Unicode 8.0 Character Code Charts , Unicode, 2015 (English)
- Graphic character identifiers , IBM (English)
- RFC 1345 - Character Mnemonics & Character Sets , Keld Simonsen, 1992 (English)
- GOST 13052
- ISO 6937-2: 1983 / Add 1: 1989
- ISO-IR-11
- ISO-IR-21
- ISO-IR-27
- ISO-IR-47
- ISO-IR-142
Individual evidence
- ↑ a b Philips SAA5246A , Philips, 1993 (English)
-
↑ Character histories: notes on some Ascii code positions , Jukka “Yucca” Korpela, 2006 (English);
7-bit character sets , Aivosto Oy, 2016 (English) -
↑ Quarter-quadrant, hyphen / divis , Wikipedia: “In the older ASCII character set and in the character sets of the ISO 8859 family of standards [...] the hyphen-minus is used, which was introduced with the typewriter as a common character for hyphen, dash and minus sign . ";
IT and communication - Characters and encodings: The ISO Latin 1 character repertoire: Detailed descriptions of the characters, "- HYPHEN, MINUS SIGN (HYPHEN-MINUS) U + 002D" , Jukka "Yucca" Korpela, 2006 (English): "In situations where sufficient support to Unicode can be safely assumed (very rarely at present!), it is best to replace the use of hyphen-minus by Unicode hyphen (U + 2010) or non-breaking hyphen (U + 2011) or minus sign (U + 2212) or, if hyphen-minus had been used eg in place of a dash symbol, some other Unicode character such as en dash (U + 2013) or em dash (U + 2014) or horizontal bar (U + 2015 ). " - ↑ a b c Minus sign, similar signs , U + 2015 horizontal bar , Wikipedia: " (2) This sign generally resembles an em dash in length, shape and altitude and differs from it only in its line break properties."
- ↑ On the use of some MS Windows characters in HTML, Suggested substitutes, Dashes , Jukka "Yucca" Korpela, 2017 (English): "In typewritten material, the em dash is represented by two hyphens with no space around them, and an en dash is represented by a hyphen. "
-
↑ Internationalization for Turkish: Dotted and Dotless Letter "I" , Tex Texin, 2010 (English);
Resolving dotted and dotless "i" , John Cowan, 1997 (English) -
↑ a b circumflex, character sets , Wikipedia: “The ASCII character set only contains the character ^ (in Unicode at position U + 005E), which is now interpreted as a single, universally applicable character. [...] In addition to the universal character ^ (U + 005E), the Unicode standard contains the typographically better character ˆ (U + 02C6) as well as other pre-composed characters with circumflex (e.g. Ẑ, ẑ). “;
ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM43 Arrowhead upwards, circumflex shape" - ^ A b ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM48 Lower bar (not jointive) low line, spacing underline (equivalent to SP09 of ISO 6937) "
-
↑ a b Grave accent, As surrogate of apostrophe or (opening) single quote , Wikipedia (English): "Additionally ASCII grave accent character (U + 0060` Grave accent ) was often used as surrogate of opening single quote, together with ASCII typewriter apostrophe (U + 0027 ' apostrophe ) used as closing single quote; double quotes were sometimes substituted by two consecutive grave accents and two consecutive typewriter apostrophes (`` ... ''). ";
ASCII and Unicode quotation marks , Markus Kuhn, 2007 (English): "Only old X Window System fonts and some old video terminals show ASCII 0x60 / 0x27 as left and right quotation marks, while most modern systems follow the ISO and Unicode standards instead. ";
ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM44 Upper reverse solidus, grave accent shape" - ↑ Character histories: notes on some Ascii code positions, VERTICAL LINE , Jukka "Yucca" Korpela, 2006 (English)
-
↑ a b Tilde, ASCII tilde (U + 007E) , Wikipedia (English): “Most modern proportional fonts align plain spacing tilde at the same level as dashes, or only slightly upper. This distinguishes it from a small tilde (˜), which is always raised. But in some monospace fonts, especially used in text user interfaces, ASCII tilde character is raised too. This apparently is a legacy of typewriters, where pairs of similar spacing and combining characters relied on one glyph. ";
Unicode Explained , Chapter 8: Character Usage, ASCII (Basic Latin), Tilde ~ (U + 007E), p. 401, Jukka K. Korpela, 2006 (English): “As a spacing clone of a diacritic tilde (ie, spacing counterpart of combining tilde U + 0303), use the small tilde ˜ (U + 02CD [correct: U + 02DC]). ";
ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM47 Upper bar (not jointive) bar or tilde shape" -
↑ a b List of Latin-based alphabets, extensions , Wikipedia;
Everything about Unicode, Lithuanian special characters , Jens Meyer, 2007;
Special letters and diacritical marks for the European languages of the Latin alphabet, Wolfgang Hendlmeier and Gerhard Helzel, 2012 -
↑ Hatschek, Usage and Character Sets , Wikipedia: "In modern printed fonts, the character on the uppercase L and on the lowercase d, l and t is often shown in a form similar to a comma at the top right next to the basic character."
And "It should be noted that these codes are also used if the hatschek is displayed on d, l, L and t in comma form. " -
↑ Telephone keypad, recommendation ITU-T E.161 , placement, appearance and naming of the symbol ⌗, Wikipedia: “This symbol is contained in Unicode as U + 2317 viewdata square [...]. With the square shape, the line ends must protrude between 8% and 18% of the edge line length on each side, with the inclined shape (interior angle 80 °) always by 18%. ";
Proposal to incorporate two telephony symbols into Unicode by glyph and annotation changes , Karl Pentzlin, 2013 (English): "The viewdata square, as its name implies, is introduced anyway as a character for" Viewdata "which is an application related to telephony introduced in the 1980s. It can be presumed that it had to be in fact the same symbol as the E.161 symbol.
However, the proportions of its representative glyph are not within the constraints given in E.161. ";
ITU-T Recommendation E.161: Arrangement of digits, letters and symbols on telephones and other devices that can be used for gaining access to a telephone network , 3.2.2 12 push buttons, symbols, pp. 3 + 4, ITU, 2001 (english) - ↑ a b ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 76, ITU, 1994 (English): "SM12 Central horizonal bar jointive"
-
↑ ż , Wiktionary: “As a typographical variant there is ƶ / Ƶ. However, this is usually only used if the whole word is written in capitals and there is no longer enough space for the point above the Z. ";
Teletext mappings , Marcin “Qrczak” Kowalczyk, 2001 (English): “In Polish capital Z with dot above is sometimes rendered with stroke instead of the dot. It's just a glyph variant, the meaning is exactly the same. The letter should be consistently encoded as Z WITH DOT ABOVE even if it's rendered with a stroke. " -
↑ a b comma (undersigned), coding , Wikipedia: “Until the early 1990s, no distinction was made between the comma and the cedilla in international standards. [...] Only later did the view prevail that these are two different diacritics. Today, Unicode contains both S and T with cedilla and S and T with comma. ”;
ISO / IEC 6937: 2001 , Table 4 - Specification of the repertoire, pp. 15 and 18, ISO / IEC, 2001 (English): "NOTE 2: The letters used in the Romanian language LATIN CAPITAL LETTER S WITH COMMA BELOW and LATIN CAPITAL LETTER T WITH COMMA BELOW are different from the LATIN CAPITAL LETTER S WITH CEDILLA and LATIN CAPITAL LETTER T WITH CEDILLA. However, subject to the agreement of originator and receiver in information interchange, the letters WITH CEDILLA may be used to substitute for the letters WITH COMMA BELOW. "
And" NOTE 5: The letters used in the Romanian language LATIN SMALL LETTER S WITH COMMA BELOW and LATIN SMALL LETTER T WITH COMMA BELOW are different from the LATIN SMALL LETTER S WITH CEDILLA and LATIN SMALL LETTER T WITH CEDILLA. However, subject to the agreement of originator and receiver in information interchange, the letters WITH CEDILLA may be used to substitute for the letters WITH COMMA BELOW. ";
Cedillas and commas below , Eric Muller, Adobe, 2013 (English);
Comments on cedilla and comma below (revision 2) , Denis Moyogo Jacquerye, 2013 (English);
Romanian diacritic marks , Cristian Kit Paul, 2008 (English) - ↑ Overline, Available Characters , Wikipedia: “In several character sets of the ISO 8859 family of standards and derived from it in the Unicode standard, there is a character U + 00AF (175 dec ) that can be used both as an overline and as a macron. [...] One of the reasons why the overline is often incorrectly referred to as a "macron" is not to be confused with the other Unicode characters of this name. The characters at the code points U + 02C9 ( modifier letter macron ) and U + 0304 ( combining macron ) are significantly shorter than their counterparts with overline . "
- ↑ The modern library , 10.2.4 and 10.2.5 character set sorting (literacy), pp 229-232, Rudolf Frankenberger and Klaus Haller, 2004
-
↑ Trema, Unicode , Wikipedia: “Most standards for character sets, including Unicode, do not differentiate between umlaut and trema. If a distinction between umlaut and trema is necessary in data processing, ISO / IEC JTC 1 / SC 2 / WG 2 recommends the following:
- Representation of the trema by: Combining Grapheme Joiner (CGJ, 034F) + Combining Diaeresis (0308)
- The umlauts are represented by: Combining Diaeresis (0308) “;
- ↑ Unicode Technical Note # 27 - Known Anomalies in Unicode Character Names , Unicode, 2017 (English)
- ↑ CCITT Recommendation T.61: Character repertoire and coded character sets for the international teletex service , 3.2.3.9 Non-spacing characters, p. 13, ITU, 1988 (English): "Note - The Non-spacing underline character is never used individually but always in combination with some other graphic character to represent the graphic rendition “underlined” for the associated character. The non-spacing underline character can be used in combination with any graphic character of the repertoire, including an accented letter or an umlaut, or space. It is recommended to implement the "underline" function by means of the control function SGR (4) instead of the "non-spacing underline" graphic character. "
- ↑ Proportionality Symbol , Doctor Peterson, 2003 (English): "If you prefer to describe it by its appearance rather than strictly by its usage, you might call it an" open alpha "or" loose alpha, "rather than" fishy alpha. " People do often describe it (wrongly) as an alpha, but I haven't seen these modifiers used anywhere. "
-
↑ ʼn, Miscellaneous , Wikipedia (English): "The upper case, or majuscule form has never been included in any international keyboards Therefore, it is decomposable by simply combining ʼ (U + 02BC) and N. 〔ʼN〕";
Unicode 10.0 Character Code Charts, Latin Extended-A , 0149 ʼn LATIN SMALL LETTER N PRECEDED BY APOSTROPHE, Unicode, 2017 (English): "uppercase is 02BC ʼ 004E N" -
↑ Kra (letter) , Wikipedia (English): “The letter can be capitalized as K ' , but it is not encoded separately as a single letter because it is very similar to the Latin capital letter K followed by an apostrophe, preferably the modifier letter apostrophe, U + 02BC ʼ modifier letter apostrophe (HTML & # 700;). “;
Status of Mapping between Characters of ISO 5426-2 and ISO / IEC 10646-1 (UCS) , 4. ADDITIONAL MAPPINGS, 63 LATIN CAPITAL LETTER KRA, p. 5, Joan M. Aliprand, 2002 (English): “The capital form of the letter kra letter can be encoded as the sequence U + 004B LATIN CAPTIAL LETTER K followed by U + 02BC MODIFIER LETTER APOSTROPHE. " - ↑ Unicode 10.0 Character Code Charts, Latin Extended-A , 0131 ı LATIN SMALL LETTER DOTLESS I, Unicode, 2017 (English): "uppercase is 0049 I"
-
↑ ß, capitalization and special features of use , as well as capital ß, capital letters without capital ß , Wikipedia;
Unicode 10.0 Character Code Charts, C1 Controls and Latin-1 Supplement , 00DF ß LATIN SMALL LETTER SHARP S, Unicode, 2017 (English): 'uppercase is “SS”' - ↑ Large ß , Wikipedia: “At the beginning of 2008 the capital ß was included as a new character in the international Unicode standard for computer character sets; Since June 29, 2017, the am has been part of the official German spelling. "
- ↑ a b I with grave (Cyrillic), Bulgarian and Macedonian , Wikipedia (English): “When not available, the character ⟨ѝ⟩ is often replaced by an ordinary ⟨и⟩ (not recommended, but still orthographically correct) or in Bulgarian by the letter ⟨й⟩ (formally this is considered a spelling error). "
- ↑ a b Tonos , Wikipedia: “In some fonts the tonos is vertical, that is, in a 'neutral' position in contrast to the acute acute inclined to the right and the grave accent inclined to the left, sometimes it is just a point, one on top Triangle or similar This custom dates back to the 1970s, i.e. from the time before the official introduction of monotonic orthography by the Greek government, when orthography reformers used a 'neutral' accent in this way, which had to differ from the existing ones in polytonic orthography. With the official introduction of the monotonic orthography by the Greek government in 1980, however, the distinction between the tone and the polytonic accents became unnecessary, and all style specifications stipulate that the monotonic tone is graphically identical to the polytonic acute. This is also what Unicode provides. "
- ↑ a b Arabic character tail for final Seen family (Seen, Sheen, Saad, Daad) , IBM Egypt, 2001 (English)
-
↑ The Unicode Consortium on Twitter , Unicode, 2019 (English);
Proposal to add characters from legacy computers and teletext to the UCS , Doug Ewell, Rebecca Bettencourt and others, 2019 (English);
Map from Teletext G1 character set to Unicode , Rebecca Bettencourt, 2018 (English);
Map from Teletext G3 character set to Unicode , Rebecca Bettencourt, 2018 (English) - ↑ Unicode Technical Report # 25 - Unicode Support for Mathematics, 2.11 Geometrical Shapes , Unicode, 2007 (English)
-
↑ Bug Reports DVBViewer Pro / GE - Teletext with Cyrillic , Griga, 2012 (English): "PS The following screenshot from Derrick's sample (see above) shows clearly which characters originate from which source: - White characters are from the Latin G0 Character Set (identical for all countries with a latin alphabet)
- Red characters are from the Spanish / Portuguese National Option Subset.
- Green characters added by packets X / 26 are from the Latin G2 Supplementary Set. " - ↑ Siemens MEGATEXT PLUS SDA 5275-2 Delta Specification / Application Notes , 2.5.2 Example for Russian Market, p. 56, Siemens, 1998 (English): "The bit SEC_LA should be set and the secondary language should be defined to English because currently, no Russian broadcaster transmits packet X / 28 or X / 29. "
- ↑ Philips SAA5x9x family , 9.5 The twist attribute, p. 40, Philips, 1998 (English): “In many of the character sets, the 'twist' serial attribute (code 1BH) can be used to switch to an alternate basic character code table, eg to change from the Hebrew alphabet to the Arabic alphabet on an Arab / Hebrew device. "
- ↑ Philips SAA5x9x family , 9.5 The twist attribute, p. 40, Philips, 1998 (English): “In many of the character sets, the 'twist' serial attribute (code 1BH) can be used to switch to an alternate basic character code table [...]. For some national option languages the alternate code table is the default, and a twist control character will switch to the first code table. "