Teletext character sets (ETSI EN 300 706)

The following tables describe the 7-bit character sets defined in ETSI EN 300 706 of the teletext standard used in Europe .

General

The first 32 positions (00 _hex to 1F _hex ) of the character sets are not defined. However, these character codes are defined as control characters in the simple level 1 teletext page.

The character 24 _hex represents the general currency symbol (¤) in the Latin G0 standard primary character set and the dollar sign ($) in the other G0 primary character sets .

The character 2A _hex in the G0 primary character sets represents the asterisk (*) or the at sign (@) depending on the control .

The filled rectangle at position 7F _hex in the G0 primary character sets and in some G2 supplementary character sets is as large as the maximum extension of all letters without descenders . It has no fixed Unicode assignment and is encoded in DOS character sets like the FE _hex (■) character , which is also used in many software-based decoders. The exact layout of the Unicode character depends heavily on the font, but at least in the “Courier” font family, the filled square ( ■ ) with the Unicode number 25A0 _hex largely corresponds to the example layout given in ETSI EN 300 706 . However, in the Arabic G0 primary character set , the rectangle is shown with a slightly shorter length than the Arabic letter Alif maqṣūra (ﻯ) at position 70 _hex , which is also not the case with all decoders.

The G2 supplementary character sets and the G3 character set "high-resolution graphics" are supported from teletext presentation level 1.5. With many Level 1.5 decoders, the character set of these character sets is still limited.

Legend

A.	Γ	Basic alphabet letter ( Latin / non-Latin script)
ß	ά	Special letter or addition
`	΄	Diacritical mark (single)
O		Diacritical mark (combining)
2	٢	Digit of the number system
½		Numeral
@	₪	Punctuation marks or special characters
O		Combining special character
▌	◣	Graphic or frame element ( defined / not defined in Unicode )
␠	_RLM	Spaces or control characters
		Undefined character
\| ¦		Characters with layout variations (often due to the low resolution or historical reasons)
₄₁	₄₁	See notes on the table (unique / different codings )
Α A	ﺏ ﺐ	Context-dependent meaning (identical layout / suitable form )
У (Y)	ﺁ (ﺂ)	Context-dependent meaning (different layout / missing form)
Ë \| $		Different codings ( depending on the control or the decoder)

With the Unicode numbers, the official Unicode name is given as an (invalid) web link so that it can be displayed as a reference text - unfortunately the wiki syntax does not provide a better way of doing this . For characters without a Unicode assignment ("N / A"), a descriptive name is used here, which is based on the names of similar Unicode characters.

Latin

The Latin G0 ("Standard" variant) and G2 character sets are essentially identical to the 8-bit character set ISO 6937-2: 1983 / Add 1: 1989 (ISO-IR-142) supplemented by the two characters A6 _hex (#) and A8 _hex (¤) from the equivalent 8-bit character set ITU T.61 (see also the current version of ISO 6937: 2001 ), whereby the G2 supplementary character set corresponds to the characters A0 _hex to FF _hex .

Latin G0 primary character set (European)
Selection bits : see national variants
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	! 0021 21	" " 0022 22	# ⋕ 0023 23	¤ 00A4 24	% 0025 25	& 0026 26	' ' 0027 27	( 0028 28	) 0029 29	* ∗ \| @ 002A \| 0040 2A	+ 002B 2B	, 002C 2C	- 002D 2D	. 002E 2E	/ 002F 2F
3_	0 0030 30	1 0031 31	2 0032 32	3 0033 33	4th 0034 34	5 0035 35	6th 0036 36	7th 0037 37	8th 0038 38	9 0039 39	: 003A 3A	; 003B 3B	< 003C 3C	= 003D 3D	> 003E 3E	? 003F 3F
4_	@ 0040 40	A. 0041 41	B. 0042 42	C. 0043 43	D. 0044 44	E. 0045 45	F. 0046 46	G 0047 47	H 0048 48	I. 0049 49	J 004A 4A	K 004B 4B	L. 004C 4C	M. 004D 4D	N 004E 4E	O 004F 4F
5_	P 0050 50	Q 0051 51	R. 0052 52	S. 0053 53	T 0054 54	U 0055 55	V 0056 56	W. 0057 57	X 0058 58	Y 0059 59	Z 005A 5A	[ 005B 5B	\ 005C 5C	] 005D 5D	^ 005E 5E	_ 005F 5F
6_	` ‵ 0060 60	a 0061 61	b 0062 62	c 0063 63	d 0064 64	e 0065 65	f 0066 66	G 0067 67	H 0068 68	i 0069 69	j 006A 6A	k 006B 6B	l 006C 6C	m 006D 6D	n 006E 6E	O 006F 6F
7_	p 0070 70	q 0071 71	r 0072 72	s 0073 73	t 0074 74	u 0075 75	v 0076 76	w 0077 77	x 0078 78	y 0079 79	z 007A 7A	{ 007B 7B	\| ¦ 007C 7C	} 007D 7D	~ ~ 007E 7E	■ 25A0 7F

The 7F _hex (■) character is coded differently from ISO 6937 .

The double quotation mark (") at position 22 _hex is typographically correct in ETSI EN 300 706 in the example layout as a closing quotation mark in English (") with the Unicode number 201D _hex . However, the character should still be encoded as a neutral variant according to ISO 6937 , in order to be able to be used visually and semantically better as opening quotation marks in English (“). In addition, the typographically correct variant is also shown at position 3A _hex in the Latin G2 supplementary character set with a different example layout rather than closing quotation marks.

The number sign (#) at position 23 _hex is shown in ETSI EN 300 706 in the example layout with vertical lines, although this is only a layout variation that is probably due to the low resolution.

The apostrophe (') at position 27 _hex is typographically correct in ETSI EN 300 706 in the example layout and could also be closed with the optically more suitable, alternative Unicode characters in English (') with the Unicode number 2019 _hex or modifying apostrophe ( ʼ) can be coded with the Unicode number 02BC _hex , but both of these would be different from ISO 6937 and would not be optically and semantically suitable if used as opening quotation marks in English ('). In addition, the typographically correct variant is also shown at position 39 _hex in the Latin G2 supplementary character set with a different example layout rather than closing quotation marks.

The coding of the character 2A _hex depends on the control .

The asterisk (*) at the position 2A _hex is in 300706 ETSI EN displayed large in the example layout sechsstrahlig, standing on a beam and centered vertically and could optically more suitable also with the alternative Unicode character asterisk operator ( * ) with the Unicode number 2217 _hex coded, which would be different to ISO 6937 .

The center dash (-) at the position 2D _hex can according to EBU Tech 3232-a and ITU T.61 also context-dependent than dash - with the Unicode number 2010 () _hex or a minus sign - with the Unicode number 2212 () _hex encoded. The character can also be used as a dash (-) with the Unicode number 2013 _hex . However, for the long dash in English (-) with the Unicode number 2014 _{hex, it is} better to use the horizontal line (-) at position 60 _hex in the " English " variant and at position 50 _hex in the Latin G2 supplementary character set or two consecutive middle bars become. 

The capital letter I in position 49 _hex can be used as a capital letter for the small letter i in position 69 _hex and as a capital letter for the small letter i without a dot (ı) in position 60 _hex or 5F _hex in the two variants " Turkish " and " Romanian ”, as well as at position 75 _hex in the Latin G2 supplementary character set. The lower case letter i at position 69 _hex can be used as a lower case letter for the upper case letter I at position 49 _hex and as a lower case letter for the upper case letter I with a dot (İ) at position 40 _hex in the "Turkish" variant and for the corresponding combination in Latin G2 supplementary character set can be used. Even in Unicode , no distinction is made between the two optically identical characters.

The circumflex (^) at position 5E _hex is shown in ETSI EN 300 706 in the example layout in large and superscript, as is also common in modern printed publications.

The underscore (_) at position 5F _hex is not shown connecting left and right in ETSI EN 300 706 in the example layout, but this is unusual in modern publications.

The single diacritical gravis (`) at position 60 _hex is shown in ETSI EN 300 706 in the example layout of the size and height as a vertically mirrored counterpart to the typographically correct form of the apostrophe (') at position 27 _hex , but still has the straight line shape and inclination of a grave accent. Nevertheless, the character could possibly also be used as an opening single quotation mark in English (‛) with the Unicode number 201B _hex , but this would differ from ISO 6937 and would not semantically fit.

The vertical bar (|) at position 7C _hex is shown in ETSI EN 300 706 in the example layout with a broken line in the middle (as well as not connecting at the top and bottom) and could also be broken with the optically more appropriate, alternative Unicode character (¦) with the Unicode number 00A6 _hex coded, which would be different to ISO 6937 . In addition, it is only a historically determined layout variation.

The tilde (~) at position 7E _hex is shown in ETSI EN 300 706 in the example layout in uppercase and superscript and in this form is not defined as an independent character in Unicode . The single diacritical tilde (˜) with the Unicode number 02DC _hex fits the altitude, but is too small. According EBU Tech 3232-a and ITU T.101 as an alternative to coding, the Unicode character overline with the Unicode number 203E (~) _hex or possibly the lone diacritics macron (¯) and the Unicode number 00AF _hex be used, but both would deviate from ISO 6937 and, unlike ITU T.101, usually connect left and right.

The coding of the other characters framed in bold depends on the control and the selected national variant .

Latin G0 primary character set - national variants
	0_	1_	2_	3_	4_	6_	8th_	23	24	40	5B	5C	5D	5E	5F	60	7B	7C	7D	7E
	Selection bits G2 = Arabic G2							23	24	40	5B	5C	5D	5E	5F	60	7B	7C	7D	7E
default								# ⋕ 0023 23	¤ 00A4 24	@ 0040 40	[ 005B 5B	\ 005C 5C	] 005D 5D	^ 005E 5E	_ 005F 5F	` ‵ 0060 60	{ 007B 7B	\| ¦ 007C 7C	} 007D 7D	~ ~ 007E 7E
Czech / Slovak	06	16			46			# ⋕ 0023 23	ů 016F 24	č 010D 40	ť tˇ 0165 5B	ž 017E 5C	ý 00FD 5D	í 00ED 5E	ř 0159 5F	é 00E9 60	á 00E1 7B	ě 011B 7C	ú 00FA 7D	š 0161 7E
English	00		20th				80 G2	£ 00A3 23	$ 0024 24	@ 0040 40	← 2190 5B	½ 00BD 5C	→ 2192 5D	↑ 2191 5E	# ⋕ 0023 5F	- 2015 60	¼ 00BC 7B	∥ 2225 7C	¾ 00BE 7D	÷ 00F7 7E
Estonian					42			# ⋕ 0023 23	O 00F5 24	Š 0160 40	Ä 00C4 5B	Ö 00D6 5C	Ž 017D 5D	Ü 00DC 5E	O 00D5 5F	š 0161 60	Ä 00E4 7B	ö 00F6 7C	ž 017E 7D	ü 00FC 7E
French	04	14th	24				84 G2	é 00E9 23	ï 00EF 24	à 00E0 40	ë 00EB 5B	ê 00EA 5C	ù 00F9 5D	î 00EE 5E	# ⋕ 0023 5F	è 00E8 60	â 00E2 7B	O 00F4 7C	û 00FB 7D	ç 00E7 7E
German	01	11	21st		41			# ⋕ 0023 23	$ 0024 24	§ 00A7 40	Ä 00C4 5B	Ö 00D6 5C	Ü 00DC 5D	^ 005E 5E	_ 005F 5F	° 00B0 60	Ä 00E4 7B	ö 00F6 7C	ü 00FC 7D	ß 00DF 7E
Italian	03	13	23					£ 00A3 23	$ 0024 24	é 00E9 40	° 00B0 5B	ç 00E7 5C	→ 2192 5D	↑ 2191 5E	# ⋕ 0023 5F	ù 00F9 60	à 00E0 7B	O 00F2 7C	è 00E8 7D	ì 00EC 7E
Latvian / Lithuanian					43			# ⋕ 0023 23	$ 0024 24	Š 0160 40	ė 0117 5B	ę 0119 5C	Ž 017D 5D	č 010D 5E	ū 016B 5F	š 0161 60	ą 0105 7B	ų 0173 7C	ž 017E 7D	į 012F 7E
Polish		10						# ⋕ 0023 23	ń 0144 24	ą 0105 40	Ż Ƶ 017B 5B	Ś 015A 5C	Ł 0141 5D	ć 0107 5E	O 00F3 5F	ę 0119 60	ż 017C 7B	ś 015B 7C	ł 0142 7D	ź 017A 7E
Portuguese / Spanish	05		25th					ç 00E7 23	$ 0024 24	¡ 00A1 40	á 00E1 5B	é 00E9 5C	í 00ED 5D	O 00F3 5E	ú 00FA 5F	¿ 00BF 60	ü 00FC 7B	ñ 00F1 7C	è 00E8 7D	à 00E0 7E
Romanian				37				# ⋕ 0023 23	¤ 00A4 24	Ț 021A 40	Â 00C2 5B	Ș 0218 5C	Ă 0102 5D	Î 00CE 5E	ı 0131 5F	ț 021B 60	â 00E2 7B	ș 0219 7C	ă 0103 7D	î 00EE 7E
Serbian / Croatian / Slovenian				35				# ⋕ 0023 23	Ë 00CB 24	Č 010C 40	Ć 0106 5B	Ž 017D 5C	Đ 0110 5D	Š 0160 5E	ë 00EB 5F	č 010D 60	ć 0107 7B	ž 017E 7C	đ 0111 7D	š 0161 7E
Swedish / Finnish, Hungarian	02	12	22nd					# ⋕ 0023 23	¤ 00A4 24	É 00C9 40	Ä 00C4 5B	Ö 00D6 5C	Å 00C5 5D	Ü 00DC 5E	_ 005F 5F	é 00E9 60	Ä 00E4 7B	ö 00F6 7C	å 00E5 7D	ü 00FC 7E
Turkish			26th			66		Tʟ N / A 23	G 011F 24	İ 0130 40	Ş 015E 5B	Ö 00D6 5C	Ç 00C7 5D	Ü 00DC 5E	G 011E 5F	ı 0131 60	ş 015F 7B	ö 00F6 7C	ç 00E7 7D	ü 00FC 7E

In the national variants, the Háček (ˇ) and the Breve (˘) for the special letters in ETSI EN 300 706 are shown imprecisely the same. In the languages of the three variants " Czech / Slovak ", " Latvian / Lithuanian " and " Serbian / Croatian / Slovenian " only the Háček is used, while in the languages of the two variants " Romanian " and " Turkish " only the breve is used. The letters in question are coded accordingly in the variants.

In the " Czech / Slovak " variant , the lowercase letter t with Háček (ť) at position 5B _hex in ETSI EN 300 706 shows the Háček (ˇ) in normal form, but is often similar in one form to the lowercase t in modern print an apostrophe (ʼ) to the right of the basic character. The coding is identical as it is just a layout variation.

The " English " variant is essentially identical to the 7-bit character set of the British Viewdata standard (ISO-IR-47), only the 5F _hex (#) character is coded differently.

The two arrows to the left (←) and right (→) at positions 5B _hex and 5D _hex are shown in ETSI EN 300 706 in the example layout to match the horizontal line (-) at position 60 _hex and can be used seamlessly at the beginning get connected. In such a combination, the horizontal line should be semantically appropriately encoded as a horizontal line extension ( ⎯ ) with the Unicode number 23AF _hex , although the Unicode character is currently only supported by very few fonts (correctly).

The double cross (#) at the position 5F _hex is in ETSI EN 300 706 represented the same as the number sign at position 23 _hex in the variant " standard " and, accordingly, identical coded. In the Viewdata standard, the character is coded as a viewdata square ( ⌗ ) with the Unicode number 2317 _hex , which is visually similar, but correctly represented but looks different (see ISO-IR-47) and has a different semantic meaning as a terminator for addresses which is not given in teletext.

The horizontal line (-) at position 60 _hex can also be used as a long dash in English (-) with the Unicode number 2014 _hex and is shown in ETSI EN 300 706 in the example layout connecting left and right.

The vertical double line at position 7C _hex is coded as a parallel character (∥) in accordance with EBU Tech 3232-a and is not shown as a connecting element in ETSI EN 300 706 in the example layout above and below. According to the character designation in the Viewdata standard, the optically identical Unicode character double vertical line (‖) with the Unicode number 2016 _hex can also be used for coding . However, according to RFC 1345 , this character is also coded there as a parallel character. But regardless of the primary encoding, the character can be used equally as a parallel character and as a double vertical line.

The " German " variant is essentially identical to the German 7-bit character set DIN 66003 (ISO-IR-21), only the 60 _hex (°) character is coded differently.

In the " Latvian / Lithuanian " variant , the two lower-case letters e with Ogonek (ę) and i with Ogonek (į) at positions 5C _hex and 7E _hex in ETSI EN 300 706 are probably incorrectly displayed with cedilla (¸), as these are in Latvian or Lithuanian can never be used with cedilla, but with Ogonek (˛). An alternative coding is not necessary, since the wrongly represented letters do not even occur in Europe and should therefore never be used.

In the " Polish " variant , the capital letter Z with an overlap (Ż) at position 5B _hex in ETSI EN 300 706 is shown as Z with a slash (Ƶ), but is usually not coded that way because it is only a layout variation acts. In addition, the associated lower case letter at position 7B _{hex is} also shown in ETSI EN 300 706 as z with a point (ż).

In the " Romanian " variant , the two letters T with sub-comma (Ț / ț) and S with sub-comma (Ș / ș) are in positions 40 _hex / 60 _hex and 5C _hex / 7C _hex according to the Romanian standardization authority with sub- comma (̦) coded (see also ISO 8859-16 ). However, until the beginning of the 1990s, these were only regarded as layout variations of the letters T with cedilla (Ţ / ţ) and S with cedilla (Ş / ş) in international standards , and ISO 6937 only contains the special letters with cedilla (¸) .

In the variant " Serbian / Croatian / Slovenian " the character 24 _hex instead of the capital letter E with trema (Ë) represents the dollar sign ($) with the Unicode number 0024 _hex or the common fraction a half (½) with the Unicode number 00BD on some decoders _hex .

The variant " Swedish / Finnish, Hungarian " is identical to the Swedish 7-bit character set SEN 850200 Annex C (ISO-IR-11).

In the " Turkish " variant , the symbol for the Turkish currency ( Tʟ ) at position 23 _hex can only be found in this form in teletext and is otherwise displayed as normal with the two single capital letters TL. There are in Unicode but different currency symbols that can be used for the Turkish currency: the Turkish Lirazeichen (₺) with the Unicode number 20ba _hex that Lirazeichen (₤) with the Unicode number 20A4 _hex and the pound sign (£) and the Unicode number 00A3 _hex .

Latin G2 Supplementary Character Set (European)
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	¡ 00A1 21	¢ 00A2 22	£ 00A3 23	$ 0024 24	¥ 00A5 25	# ⋕ 0023 26	§ 00A7 27	¤ 00A4 28	' 2018 29	" 201C 2A	« 00AB 2B	← 2190 2C	↑ 2191 2D	→ 2192 2E	↓ 2193 2F
3_	° 00B0 30	± 00B1 31	² 00B2 32	³ 00B3 33	× 00D7 34	µ 00B5 35	¶ 00B6 36	· 00B7 37	÷ 00F7 38	' 2019 39	” 201D 3A	» 00BB 3B	¼ 00BC 3C	½ 00BD 3D	¾ 00BE 3E	¿ 00BF 3F
4_	40	` 0060 41	´ 00B4 42	ˆ 02C6 43	˜ 02DC 44	¯ ˉ 00AF 45	˘ 02D8 46	˙ 02D9 47	¨ 00A8 48	̣ N / A 49	˚ 02DA 4A	¸ (̦) 00B8 ( N / A ) 4B	_ 005F 4C	˝ 02DD 4D	˛ 02DB 4E	ˇ 02C7 4F
Comb.	40	O 0300 41	ó (ģ) 0301 ( 0327 ) 42	O 0302 43	O 0303 44	O 0304 45	O 0306 46	ȯ 0307 47	ö 0308 48	O 0323 49	å 030A 4A	ç (o̦) 0327 ( 0326 ) 4B	O 0332 4C	O 030B 4D	ǫ 0328 4E	ǒ 030C 4F
5_	- 2015 50	¹ 00B9 51	® 00AE 52	© 00A9 53	™ 2122 54	♪ 266A 55	₠ 20A0 56	‰ 2030 57	∝ 221D 58	59	5A	5B	⅛ 215B 5C	⅜ 215C 5D	⅝ 215D 5E	⅞ 215E 5F
6_	Ω 2126 60	Æ 00C6 61	Đ Ð 0110 00D0 62	ª 00AA 63	H 0126 64	65	Ĳ 0132 66	Ŀ 013F 67	Ł 0141 68	O 00D8 69	Œ 0152 6A	º 00BA 6B	Þ 00DE 6C	Ŧ 0166 6D	Ŋ 014A 6E	ŉ 0149 6F
7_	ĸ 0138 70	æ 00E6 71	đ 0111 72	ð 00F0 73	H 0127 74	ı 0131 75	ĳ 0133 76	ŀ 0140 77	ł 0142 78	O 00F8 79	œ 0153 7A	ß 00DF 7B	þ 00FE 7C	ŧ 0167 7D	ŋ 014B 7E	■ 25A0 7F

The six characters 20 _hex (space), 49 _hex (̣), 56 _hex (₠), 57 _hex (‰), 58 _hex (∝) and 7F _hex (■) are coded differently from ISO 6937 and ITU T.61 .

The space at position 20 _hex can also be coded as a protected space with the Unicode number 00A0 _{hex in} accordance with ISO 6937 . However, the line break behavior in teletext is irrelevant.

The two arrows to the left (←) and right (→) at positions 2C _hex and 2E _hex are shown in ETSI EN 300 706 in the example layout to match the horizontal line (-) at position 50 _hex and can be used seamlessly at the beginning get connected. In such a combination, the horizontal line should be semantically appropriately encoded as a horizontal line extension ( ⎯ ) with the Unicode number 23AF _hex , although the Unicode character is currently only supported by very few fonts (correctly).

The single diacritical grave accent (`) at position 41 _hex is shown in the Latin G0 standard primary character set in ETSI EN 300 706 with a different example layout and can also be used with the alternative Unicode character modifying grave accent (ˋ) with the Unicode number 02CB _hex coded. However, these two characters are optically identical in modern printed matter. The single diacritical acute accent (´) at position 42 _hex with the alternative Unicode character modifying acute accent (eventuell) with the Unicode number 02CA _{hex could be} coded accordingly, but this would be different from ISO 6937 .

Since the single diacritical characters circumflex (ˆ) at position 43 _hex and tilde (˜) at position 44 _hex in the Latin G0 standard primary character set in ETSI EN 300 706 are shown with a different example layout, a more suitable, alternative coding is used as used in ISO 6937 (see Windows-1252 ). 

The layout of the single diacritical Unicode character macron (¯) at position 45 _hex is also heavily dependent on the font and is often more like the overline (‾), so the optically more suitable, alternative Unicode character modifying macron ( ˉ) with the Unicode number 02C9 _hex can be used, but this would be different from ISO 6937 .

The diacritical mark in the form of a horizontal colon (¨) at position 48 _hex can be used as a trema and umlaut points according to EBU Tech 3232-a and ITU T.61 . Even in Unicode , no distinction is made between these two optically identical characters. If a semantic differentiation is necessary, the diacritical symbol Trema can be coded with the Unicode string combining grapheme connector with the Unicode number 034F _hex and combining Trema (¨) with the Unicode number 0308 _hex , while the diacritical symbol umlaut dots can be coded quite normally with the Unicode- Character combining Trema (¨) is encoded with the Unicode number 0308 _hex or the Unicode characters combined with Trema. You shouldn't be confused by the names of the Unicode characters.

Historically, the diacritical cedilla (an) at position 4B _hex can also be used as a sub- comma (̦).

The combining underlining (_) and the associated underlining at position 4C _hex are not shown in ETSI EN 300 706 in the example layout on the left and right and should be better implemented using the " Underline " font . Correspondingly, the underscore at position 5F _hex in the Latin G0 primary character set should also be coded as a protected space in the font “underline” in order to avoid a double line and to achieve uniform lines. But at least in the “Courier” font family, the underline is optically compatible with the “Underline” font.

The horizontal line (-) at position 50 _hex can also be used as a long dash in English (-) with the Unicode number 2014 _hex and is shown in ETSI EN 300 706 in the example layout connecting left and right.

The proportional symbol (∝) at position 58 _hex is probably incorrectly referred to as alpha in EBU Tech 3232-a , but should not be confused with the Greek lowercase alpha (α), as both characters are shown in ETSI EN 300 706 with a different example layout .

According to EBU Tech 3232-a and ISO 6937, the character 62 _hex can be used as a capital letter D with a slash (Đ) for the lower case letter of the same name (đ) at position 72 _hex and as an Icelandic capital letter Eth (Ð) for the lower case letter of the same name (ð) position 73 _hex can be used. In case of doubt, the first Unicode number according to ISO 6937 should be selected.

The character for the indefinite article in Afrikaans (ŉ) at position 6F _hex is only available in lower case and is usually always lower case . In capitals , the character is displayed normally with the capital letter N at position 4E _hex with a preceding modifying apostrophe (ʼ) in position 27 _hex in the Latin G0 primary character set. The capitalized form is not defined as a separate character in Unicode either.

The previously used Greenlandic letter Kra (an) at position 70 _hex is only available as a lowercase letter. The associated capital letter is represented with the capital letter K at position 4B _hex with a subsequent modifying apostrophe (ʼ) at position 27 _hex in the Latin G0 primary character set and is not defined as a separate character in Unicode either.

The capital letter I at position 49 _hex in the Latin G0 primary character set is used as the uppercase letter for the Turkish lowercase letter i without a period (ı) at position 75 _hex . This is also provided for in Unicode (see also note on the Latin G0 primary character set ).

The German letter Eszett (ß) at position 7B _hex is only available as a lowercase letter. The capitalization is usually carried out with two consecutive capital letters S at the position 53 _hex in the Latin G0 primary character set and is not defined in this form as a separate character in Unicode . It was not until 2008 that the Eszett in capital letter form (ẞ) was added as a new character in Unicode and has been part of the official German spelling since 2017 .

The alternative coding of the characters in the "Combining" line is used depending on the control . The supported combinations depend on the decoder. If in doubt, you should limit yourself to the combinations specified in ISO 6937 . Accordingly, to represent the lowercase letter g with cedilla (ģ), the lowercase letter g is combined with the acute (´) at position 42 _hex , unlike in Unicode . With the two Cyrillic and Greek G2 supplementary character sets, the combining characters should only be used in conjunction with the Latin G0 primary character set.

Cyrillic

The Cyrillic G0 primary character sets are for the most part identical to the 7-bit character set GOST 13052 (adopted in ISO-IR-111 ), whereby the uppercase and lowercase letters are swapped and thus arranged as in the other character sets.

Cyrillic G0 primary character set - variant 1 - Serbian / Croatian
selection bits : 40
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	! 0021 21	" " 0022 22	# ⋕ 0023 23	$ 0024 24	% 0025 25	& 0026 26	' ' 0027 27	( 0028 28	) 0029 29	* ∗ \| @ 002A \| 0040 2A	+ 002B 2B	, 002C 2C	- 002D 2D	. 002E 2E	/ 002F 2F
3_	0 0030 30	1 0031 31	2 0032 32	3 0033 33	4th 0034 34	5 0035 35	6th 0036 36	7th 0037 37	8th 0038 38	9 0039 39	: 003A 3A	; 003B 3B	< 003C 3C	= 003D 3D	> 003E 3E	? 003F 3F
4_	Ч 0427 40	А A 0410 0041 41	Б 0411 42	Ц 0426 43	Д 0414 44	Е 0415 45	Ф 0424 46	Г 0413 47	Х X 0425 0058 48	И 0418 49	Ј 0408 4A	К 041A 4B	Л 041B 4C	М M 041C 004D 4D	Н H 041D 0048 4E	О O 041E 004F 4F
5_	П 041F 50	Ќ 040C 51	Р P 0420 0050 52	С C 0421 0043 53	Т T 0422 0054 54	У (Y) 0423 ( 0059 ) 55	В B 0412 0042 56	Ѓ 0403 57	Љ 0409 58	Њ 040A 59	З 0417 5A	Ћ 040B 5B	Ж 0416 5C	Ђ 0402 5D	Ш 0428 5E	Џ 040F 5F
6_	ч 0447 60	а a 0430 0061 61	б 0431 62	ц 0446 63	д 0434 64	е 0435 65	ф 0444 66	г 0433 67	х x 0445 0078 68	и 0438 69	ј 0458 6A	к 043A 6B	л 043B 6C	м (m) 043C ( 006D ) 6D	н (h) 043D ( 0068 ) 6E	о o 043E 006F 6F
7_	п 043F 70	ќ 045C 71	р p 0440 0070 72	с c 0441 0063 73	т (t) 0442 ( 0074 ) 74	у y 0443 0079 75	в (b) 0432 ( 0062 ) 76	ѓ 0453 77	љ 0459 78	њ 045A 79	з 0437 7A	ћ 045B 7B	ж 0436 7C	ђ 0452 7D	ш 0448 7E	■ 25A0 7F

The two characters 24 _hex ($), 7F _hex (■) and twelve Cyrillic letter pairs are coded differently to GOST 13052 and are arranged as closely as possible to the Latin G0 variant "Serbian / Croatian / Slovenian" (see Cyrillic alphabet, Serbian, Serbo-Croatian and Montenegrin ), whereby the Cyrillic letter Dže (Џ) in position 5F _{hex is} only present as a capital letter.

Instead of the dollar sign ($) in some decoders, the 24 _{hex character} represents the Cyrillic capital letter Jo (Ё) with the Unicode number 0401 _hex or the Latin capital letter E with Trema (Ë) with the Unicode number 00CB _hex .

The coding of the character 2A _hex depends on the control .

The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Cyrillic G2 supplementary character set .

Cyrillic G0 primary character set - variant 2 - Russian / Bulgarian
selection bits : 44
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	! 0021 21	" " 0022 22	# ⋕ 0023 23	$ 0024 24	% 0025 25	ы 044B 26	' ' 0027 27	( 0028 28	) 0029 29	* ∗ \| @ 002A \| 0040 2A	+ 002B 2B	, 002C 2C	- 002D 2D	. 002E 2E	/ 002F 2F
3_	0 0030 30	1 0031 31	2 0032 32	3 0033 33	4th 0034 34	5 0035 35	6th 0036 36	7th 0037 37	8th 0038 38	9 0039 39	: 003A 3A	; 003B 3B	< 003C 3C	= 003D 3D	> 003E 3E	? 003F 3F
4_	Ю 042E 40	А A 0410 0041 41	Б 0411 42	Ц 0426 43	Д 0414 44	Е 0415 45	Ф 0424 46	Г 0413 47	Х X 0425 0058 48	И 0418 49	Й (Ѝ) 0419 ( 040D ) 4A	К 041A 4B	Л 041B 4C	М M 041C 004D 4D	Н H 041D 0048 4E	О O 041E 004F 4F
5_	П 041F 50	Я 042F 51	Р P 0420 0050 52	С C 0421 0043 53	Т T 0422 0054 54	У (Y) 0423 ( 0059 ) 55	Ж 0416 56	В B 0412 0042 57	Ь 042C 58	Ъ 042A 59	З 0417 5A	Ш 0428 5B	Э 042D 5C	Щ 0429 5D	Ч 0427 5E	Ы 042B 5F
6_	ю 044E 60	а a 0430 0061 61	б 0431 62	ц 0446 63	д 0434 64	е 0435 65	ф 0444 66	г 0433 67	х x 0445 0078 68	и 0438 69	й (ѝ) 0439 ( 045D ) 6A	к 043A 6B	л 043B 6C	м (m) 043C ( 006D ) 6D	н (h) 043D ( 0068 ) 6E	о o 043E 006F 6F
7_	п 043F 70	я 044F 71	р p 0440 0070 72	с c 0441 0063 73	т (t) 0442 ( 0074 ) 74	у y 0443 0079 75	ж 0436 76	в (b) 0432 ( 0062 ) 77	ь 044C 78	ъ 044A 79	з 0437 7A	ш 0448 7B	э 044D 7C	щ 0449 7D	ч 0447 7E	■ 25A0 7F

The three characters 24 _hex ($), 26 _hex (ы) and 7F _hex (■) are coded differently to GOST 13052, as well as the two Cyrillic letter pairs at positions 59 _hex / 79 _hex (Ъ / ъ) and 5F _hex / 26 _hex (Ы / ы) swapped according to the Bulgarian variant.

The coding of the character 2A _hex depends on the control .

With the Cyrillic letters short I (Й / й) at positions 4A _hex and 6A _hex , in ETSI EN 300 706 the breve (˘) is probably incorrectly displayed like the superscript (˙). But maybe this was done in order to be able to use it better as a Cyrillic letter I with grave accent (Ѝ / ѝ).

The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Cyrillic G2 supplementary character set .

Cyrillic G0 primary character set - variant 3 - Ukrainian
selection bits : 45
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	! 0021 21	" " 0022 22	# ⋕ 0023 23	$ 0024 24	% 0025 25	ї 0457 26	' ' 0027 27	( 0028 28	) 0029 29	* ∗ \| @ 002A \| 0040 2A	+ 002B 2B	, 002C 2C	- 002D 2D	. 002E 2E	/ 002F 2F
3_	0 0030 30	1 0031 31	2 0032 32	3 0033 33	4th 0034 34	5 0035 35	6th 0036 36	7th 0037 37	8th 0038 38	9 0039 39	: 003A 3A	; 003B 3B	< 003C 3C	= 003D 3D	> 003E 3E	? 003F 3F
4_	Ю 042E 40	А A 0410 0041 41	Б 0411 42	Ц 0426 43	Д 0414 44	Е 0415 45	Ф 0424 46	Г 0413 47	Х X 0425 0058 48	И 0418 49	Й (Ѝ) 0419 ( 040D ) 4A	К 041A 4B	Л 041B 4C	М M 041C 004D 4D	Н H 041D 0048 4E	О O 041E 004F 4F
5_	П 041F 50	Я 042F 51	Р P 0420 0050 52	С C 0421 0043 53	Т T 0422 0054 54	У (Y) 0423 ( 0059 ) 55	Ж 0416 56	В B 0412 0042 57	Ь 042C 58	І 0406 59	З 0417 5A	Ш 0428 5B	Є 0404 5C	Щ 0429 5D	Ч 0427 5E	Ї 0407 5F
6_	ю 044E 60	а a 0430 0061 61	б 0431 62	ц 0446 63	д 0434 64	е 0435 65	ф 0444 66	г 0433 67	х x 0445 0078 68	и 0438 69	й (ѝ) 0439 ( 045D ) 6A	к 043A 6B	л 043B 6C	м (m) 043C ( 006D ) 6D	н (h) 043D ( 0068 ) 6E	о o 043E 006F 6F
7_	п 043F 70	я 044F 71	р p 0440 0070 72	с c 0441 0063 73	т (t) 0442 ( 0074 ) 74	у y 0443 0079 75	ж 0436 76	в (b) 0432 ( 0062 ) 77	ь 044C 78	і 0456 79	з 0437 7A	ш 0448 7B	є 0454 7C	щ 0449 7D	ч 0447 7E	■ 25A0 7F

The three characters 24 _hex ($), 26 _hex (ї), 7F _hex (■) and three Cyrillic letter pairs are coded differently from GOST 13052.

The coding of the character 2A _hex depends on the control .

With the Cyrillic letters short I (Й / й) at positions 4A _hex and 6A _hex , in ETSI EN 300 706 the breve (˘) is probably incorrectly displayed like the superscript (˙). But maybe this was done in order to be able to use it better as a Cyrillic letter I with grave accent (Ѝ / ѝ).

The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Cyrillic G2 supplementary character set .

Cyrillic G2 supplementary character set
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	¡ 00A1 21	¢ 00A2 22	£ 00A3 23	$ 0024 24	¥ 00A5 25	26th	§ 00A7 27	28	' 2018 29	" 201C 2A	« 00AB 2B	← 2190 2C	↑ 2191 2D	→ 2192 2E	↓ 2193 2F
3_	° 00B0 30	± 00B1 31	² 00B2 32	³ 00B3 33	× 00D7 34	µ 00B5 35	¶ 00B6 36	· 00B7 37	÷ 00F7 38	' 2019 39	” 201D 3A	» 00BB 3B	¼ 00BC 3C	½ 00BD 3D	¾ 00BE 3E	¿ 00BF 3F
4_	40	` 0060 41	´ 00B4 42	ˆ 02C6 43	˜ 02DC 44	¯ ˉ 00AF 45	˘ 02D8 46	˙ 02D9 47	¨ 00A8 48	̣ N / A 49	˚ 02DA 4A	¸ (̦) 00B8 ( N / A ) 4B	_ 005F 4C	˝ 02DD 4D	˛ 02DB 4E	ˇ 02C7 4F
Comb.	40	O 0300 41	ó (ģ) 0301 ( 0327 ) 42	O 0302 43	O 0303 44	O 0304 45	O 0306 46	ȯ 0307 47	ö 0308 48	O 0323 49	å 030A 4A	ç (o̦) 0327 ( 0326 ) 4B	O 0332 4C	O 030B 4D	ǫ 0328 4E	ǒ 030C 4F
5_	- 2015 50	¹ 00B9 51	® 00AE 52	© 00A9 53	™ 2122 54	♪ 266A 55	₠ 20A0 56	‰ 2030 57	∝ 221D 58	Ł 0141 59	ł 0142 5A	ß 00DF 5B	⅛ 215B 5C	⅜ 215C 5D	⅝ 215D 5E	⅞ 215E 5F
6_	D. 0044 60	E. 0045 61	F. 0046 62	G 0047 63	I І 0049 0406 64	J Ј 004A 0408 65	K 004B 66	L. 004C 67	N 004E 68	Q 0051 69	R. 0052 6A	S Ѕ 0053 0405 6B	U 0055 6C	V 0056 6D	W. 0057 6E	Z 005A 6F
7_	d 0064 70	e 0065 71	f 0066 72	G 0067 73	i і 0069 0456 74	j ј 006A 0458 75	k 006B 76	l 006C 77	n 006E 78	q 0071 79	r 0072 7A	s ѕ 0073 0455 7B	u 0075 7C	v 0076 7D	w 0077 7E	z 007A 7F

The characters 20 _hex to 5F _hex are essentially identical to the Latin G2 supplementary character set without the two additional characters from ITU T.61 . The three characters 59 _hex to 5B _hex are coded with special Latin letters.

The characters 60 _hex to 7F _hex are coded with Latin letters which, together with similar looking letters in the Cyrillic G0 primary character sets, each represent the complete Latin alphabet.

The alternative coding of the bold framed characters can be used to supplement the coded Cyrillic alphabet, whereby the two Cyrillic letters Belarusian-Ukrainian I (І / і) and Serbian Je (Ј / ј) at positions 64 _hex / 74 _hex and 65 _hex / 75 _hex already exist in the Cyrillic G0 variant 3 "Ukrainian" or 1 "Serbian / Croatian" .

The alternative coding of the characters in the "Combining" line is used depending on the control . As with the Latin G2 supplementary character set, the combining characters should only be used in conjunction with the Latin G0 primary character set.

Greek

The Greek G0 primary character set is essentially identical to the characters 20 _hex to 3F _hex and C0 _hex to FE _{hex of} the 8-bit character set ELOT 928 (identical to ISO 8859-7 ).

Greek G0 primary character set
Selection bits : 67
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	! 0021 21	" " 0022 22	# ⋕ 0023 23	$ 0024 24	% 0025 25	& 0026 26	' ' 0027 27	( 0028 28	) 0029 29	* ∗ \| @ 002A \| 0040 2A	+ 002B 2B	, 002C 2C	- 002D 2D	. 002E 2E	/ 002F 2F
3_	0 0030 30	1 0031 31	2 0032 32	3 0033 33	4th 0034 34	5 0035 35	6th 0036 36	7th 0037 37	8th 0038 38	9 0039 39	: 003A 3A	; 003B 3B	« 00AB 3C	= 003D 3D	» 00BB 3E	? 003F 3F
4_	ΐ 0390 40	Α A 0391 0041 41	Β B 0392 0042 42	Γ 0393 43	Δ 0394 44	Ε E 0395 0045 45	Ζ 0396 46	Η H 0397 0048 47	Θ 0398 48	Ι I 0399 0049 49	Κ K 039A 004B 4A	Λ 039B 4B	Μ M 039C 004D 4C	Ν N 039D 004E 4D	Ξ 039E 4E	Ο O 039F 004F 4F
5_	Π 03A0 50	Ρ P 03A1 0050 51	΄ 0384 52	Σ 03A3 53	Τ T 03A4 0054 54	Υ 03A5 55	Φ 03A6 56	Χ X 03A7 0058 57	Ψ 03A8 58	Ω 03A9 59	Ϊ 03AA 5A	Ϋ 03AB 5B	ά 03AC 5C	έ 03AD 5D	ή 03AE 5E	ί 03AF 5F
6_	ΰ 03B0 60	α 03B1 61	β 03B2 62	γ 03B3 63	δ 03B4 64	ε 03B5 65	ζ 03B6 66	η 03B7 67	θ 03B8 68	ι 03B9 69	κ 03BA 6A	λ 03BB 6B	μ 03BC 6C	ν 03BD 6D	ξ 03BE 6E	ο o 03BF 006F 6F
7_	π 03C0 70	ρ 03C1 71	ς 03C2 72	σ 03C3 73	τ 03C4 74	υ 03C5 75	φ 03C6 76	χ 03C7 77	ψ 03C8 78	ω 03C9 79	ϊ 03CA 7A	ϋ 03CB 7B	ό 03CC 7C	ύ 03CD 7D	ώ 03CE 7E	■ 25A0 7F

The four characters 3C _hex («), 3E _hex (»), 52 _hex (΄) and 7F _hex (■) are coded differently to ELOT 928.

The coding of the character 2A _hex depends on the control .

The single tone (΄) at position 52 _hex is shown in ETSI EN 300 706 in the example layout, right-justified, so that it is correctly positioned for a subsequent capital letter. This also results in sufficient space for word separation.

In ETSI EN 300 706, for historical reasons, the tonos (΄) is a single character at position 52 _hex and in the Greek lowercase letters with dialysis and tonos (΅) in positions 40 _hex and 60 _hex vertically ('), as well as in the Greek Lowercase letters with tones in positions 5C _hex to 5F _hex and 7C _hex to 7E _hex as shown by the over- point (˙).

The Greek small letter Iota (ι) at position 69 _hex , as well as with diacritics (ΐ, ί and ϊ) at positions 40 _hex , 5F _hex and 7A _hex is in ETSI EN 300 706 imprecise like the Latin small letter i with serifs ( ı ) shown.

The variant for the end of the word of the Greek lowercase letter Sigma (ς) at position 72 _hex is shown in ETSI EN 300 706 inexactly like the Latin lowercase letter s.

The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Greek G2 supplementary character set .

Greek G2 supplementary character set
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	a 0061 21	b 0062 22	£ 00A3 23	e 0065 24	H 0068 25	i 0069 26	§ 00A7 27	: 003A 28	' 2018 29	" 201C 2A	k 006B 2B	← 2190 2C	↑ 2191 2D	→ 2192 2E	↓ 2193 2F
3_	° 00B0 30	± 00B1 31	² 00B2 32	³ 00B3 33	× 00D7 34	m 006D 35	n 006E 36	p 0070 37	÷ 00F7 38	' 2019 39	” 201D 3A	t 0074 3B	¼ 00BC 3C	½ 00BD 3D	¾ 00BE 3E	x 0078 3F
4_	40	` 0060 41	´ 00B4 42	ˆ 02C6 43	˜ 02DC 44	¯ ˉ 00AF 45	˘ 02D8 46	˙ 02D9 47	¨ 00A8 48	̣ N / A 49	˚ 02DA 4A	¸ (̦) 00B8 ( N / A ) 4B	_ 005F 4C	˝ 02DD 4D	˛ 02DB 4E	ˇ 02C7 4F
Comb.	40	O 0300 41	ó (ģ) 0301 ( 0327 ) 42	O 0302 43	O 0303 44	O 0304 45	O 0306 46	ȯ 0307 47	ö 0308 48	O 0323 49	å 030A 4A	ç (o̦) 0327 ( 0326 ) 4B	O 0332 4C	O 030B 4D	ǫ 0328 4E	ǒ 030C 4F
5_	? 003F 50	¹ 00B9 51	® 00AE 52	© 00A9 53	™ 2122 54	♪ 266A 55	₠ 20A0 56	‰ 2030 57	∝ 221D 58	Ί 038A 59	Ύ 038E 5A	Ώ 038F 5B	⅛ 215B 5C	⅜ 215C 5D	⅝ 215D 5E	⅞ 215E 5F
6_	C. 0043 60	D. 0044 61	F. 0046 62	G 0047 63	J 004A 64	L. 004C 65	Q 0051 66	R. 0052 67	S. 0053 68	U 0055 69	V 0056 6A	W. 0057 6B	Y 0059 6C	Z 005A 6D	Ά 0386 6E	Ή 0389 6F
7_	c 0063 70	d 0064 71	f 0066 72	G 0067 73	j 006A 74	l 006C 75	q 0071 76	r 0072 77	s 0073 78	u 0075 79	v 0076 7A	w 0077 7B	y 0079 7C	z 007A 7D	Έ 0388 7E	■ 25A0 7F

The characters 20 _hex to 5F _hex and 7F _hex are largely identical to the Latin G2 supplementary character set without the two additional characters from ITU T.61 . The three characters 59 _hex to 5B _hex are coded with special Greek letters, and a further eleven characters with Latin lower case letters. In addition, the two characters 28 _hex and 50 _{hex are} coded differently as a colon (:) and question mark (?), Although these are already included in the Greek G0 primary character set . This may have historical reasons, because these two characters are not available in the 7-bit ISO-IR-27 character set.

The characters 60 _hex to 7E _hex are coded with Latin letters and special Greek letters. The Latin letters together with similar looking letters in the Greek G0 primary character set form the complete Latin alphabet.

For the Greek capital letters with tonos in positions 59 _hex to 5B _hex , 6E _hex , 6F _hex and 7E _hex , the tonos (΄) is shown vertically (') in ETSI EN 300 706 for historical reasons.

The alternative coding of the characters in the "Combining" line is used depending on the control . As with the Latin G2 supplementary character set, the combining characters should only be used in conjunction with the Latin G0 primary character set.

Arabic

The Arabic G0 primary character set is largely identical to the 7-bit character set ASMO 449 (adopted in ISO 8859-6 ), whereby the Latin G0 variant "English" is used for the special characters and the Arabic letters are shown with their presentation forms. Five special letters have been moved to the Arabic G2 supplementary character set , which also contains additional letters for Persian.

The Arabic letters with multiple codings and an optional connection to the right are shown in ETSI EN 300 706 on the right without their own connecting line and are accordingly coded primarily as an initial or isolated form of presentation. Deviating from this, the three Arabic letters of the " Ǧīm " family (ﺝ, ﺡ and ﺥ) at positions 4C _hex to 4E _hex in the Arabic G0 primary character set are more likely to be presented as a medial form of presentation (with a straight baseline), but still primary Coded as the initial form of presentation, as the medial forms of presentation (without a straight base line) are also available at positions 5C _hex to 5E _hex in the Arabic G0 primary character set (see also the note on the table ).

In addition, the Arabic letter Yāʾ (ﻱ) at position 27 _hex in the Arabic G0 primary character set and with Hamza above (ﺉ) at position 27 _hex in the Arabic G2 supplementary character set is more of a final form of presentation and is therefore primarily coded as the isolated form of presentation does not optically allow a correct connection to the right.

The Arabic letters with several codings and an optional connection to the left are shown in ETSI EN 300 706 on the left with a connecting line and accordingly primarily coded as an initial form of presentation. In contrast to this, the four Arabic letters of the " Sīn " family (ﺱ, ﺵ, ﺹ and ﺽ) at positions 53 _hex to 56 _hex in the Arabic G0 primary character set are shown on the left without a terminator or their own connecting line and must each have a second Characters are completed (see note on the table ).

In the case of Arabic letters with several Unicode numbers, when outputting in Unicode, either the appropriate Unicode number must be selected according to the two neighboring characters on the left and right or, in the simplest case, the first Unicode number must be used. A bold unicode number stands for the actual character. If the actual characters are used instead of the presentation forms for the output in Unicode, then the non-width non-connector (ZWNJ) with the Unicode number 200C _hex or the non- width connector (ZWJ) with the Unicode number 200D _hex may have to be inserted in order to enable the automatic selection of the To restrict glyphs to the possible forms of presentation of the respective characters.

The Arabic script is written from right to left , but the arrangement in teletext is from left to right as usual. For this reason, when outputting in Unicode, either the Unicode Bidi algorithm must be used backwards or, in the simplest case, the bidirectional control character left-to-right-forced (LRO) with the Unicode number 202D _hex must be placed in front of each line.

Arabic G0 primary character set
Selection bits : 87 or A7
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	! 0021 21	" " 0022 22	£ 00A3 23	$ 0024 24	% 0025 25	ں FE73 26	ﻲ ﻱ FEF2 FEF1 064A 27	) 0029 28	( 0028 29	* ∗ \| @ 002A \| 0040 2A	+ 002B 2B	, , 060C 002C 2C	- 002D 2D	. 002E 2E	/ 002F 2F
3_	0 0030 30	1 0031 31	2 0032 32	3 0033 33	4th 0034 34	5 0035 35	6th 0036 36	7th 0037 37	8th 0038 38	9 0039 39	: 003A 3A	؛ 061B 3B	> 003E 3C	= 003D 3D	< 003C 3E	؟ 061F 3F
4_	ﺔ FE94 0629 40	ﺀ FE80 0621 41	ﺒ FE92 0628 42	ﺏ ﺐ FE8F FE90 0628 43	ﺘ FE98 062A 44	ﺕ ﺖ FE95 FE96 062A 45	ﺎ FE8E 0627 46	ﺍ FE8D 0627 47	ﺑ FE91 0628 48	ﺓ FE93 0629 49	ﺗ FE97 062A 4A	ﺛ FE9B 062B 4B	ﺟ ﺠ ﺟ ﺠ FE9F FEA0 062C 4C	ﺣ ﺤ ﺣ ﺤ FEA3 FEA4 062D 4D	ﺧ ﺨ ﺧ ﺨ FEA7 FEA8 062E 4E	ﺩ ﺪ FEA9 FEAA 062F 4F
5_	ﺫ ﺬ FEAB FEAC 0630 50	ﺭ ﺮ FEAD FEAE 0631 51	ﺯ ﺰ FEAF FEB0 0632 52	ﺳ ﺴ (ﺱ ﺲ) FEB3 FEB4 ( FEB1 FEB2 ) 0633 53	ﺷ ﺸ (ﺵ ﺶ) FEB7 FEB8 ( FEB5 FEB6 ) 0634 54	ﺻ ﺼ (ﺹ ﺺ) FEBB FEBC ( FEB9 FEBA ) 0635 55	ﺿ ﻀ (ﺽ ﺾ) FEBF FEC0 ( FEBD FEBE ) 0636 56	ﻃ ﻁ ﻂ ﻄ FEC3 FEC1 FEC2 FEC4 0637 57	ﻇ ﻅ ﻆ ﻈ FEC7 FEC5 FEC6 FEC8 0638 58	ﻋ FECB 0639 59	ﻏ FECF 063A 5A	ﺜ FE9C 062B 5B	ﺠ ﺠ FEA0 062C 5C	ﺤ ﺤ FEA4 062D 5D	ﺨ ﺨ FEA8 062E 5E	# ⋕ 0023 5F
6_	ـ 0640 60	ﻓ FED3 0641 61	ﻗ FED7 0642 62	ﻛ ﻜ FEDB FEDC 0643 63	ﻟ FEDF 0644 64	ﻣ FEE3 0645 65	ﻧ FEE7 0646 66	ﻫ FEEB 0647 67	ﻭ ﻮ FEED FEEE 0648 68	ﻰ FEF0 0649 69	ﻳ FEF3 064A 6A	ﺙ ﺚ FE99 FE9A 062B 6B	ﺝ ﺞ FE9D FE9E 062C 6C	ﺡ ﺢ FEA1 FEA2 062D 6D	ﺥ ﺦ FEA5 FEA6 062E 6E	ﻴ FEF4 064A 6F
Pers.	ﯼ FBFC 06CC 70			ﮐ ﮎ ﮏ ﮑ FB90 FB8E FB8F FB91 06A9 63						ﯽ FBFD 06CC 69	ﯾ FBFE 06CC 6A					ﯿ FBFF 06CC 6F
7_	ﻯ FEEF 0649 70	ﻌ FECC 0639 71	ﻐ FED0 063A 72	ﻔ FED4 0641 73	ﻑ ﻒ FED1 FED2 0641 74	ﻘ FED8 0642 75	ﻕ ﻖ FED5 FED6 0642 76	ﻙ ﻚ FED9 FEDA 0643 77	ﻠ FEE0 0644 78	ﻝ ﻞ FEDD FEDE 0644 79	ﻤ FEE4 0645 7A	ﻡ ﻢ FEE1 FEE2 0645 7B	ﻨ FEE8 0646 7C	ﻥ ﻦ FEE5 FEE6 0646 7D	ﻻ FEFB 7E	■ 25A0 7F

The two characters 26 _hex () and 27 _hex (ﻱ) are coded differently to ASMO 449 . In addition, five special letters and almost all special characters in positions 40 _hex to 7E _{hex have been} replaced by other forms of presentation of the coded Arabic letters.

The character 26 _hex () serves as the final part for the isolated and final forms of presentation of the four Arabic letters of the " Sīn " family (ﺱ, ﺵ, ﺹ and ﺽ) at positions 53 _hex to 56 _hex .

The two round brackets (“)” and “(”) at positions 28 _hex and 29 _hex , as well as the two comparison characters (> and <) at positions 3C _hex and 3E _hex are coded clockwise as in the other character sets , since the All characters in teletext are always arranged from left to right.

The coding of the character 2A _hex depends on the control .

The Arabic comma (،) at the 2C _hex position is shown in ETSI EN 300 706 in the example layout so that it can also be used optically as a normal comma (,).

The combined initial and medial presentation forms of the three Arabic letters of the " Ǧīm " family ( ﺟ / ﺠ , ﺣ / ﺤ and ﺧ / ﺨ ) at positions 4C _hex to 4E _hex are in ETSI EN 300 706 suitable for the initial and medial Presentation forms of the Persian letter Che ( ﭼ / ﭽ ) at positions 28 _hex and 29 _hex in the Arabic G2 supplementary character set shown with a straight base line. However, the coding as media presentation forms are identical to the media presentation forms without a straight base line ( ﺠ , ﺤ and an ) at positions 5C _hex to 5E _hex , since this is only a layout variation. The same applies to the use as initial forms of presentation, although there are no separate characters for the layout variation without a straight baseline ( ﺟ , ﺣ and ﺧ ).

The four Arabic letters of the " Sīn " family (ﺱ, ﺵ, ﺹ and ﺽ) at positions 53 _hex to 56 _hex are shown on the left without any termination or their own connecting line and each must be completed with a second character. When used as an isolated or final form of presentation, the end piece () must be added to the left at position 26 _hex . When used as an initial or medial form of presentation, the modifying character Taṭwīl (ـ) must be added to the left at position 60 _hex if the left neighbor does not have its own connecting line to the right or if it is very short.

The alternative coding (with identical layout) of the letters in the line "Persian" serves to complete the Persian letters coded in the Arabic G2 supplementary character set.

Arabic G2 supplementary character set
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	ﻉ FEC9 0639 21	ﺁ (ﺂ) FE81 ( FE82 ) 0622 22	ﺃ (ﺄ) FE83 ( FE84 ) 0623 23	ﺅ ﺆ FE85 FE86 0624 24	ﺇ (ﺈ) FE87 ( FE88 ) 0625 25	ﺋ FE8B 0626 26	ﺊ ﺉ FE8A FE89 0626 27	ﭼ ﭼ FB7C 0686 28	ﭽ ﭽ FB7D 0686 29	ﭺ ﭻ FB7A FB7B 0686 2A	ﭘ FB58 067E 2B	ﭙ FB59 067E 2C	ﭖ ﭗ FB56 FB57 067E 2D	ﮊ ﮋ FB8A FB8B 0698 2E	ﮔ ﮒ ﮓ ﮕ FB94 FB92 FB93 FB95 06AF 2F
3_	٠ 0660 30	١ 0661 31	٢ 0662 32	٣ 0663 33	٤ 0664 34	٥ 0665 35	٦ 0666 36	٧ 0667 37	٨ 0668 38	٩ 0669 39	ﻎ FECE 063A 3A	ﻍ FECD 063A 3B	ﻼ FEFC 3C	ﻬ FEEC 0647 3D	ﻪ FEEA 0647 3E	ﻩ FEE9 0647 3F
4_	à 00E0 40	A. 0041 41	B. 0042 42	C. 0043 43	D. 0044 44	E. 0045 45	F. 0046 46	G 0047 47	H 0048 48	I. 0049 49	J 004A 4A	K 004B 4B	L. 004C 4C	M. 004D 4D	N 004E 4E	O 004F 4F
5_	P 0050 50	Q 0051 51	R. 0052 52	S. 0053 53	T 0054 54	U 0055 55	V 0056 56	W. 0057 57	X 0058 58	Y 0059 59	Z 005A 5A	ë 00EB 5B	ê 00EA 5C	ù 00F9 5D	î 00EE 5E	ﻊ FECA 0639 5F
6_	é 00E9 60	a 0061 61	b 0062 62	c 0063 63	d 0064 64	e 0065 65	f 0066 66	G 0067 67	H 0068 68	i 0069 69	j 006A 6A	k 006B 6B	l 006C 6C	m 006D 6D	n 006E 6E	O 006F 6F
7_	p 0070 70	q 0071 71	r 0072 72	s 0073 73	t 0074 74	u 0075 75	v 0076 76	w 0077 77	x 0078 78	y 0079 79	z 007A 7A	â 00E2 7B	O 00F4 7C	û 00FB 7D	ç 00E7 7E	7F

The character set is partially identical to the Latin G0 primary character set . The digits are coded differently with their Arabic-Indian variants. In addition, all special characters have been replaced by presentation forms of Arabic letters and modified Latin lowercase letters to spell French (see Windows-1256 ).

The alternative coding of the characters framed in bold is necessary to complete all forms of presentation of the coded Arabic letters.

Hebrew

The Hebrew G0 primary character set is essentially identical to the 7-bit character set SI 960 (adopted in ISO 8859-8 ), whereby the Latin G0 variant "English" is used for the special characters . A Hebrew G2 supplementary character set is not defined; the Arabic G2 supplementary character set is used.

The Hebrew script is written from right to left , but the arrangement in teletext is from left to right as usual. For this reason, when outputting in Unicode, either the Unicode Bidi algorithm must be used backwards or, in the simplest case, the bidirectional control character left-to-right-forced (LRO) with the Unicode number 202D _hex must be placed in front of each line.

Hebrew G0 primary character set
selection bits : A5
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	␠ 0020 20	! 0021 21	" " 0022 22	£ 00A3 23	$ 0024 24	% 0025 25	& 0026 26	' ' 0027 27	( 0028 28	) 0029 29	* ∗ \| @ 002A \| 0040 2A	+ 002B 2B	, 002C 2C	- 002D 2D	. 002E 2E	/ 002F 2F
3_	0 0030 30	1 0031 31	2 0032 32	3 0033 33	4th 0034 34	5 0035 35	6th 0036 36	7th 0037 37	8th 0038 38	9 0039 39	: 003A 3A	; 003B 3B	< 003C 3C	= 003D 3D	> 003E 3E	? 003F 3F
4_	@ 0040 40	A. 0041 41	B. 0042 42	C. 0043 43	D. 0044 44	E. 0045 45	F. 0046 46	G 0047 47	H 0048 48	I. 0049 49	J 004A 4A	K 004B 4B	L. 004C 4C	M. 004D 4D	N 004E 4E	O 004F 4F
5_	P 0050 50	Q 0051 51	R. 0052 52	S. 0053 53	T 0054 54	U 0055 55	V 0056 56	W. 0057 57	X 0058 58	Y 0059 59	Z 005A 5A	← 2190 5B	½ 00BD 5C	→ 2192 5D	↑ 2191 5E	# ⋕ 0023 5F
6_	א 05D0 60	ב 05D1 61	ג 05D2 62	ד 05D3 63	ה 05D4 64	ו 05D5 65	ז 05D6 66	ח 05D7 67	ט 05D8 68	י 05D9 69	ך 05DA 6A	כ 05DB 6B	ל 05DC 6C	ם 05DD 6D	מ 05DE 6E	ן 05DF 6F
7_	נ 05E0 70	ס 05E1 71	ע 05E2 72	ף 05E3 73	פ 05E4 74	ץ 05E5 75	צ 05E6 76	ק 05E7 77	ר 05E8 78	ש 05E9 79	ת 05EA 7A	₪ 20AA 7B	∥ 2225 7C	¾ 00BE 7D	÷ 00F7 7E	■ 25A0 7F

In contrast to SI 960, the 7B _hex ( Zeichen ) character is coded as a shekel currency symbol (see Windows-1255 ).

The coding of the character 2A _hex depends on the control .

graphic

The characters with a 6-digit Unicode number (01FBxx _hex ) will only be included in a future version of Unicode and may still change.

With normal teletext in 4: 3 format , the ratio of width to height of a character is 4: 5. This must be observed for the justified display of a graphic.

Since the exact layout of the Unicode characters is heavily dependent on the font and these do not always match, you should draw all graphic characters yourself if necessary.

G1 character set block graphics

_0

_1

_2

_3

_4

_5

_6

_7

_8th

_9

_A

_B

_C

_D

_E

_F

2_

␠

0020
20

█▌

01FB00
21

▐█

01FB01
22

███

01FB02
23

█▌

01FB03
24

█▌
█▌

01FB04
25

▐█
█▌

01FB05
26

███
█▌

01FB06
27

▐█

01FB07
28

█▌
▐█

01FB08
29

▐█
▐█

01FB09
2A

███
▐█

01FB0A
2B

███

01FB0B
2C

█▌
███

01FB0C
2D

▐█
███

01FB0D
2E

███
███

01FB0E
2F

3_

█▌

01FB0F
30

█▌

█▌

01FB10
31

▐█

█▌

01FB11
32

███

█▌

01FB12
33

█▌
█▌

01FB13
34

▌

258C
35

▐█
█▌
█▌

01FB14
36

███
█▌
█▌

01FB15
37

▐█
█▌

01FB16
38

█▌
▐█
█▌

01FB17
39

▐█
▐█
█▌

01FB18
3A

███
▐█
█▌

01FB19
3B

███
█▌

01FB1A
3C

█▌
███
█▌

01FB1B
3D

▐█
███
█▌

01FB1C
3E

███
███
█▌

01FB1D
3F

4_

[G0]

40

[G0]

41

[G0]

42

[G0]

43

[G0]

44

[G0]

45

[G0]

46

[G0]

47

[G0]

48

[G0]

49

[G0]

4A

[G0]

4B

[G0]

4C

[G0]

4D

[G0]

4E

[G0]

4F

5_

[G0]

50

[G0]

51

[G0]

52

[G0]

53

[G0]

54

[G0]

55

[G0]

56

[G0]

57

[G0]

58

[G0]

59

[G0]

5A

[G0]

5B

[G0]

5C

[G0]

5D

[G0]

5E

[G0]

5F

6_

▐█

01FB1E
60

█▌

▐█

01FB1F
61

▐█

▐█

01FB20
62

███

▐█

01FB21
63

█▌
▐█

01FB22
64

█▌
█▌
▐█

01FB23
65

▐█
█▌
▐█

01FB24
66

███
█▌
▐█

01FB25
67

▐█
▐█

01FB26
68

█▌
▐█
▐█

01FB27
69

▐

2590
6A

███
▐█
▐█

01FB28
6B

███
▐█

01FB29
6C

█▌
███
▐█

01FB2A
6D

▐█
███
▐█

01FB2B
6E

███
███
▐█

01FB2C
6F

7_

███

01FB2D
70

█▌

███

01FB2E
71

▐█

███

01FB2F
72

███

███

01FB30
73

█▌
███

01FB31
74

█▌
█▌
███

01FB32
75

▐█
█▌
███

01FB33
76

███
█▌
███

01FB34
77

▐█
███

01FB35
78

█▌
▐█
███

01FB36
79

▐█
▐█
███

01FB37
7A

███
▐█
███

01FB38
7B

███
███

01FB39
7C

█▌
███
███

01FB3A
7D

▐█
███
███

01FB3B
7E

█

2588
7F

The graphic space at position 20 _hex is as wide as the block elements at positions 21 _hex to 3F _hex and 60 _hex to 7F _hex and can be coded as normal or protected spaces , as they are just as wide in a font with a fixed character width are. However, encoding as a separate character similar to the digit space with the Unicode number 2007 _hex would be better, which is not available in Unicode . The attribute "Separate block graphic / underline " has no effect on the graphic space.

The 63 block elements at the positions 21 _hex to 3F _hex and 60 _hex to 7F _hex be dependent on the corresponding attribute as shown in contiguous or alternatively as to the right of the full block (█) at the position 7F _hex illustrated in separate form. In the split shape, the six rectangular blocks that make up these graphic characters are smaller and not connected to each other. The separated forms are not defined as independent characters in Unicode .

The corresponding characters of the selected G0 primary character set are used for the 32 positions 40 _hex to 5F _hex .

G3 character set High resolution graphics
	_0	_1	_2	_3	_4	_5	_6	_7	_8th	_9	_A	_B	_C	_D	_E	_F
2_	? 01FB3C 20	? 01FB3D 21	? 01FB3E 22	? 01FB3F 23	? 01FB40 24	◣ ( 25E3 ) 25	? 01FB41 26	? 01FB42 27	? 01FB43 28	? 01FB44 29	? 01FB45 2A	? 01FB46 2B	? 01FB68 2C	? 01FB69 2D	▐ ▐ ▐ ( 01FB70 ) ( 01FB71 ) 2E	▒ 2592 2F
3_	? 01FB47 30	? 01FB48 31	? 01FB49 32	? 01FB4A 33	? 01FB4B 34	◢ ( 25E2 ) 35	? 01FB4C 36	? 01FB4D 37	? 01FB4E 38	? 01FB4F 39	? 01FB50 3A	? 01FB51 3B	? 01FB6A 3C	? 01FB6B 3D	▌ ▌ ▌ ( 01FB75 ) ( 01FB74 ) 3E	█ 2588 3F
4_	▌ ███ ( 2537 ) 40	███ ▌ ( 252F ) 41	▌ ██ ▌ ( 251D ) 42	▌ █▌ ▌ ( 2525 ) 43	? 01FBA4 44	? 01FBA5 45	? 01FBA6 46	? 01FBA7 47	? 01FBA0 48	? 01FBA1 49	? 01FBA2 4A	? 01FBA3 4B	▌ ███ ▌ ( 253F ) 4C	⚫ 26AB 4D	⬤ 2B24 4E	◯ 25EF 4F
5_	│ 2502 50	─ \| - 2500 \| 2015 51	┌ 250C 52	┐ 2510 53	└ 2514 54	┘ 2518 55	├ 251C 56	┤ 2524 57	┬ 252C 58	┴ 2534 59	┼ 253C 5A	⭢ \| → 2B62 \| 2192 5B	⭠ \| ← 2B60 \| 2190 5C	⭡ \| ↑ 2B61 \| 2191 5D	⭣ 2B63 5E	␠ 0020 5F
6_	? 01FB52 60	? 01FB53 61	? 01FB54 62	? 01FB55 63	? 01FB56 64	◥ ( 25E5 ) 65	? 01FB57 66	? 01FB58 67	? 01FB59 68	? 01FB5A 69	? 01FB5B 6A	? 01FB5C 6B	? 01FB6C 6C	? 01FB6D 6D	6E	6F
7_	? 01FB5D 70	? 01FB5E 71	? 01FB5F 72	? 01FB60 73	? 01FB61 74	◤ ( 25E4 ) 75	? 01FB62 76	? 01FB63 77	? 01FB64 78	? 01FB65 79	? 01FB66 7A	? 01FB67 7B	? 01FB6E 7C	? 01FB6F 7D	7E	7F

The 57 smoothed block elements at the positions 20 _hex to 2D _hex , 30 _hex to 3D _hex , 3F _hex , 60 _hex to 6D _hex and 70 _hex to 7D _hex are in some decoders depending on the associated attribute as shown in contiguous or alternatively like the block elements shown in separate form in the G1 block graphic character set (see ITU T.101 ). The separated forms are not defined as independent characters in Unicode .

In the case of the four triangles at positions 25 _hex , 35 _hex , 65 _hex and 75 _hex , the alternatively coded Unicode characters are not graphic elements that connect the teletext characters , but rather geometric shapes aligned on the baseline , each on all four sides of space are surrounded.

The left thin vertical frame line ( │ ) at position 2E _hex is centered horizontally in relation to the left half block (▌) at position 35 _hex in the G1 block graphic character set . The alternatively coded Unicode characters, on the other hand, are not lines, but vertical eighth blocks to the left and right of the line position.

The right thin vertical frame line ( │ ) at position 3E _hex is centered horizontally in relation to the right half block (▐) at position 6A _hex in the G1 block graphic character set . The alternatively coded Unicode characters, however, are not lines, but vertical eighth blocks to the right and left of the line position.

For the five frame elements at positions 40 _hex to 43 _hex and 4C _hex , the thick horizontal line corresponds to the middle horizontal third block (?) at position 2C _hex in the G1 block graphic character set . With the alternatively coded Unicode characters, on the other hand, the thick horizontal line corresponds to the thick horizontal frame line (━) with the Unicode number 2501 _hex , which is significantly thinner.

The following three circles do not have a fixed Unicode assignment and are coded based on Unicode Technical Report # 25. The exact layout of the Unicode characters depends heavily on the font, if they are supported at all. For the two large circles in full block width, at least in a font with a fixed character width, the largest Unicode circles should fit best, and even in the proportional font "Arial Unicode MS" the large circle line ( ◯ ) with the Unicode number 25EF _{hex is the} same wide as the full block ( █ ) at position 3F _hex .

The filled small circle ( ⚫ ) at position 4D _hex is the same size as the sixth block (?) at position 24 _hex in the G1 block graphic character set and is centered.

The filled in large circle ( ⬤ ) at position 4E _hex and the large circle line ( ◯ ) at position 4F _hex are each as wide as the full block (█) at position 3F _hex and vertically centered.

The two arrows to the right (⭢) and left (⭠) at positions 5B _hex and 5C _hex match the thin horizontal frame lines (─) of the characters 51 _hex to 5A _hex and can be seamlessly connected to these at the beginning. These characters are shown in ETSI EN 300 706 in the example layout with a thicker line width than the three characters with a similar layout (→, ← and -) at positions 5D _hex , 5B _hex and 60 _hex in the Latin G0 variant "English" and at positions 2E _hex , 2C _hex and 50 _hex in the Latin G2 supplementary character set and should not be mixed together.

The two arrows up (⭡) and down (⭣) at the positions 5D _hex and 5E _hex match the thin vertical frame lines (│) of the characters 40 _hex to 4C _hex and 50 _hex to 5A _hex and can start with these be seamlessly connected.

The graphic space at position 5F _hex is identical to the graphic space at position 20 _hex in the G1 block graphic character set and should therefore be coded identically.

The characters with the Unicode number in brackets are similar to the example layouts given in ETSI EN 300 706 , but usually do not match the other graphic characters visually and semantically. However, there is no better Unicode encoding for these characters .

Many Level 1.5 decoders only support the four characters framed in bold, so the assumption is that they use characters with a similar layout from the Latin G0 variant "English" and that the characters must be coded alternatively accordingly .

Character set selection

With the selection bits in the national G0 character set tables, the associated G2 character set is usually also selected. The first hexadecimal number indicates the four most significant bits (the region) and the second number the three least significant bits (the national variant).

Selection bits of the national G0 / G2 character sets
	Western European	Central European (Polish)	Turkish (Western European)	Southeast European (Romanian)	Eastern European (Cyrillic)	Greek / Turkish	Arabic	Hebrew / Arabic
	0_	1_	2_	3_	4_	6_	8th_	A_
_0	English	Polish	English		Cyrillic 1 (Serbian / Croatian)		English
_0	Latin G2 00	Latin G2 10	Latin G2 20th		Cyrillic G2 40		Arabic G2 80
_1	German	German	German		German
_1	Latin G2 01	Latin G2 11	Latin G2 21st		Latin G2 41
_2	Swedish / Finnish, Hungarian	Swedish / Finnish, Hungarian	Swedish / Finnish, Hungarian		Estonian
_2	Latin G2 02	Latin G2 12	Latin G2 22nd		Latin G2 42
_3	Italian	Italian	Italian		Latvian / Lithuanian
_3	Latin G2 03	Latin G2 13	Latin G2 23		Latin G2 43
_4	French	French	French		Cyrillic 2 (Russian / Bulgarian)		French
_4	Latin G2 04	Latin G2 14th	Latin G2 24		Cyrillic G2 44		Arabic G2 84
_5	Portuguese / Spanish		Portuguese / Spanish	Serbian / Croatian / Slovenian	Cyrillic 3 (Ukrainian)			Hebrew
_5	Latin G2 05		Latin G2 25th	Latin G2 35	Cyrillic G2 45			Arabic G2 A5
_6	Czech / Slovak	Czech / Slovak	Turkish		Czech / Slovak	Turkish
_6	Latin G2 06	Latin G2 16	Latin G2 26th		Latin G2 46	Latin G2 66
_7				Romanian		Greek	Arabic	Arabic
_7				Latin G2 37		Greek G2 67	Arabic G2 87	Arabic G2 A7

Second G0					English ¹ 4+		English ² 8+	Arabic ³ A +

Notes on the G0 character set:

For the X / 26 selection and all other X / 26 functions for character selection , Latin (with a green background) always uses the "Standard" variant .

Icelandic channels use the Latin G0 variant "Portuguese / Spanish" and the Latin G2 supplementary character set .

Notes on the second G0 character set:

¹With Cyrillic, the second G0 character set for Russian channels must be preset with the Latin variant "English" .

²In Arabic, the second G0 character set for Iranian channels must be preset with the Latin variant "English" .

³In Hebrew, the second G0 character set for Israeli channels must be preset with “ Arabic ”.

Selection of the national G0 / G2 character sets
		1 = highest	superior	inferior	default	Second G0	X / 26 selection	default	default	X / 26 selection
	Level	priority	Selection bits for standard G0 / G2		G0 character set			G1 character set	G2 character set
X / 0 (page header)	all	8th	Decoder ¹	Page header	●	○ ²			○ ³ (from level 1.5)
X / 28/1	≤ 1.5 ⁴	4th	package	Page header	●	○ ⁵		●	○ ⁵ (from level 1.5)
M / 29/1	≤ 1.5 ⁴	7th	package	Page header	●	○ ⁵		●	○ ⁵ (from level 1.5)
X / 28/0 format 1	≥ 2.5	2	package	Page header (with some Level 2.5 decoders from the package)	●	●			●
X / 28/4	≥ 3.5	3	package	Page header	●	●			●
M / 29/0	≥ 2.5	5	package	Page header (with some Level 2.5 decoders from the package)	●	●			●
M / 29/4	≥ 3.5	6th	package	Page header	●	●			●
X / 26 column function…… 08 _hex "Modified G0 and G2 Character Set"	≥ 2.5	1					● ⁶^,⁷			● ⁷

Presettings for each Teletext page:

¹The more significant selection bits for the standard G0 / G2 character sets depend on the decoder and the region set there. From level 2.5, the neutral default setting is 0 (Western European) - Latin .

²The selection of the second G0 character set depends on the decoder and the region set there. Whether the selection of the standard G0 character set should have an influence on the second G0 character set at this point is not specified, but is necessary.

³With many Level 1.5 decoders, the selection and the character set of the G2 character set are limited. Whether the selection of the standard G0 character set should have any influence on the G2 character set at this point is not specified, but it would make sense. However, this question only arises for the two higher-value selection bits 4 (Eastern European, Cyrillic) and 6 (Greek / Turkish) , where more than one G2 character set is defined in each case.

Notes on packages X / 28/1 and M / 29/1:

^4thThe character set selection functions in these packages are defined in earlier specifications and have been retained for compatibility with corresponding Level 1 and Level 1.5 decoders. They are not intended for use by Level 2.5 and Level 3.5 decoders.

⁵ It is not known whether the selection of the standard G0 character set should have an effect on the second G0 character set and the G2 character set, but it would make sense.

Notes on the X / 26 selection:

^6thWith the X / 26 selection, the Latin variant "Standard" is always used.

^7th At level 2.5, in addition to the standard G0 / G2 character set pair, only one additional G0 / G2 character set pair is possible for each teletext page, from level 3.5 any number.

Choice of characters
		Control characters 00 _hex ..1F _hex	default	Second G0	X / 26 selection	Character 2A _hex	Latin variant	Standard ^a	default	X / 26 selection	Standard ^b
	Level	Control characters 00 _hex ..1F _hex	G0 character set					G1 character set	G2 character set		G3 character set
X / 0 to X / 25 Simple level 1 teletext page	all	● ¹	● ²^,³	● ³		*	national	● ⁴
X / 26 column function ...
... 10 _hex "G0 Character"	≥ 1.5		●		●	@	default
… 09 _hex "G0 Character (Levels 2.5 & 3.5)"	≥ 2.5		●		●	*	default
... 11 _hex to 1F _hex "G0 Character with diacritical mark"	≥ 1.5		●		●	*	default		combining	combining
... 01 _hex "G1 character"	≥ 2.5		○ ⁵		○ ⁵		default	● ⁵
… 0F _hex "G2 Character"	≥ 1.5								● ⁶	●
... 02 _hex "G3 Character (Level 1.5)"	≥ 1.5										● ⁶
… 0B _hex "G3 Character (Levels 2.5 & 3.5)"	≥ 2.5										●

Notes on the G1 and G3 character sets :

^aWith the G1 character set , the form of the 63 block elements (positions 21 _hex to 3F _hex and 60 _hex to 7F _hex ) can be combined with the two control characters 19 _hex "Contiguous Mosaic Graphics" (connected) and 1A _hex "Separated Mosaic Graphics" (separated), as well as from level 2.5 with the X / 26 column function 0C _hex "Display attributes" as an attribute. The contiguous shape is preset at the beginning of each line.

^bWith the G3 character set , the shape of the 57 smoothed block elements (positions 20 _hex to 2D _hex , 30 _hex to 3D _hex , 3F _hex , 60 _hex to 6D _hex and 70 _hex to 7D _hex ) can be used with some decoders as with the block elements in G1 Character set as an attribute.

Notes on the simple level 1 teletext page:

¹In the case of a control character, the space is normally displayed at position 20 _hex in the selected character set. In the graphics hold mode, the last selected G1 block element / space (positions 20 _hex to 3F _hex and 60 _hex to 7F _hex ) is displayed when the G1 character set is selected . This stop character is reset to the blank at the beginning of each line, when changing G0 / G1 character set or when real size changes are made. The hold mode can be switched on and off with the two control characters 1E _hex “Hold Mosaics” and 1F _hex “Release Mosaics”, whereby the current hold character is already or still displayed. At the beginning of each line the hold mode is switched off.

² The first G0 character set is always selected at the beginning of each line.

³The G0 character set can be selected with the eight control characters 00 _hex to 07 _hex "Alpha Color Codes". The control character 1B _hex "ESC" can be used to switch between the first and second G0 character set .

^4thThe G1 character set can be selected with the eight control characters 10 _hex to 17 _hex "Mosaic Color Codes". The corresponding characters from the selected G0 character set (standard or second G0) are used for the 32 positions 40 _hex to 5F _hex .

Comment on the X / 26 column function 01 _hex "G1 Character":

⁵With the G1 character set , the corresponding characters of the selected G0 character set (standard or X / 26 selection) are used for the 32 positions 40 _hex to 5F _hex .

Comment on the X / 26 column functions 0F _hex "G2 Character" and 02 _hex "G3 Character (Level 1.5)":

^6thWith many Level 1.5 decoders the character set of the G2 and G3 character sets is limited.

Web links

ETSI EN 300 706 - Enhanced Teletext specification (2003) and ETS 300 706 (1997), ETSI (English)
ITU-T Recommendation T.101: International interworking for Videotex services (1994) and ITU-T Recommendation T.101, Annex C (1990), ITU (English)
EBU Tech 3232 - Displayable Character Sets for Broadcast Teletext and EBU Tech 3232-a - Appendices , EBU, 1982 (English)
STV5348 , STMicroelectronics, 2004 (English)
Philips SAA5243 (1991), Philips SAA5244A (1992), Philips SAA5249 (1996), Philips SAA5254 (1996), Philips SAA5281 (1996), Philips SAA5288 (1997) and Philips SAA5290 (1995), Philips (English)
The Cyrillic Charset Soup , Roman Czyborra, 1998 (English)
Notes on some Unicode Arabic characters: recommendations for usage , Jonathan Kew, Draft 2, 2005
Unicode 8.0 Character Code Charts , Unicode, 2015 (English)
Graphic character identifiers , IBM (English)
RFC 1345 - Character Mnemonics & Character Sets , Keld Simonsen, 1992 (English)
GOST 13052
ISO 6937-2: 1983 / Add 1: 1989
ISO-IR-11
ISO-IR-21
ISO-IR-27
ISO-IR-47
ISO-IR-142

Individual evidence

↑ ^a ^b Philips SAA5246A , Philips, 1993 (English)

↑ Character histories: notes on some Ascii code positions , Jukka “Yucca” Korpela, 2006 (English);
7-bit character sets , Aivosto Oy, 2016 (English)

↑ Quarter-quadrant, hyphen / divis , Wikipedia: “In the older ASCII character set and in the character sets of the ISO 8859 family of standards [...] the hyphen-minus is used, which was introduced with the typewriter as a common character for hyphen, dash and minus sign . ";
IT and communication - Characters and encodings: The ISO Latin 1 character repertoire: Detailed descriptions of the characters, "- HYPHEN, MINUS SIGN (HYPHEN-MINUS) U + 002D" , Jukka "Yucca" Korpela, 2006 (English): "In situations where sufficient support to Unicode can be safely assumed (very rarely at present!), it is best to replace the use of hyphen-minus by Unicode hyphen (U + 2010) or non-breaking hyphen (U + 2011) or minus sign (U + 2212) or, if hyphen-minus had been used eg in place of a dash symbol, some other Unicode character such as en dash (U + 2013) or em dash (U + 2014) or horizontal bar (U + 2015 ). "

↑ ^a ^b ^c Minus sign, similar signs , U + 2015 horizontal bar , Wikipedia: " ⁽²⁾ This sign generally resembles an em dash in length, shape and altitude and differs from it only in its line break properties."

↑ On the use of some MS Windows characters in HTML, Suggested substitutes, Dashes , Jukka "Yucca" Korpela, 2017 (English): "In typewritten material, the em dash is represented by two hyphens with no space around them, and an en dash is represented by a hyphen. "

↑ Internationalization for Turkish: Dotted and Dotless Letter "I" , Tex Texin, 2010 (English);
Resolving dotted and dotless "i" , John Cowan, 1997 (English)

↑ ^a ^b circumflex, character sets , Wikipedia: “The ASCII character set only contains the character ^ (in Unicode at position U + 005E), which is now interpreted as a single, universally applicable character. [...] In addition to the universal character ^ (U + 005E), the Unicode standard contains the typographically better character ˆ (U + 02C6) as well as other pre-composed characters with circumflex (e.g. Ẑ, ẑ). “;
ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM43 Arrowhead upwards, circumflex shape"

^ ^A ^b ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM48 Lower bar (not jointive) low line, spacing underline (equivalent to SP09 of ISO 6937) "

↑ ^a ^b Grave accent, As surrogate of apostrophe or (opening) single quote , Wikipedia (English): "Additionally ASCII grave accent character (U + 0060` Grave accent ) was often used as surrogate of opening single quote, together with ASCII typewriter apostrophe (U + 0027 ' apostrophe ) used as closing single quote; double quotes were sometimes substituted by two consecutive grave accents and two consecutive typewriter apostrophes (`` ... ''). ";
ASCII and Unicode quotation marks , Markus Kuhn, 2007 (English): "Only old X Window System fonts and some old video terminals show ASCII 0x60 / 0x27 as left and right quotation marks, while most modern systems follow the ISO and Unicode standards instead. ";
ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM44 Upper reverse solidus, grave accent shape"

↑ Character histories: notes on some Ascii code positions, VERTICAL LINE , Jukka "Yucca" Korpela, 2006 (English)

↑ ^a ^b Tilde, ASCII tilde (U + 007E) , Wikipedia (English): “Most modern proportional fonts align plain spacing tilde at the same level as dashes, or only slightly upper. This distinguishes it from a small tilde (˜), which is always raised. But in some monospace fonts, especially used in text user interfaces, ASCII tilde character is raised too. This apparently is a legacy of typewriters, where pairs of similar spacing and combining characters relied on one glyph. ";
Unicode Explained , Chapter 8: Character Usage, ASCII (Basic Latin), Tilde ~ (U + 007E), p. 401, Jukka K. Korpela, 2006 (English): “As a spacing clone of a diacritic tilde (ie, spacing counterpart of combining tilde U + 0303), use the small tilde ˜ (U + 02CD [correct: U + 02DC]). ";
ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM47 Upper bar (not jointive) bar or tilde shape"

↑ ^a ^b List of Latin-based alphabets, extensions , Wikipedia;
Everything about Unicode, Lithuanian special characters , Jens Meyer, 2007;
Special letters and diacritical marks for the European languages of the Latin alphabet, Wolfgang Hendlmeier and Gerhard Helzel, 2012

↑ Hatschek, Usage and Character Sets , Wikipedia: "In modern printed fonts, the character on the uppercase L and on the lowercase d, l and t is often shown in a form similar to a comma at the top right next to the basic character."
And "It should be noted that these codes are also used if the hatschek is displayed on d, l, L and t in comma form. "

↑ Telephone keypad, recommendation ITU-T E.161 , placement, appearance and naming of the symbol ⌗, Wikipedia: “This symbol is contained in Unicode as U + 2317 viewdata square [...]. With the square shape, the line ends must protrude between 8% and 18% of the edge line length on each side, with the inclined shape (interior angle 80 °) always by 18%. ";
Proposal to incorporate two telephony symbols into Unicode by glyph and annotation changes , Karl Pentzlin, 2013 (English): "The viewdata square, as its name implies, is introduced anyway as a character for" Viewdata "which is an application related to telephony introduced in the 1980s. It can be presumed that it had to be in fact the same symbol as the E.161 symbol.
However, the proportions of its representative glyph are not within the constraints given in E.161. ";
ITU-T Recommendation E.161: Arrangement of digits, letters and symbols on telephones and other devices that can be used for gaining access to a telephone network , 3.2.2 12 push buttons, symbols, pp. 3 + 4, ITU, 2001 (english)

↑ ^a ^b ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 76, ITU, 1994 (English): "SM12 Central horizonal bar jointive"

↑ ż , Wiktionary: “As a typographical variant there is ƶ / Ƶ. However, this is usually only used if the whole word is written in capitals and there is no longer enough space for the point above the Z. ";
Teletext mappings , Marcin “Qrczak” Kowalczyk, 2001 (English): “In Polish capital Z with dot above is sometimes rendered with stroke instead of the dot. It's just a glyph variant, the meaning is exactly the same. The letter should be consistently encoded as Z WITH DOT ABOVE even if it's rendered with a stroke. "

↑ ^a ^b comma (undersigned), coding , Wikipedia: “Until the early 1990s, no distinction was made between the comma and the cedilla in international standards. [...] Only later did the view prevail that these are two different diacritics. Today, Unicode contains both S and T with cedilla and S and T with comma. ”;
ISO / IEC 6937: 2001 , Table 4 - Specification of the repertoire, pp. 15 and 18, ISO / IEC, 2001 (English): "NOTE 2: The letters used in the Romanian language LATIN CAPITAL LETTER S WITH COMMA BELOW and LATIN CAPITAL LETTER T WITH COMMA BELOW are different from the LATIN CAPITAL LETTER S WITH CEDILLA and LATIN CAPITAL LETTER T WITH CEDILLA. However, subject to the agreement of originator and receiver in information interchange, the letters WITH CEDILLA may be used to substitute for the letters WITH COMMA BELOW. "
And" NOTE 5: The letters used in the Romanian language LATIN SMALL LETTER S WITH COMMA BELOW and LATIN SMALL LETTER T WITH COMMA BELOW are different from the LATIN SMALL LETTER S WITH CEDILLA and LATIN SMALL LETTER T WITH CEDILLA. However, subject to the agreement of originator and receiver in information interchange, the letters WITH CEDILLA may be used to substitute for the letters WITH COMMA BELOW. ";
Cedillas and commas below , Eric Muller, Adobe, 2013 (English);
Comments on cedilla and comma below (revision 2) , Denis Moyogo Jacquerye, 2013 (English);
Romanian diacritic marks , Cristian Kit Paul, 2008 (English)

↑ Overline, Available Characters , Wikipedia: “In several character sets of the ISO 8859 family of standards and derived from it in the Unicode standard, there is a character U + 00AF (175 _dec ) that can be used both as an overline and as a macron. [...] One of the reasons why the overline is often incorrectly referred to as a "macron" is not to be confused with the other Unicode characters of this name. The characters at the code points U + 02C9 ( modifier letter macron ) and U + 0304 ( combining macron ) are significantly shorter than their counterparts with overline . "

↑ The modern library , 10.2.4 and 10.2.5 character set sorting (literacy), pp 229-232, Rudolf Frankenberger and Klaus Haller, 2004

↑ Trema, Unicode , Wikipedia: “Most standards for character sets, including Unicode, do not differentiate between umlaut and trema. If a distinction between umlaut and trema is necessary in data processing, ISO / IEC JTC 1 / SC 2 / WG 2 recommends the following:

Representation of the trema by: Combining Grapheme Joiner (CGJ, 034F) + Combining Diaeresis (0308)
The umlauts are represented by: Combining Diaeresis (0308) “;

Frequently Asked Questions, Characters and Combining Marks, "Q: Unicode doesn't seem to distinguish between tréma and umlaut, but I need to distinguish. What shall I do? " , Unicode, 2016 (English)

↑ Unicode Technical Note # 27 - Known Anomalies in Unicode Character Names , Unicode, 2017 (English)

↑ CCITT Recommendation T.61: Character repertoire and coded character sets for the international teletex service , 3.2.3.9 Non-spacing characters, p. 13, ITU, 1988 (English): "Note - The Non-spacing underline character is never used individually but always in combination with some other graphic character to represent the graphic rendition “underlined” for the associated character. The non-spacing underline character can be used in combination with any graphic character of the repertoire, including an accented letter or an umlaut, or space. It is recommended to implement the "underline" function by means of the control function SGR (4) instead of the "non-spacing underline" graphic character. "

↑ Proportionality Symbol , Doctor Peterson, 2003 (English): "If you prefer to describe it by its appearance rather than strictly by its usage, you might call it an" open alpha "or" loose alpha, "rather than" fishy alpha. " People do often describe it (wrongly) as an alpha, but I haven't seen these modifiers used anywhere. "

↑ ŉ, Miscellaneous , Wikipedia (English): "The upper case, or majuscule form has never been included in any international keyboards Therefore, it is decomposable by simply combining ʼ (U + 02BC) and N. 〔ʼN〕";
Unicode 10.0 Character Code Charts, Latin Extended-A , 0149 ŉ LATIN SMALL LETTER N PRECEDED BY APOSTROPHE, Unicode, 2017 (English): "uppercase is 02BC ʼ 004E N"

↑ Kra (letter) , Wikipedia (English): “The letter can be capitalized as K ' , but it is not encoded separately as a single letter because it is very similar to the Latin capital letter K followed by an apostrophe, preferably the modifier letter apostrophe, U + 02BC ʼ modifier letter apostrophe (HTML & # 700;). “;
Status of Mapping between Characters of ISO 5426-2 and ISO / IEC 10646-1 (UCS) , 4. ADDITIONAL MAPPINGS, 63 LATIN CAPITAL LETTER KRA, p. 5, Joan M. Aliprand, 2002 (English): “The capital form of the letter kra letter can be encoded as the sequence U + 004B LATIN CAPTIAL LETTER K followed by U + 02BC MODIFIER LETTER APOSTROPHE. "

↑ Unicode 10.0 Character Code Charts, Latin Extended-A , 0131 ı LATIN SMALL LETTER DOTLESS I, Unicode, 2017 (English): "uppercase is 0049 I"

↑ ß, capitalization and special features of use , as well as capital ß, capital letters without capital ß , Wikipedia;
Unicode 10.0 Character Code Charts, C1 Controls and Latin-1 Supplement , 00DF ß LATIN SMALL LETTER SHARP S, Unicode, 2017 (English): 'uppercase is “SS”'
↑ Large ß , Wikipedia: “At the beginning of 2008 the capital ß was included as a new character in the international Unicode standard for computer character sets; Since June 29, 2017, the am has been part of the official German spelling. "
↑ ^a ^b I with grave (Cyrillic), Bulgarian and Macedonian , Wikipedia (English): “When not available, the character ⟨ѝ⟩ is often replaced by an ordinary ⟨и⟩ (not recommended, but still orthographically correct) or in Bulgarian by the letter ⟨й⟩ (formally this is considered a spelling error). "
↑ ^a ^b Tonos , Wikipedia: “In some fonts the tonos is vertical, that is, in a 'neutral' position in contrast to the acute acute inclined to the right and the grave accent inclined to the left, sometimes it is just a point, one on top Triangle or similar This custom dates back to the 1970s, i.e. from the time before the official introduction of monotonic orthography by the Greek government, when orthography reformers used a 'neutral' accent in this way, which had to differ from the existing ones in polytonic orthography. With the official introduction of the monotonic orthography by the Greek government in 1980, however, the distinction between the tone and the polytonic accents became unnecessary, and all style specifications stipulate that the monotonic tone is graphically identical to the polytonic acute. This is also what Unicode provides. "
↑ ^a ^b Arabic character tail for final Seen family (Seen, Sheen, Saad, Daad) , IBM Egypt, 2001 (English)
↑ The Unicode Consortium on Twitter , Unicode, 2019 (English);
Proposal to add characters from legacy computers and teletext to the UCS , Doug Ewell, Rebecca Bettencourt and others, 2019 (English);
Map from Teletext G1 character set to Unicode , Rebecca Bettencourt, 2018 (English);
Map from Teletext G3 character set to Unicode , Rebecca Bettencourt, 2018 (English)
↑ Unicode Technical Report # 25 - Unicode Support for Mathematics, 2.11 Geometrical Shapes , Unicode, 2007 (English)
↑ Bug Reports DVBViewer Pro / GE - Teletext with Cyrillic , Griga, 2012 (English): "PS The following screenshot from Derrick's sample (see above) shows clearly which characters originate from which source: - White characters are from the Latin G0 Character Set (identical for all countries with a latin alphabet)
- Red characters are from the Spanish / Portuguese National Option Subset.
- Green characters added by packets X / 26 are from the Latin G2 Supplementary Set. "
↑ Siemens MEGATEXT PLUS SDA 5275-2 Delta Specification / Application Notes , 2.5.2 Example for Russian Market, p. 56, Siemens, 1998 (English): "The bit SEC_LA should be set and the secondary language should be defined to English because currently, no Russian broadcaster transmits packet X / 28 or X / 29. "
↑ Philips SAA5x9x family , 9.5 The twist attribute, p. 40, Philips, 1998 (English): “In many of the character sets, the 'twist' serial attribute (code 1BH) can be used to switch to an alternate basic character code table, eg to change from the Hebrew alphabet to the Arabic alphabet on an Arab / Hebrew device. "
↑ Philips SAA5x9x family , 9.5 The twist attribute, p. 40, Philips, 1998 (English): “In many of the character sets, the 'twist' serial attribute (code 1BH) can be used to switch to an alternate basic character code table [...]. For some national option languages the alternate code table is the default, and a twist control character will switch to the first code table. "

[Philips_SAA5246A-1] Philips SAA5246A , Philips, 1993 (English)

[2] Character histories: notes on some Ascii code positions , Jukka “Yucca” Korpela, 2006 (English);
7-bit character sets , Aivosto Oy, 2016 (English)

[3] Quarter-quadrant, hyphen / divis , Wikipedia: “In the older ASCII character set and in the character sets of the ISO 8859 family of standards [...] the hyphen-minus is used, which was introduced with the typewriter as a common character for hyphen, dash and minus sign . ";
IT and communication - Characters and encodings: The ISO Latin 1 character repertoire: Detailed descriptions of the characters, "- HYPHEN, MINUS SIGN (HYPHEN-MINUS) U + 002D" , Jukka "Yucca" Korpela, 2006 (English): "In situations where sufficient support to Unicode can be safely assumed (very rarely at present!), it is best to replace the use of hyphen-minus by Unicode hyphen (U + 2010) or non-breaking hyphen (U + 2011) or minus sign (U + 2212) or, if hyphen-minus had been used eg in place of a dash symbol, some other Unicode character such as en dash (U + 2013) or em dash (U + 2014) or horizontal bar (U + 2015 ). "

[Geviertstrich-4] Minus sign, similar signs , U + 2015 horizontal bar , Wikipedia: " ⁽²⁾ This sign generally resembles an em dash in length, shape and altitude and differs from it only in its line break properties."

[5] On the use of some MS Windows characters in HTML, Suggested substitutes, Dashes , Jukka "Yucca" Korpela, 2017 (English): "In typewritten material, the em dash is represented by two hyphens with no space around them, and an en dash is represented by a hyphen. "

[6] Internationalization for Turkish: Dotted and Dotless Letter "I" , Tex Texin, 2010 (English);
Resolving dotted and dotless "i" , John Cowan, 1997 (English)

[Zirkumflex-7] rcumflex, character sets , Wikipedia: “The ASCII character set only contains the character ^ (in Unicode at position U + 005E), which is now interpreted as a single, universally applicable character. [...] In addition to the universal character ^ (U + 005E), the Unicode standard contains the typographically better character ˆ (U + 02C6) as well as other pre-composed characters with circumflex (e.g. Ẑ, ẑ). “;
ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM43 Arrowhead upwards, circumflex shape"

[Unterstrich-8] A ^b ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM48 Lower bar (not jointive) low line, spacing underline (equivalent to SP09 of ISO 6937) "

[Gravis-9] Grave accent, As surrogate of apostrophe or (opening) single quote , Wikipedia (English): "Additionally ASCII grave accent character (U + 0060` Grave accent ) was often used as surrogate of opening single quote, together with ASCII typewriter apostrophe (U + 0027 ' apostrophe ) used as closing single quote; double quotes were sometimes substituted by two consecutive grave accents and two consecutive typewriter apostrophes (`` ... ''). ";
ASCII and Unicode quotation marks , Markus Kuhn, 2007 (English): "Only old X Window System fonts and some old video terminals show ASCII 0x60 / 0x27 as left and right quotation marks, while most modern systems follow the ISO and Unicode standards instead. ";
ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM44 Upper reverse solidus, grave accent shape"

[10] Character histories: notes on some Ascii code positions, VERTICAL LINE , Jukka "Yucca" Korpela, 2006 (English)

[Tilde-11] Tilde, ASCII tilde (U + 007E) , Wikipedia (English): “Most modern proportional fonts align plain spacing tilde at the same level as dashes, or only slightly upper. This distinguishes it from a small tilde (˜), which is always raised. But in some monospace fonts, especially used in text user interfaces, ASCII tilde character is raised too. This apparently is a legacy of typewriters, where pairs of similar spacing and combining characters relied on one glyph. ";
Unicode Explained , Chapter 8: Character Usage, ASCII (Basic Latin), Tilde ~ (U + 007E), p. 401, Jukka K. Korpela, 2006 (English): “As a spacing clone of a diacritic tilde (ie, spacing counterpart of combining tilde U + 0303), use the small tilde ˜ (U + 02CD [correct: U + 02DC]). ";
ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM47 Upper bar (not jointive) bar or tilde shape"