Teletext character sets (ETSI EN 300 706)

from Wikipedia, the free encyclopedia

The following tables describe the 7-bit character sets defined in ETSI EN 300 706 of the teletext standard used in Europe .

General

The first 32 positions (00 hex to 1F hex ) of the character sets are not defined. However, these character codes are defined as control characters in the simple level 1 teletext page.

The character 24 hex represents the general currency symbol (¤) in the Latin G0 standard primary character set and the dollar sign ($) in the other G0 primary character sets .

The character 2A hex in the G0 primary character sets represents the asterisk (*) or the at sign (@) depending on the control .

The filled rectangle at position 7F hex in the G0 primary character sets and in some G2 supplementary character sets is as large as the maximum extension of all letters without descenders . It has no fixed Unicode assignment and is encoded in DOS character sets like the FE hex (■) character , which is also used in many software-based decoders. The exact layout of the Unicode character depends heavily on the font, but at least in the “Courier” font family, the filled square ( ) with the Unicode number 25A0 hex largely corresponds to the example layout given in ETSI EN 300 706 . However, in the Arabic G0 primary character set , the rectangle is shown with a slightly shorter length than the Arabic letter Alif maqṣūra (ﻯ) at position 70 hex , which is also not the case with all decoders.

The G2 supplementary character sets and the G3 character set "high-resolution graphics" are supported from teletext presentation level 1.5. With many Level 1.5 decoders, the character set of these character sets is still limited.

Legend

A. Γ Basic alphabet letter ( Latin / non-Latin script)
ß ά Special letter or addition
` ΄ Diacritical mark (single)
O Diacritical mark (combining)
2 ٢ Digit of the number system
½ Numeral
@ Punctuation marks or special characters
O Combining special character
Graphic or frame element ( defined / not defined in Unicode )
RLM Spaces or control characters
Undefined character
| ¦ Characters with layout variations (often due to the low resolution or historical reasons)
41 41 See notes on the table (unique / different codings )
Α A ﺏ ﺐ Context-dependent meaning (identical layout / suitable form )
У (Y) ﺁ (ﺂ) Context-dependent meaning (different layout / missing form)
Ë | $ Different codings ( depending on the control or the decoder)

With the Unicode numbers, the official Unicode name is given as an (invalid) web link so that it can be displayed as a reference text - unfortunately the wiki syntax does not provide a better way of doing this . For characters without a Unicode assignment ("N / A"), a descriptive name is used here, which is based on the names of similar Unicode characters.

Latin

The Latin G0 ("Standard" variant) and G2 character sets are essentially identical to the 8-bit character set ISO 6937-2: 1983 / Add 1: 1989 (ISO-IR-142) supplemented by the two characters A6 hex (#) and A8 hex (¤) from the equivalent 8-bit character set ITU T.61 (see also the current version of ISO 6937: 2001 ), whereby the G2 supplementary character set corresponds to the characters A0 hex to FF hex .

Latin G0 primary character set (European)
Selection bits : see national variants
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

!

0021
21

" "

0022
22

#

0023
23

¤

00A4
24

%

0025
25

&

0026
26

' '

0027
27

(

0028
28

)

0029
29

* | @ 

002A | 0040
2A

+

002B
2B

,

002C
2C

-

002D
2D

.

002E
2E

/

002F
2F

3_

0

0030
30

1

0031
31

2

0032
32

3

0033
33

4th

0034
34

5

0035
35

6th

0036
36

7th

0037
37

8th

0038
38

9

0039
39

:

003A
3A

;

003B
3B

<

003C
3C

=

003D
3D

>

003E
3E

?

003F
3F

4_

@

0040
40

A.

0041
41

B.

0042
42

C.

0043
43

D.

0044
44

E.

0045
45

F.

0046
46

G

0047
47

H

0048
48

I.

0049
49

J

004A
4A

K

004B
4B

L.

004C
4C

M.

004D
4D

N

004E
4E

O

004F
4F

5_

P

0050
50

Q

0051
51

R.

0052
52

S.

0053
53

T

0054
54

U

0055
55

V

0056
56

W.

0057
57

X

0058
58

Y

0059
59

Z

005A
5A

[

005B
5B

\

005C
5C

]

005D
5D

^

005E
5E

_

005F
5F

6_

`

0060
60

a

0061
61

b

0062
62

c

0063
63

d

0064
64

e

0065
65

f

0066
66

G

0067
67

H

0068
68

i

0069
69

j

006A
6A

k

006B
6B

l

006C
6C

m

006D
6D

n

006E
6E

O

006F
6F

7_

p

0070
70

q

0071
71

r

0072
72

s

0073
73

t

0074
74

u

0075
75

v

0076
76

w

0077
77

x

0078
78

y

0079
79

z

007A
7A

{

007B
7B

| ¦

007C
7C

}

007D
7D

~ ~

007E
7E

25A0
7F

The 7F hex (■) character is coded differently from ISO 6937 .

The double quotation mark (") at position 22 hex is typographically correct in ETSI EN 300 706 in the example layout as a closing quotation mark in English (") with the Unicode number 201D hex . However, the character should still be encoded as a neutral variant according to ISO 6937 , in order to be able to be used visually and semantically better as opening quotation marks in English (“). In addition, the typographically correct variant is also shown at position 3A hex in the Latin G2 supplementary character set with a different example layout rather than closing quotation marks.

The number sign (#) at position 23 hex is shown in ETSI EN 300 706 in the example layout with vertical lines, although this is only a layout variation that is probably due to the low resolution.

The apostrophe (') at position 27 hex is typographically correct in ETSI EN 300 706 in the example layout and could also be closed with the optically more suitable, alternative Unicode characters in English (') with the Unicode number 2019 hex or modifying apostrophe ( ʼ) can be coded with the Unicode number 02BC hex , but both of these would be different from ISO 6937 and would not be optically and semantically suitable if used as opening quotation marks in English ('). In addition, the typographically correct variant is also shown at position 39 hex in the Latin G2 supplementary character set with a different example layout rather than closing quotation marks.

The coding of the character 2A hex depends on the control .

The asterisk (*) at the position 2A hex is in 300706 ETSI EN displayed large in the example layout sechsstrahlig, standing on a beam and centered vertically and could optically more suitable also with the alternative Unicode character asterisk operator ( * ) with the Unicode number 2217 hex coded, which would be different to ISO 6937 .

The center dash (-) at the position 2D hex can according to EBU Tech 3232-a and ITU T.61 also context-dependent than dash - with the Unicode number 2010 () hex or a minus sign - with the Unicode number 2212 () hex encoded. The character can also be used as a dash (-) with the Unicode number 2013 hex . However, for the long dash in English (-) with the Unicode number 2014 hex, it is better to use the horizontal line (-) at position 60 hex in the " English " variant and at position 50 hex in the Latin G2 supplementary character set or two consecutive middle bars become. 

The capital letter I in position 49 hex can be used as a capital letter for the small letter i in position 69 hex and as a capital letter for the small letter i without a dot (ı) in position 60 hex or 5F hex in the two variants " Turkish " and " Romanian ”, as well as at position 75 hex in the Latin G2 supplementary character set. The lower case letter i at position 69 hex can be used as a lower case letter for the upper case letter I at position 49 hex and as a lower case letter for the upper case letter I with a dot (İ) at position 40 hex in the "Turkish" variant and for the corresponding combination in Latin G2 supplementary character set can be used. Even in Unicode , no distinction is made between the two optically identical characters.

The circumflex (^) at position 5E hex is shown in ETSI EN 300 706 in the example layout in large and superscript, as is also common in modern printed publications.

The underscore (_) at position 5F hex is not shown connecting left and right in ETSI EN 300 706 in the example layout, but this is unusual in modern publications.

The single diacritical gravis (`) at position 60 hex is shown in ETSI EN 300 706 in the example layout of the size and height as a vertically mirrored counterpart to the typographically correct form of the apostrophe (') at position 27 hex , but still has the straight line shape and inclination of a grave accent. Nevertheless, the character could possibly also be used as an opening single quotation mark in English (‛) with the Unicode number 201B hex , but this would differ from ISO 6937 and would not semantically fit.

The vertical bar (|) at position 7C hex is shown in ETSI EN 300 706 in the example layout with a broken line in the middle (as well as not connecting at the top and bottom) and could also be broken with the optically more appropriate, alternative Unicode character (¦) with the Unicode number 00A6 hex coded, which would be different to ISO 6937 . In addition, it is only a historically determined layout variation.

The tilde (~) at position 7E hex is shown in ETSI EN 300 706 in the example layout in uppercase and superscript and in this form is not defined as an independent character in Unicode . The single diacritical tilde (˜) with the Unicode number 02DC hex fits the altitude, but is too small. According EBU Tech 3232-a and ITU T.101 as an alternative to coding, the Unicode character overline with the Unicode number 203E (~) hex or possibly the lone diacritics macron (¯) and the Unicode number 00AF hex be used, but both would deviate from ISO 6937 and, unlike ITU T.101, usually connect left and right.

The coding of the other characters framed in bold depends on the control and the selected national variant .

Latin G0 primary character set - national variants
Selection bits
G2 = Arabic G2
23 24 40 5B 5C 5D 5E 5F 60 7B 7C 7D 7E
0_ 1_ 2_ 3_ 4_ 6_ 8th_
default

#

0023
23

¤

00A4
24

@

0040
40

[

005B
5B

\

005C
5C

]

005D
5D

^

005E
5E

_

005F
5F

`

0060
60

{

007B
7B

| ¦

007C
7C

}

007D
7D

~ ~

007E
7E

Czech / Slovak 06 16 46

#

0023
23

ů

016F
24

č

010D
40

ť

0165
5B

ž

017E
5C

ý

00FD
5D

í

00ED
5E

ř

0159
5F

é

00E9
60

á

00E1
7B

ě

011B
7C

ú

00FA
7D

š

0161
7E

English 00 20th 80
G2

£

00A3
23

$

0024
24

@

0040
40

2190
5B

½

00BD
5C

2192
5D

2191
5E

#

0023
5F

-

2015
60

¼

00BC
7B

2225
7C

¾

00BE
7D

÷

00F7
7E

Estonian 42

#

0023
23

O

00F5
24

Š

0160
40

Ä

00C4
5B

Ö

00D6
5C

Ž

017D
5D

Ü

00DC
5E

O

00D5
5F

š

0161
60

Ä

00E4
7B

ö

00F6
7C

ž

017E
7D

ü

00FC
7E

French 04 14th 24 84
G2

é

00E9
23

ï

00EF
24

à

00E0
40

ë

00EB
5B

ê

00EA
5C

ù

00F9
5D

î

00EE
5E

#

0023
5F

è

00E8
60

â

00E2
7B

O

00F4
7C

û

00FB
7D

ç

00E7
7E

German 01 11 21st 41

#

0023
23

$

0024
24

§

00A7
40

Ä

00C4
5B

Ö

00D6
5C

Ü

00DC
5D

^

005E
5E

_

005F
5F

°

00B0
60

Ä

00E4
7B

ö

00F6
7C

ü

00FC
7D

ß

00DF
7E

Italian 03 13 23

£

00A3
23

$

0024
24

é

00E9
40

°

00B0
5B

ç

00E7
5C

2192
5D

2191
5E

#

0023
5F

ù

00F9
60

à

00E0
7B

O

00F2
7C

è

00E8
7D

ì

00EC
7E

Latvian / Lithuanian 43

#

0023
23

$

0024
24

Š

0160
40

ė

0117
5B

ę

0119
5C

Ž

017D
5D

č

010D
5E

ū

016B
5F

š

0161
60

ą

0105
7B

ų

0173
7C

ž

017E
7D

į

012F
7E

Polish 10

#

0023
23

ń

0144
24

ą

0105
40

Ż Ƶ

017B
5B

Ś

015A
5C

Ł

0141
5D

ć

0107
5E

O

00F3
5F

ę

0119
60

ż

017C
7B

ś

015B
7C

ł

0142
7D

ź

017A
7E

Portuguese / Spanish 05 25th

ç

00E7
23

$

0024
24

¡

00A1
40

á

00E1
5B

é

00E9
5C

í

00ED
5D

O

00F3
5E

ú

00FA
5F

¿

00BF
60

ü

00FC
7B

ñ

00F1
7C

è

00E8
7D

à

00E0
7E

Romanian 37

#

0023
23

¤

00A4
24

Ț

021A
40

Â

00C2
5B

Ș

0218
5C

Ă

0102
5D

Î

00CE
5E

ı

0131
5F

ț

021B
60

â

00E2
7B

ș

0219
7C

ă

0103
7D

î

00EE
7E

Serbian / Croatian / Slovenian 35

#

0023
23

Ë

00CB
24

Č

010C
40

Ć

0106
5B

Ž

017D
5C

Đ

0110
5D

Š

0160
5E

ë

00EB
5F

č

010D
60

ć

0107
7B

ž

017E
7C

đ

0111
7D

š

0161
7E

Swedish / Finnish, Hungarian 02 12 22nd

#

0023
23

¤

00A4
24

É

00C9
40

Ä

00C4
5B

Ö

00D6
5C

Å

00C5
5D

Ü

00DC
5E

_

005F
5F

é

00E9
60

Ä

00E4
7B

ö

00F6
7C

å

00E5
7D

ü

00FC
7E

Turkish 26th 66

N / A
23

G

011F
24

İ

0130
40

Ş

015E
5B

Ö

00D6
5C

Ç

00C7
5D

Ü

00DC
5E

G

011E
5F

ı

0131
60

ş

015F
7B

ö

00F6
7C

ç

00E7
7D

ü

00FC
7E

In the national variants, the Háček (ˇ) and the Breve (˘) for the special letters in ETSI EN 300 706 are shown imprecisely the same. In the languages ​​of the three variants " Czech / Slovak ", " Latvian / Lithuanian " and " Serbian / Croatian / Slovenian " only the Háček is used, while in the languages ​​of the two variants " Romanian " and " Turkish " only the breve is used. The letters in question are coded accordingly in the variants.

In the " Czech / Slovak " variant , the lowercase letter t with Háček (ť) at position 5B hex in ETSI EN 300 706 shows the Háček (ˇ) in normal form, but is often similar in one form to the lowercase t in modern print an apostrophe (ʼ) to the right of the basic character. The coding is identical as it is just a layout variation.

The " English " variant is essentially identical to the 7-bit character set of the British Viewdata standard (ISO-IR-47), only the 5F hex (#) character is coded differently.

The two arrows to the left (←) and right (→) at positions 5B hex and 5D hex are shown in ETSI EN 300 706 in the example layout to match the horizontal line (-) at position 60 hex and can be used seamlessly at the beginning get connected. In such a combination, the horizontal line should be semantically appropriately encoded as a horizontal line extension ( ) with the Unicode number 23AF hex , although the Unicode character is currently only supported by very few fonts (correctly).

The double cross (#) at the position 5F hex is in ETSI EN 300 706 represented the same as the number sign at position 23 hex in the variant " standard " and, accordingly, identical coded. In the Viewdata standard, the character is coded as a viewdata square ( ) with the Unicode number 2317 hex , which is visually similar, but correctly represented but looks different (see ISO-IR-47) and has a different semantic meaning as a terminator for addresses which is not given in teletext.

The horizontal line (-) at position 60 hex can also be used as a long dash in English (-) with the Unicode number 2014 hex and is shown in ETSI EN 300 706 in the example layout connecting left and right.

The vertical double line at position 7C hex is coded as a parallel character (∥) in accordance with EBU Tech 3232-a and is not shown as a connecting element in ETSI EN 300 706 in the example layout above and below. According to the character designation in the Viewdata standard, the optically identical Unicode character double vertical line (‖) with the Unicode number 2016 hex can also be used for coding . However, according to RFC 1345 , this character is also coded there as a parallel character. But regardless of the primary encoding, the character can be used equally as a parallel character and as a double vertical line.

The " German " variant is essentially identical to the German 7-bit character set DIN 66003 (ISO-IR-21), only the 60 hex (°) character is coded differently.

In the " Latvian / Lithuanian " variant , the two lower-case letters e with Ogonek (ę) and i with Ogonek (į) at positions 5C hex and 7E hex in ETSI EN 300 706 are probably incorrectly displayed with cedilla (¸), as these are in Latvian or Lithuanian can never be used with cedilla, but with Ogonek (˛). An alternative coding is not necessary, since the wrongly represented letters do not even occur in Europe and should therefore never be used.

In the " Polish " variant , the capital letter Z with an overlap (Ż) at position 5B hex in ETSI EN 300 706 is shown as Z with a slash (Ƶ), but is usually not coded that way because it is only a layout variation acts. In addition, the associated lower case letter at position 7B hex is also shown in ETSI EN 300 706 as z with a point (ż).

In the " Romanian " variant , the two letters T with sub-comma (Ț / ț) and S with sub-comma (Ș / ș) are in positions 40 hex / 60 hex and 5C hex / 7C hex according to the Romanian standardization authority with sub- comma (̦) coded (see also ISO 8859-16 ). However, until the beginning of the 1990s, these were only regarded as layout variations of the letters T with cedilla (Ţ / ţ) and S with cedilla (Ş / ş) in international standards , and ISO 6937 only contains the special letters with cedilla (¸) .

In the variant " Serbian / Croatian / Slovenian " the character 24 hex instead of the capital letter E with trema (Ë) represents the dollar sign ($) with the Unicode number 0024 hex or the common fraction a half (½) with the Unicode number 00BD on some decoders hex .

The variant " Swedish / Finnish, Hungarian " is identical to the Swedish 7-bit character set SEN 850200 Annex C (ISO-IR-11).

In the " Turkish " variant , the symbol for the Turkish currency ( ) at position 23 hex can only be found in this form in teletext and is otherwise displayed as normal with the two single capital letters TL. There are in Unicode but different currency symbols that can be used for the Turkish currency: the Turkish Lirazeichen (₺) with the Unicode number 20ba hex that Lirazeichen (₤) with the Unicode number 20A4 hex and the pound sign (£) and the Unicode number 00A3 hex .

Latin G2 Supplementary Character Set (European)
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

¡

00A1
21

¢

00A2
22

£

00A3
23

$

0024
24

¥

00A5
25

#

0023
26

§

00A7
27

¤

00A4
28

'

2018
29

"

201C
2A

«

00AB
2B

2190
2C

2191
2D

2192
2E

2193
2F

3_

°

00B0
30

±

00B1
31

²

00B2
32

³

00B3
33

×

00D7
34

µ

00B5
35

00B6
36

·

00B7
37

÷

00F7
38

'

2019
39

201D
3A

»

00BB
3B

¼

00BC
3C

½

00BD
3D

¾

00BE
3E

¿

00BF
3F

4_


40

`

0060
41

´

00B4
42

ˆ

02C6
43

˜ 

02DC
44

¯ ˉ

00AF
45

˘

02D8
46

˙

02D9
47

¨

00A8
48

̣ 

N / A
49

˚

02DA
4A

¸ (̦)

00B8 ( N / A )
4B

_

005F
4C

˝

02DD
4D

˛

02DB
4E

ˇ

02C7
4F

Comb.


40

O

0300
41

ó (ģ)

0301 ( 0327 )
42

O

0302
43

O

0303
44

O

0304
45

O

0306
46

ȯ

0307
47

ö

0308
48

O

0323
49

å

030A
4A

ç (o̦)

0327 ( 0326 )
4B

O

0332
4C

O

030B
4D

ǫ

0328
4E

ǒ

030C
4F

5_

-

2015
50

¹

00B9
51

®

00AE
52

©

00A9
53

2122
54

266A
55

20A0
56

2030
57

221D
58


59


5A


5B

215B
5C

215C
5D

215D
5E

215E
5F

6_

Ω

2126
60

Æ

00C6
61

Đ Ð

0110 00D0
62

ª

00AA
63

H

0126
64


65

IJ

0132
66

Ŀ

013F
67

Ł

0141
68

O

00D8
69

Œ

0152
6A

º

00BA
6B

Þ

00DE
6C

Ŧ

0166
6D

Ŋ

014A
6E

ʼn

0149
6F

7_

ĸ

0138
70

æ

00E6
71

đ

0111
72

ð

00F0
73

H

0127
74

ı

0131
75

ij

0133
76

ŀ

0140
77

ł

0142
78

O

00F8
79

œ

0153
7A

ß

00DF
7B

þ

00FE
7C

ŧ

0167
7D

ŋ

014B
7E

25A0
7F

The six characters 20 hex (space), 49 hex (̣), 56 hex (₠), 57 hex (‰), 58 hex (∝) and 7F hex (■) are coded differently from ISO 6937 and ITU T.61 .

The space at position 20 hex can also be coded as a protected space with the Unicode number 00A0 hex in accordance with ISO 6937 . However, the line break behavior in teletext is irrelevant.

The two arrows to the left (←) and right (→) at positions 2C hex and 2E hex are shown in ETSI EN 300 706 in the example layout to match the horizontal line (-) at position 50 hex and can be used seamlessly at the beginning get connected. In such a combination, the horizontal line should be semantically appropriately encoded as a horizontal line extension ( ) with the Unicode number 23AF hex , although the Unicode character is currently only supported by very few fonts (correctly).

The single diacritical grave accent (`) at position 41 hex is shown in the Latin G0 standard primary character set in ETSI EN 300 706 with a different example layout and can also be used with the alternative Unicode character modifying grave accent (ˋ) with the Unicode number 02CB hex coded. However, these two characters are optically identical in modern printed matter. The single diacritical acute accent (´) at position 42 hex with the alternative Unicode character modifying acute accent (eventuell) with the Unicode number 02CA hex could be coded accordingly, but this would be different from ISO 6937 .

Since the single diacritical characters circumflex (ˆ) at position 43 hex and tilde (˜) at position 44 hex in the Latin G0 standard primary character set in ETSI EN 300 706 are shown with a different example layout, a more suitable, alternative coding is used as used in ISO 6937 (see Windows-1252 ). 

The layout of the single diacritical Unicode character macron (¯) at position 45 hex is also heavily dependent on the font and is often more like the overline (‾), so the optically more suitable, alternative Unicode character modifying macron ( ˉ) with the Unicode number 02C9 hex can be used, but this would be different from ISO 6937 .

The diacritical mark in the form of a horizontal colon (¨) at position 48 hex can be used as a trema and umlaut points according to EBU Tech 3232-a and ITU T.61 . Even in Unicode , no distinction is made between these two optically identical characters. If a semantic differentiation is necessary, the diacritical symbol Trema can be coded with the Unicode string combining grapheme connector with the Unicode number 034F hex and combining Trema (¨) with the Unicode number 0308 hex , while the diacritical symbol umlaut dots can be coded quite normally with the Unicode- Character combining Trema (¨) is encoded with the Unicode number 0308 hex or the Unicode characters combined with Trema. You shouldn't be confused by the names of the Unicode characters.

Historically, the diacritical cedilla (an) at position 4B hex can also be used as a sub- comma (̦).

The combining underlining (_) and the associated underlining at position 4C hex are not shown in ETSI EN 300 706 in the example layout on the left and right and should be better implemented using the " Underline " font . Correspondingly, the underscore at position 5F hex in the Latin G0 primary character set should also be coded as a protected space in the font “underline” in order to avoid a double line and to achieve uniform lines. But at least in the “Courier” font family, the underline is optically compatible with the “Underline” font.

The horizontal line (-) at position 50 hex can also be used as a long dash in English (-) with the Unicode number 2014 hex and is shown in ETSI EN 300 706 in the example layout connecting left and right.

The proportional symbol (∝) at position 58 hex is probably incorrectly referred to as alpha in EBU Tech 3232-a , but should not be confused with the Greek lowercase alpha (α), as both characters are shown in ETSI EN 300 706 with a different example layout .

According to EBU Tech 3232-a and ISO 6937, the character 62 hex can be used as a capital letter D with a slash (Đ) for the lower case letter of the same name (đ) at position 72 hex and as an Icelandic capital letter Eth (Ð) for the lower case letter of the same name (ð) position 73 hex can be used. In case of doubt, the first Unicode number according to ISO 6937 should be selected.

The character for the indefinite article in Afrikaans (ʼn) at position 6F hex is only available in lower case and is usually always lower case . In capitals , the character is displayed normally with the capital letter N at position 4E hex with a preceding modifying apostrophe (ʼ) in position 27 hex in the Latin G0 primary character set. The capitalized form is not defined as a separate character in Unicode either.

The previously used Greenlandic letter Kra (an) at position 70 hex is only available as a lowercase letter. The associated capital letter is represented with the capital letter K at position 4B hex with a subsequent modifying apostrophe (ʼ) at position 27 hex in the Latin G0 primary character set and is not defined as a separate character in Unicode either.

The capital letter I at position 49 hex in the Latin G0 primary character set is used as the uppercase letter for the Turkish lowercase letter i without a period (ı) at position 75 hex . This is also provided for in Unicode (see also note on the Latin G0 primary character set ).

The German letter Eszett (ß) at position 7B hex is only available as a lowercase letter. The capitalization is usually carried out with two consecutive capital letters S at the position 53 hex in the Latin G0 primary character set and is not defined in this form as a separate character in Unicode . It was not until 2008 that the Eszett in capital letter form (ẞ) was added as a new character in Unicode and has been part of the official German spelling since 2017 .

The alternative coding of the characters in the "Combining" line is used depending on the control . The supported combinations depend on the decoder. If in doubt, you should limit yourself to the combinations specified in ISO 6937 . Accordingly, to represent the lowercase letter g with cedilla (ģ), the lowercase letter g is combined with the acute (´) at position 42 hex , unlike in Unicode . With the two Cyrillic and Greek G2 supplementary character sets, the combining characters should only be used in conjunction with the Latin G0 primary character set.

Cyrillic

The Cyrillic G0 primary character sets are for the most part identical to the 7-bit character set GOST 13052 (adopted in ISO-IR-111 ), whereby the uppercase and lowercase letters are swapped and thus arranged as in the other character sets.

Cyrillic G0 primary character set - variant 1 - Serbian / Croatian
selection bits : 40
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

!

0021
21

" "

0022
22

#

0023
23

$

0024
24

%

0025
25

&

0026
26

' '

0027
27

(

0028
28

)

0029
29

* | @ 

002A | 0040
2A

+

002B
2B

,

002C
2C

-

002D
2D

.

002E
2E

/

002F
2F

3_

0

0030
30

1

0031
31

2

0032
32

3

0033
33

4th

0034
34

5

0035
35

6th

0036
36

7th

0037
37

8th

0038
38

9

0039
39

:

003A
3A

;

003B
3B

<

003C
3C

=

003D
3D

>

003E
3E

?

003F
3F

4_

Ч

0427
40

А  A

0410 0041
41

Б

0411
42

Ц

0426
43

Д

0414
44

Е

0415
45

Ф

0424
46

Г

0413
47

Х  X

0425 0058
48

И

0418
49

Ј

0408
4A

К

041A
4B

Л

041B
4C

М  M

041C 004D
4D

Н  H

041D 0048
4E

О  O

041E 004F
4F

5_

П

041F
50

Ќ

040C
51

Р  P

0420 0050
52

С  C

0421 0043
53

Т  T

0422 0054
54

У  (Y)

0423 ( 0059 )
55

В  B

0412 0042
56

Ѓ

0403
57

Љ

0409
58

Њ

040A
59

З

0417
5A

Ћ

040B
5B

Ж

0416
5C

Ђ

0402
5D

Ш

0428
5E

Џ

040F
5F

6_

ч

0447
60

а  a

0430 0061
61

б

0431
62

ц

0446
63

д

0434
64

е

0435
65

ф

0444
66

г

0433
67

х  x

0445 0078
68

и

0438
69

ј

0458
6A

к

043A
6B

л

043B
6C

м  (m)

043C ( 006D )
6D

н  (h)

043D ( 0068 )
6E

о  o

043E 006F
6F

7_

п

043F
70

ќ

045C
71

р  p

0440 0070
72

с  c

0441 0063
73

т  (t)

0442 ( 0074 )
74

у  y

0443 0079
75

в  (b)

0432 ( 0062 )
76

ѓ

0453
77

љ

0459
78

њ

045A
79

з

0437
7A

ћ

045B
7B

ж

0436
7C

ђ

0452
7D

ш

0448
7E

25A0
7F

The two characters 24 hex ($), 7F hex (■) and twelve Cyrillic letter pairs are coded differently to GOST 13052 and are arranged as closely as possible to the Latin G0 variant "Serbian / Croatian / Slovenian" (see Cyrillic alphabet, Serbian, Serbo-Croatian and Montenegrin ), whereby the Cyrillic letter Dže (Џ) in position 5F hex is only present as a capital letter.

Instead of the dollar sign ($) in some decoders, the 24 hex character represents the Cyrillic capital letter Jo (Ё) with the Unicode number 0401 hex or the Latin capital letter E with Trema (Ë) with the Unicode number 00CB hex .

The coding of the character 2A hex depends on the control .

The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Cyrillic G2 supplementary character set .

Cyrillic G0 primary character set - variant 2 - Russian / Bulgarian
selection bits : 44
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

!

0021
21

" "

0022
22

#

0023
23

$

0024
24

%

0025
25

ы

044B
26

' '

0027
27

(

0028
28

)

0029
29

* | @ 

002A | 0040
2A

+

002B
2B

,

002C
2C

-

002D
2D

.

002E
2E

/

002F
2F

3_

0

0030
30

1

0031
31

2

0032
32

3

0033
33

4th

0034
34

5

0035
35

6th

0036
36

7th

0037
37

8th

0038
38

9

0039
39

:

003A
3A

;

003B
3B

<

003C
3C

=

003D
3D

>

003E
3E

?

003F
3F

4_

Ю

042E
40

А  A

0410 0041
41

Б

0411
42

Ц

0426
43

Д

0414
44

Е

0415
45

Ф

0424
46

Г

0413
47

Х  X

0425 0058
48

И

0418
49

Й (Ѝ)

0419 ( 040D )
4A

К

041A
4B

Л

041B
4C

М  M

041C 004D
4D

Н  H

041D 0048
4E

О  O

041E 004F
4F

5_

П

041F
50

Я

042F
51

Р  P

0420 0050
52

С  C

0421 0043
53

Т  T

0422 0054
54

У  (Y)

0423 ( 0059 )
55

Ж

0416
56

В  B

0412 0042
57

Ь

042C
58

Ъ

042A
59

З

0417
5A

Ш

0428
5B

Э

042D
5C

Щ

0429
5D

Ч

0427
5E

Ы

042B
5F

6_

ю

044E
60

а  a

0430 0061
61

б

0431
62

ц

0446
63

д

0434
64

е

0435
65

ф

0444
66

г

0433
67

х  x

0445 0078
68

и

0438
69

й (ѝ)

0439 ( 045D )
6A

к

043A
6B

л

043B
6C

м  (m)

043C ( 006D )
6D

н  (h)

043D ( 0068 )
6E

о  o

043E 006F
6F

7_

п

043F
70

я

044F
71

р  p

0440 0070
72

с  c

0441 0063
73

т  (t)

0442 ( 0074 )
74

у  y

0443 0079
75

ж

0436
76

в  (b)

0432 ( 0062 )
77

ь

044C
78

ъ

044A
79

з

0437
7A

ш

0448
7B

э

044D
7C

щ

0449
7D

ч

0447
7E

25A0
7F

The three characters 24 hex ($), 26 hex (ы) and 7F hex (■) are coded differently to GOST 13052, as well as the two Cyrillic letter pairs at positions 59 hex / 79 hex (Ъ / ъ) and 5F hex / 26 hex (Ы / ы) swapped according to the Bulgarian variant.

The coding of the character 2A hex depends on the control .

With the Cyrillic letters short I (Й / й) at positions 4A hex and 6A hex , in ETSI EN 300 706 the breve (˘) is probably incorrectly displayed like the superscript (˙). But maybe this was done in order to be able to use it better as a Cyrillic letter I with grave accent (Ѝ / ѝ).

The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Cyrillic G2 supplementary character set .

Cyrillic G0 primary character set - variant 3 - Ukrainian
selection bits : 45
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

!

0021
21

" "

0022
22

#

0023
23

$

0024
24

%

0025
25

ї

0457
26

' '

0027
27

(

0028
28

)

0029
29

* | @ 

002A | 0040
2A

+

002B
2B

,

002C
2C

-

002D
2D

.

002E
2E

/

002F
2F

3_

0

0030
30

1

0031
31

2

0032
32

3

0033
33

4th

0034
34

5

0035
35

6th

0036
36

7th

0037
37

8th

0038
38

9

0039
39

:

003A
3A

;

003B
3B

<

003C
3C

=

003D
3D

>

003E
3E

?

003F
3F

4_

Ю

042E
40

А  A

0410 0041
41

Б

0411
42

Ц

0426
43

Д

0414
44

Е

0415
45

Ф

0424
46

Г

0413
47

Х  X

0425 0058
48

И

0418
49

Й (Ѝ)

0419 ( 040D )
4A

К

041A
4B

Л

041B
4C

М  M

041C 004D
4D

Н  H

041D 0048
4E

О  O

041E 004F
4F

5_

П

041F
50

Я

042F
51

Р  P

0420 0050
52

С  C

0421 0043
53

Т  T

0422 0054
54

У  (Y)

0423 ( 0059 )
55

Ж

0416
56

В  B

0412 0042
57

Ь

042C
58

І

0406
59

З

0417
5A

Ш

0428
5B

Є

0404
5C

Щ

0429
5D

Ч

0427
5E

Ї

0407
5F

6_

ю

044E
60

а  a

0430 0061
61

б

0431
62

ц

0446
63

д

0434
64

е

0435
65

ф

0444
66

г

0433
67

х  x

0445 0078
68

и

0438
69

й (ѝ)

0439 ( 045D )
6A

к

043A
6B

л

043B
6C

м  (m)

043C ( 006D )
6D

н  (h)

043D ( 0068 )
6E

о  o

043E 006F
6F

7_

п

043F
70

я

044F
71

р  p

0440 0070
72

с  c

0441 0063
73

т  (t)

0442 ( 0074 )
74

у  y

0443 0079
75

ж

0436
76

в  (b)

0432 ( 0062 )
77

ь

044C
78

і

0456
79

з

0437
7A

ш

0448
7B

є

0454
7C

щ

0449
7D

ч

0447
7E

25A0
7F

The three characters 24 hex ($), 26 hex (ї), 7F hex (■) and three Cyrillic letter pairs are coded differently from GOST 13052.

The coding of the character 2A hex depends on the control .

With the Cyrillic letters short I (Й / й) at positions 4A hex and 6A hex , in ETSI EN 300 706 the breve (˘) is probably incorrectly displayed like the superscript (˙). But maybe this was done in order to be able to use it better as a Cyrillic letter I with grave accent (Ѝ / ѝ).

The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Cyrillic G2 supplementary character set .

Cyrillic G2 supplementary character set
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

¡

00A1
21

¢

00A2
22

£

00A3
23

$

0024
24

¥

00A5
25


26th

§

00A7
27


28

'

2018
29

"

201C
2A

«

00AB
2B

2190
2C

2191
2D

2192
2E

2193
2F

3_

°

00B0
30

±

00B1
31

²

00B2
32

³

00B3
33

×

00D7
34

µ

00B5
35

00B6
36

·

00B7
37

÷

00F7
38

'

2019
39

201D
3A

»

00BB
3B

¼

00BC
3C

½

00BD
3D

¾

00BE
3E

¿

00BF
3F

4_


40

`

0060
41

´

00B4
42

ˆ

02C6
43

˜ 

02DC
44

¯ ˉ

00AF
45

˘

02D8
46

˙

02D9
47

¨

00A8
48

̣ 

N / A
49

˚

02DA
4A

¸ (̦)

00B8 ( N / A )
4B

_

005F
4C

˝

02DD
4D

˛

02DB
4E

ˇ

02C7
4F

Comb.


40

O

0300
41

ó (ģ)

0301 ( 0327 )
42

O

0302
43

O

0303
44

O

0304
45

O

0306
46

ȯ

0307
47

ö

0308
48

O

0323
49

å

030A
4A

ç (o̦)

0327 ( 0326 )
4B

O

0332
4C

O

030B
4D

ǫ

0328
4E

ǒ

030C
4F

5_

-

2015
50

¹

00B9
51

®

00AE
52

©

00A9
53

2122
54

266A
55

20A0
56

2030
57

221D
58

Ł

0141
59

ł

0142
5A

ß

00DF
5B

215B
5C

215C
5D

215D
5E

215E
5F

6_

D.

0044
60

E.

0045
61

F.

0046
62

G

0047
63

І

0049 0406
64

Ј

004A 0408
65

K

004B
66

L.

004C
67

N

004E
68

Q

0051
69

R.

0052
6A

Ѕ

0053 0405
6B

U

0055
6C

V

0056
6D

W.

0057
6E

Z

005A
6F

7_

d

0064
70

e

0065
71

f

0066
72

G

0067
73

і

0069 0456
74

ј

006A 0458
75

k

006B
76

l

006C
77

n

006E
78

q

0071
79

r

0072
7A

ѕ

0073 0455
7B

u

0075
7C

v

0076
7D

w

0077
7E

z

007A
7F

The characters 20 hex to 5F hex are essentially identical to the Latin G2 supplementary character set without the two additional characters from ITU T.61 . The three characters 59 hex to 5B hex are coded with special Latin letters.

The characters 60 hex to 7F hex are coded with Latin letters which, together with similar looking letters in the Cyrillic G0 primary character sets, each represent the complete Latin alphabet.

The alternative coding of the bold framed characters can be used to supplement the coded Cyrillic alphabet, whereby the two Cyrillic letters Belarusian-Ukrainian I (І / і) and Serbian Je (Ј / ј) at positions 64 hex / 74 hex and 65 hex / 75 hex already exist in the Cyrillic G0 variant 3 "Ukrainian" or 1 "Serbian / Croatian" .

The alternative coding of the characters in the "Combining" line is used depending on the control . As with the Latin G2 supplementary character set, the combining characters should only be used in conjunction with the Latin G0 primary character set.

Greek

The Greek G0 primary character set is essentially identical to the characters 20 hex to 3F hex and C0 hex to FE hex of the 8-bit character set ELOT 928 (identical to ISO 8859-7 ).

Greek G0 primary character set
Selection bits : 67
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

!

0021
21

" "

0022
22

#

0023
23

$

0024
24

%

0025
25

&

0026
26

' '

0027
27

(

0028
28

)

0029
29

* | @ 

002A | 0040
2A

+

002B
2B

,

002C
2C

-

002D
2D

.

002E
2E

/

002F
2F

3_

0

0030
30

1

0031
31

2

0032
32

3

0033
33

4th

0034
34

5

0035
35

6th

0036
36

7th

0037
37

8th

0038
38

9

0039
39

:

003A
3A

;

003B
3B

«

00AB
3C

=

003D
3D

»

00BB
3E

?

003F
3F

4_

ΐ

0390
40

Α  A

0391 0041
41

Β  B

0392 0042
42

Γ

0393
43

Δ

0394
44

Ε  E

0395 0045
45

Ζ

0396
46

Η  H

0397 0048
47

Θ

0398
48

Ι  I

0399 0049
49

Κ  K

039A 004B
4A

Λ

039B
4B

Μ  M

039C 004D
4C

Ν  N

039D 004E
4D

Ξ

039E
4E

Ο  O

039F 004F
4F

5_

Π

03A0
50

Ρ  P

03A1 0050
51

΄

0384
52

Σ

03A3
53

Τ  T

03A4 0054
54

Υ

03A5
55

Φ

03A6
56

Χ  X

03A7 0058
57

Ψ

03A8
58

Ω

03A9
59

Ϊ

03AA
5A

Ϋ

03AB
5B

ά

03AC
5C

έ

03AD
5D

ή

03AE
5E

ί

03AF
5F

6_

ΰ

03B0
60

α

03B1
61

β

03B2
62

γ

03B3
63

δ

03B4
64

ε

03B5
65

ζ

03B6
66

η

03B7
67

θ

03B8
68

ι

03B9
69

κ

03BA
6A

λ

03BB
6B

μ

03BC
6C

ν

03BD
6D

ξ

03BE
6E

ο  o

03BF 006F
6F

7_

π

03C0
70

ρ

03C1
71

ς

03C2
72

σ

03C3
73

τ

03C4
74

υ

03C5
75

φ

03C6
76

χ

03C7
77

ψ

03C8
78

ω

03C9
79

ϊ

03CA
7A

ϋ

03CB
7B

ό

03CC
7C

ύ

03CD
7D

ώ

03CE
7E

25A0
7F

The four characters 3C hex («), 3E hex (»), 52 hex (΄) and 7F hex (■) are coded differently to ELOT 928.

The coding of the character 2A hex depends on the control .

The single tone (΄) at position 52 hex is shown in ETSI EN 300 706 in the example layout, right-justified, so that it is correctly positioned for a subsequent capital letter. This also results in sufficient space for word separation.

In ETSI EN 300 706, for historical reasons, the tonos (΄) is a single character at position 52 hex and in the Greek lowercase letters with dialysis and tonos (΅) in positions 40 hex and 60 hex vertically ('), as well as in the Greek Lowercase letters with tones in positions 5C hex to 5F hex and 7C hex to 7E hex as shown by the over- point (˙).

The Greek small letter Iota (ι) at position 69 hex , as well as with diacritics (ΐ, ί and ϊ) at positions 40 hex , 5F hex and 7A hex is in ETSI EN 300 706 imprecise like the Latin small letter i with serifs ( ı ) shown.

The variant for the end of the word of the Greek lowercase letter Sigma (ς) at position 72 hex is shown in ETSI EN 300 706 inexactly like the Latin lowercase letter s.

The alternative coding of the other characters framed in bold is necessary to complete the Latin alphabet coded in the Greek G2 supplementary character set .

Greek G2 supplementary character set
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

a

0061
21

b

0062
22

£

00A3
23

e

0065
24

H

0068
25

i

0069
26

§

00A7
27

:

003A
28

'

2018
29

"

201C
2A

k

006B
2B

2190
2C

2191
2D

2192
2E

2193
2F

3_

°

00B0
30

±

00B1
31

²

00B2
32

³

00B3
33

×

00D7
34

m

006D
35

n

006E
36

p

0070
37

÷

00F7
38

'

2019
39

201D
3A

t

0074
3B

¼

00BC
3C

½

00BD
3D

¾

00BE
3E

x

0078
3F

4_


40

`

0060
41

´

00B4
42

ˆ

02C6
43

˜ 

02DC
44

¯ ˉ

00AF
45

˘

02D8
46

˙

02D9
47

¨

00A8
48

̣ 

N / A
49

˚

02DA
4A

¸ (̦)

00B8 ( N / A )
4B

_

005F
4C

˝

02DD
4D

˛

02DB
4E

ˇ

02C7
4F

Comb.


40

O

0300
41

ó (ģ)

0301 ( 0327 )
42

O

0302
43

O

0303
44

O

0304
45

O

0306
46

ȯ

0307
47

ö

0308
48

O

0323
49

å

030A
4A

ç (o̦)

0327 ( 0326 )
4B

O

0332
4C

O

030B
4D

ǫ

0328
4E

ǒ

030C
4F

5_

?

003F
50

¹

00B9
51

®

00AE
52

©

00A9
53

2122
54

266A
55

20A0
56

2030
57

221D
58

Ί

038A
59

Ύ

038E
5A

Ώ

038F
5B

215B
5C

215C
5D

215D
5E

215E
5F

6_

C.

0043
60

D.

0044
61

F.

0046
62

G

0047
63

J

004A
64

L.

004C
65

Q

0051
66

R.

0052
67

S.

0053
68

U

0055
69

V

0056
6A

W.

0057
6B

Y

0059
6C

Z

005A
6D

Ά

0386
6E

Ή

0389
6F

7_

c

0063
70

d

0064
71

f

0066
72

G

0067
73

j

006A
74

l

006C
75

q

0071
76

r

0072
77

s

0073
78

u

0075
79

v

0076
7A

w

0077
7B

y

0079
7C

z

007A
7D

Έ

0388
7E

25A0
7F

The characters 20 hex to 5F hex and 7F hex are largely identical to the Latin G2 supplementary character set without the two additional characters from ITU T.61 . The three characters 59 hex to 5B hex are coded with special Greek letters, and a further eleven characters with Latin lower case letters. In addition, the two characters 28 hex and 50 hex are coded differently as a colon (:) and question mark (?), Although these are already included in the Greek G0 primary character set . This may have historical reasons, because these two characters are not available in the 7-bit ISO-IR-27 character set.

The characters 60 hex to 7E hex are coded with Latin letters and special Greek letters. The Latin letters together with similar looking letters in the Greek G0 primary character set form the complete Latin alphabet.

For the Greek capital letters with tonos in positions 59 hex to 5B hex , 6E hex , 6F hex and 7E hex , the tonos (΄) is shown vertically (') in ETSI EN 300 706 for historical reasons.

The alternative coding of the characters in the "Combining" line is used depending on the control . As with the Latin G2 supplementary character set, the combining characters should only be used in conjunction with the Latin G0 primary character set.

Arabic

The Arabic G0 primary character set is largely identical to the 7-bit character set ASMO 449 (adopted in ISO 8859-6 ), whereby the Latin G0 variant "English" is used for the special characters and the Arabic letters are shown with their presentation forms. Five special letters have been moved to the Arabic G2 supplementary character set , which also contains additional letters for Persian.

The Arabic letters with multiple codings and an optional connection to the right are shown in ETSI EN 300 706 on the right without their own connecting line and are accordingly coded primarily as an initial or isolated form of presentation. Deviating from this, the three Arabic letters of the " Ǧīm " family (ﺝ, ﺡ and ﺥ) at positions 4C hex to 4E hex in the Arabic G0 primary character set are more likely to be presented as a medial form of presentation (with a straight baseline), but still primary Coded as the initial form of presentation, as the medial forms of presentation (without a straight base line) are also available at positions 5C hex to 5E hex in the Arabic G0 primary character set (see also the note on the table ).

In addition, the Arabic letter Yāʾ (ﻱ) at position 27 hex in the Arabic G0 primary character set and with Hamza above (ﺉ) at position 27 hex in the Arabic G2 supplementary character set is more of a final form of presentation and is therefore primarily coded as the isolated form of presentation does not optically allow a correct connection to the right.

The Arabic letters with several codings and an optional connection to the left are shown in ETSI EN 300 706 on the left with a connecting line and accordingly primarily coded as an initial form of presentation. In contrast to this, the four Arabic letters of the " Sīn " family (ﺱ, ﺵ, ﺹ and ﺽ) at positions 53 hex to 56 hex in the Arabic G0 primary character set are shown on the left without a terminator or their own connecting line and must each have a second Characters are completed (see note on the table ).

In the case of Arabic letters with several Unicode numbers, when outputting in Unicode, either the appropriate Unicode number must be selected according to the two neighboring characters on the left and right or, in the simplest case, the first Unicode number must be used. A bold unicode number stands for the actual character. If the actual characters are used instead of the presentation forms for the output in Unicode, then the non-width non-connector (ZWNJ) with the Unicode number 200C hex or the non- width connector (ZWJ) with the Unicode number 200D hex may have to be inserted in order to enable the automatic selection of the To restrict glyphs to the possible forms of presentation of the respective characters.

The Arabic script is written from right to left , but the arrangement in teletext is from left to right as usual. For this reason, when outputting in Unicode, either the Unicode Bidi algorithm must be used backwards or, in the simplest case, the bidirectional control character left-to-right-forced (LRO) with the Unicode number 202D hex must be placed in front of each line.

Arabic G0 primary character set
Selection bits : 87 or A7
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

!

0021
21

" "

0022
22

£

00A3
23

$

0024
24

%

0025
25

ں

FE73
26

ﻲ ﻱ

FEF2  FEF1
064A
27

)

0029
28

(

0028
29

* | @ 

002A | 0040
2A

+

002B
2B

,

060C 002C
2C

-

002D
2D

.

002E
2E

/

002F
2F

3_

0

0030
30

1

0031
31

2

0032
32

3

0033
33

4th

0034
34

5

0035
35

6th

0036
36

7th

0037
37

8th

0038
38

9

0039
39

:

003A
3A

؛

061B
3B

>

003E
3C

=

003D
3D

<

003C
3E

؟

061F
3F

4_

FE94
0629
40

FE80
0621
41

FE92
0628
42

ﺏ ﺐ

FE8F  FE90
0628
43

FE98
062A
44

ﺕ ﺖ

FE95  FE96
062A
45

FE8E
0627
46

FE8D
0627
47

FE91
0628
48

FE93
0629
49

FE97
062A
4A

FE9B
062B
4B

ﺟ ﺠ  

FE9F  FEA0
062C
4C

ﺣ ﺤ  

FEA3  FEA4
062D
4D

ﺧ ﺨ  

FEA7  FEA8
062E
4E

ﺩ ﺪ

FEA9  FEAA
062F
4F

5_

ﺫ ﺬ

FEAB  FEAC
0630
50

ﺭ ﺮ

FEAD  FEAE
0631
51

ﺯ ﺰ

FEAF  FEB0
0632
52

ﺳ ﺴ (ﺱ ﺲ)

FEB3  FEB4 ( FEB1  FEB2 )
0633
53

ﺷ ﺸ (ﺵ ﺶ)

FEB7  FEB8 ( FEB5  FEB6 )
0634
54

ﺻ ﺼ (ﺹ ﺺ)

FEBB  FEBC ( FEB9  FEBA )
0635
55

ﺿ ﻀ (ﺽ ﺾ)

FEBF  FEC0 ( FEBD  FEBE )
0636
56

ﻃ ﻁ ﻂ ﻄ

FEC3  FEC1 FEC2  FEC4
0637
57

ﻇ ﻅ ﻆ ﻈ

FEC7  FEC5 FEC6  FEC8
0638
58

FECB
0639
59

FECF
063A
5A

FE9C
062B
5B

FEA0
062C
5C

FEA4
062D
5D

FEA8
062E
5E

#

0023
5F

6_

ـ

0640
60

FED3
0641
61

FED7
0642
62

ﻛ ﻜ

FEDB  FEDC
0643
63

FEDF
0644
64

FEE3
0645
65

FEE7
0646
66

FEEB
0647
67

ﻭ ﻮ

FEED  FEEE
0648
68

FEF0
0649
69

FEF3
064A
6A

ﺙ ﺚ

FE99  FE9A
062B
6B

ﺝ ﺞ

FE9D  FE9E
062C
6C

ﺡ ﺢ

FEA1  FEA2
062D
6D

ﺥ ﺦ

FEA5  FEA6
062E
6E

FEF4
064A
6F

Pers.

FBFC
06CC
70

ﮐ ﮎ ﮏ ﮑ

FB90  FB8E FB8F  FB91
06A9
63

FBFD
06CC
69

FBFE
06CC
6A

ﯿ

FBFF
06CC
6F

7_

FEEF
0649
70

FECC
0639
71

FED0
063A
72

FED4
0641
73

ﻑ ﻒ

FED1  FED2
0641
74

FED8
0642
75

ﻕ ﻖ

FED5  FED6
0642
76

ﻙ ﻚ

FED9  FEDA
0643
77

FEE0
0644
78

ﻝ ﻞ

FEDD  FEDE
0644
79

FEE4
0645
7A

ﻡ ﻢ

FEE1  FEE2
0645
7B

FEE8
0646
7C

ﻥ ﻦ

FEE5  FEE6
0646
7D

FEFB
7E

25A0
7F

The two characters 26 hex () and 27 hex (ﻱ) are coded differently to ASMO 449 . In addition, five special letters and almost all special characters in positions 40 hex to 7E hex have been replaced by other forms of presentation of the coded Arabic letters.

The character 26 hex () serves as the final part for the isolated and final forms of presentation of the four Arabic letters of the " Sīn " family (ﺱ, ﺵ, ﺹ and ﺽ) at positions 53 hex to 56 hex .

The two round brackets (“)” and “(”) at positions 28 hex and 29 hex , as well as the two comparison characters (> and <) at positions 3C hex and 3E hex are coded clockwise as in the other character sets , since the All characters in teletext are always arranged from left to right.

The coding of the character 2A hex depends on the control .

The Arabic comma (،) at the 2C hex position is shown in ETSI EN 300 706 in the example layout so that it can also be used optically as a normal comma (,).

The combined initial and medial presentation forms of the three Arabic letters of the " Ǧīm " family ( / , / and / ) at positions 4C hex to 4E hex are in ETSI EN 300 706 suitable for the initial and medial Presentation forms of the Persian letter Che ( / ) at positions 28 hex and 29 hex in the Arabic G2 supplementary character set shown with a straight base line. However, the coding as media presentation forms are identical to the media presentation forms without a straight base line ( , and an ) at positions 5C hex to 5E hex , since this is only a layout variation. The same applies to the use as initial forms of presentation, although there are no separate characters for the layout variation without a straight baseline ( , and ).

The four Arabic letters of the " Sīn " family (ﺱ, ﺵ, ﺹ and ﺽ) at positions 53 hex to 56 hex are shown on the left without any termination or their own connecting line and each must be completed with a second character. When used as an isolated or final form of presentation, the end piece () must be added to the left at position 26 hex . When used as an initial or medial form of presentation, the modifying character Taṭwīl (ـ) must be added to the left at position 60 hex if the left neighbor does not have its own connecting line to the right or if it is very short.

The alternative coding (with identical layout) of the letters in the line "Persian" serves to complete the Persian letters coded in the Arabic G2 supplementary character set.

Arabic G2 supplementary character set
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

FEC9
0639
21

ﺁ (ﺂ)

FE81 ( FE82 )
0622
22

ﺃ (ﺄ)

FE83 ( FE84 )
0623
23

ﺅ ﺆ

FE85  FE86
0624
24

ﺇ (ﺈ)

FE87 ( FE88 )
0625
25

FE8B
0626
26

ﺊ ﺉ

FE8A  FE89
0626
27

FB7C
0686
28

FB7D
0686
29

ﭺ ﭻ

FB7A  FB7B
0686
2A

FB58
067E
2B

FB59
067E
2C

ﭖ ﭗ

FB56  FB57
067E
2D

ﮊ ﮋ

FB8A  FB8B
0698
2E

ﮔ ﮒ ﮓ ﮕ

FB94  FB92 FB93  FB95
06AF
2F

3_

٠

0660
30

١

0661
31

٢

0662
32

٣

0663
33

٤

0664
34

٥

0665
35

٦

0666
36

٧

0667
37

٨

0668
38

٩

0669
39

FECE
063A
3A

FECD
063A
3B

FEFC
3C

FEEC
0647
3D

FEEA
0647
3E

FEE9
0647
3F

4_

à

00E0
40

A.

0041
41

B.

0042
42

C.

0043
43

D.

0044
44

E.

0045
45

F.

0046
46

G

0047
47

H

0048
48

I.

0049
49

J

004A
4A

K

004B
4B

L.

004C
4C

M.

004D
4D

N

004E
4E

O

004F
4F

5_

P

0050
50

Q

0051
51

R.

0052
52

S.

0053
53

T

0054
54

U

0055
55

V

0056
56

W.

0057
57

X

0058
58

Y

0059
59

Z

005A
5A

ë

00EB
5B

ê

00EA
5C

ù

00F9
5D

î

00EE
5E

FECA
0639
5F

6_

é

00E9
60

a

0061
61

b

0062
62

c

0063
63

d

0064
64

e

0065
65

f

0066
66

G

0067
67

H

0068
68

i

0069
69

j

006A
6A

k

006B
6B

l

006C
6C

m

006D
6D

n

006E
6E

O

006F
6F

7_

p

0070
70

q

0071
71

r

0072
72

s

0073
73

t

0074
74

u

0075
75

v

0076
76

w

0077
77

x

0078
78

y

0079
79

z

007A
7A

â

00E2
7B

O

00F4
7C

û

00FB
7D

ç

00E7
7E


7F

The character set is partially identical to the Latin G0 primary character set . The digits are coded differently with their Arabic-Indian variants. In addition, all special characters have been replaced by presentation forms of Arabic letters and modified Latin lowercase letters to spell French (see Windows-1256 ).

The alternative coding of the characters framed in bold is necessary to complete all forms of presentation of the coded Arabic letters.

Hebrew

The Hebrew G0 primary character set is essentially identical to the 7-bit character set SI 960 (adopted in ISO 8859-8 ), whereby the Latin G0 variant "English" is used for the special characters . A Hebrew G2 supplementary character set is not defined; the Arabic G2 supplementary character set is used.

The Hebrew script is written from right to left , but the arrangement in teletext is from left to right as usual. For this reason, when outputting in Unicode, either the Unicode Bidi algorithm must be used backwards or, in the simplest case, the bidirectional control character left-to-right-forced (LRO) with the Unicode number 202D hex must be placed in front of each line.

Hebrew G0 primary character set
selection bits : A5
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

!

0021
21

" "

0022
22

£

00A3
23

$

0024
24

%

0025
25

&

0026
26

' '

0027
27

(

0028
28

)

0029
29

* | @ 

002A | 0040
2A

+

002B
2B

,

002C
2C

-

002D
2D

.

002E
2E

/

002F
2F

3_

0

0030
30

1

0031
31

2

0032
32

3

0033
33

4th

0034
34

5

0035
35

6th

0036
36

7th

0037
37

8th

0038
38

9

0039
39

:

003A
3A

;

003B
3B

<

003C
3C

=

003D
3D

>

003E
3E

?

003F
3F

4_

@

0040
40

A.

0041
41

B.

0042
42

C.

0043
43

D.

0044
44

E.

0045
45

F.

0046
46

G

0047
47

H

0048
48

I.

0049
49

J

004A
4A

K

004B
4B

L.

004C
4C

M.

004D
4D

N

004E
4E

O

004F
4F

5_

P

0050
50

Q

0051
51

R.

0052
52

S.

0053
53

T

0054
54

U

0055
55

V

0056
56

W.

0057
57

X

0058
58

Y

0059
59

Z

005A
5A

2190
5B

½

00BD
5C

2192
5D

2191
5E

#

0023
5F

6_

א

05D0
60

ב

05D1
61

ג

05D2
62

ד

05D3
63

ה

05D4
64

ו

05D5
65

ז

05D6
66

ח

05D7
67

ט

05D8
68

י

05D9
69

ך

05DA
6A

כ

05DB
6B

ל

05DC
6C

ם

05DD
6D

מ

05DE
6E

ן

05DF
6F

7_

נ

05E0
70

ס

05E1
71

ע

05E2
72

ף

05E3
73

פ

05E4
74

ץ

05E5
75

צ

05E6
76

ק

05E7
77

ר

05E8
78

ש

05E9
79

ת

05EA
7A

20AA
7B

2225
7C

¾

00BE
7D

÷

00F7
7E

25A0
7F

In contrast to SI 960, the 7B hex ( Zeichen ) character is coded as a shekel currency symbol (see Windows-1255 ).

The coding of the character 2A hex depends on the control .

graphic

The characters with a 6-digit Unicode number (01FBxx hex ) will only be included in a future version of Unicode and may still change.

With normal teletext in 4: 3 format , the ratio of width to height of a character is 4: 5. This must be observed for the justified display of a graphic.

Since the exact layout of the Unicode characters is heavily dependent on the font and these do not always match, you should draw all graphic characters yourself if necessary.

G1 character set block graphics
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

0020
20

█▌ 
   
   

01FB00
21

 ▐█
   
   

01FB01
22

███
   
   

01FB02
23

   
█▌ 
   

01FB03
24

█▌ 
█▌ 
   

01FB04
25

 ▐█
█▌ 
   

01FB05
26

███
█▌ 
   

01FB06
27

   
 ▐█
   

01FB07
28

█▌ 
 ▐█
   

01FB08
29

 ▐█
 ▐█
   

01FB09
2A

███
 ▐█
   

01FB0A
2B

   
███
   

01FB0B
2C

█▌ 
███
   

01FB0C
2D

 ▐█
███
   

01FB0D
2E

███
███
   

01FB0E
2F

3_

   
   
█▌ 

01FB0F
30

█▌ 
   
█▌ 

01FB10
31

 ▐█
   
█▌ 

01FB11
32

███
   
█▌ 

01FB12
33

   
█▌ 
█▌ 

01FB13
34

258C
35

 ▐█
█▌ 
█▌ 

01FB14
36

███
█▌ 
█▌ 

01FB15
37

   
 ▐█
█▌ 

01FB16
38

█▌ 
 ▐█
█▌ 

01FB17
39

 ▐█
 ▐█
█▌ 

01FB18
3A

███
 ▐█
█▌ 

01FB19
3B

   
███
█▌ 

01FB1A
3C

█▌ 
███
█▌ 

01FB1B
3D

 ▐█
███
█▌ 

01FB1C
3E

███
███
█▌ 

01FB1D
3F

4_

[G0]

 
40

[G0]

 
41

[G0]

 
42

[G0]

 
43

[G0]

 
44

[G0]

 
45

[G0]

 
46

[G0]

 
47

[G0]

 
48

[G0]

 
49

[G0]

 
4A

[G0]

 
4B

[G0]

 
4C

[G0]

 
4D

[G0]

 
4E

[G0]

 
4F

5_

[G0]

 
50

[G0]

 
51

[G0]

 
52

[G0]

 
53

[G0]

 
54

[G0]

 
55

[G0]

 
56

[G0]

 
57

[G0]

 
58

[G0]

 
59

[G0]

 
5A

[G0]

 
5B

[G0]

 
5C

[G0]

 
5D

[G0]

 
5E

[G0]

 
5F

6_

   
   
 ▐█

01FB1E
60

█▌ 
   
 ▐█

01FB1F
61

 ▐█
   
 ▐█

01FB20
62

███
   
 ▐█

01FB21
63

   
█▌ 
 ▐█

01FB22
64

█▌ 
█▌ 
 ▐█

01FB23
65

 ▐█
█▌ 
 ▐█

01FB24
66

███
█▌ 
 ▐█

01FB25
67

   
 ▐█
 ▐█

01FB26
68

█▌ 
 ▐█
 ▐█

01FB27
69

2590
6A

███
 ▐█
 ▐█

01FB28
6B

   
███
 ▐█

01FB29
6C

█▌ 
███
 ▐█

01FB2A
6D

 ▐█
███
 ▐█

01FB2B
6E

███
███
 ▐█

01FB2C
6F

7_

   
   
███

01FB2D
70

█▌ 
   
███

01FB2E
71

 ▐█
   
███

01FB2F
72

███
   
███

01FB30
73

   
█▌ 
███

01FB31
74

█▌ 
█▌ 
███

01FB32
75

 ▐█
█▌ 
███

01FB33
76

███
█▌ 
███

01FB34
77

   
 ▐█
███

01FB35
78

█▌ 
 ▐█
███

01FB36
79

 ▐█
 ▐█
███

01FB37
7A

███
 ▐█
███

01FB38
7B

   
███
███

01FB39
7C

█▌ 
███
███

01FB3A
7D

 ▐█
███
███

01FB3B
7E

2588
7F

The graphic space at position 20 hex is as wide as the block elements at positions 21 hex to 3F hex and 60 hex to 7F hex and can be coded as normal or protected spaces , as they are just as wide in a font with a fixed character width are. However, encoding as a separate character similar to the digit space with the Unicode number 2007 hex would be better, which is not available in Unicode . The attribute "Separate block graphic / underline " has no effect on the graphic space.

The 63 block elements at the positions 21 hex to 3F hex and 60 hex to 7F hex be dependent on the corresponding attribute as shown in contiguous or alternatively as to the right of the full block (█) at the position 7F hex illustrated in separate form. In the split shape, the six rectangular blocks that make up these graphic characters are smaller and not connected to each other. The separated forms are not defined as independent characters in Unicode .

The corresponding characters of the selected G0 primary character set are used for the 32 positions 40 hex to 5F hex .

G3 character set High resolution graphics
_0 _1 _2 _3 _4 _5 _6 _7 _8th _9 _A _B _C _D _E _F
2_

?

01FB3C
20

?

01FB3D
21

?

01FB3E
22

?

01FB3F
23

?

01FB40
24

( 25E3 )
25

?

01FB41
26

?

01FB42
27

?

01FB43
28

?

01FB44
29

?

01FB45
2A

?

01FB46
2B

?

01FB68
2C

?

01FB69
2D

▐  
▐  
▐  

( 01FB70 ) ( 01FB71 )
2E

2592
2F

3_

?

01FB47
30

?

01FB48
31

?

01FB49
32

?

01FB4A
33

?

01FB4B
34

( 25E2 )
35

?

01FB4C
36

?

01FB4D
37

?

01FB4E
38

?

01FB4F
39

?

01FB50
3A

?

01FB51
3B

?

01FB6A
3C

?

01FB6B
3D

  ▌
  ▌
  ▌

( 01FB75 ) ( 01FB74 )
3E

2588
3F

4_

 ▌ 
███
   

( 2537 )
40

   
███
 ▌ 

( 252F )
41

 ▌ 
 ██
 ▌ 

( 251D )
42

 ▌ 
█▌ 
 ▌ 

( 2525 )
43

?

01FBA4
44

?

01FBA5
45

?

01FBA6
46

?

01FBA7
47

?

01FBA0
48

?

01FBA1
49

?

01FBA2
4A

?

01FBA3
4B

 ▌ 
███
 ▌ 

( 253F )
4C

26AB
4D

2B24
4E

25EF
4F

5_

2502
50

| -

2500 | 2015
51

250C
52

2510
53

2514
54

2518
55

251C
56

2524
57

252C
58

2534
59

253C
5A

⭢ | →

2B62 | 2192
5B

⭠ | ←

2B60 | 2190
5C

⭡ | ↑

2B61 | 2191
5D

2B63
5E

0020
5F

6_

?

01FB52
60

?

01FB53
61

?

01FB54
62

?

01FB55
63

?

01FB56
64

( 25E5 )
65

?

01FB57
66

?

01FB58
67

?

01FB59
68

?

01FB5A
69

?

01FB5B
6A

?

01FB5C
6B

?

01FB6C
6C

?

01FB6D
6D


6E


6F

7_

?

01FB5D
70

?

01FB5E
71

?

01FB5F
72

?

01FB60
73

?

01FB61
74

( 25E4 )
75

?

01FB62
76

?

01FB63
77

?

01FB64
78

?

01FB65
79

?

01FB66
7A

?

01FB67
7B

?

01FB6E
7C

?

01FB6F
7D


7E


7F

The 57 smoothed block elements at the positions 20 hex to 2D hex , 30 hex to 3D hex , 3F hex , 60 hex to 6D hex and 70 hex to 7D hex are in some decoders depending on the associated attribute as shown in contiguous or alternatively like the block elements shown in separate form in the G1 block graphic character set (see ITU T.101 ). The separated forms are not defined as independent characters in Unicode .

In the case of the four triangles at positions 25 hex , 35 hex , 65 hex and 75 hex , the alternatively coded Unicode characters are not graphic elements that connect the teletext characters , but rather geometric shapes aligned on the baseline , each on all four sides of space are surrounded.

The left thin vertical frame line ( ) at position 2E hex is centered horizontally in relation to the left half block (▌) at position 35 hex in the G1 block graphic character set . The alternatively coded Unicode characters, on the other hand, are not lines, but vertical eighth blocks to the left and right of the line position.

The right thin vertical frame line ( ) at position 3E hex is centered horizontally in relation to the right half block (▐) at position 6A hex in the G1 block graphic character set . The alternatively coded Unicode characters, however, are not lines, but vertical eighth blocks to the right and left of the line position.

For the five frame elements at positions 40 hex to 43 hex and 4C hex , the thick horizontal line corresponds to the middle horizontal third block (?) at position 2C hex in the G1 block graphic character set . With the alternatively coded Unicode characters, on the other hand, the thick horizontal line corresponds to the thick horizontal frame line (━) with the Unicode number 2501 hex , which is significantly thinner.

The following three circles do not have a fixed Unicode assignment and are coded based on Unicode Technical Report # 25. The exact layout of the Unicode characters depends heavily on the font, if they are supported at all. For the two large circles in full block width, at least in a font with a fixed character width, the largest Unicode circles should fit best, and even in the proportional font "Arial Unicode MS" the large circle line ( ) with the Unicode number 25EF hex is the same wide as the full block ( ) at position 3F hex .

The filled small circle ( ) at position 4D hex is the same size as the sixth block (?) at position 24 hex in the G1 block graphic character set and is centered.

The filled in large circle ( ) at position 4E hex and the large circle line ( ) at position 4F hex are each as wide as the full block (█) at position 3F hex and vertically centered.

The two arrows to the right (⭢) and left (⭠) at positions 5B hex and 5C hex match the thin horizontal frame lines (─) of the characters 51 hex to 5A hex and can be seamlessly connected to these at the beginning. These characters are shown in ETSI EN 300 706 in the example layout with a thicker line width than the three characters with a similar layout (→, ← and -) at positions 5D hex , 5B hex and 60 hex in the Latin G0 variant "English" and at positions 2E hex , 2C hex and 50 hex in the Latin G2 supplementary character set and should not be mixed together.

The two arrows up (⭡) and down (⭣) at the positions 5D hex and 5E hex match the thin vertical frame lines (│) of the characters 40 hex to 4C hex and 50 hex to 5A hex and can start with these be seamlessly connected.

The graphic space at position 5F hex is identical to the graphic space at position 20 hex in the G1 block graphic character set and should therefore be coded identically.

The characters with the Unicode number in brackets are similar to the example layouts given in ETSI EN 300 706 , but usually do not match the other graphic characters visually and semantically. However, there is no better Unicode encoding for these characters .

Many Level 1.5 decoders only support the four characters framed in bold, so the assumption is that they use characters with a similar layout from the Latin G0 variant "English" and that the characters must be coded alternatively accordingly .

Character set selection

With the selection bits in the national G0 character set tables, the associated G2 character set is usually also selected. The first hexadecimal number indicates the four most significant bits (the region) and the second number the three least significant bits (the national variant).

Selection bits of the national G0 / G2 character sets
0_ 1_ 2_ 3_ 4_ 6_ 8th_ A_
Western European Central European (Polish) Turkish (Western European) Southeast European (Romanian) Eastern European (Cyrillic) Greek / Turkish Arabic Hebrew / Arabic
_0 English Polish English Cyrillic 1 (Serbian / Croatian) English
Latin G2

00

Latin G2

10

Latin G2

20th

Cyrillic G2

40

Arabic G2

80

_1 German German German German
Latin G2

01

Latin G2

11

Latin G2

21st

Latin G2

41

_2 Swedish / Finnish, Hungarian Swedish / Finnish, Hungarian Swedish / Finnish, Hungarian Estonian
Latin G2

02

Latin G2

12

Latin G2

22nd

Latin G2

42

_3 Italian Italian Italian Latvian / Lithuanian
Latin G2

03

Latin G2

13

Latin G2

23

Latin G2

43

_4 French French French Cyrillic 2 (Russian / Bulgarian) French
Latin G2

04

Latin G2

14th

Latin G2

24

Cyrillic G2

44

Arabic G2

84

_5 Portuguese / Spanish Portuguese / Spanish Serbian / Croatian / Slovenian Cyrillic 3 (Ukrainian) Hebrew
Latin G2

05

Latin G2

25th

Latin G2

35

Cyrillic G2

45

Arabic G2

A5

_6 Czech / Slovak Czech / Slovak Turkish Czech / Slovak Turkish
Latin G2

06

Latin G2

16

Latin G2

26th

Latin G2

46

Latin G2

66

_7 Romanian Greek Arabic Arabic
Latin G2

37

Greek G2

67

Arabic G2

87

Arabic G2

A7

Second G0 English 1 

4+

English 2 

8+

Arabic 3 

A +

Notes on the G0 character set:

Notes on the second G0 character set:

1With Cyrillic, the second G0 character set for Russian channels must be preset with the Latin variant "English" .
2In Arabic, the second G0 character set for Iranian channels must be preset with the Latin variant "English" .
3In Hebrew, the second G0 character set for Israeli channels must be preset with “ Arabic ”.
Selection of the national G0 / G2 character sets
Level priority Selection bits for standard G0 / G2 G0 character set G1 character set G2 character set

1 = highest

superior inferior default Second G0 X / 26 selection default default X / 26 selection
X / 0 (page header) all 8th Decoder 1  Page header 2  3 

(from level 1.5)

X / 28/1 ≤ 1.5 4  4th package Page header 5  5 

(from level 1.5)

M / 29/1 ≤ 1.5 4  7th package Page header 5  5 

(from level 1.5)

X / 28/0 format 1 ≥ 2.5 2 package Page header

(with some Level 2.5 decoders from the package)

X / 28/4 ≥ 3.5 3 package Page header
M / 29/0 ≥ 2.5 5 package Page header

(with some Level 2.5 decoders from the package)

M / 29/4 ≥ 3.5 6th package Page header
X / 26 column function……
08 hex "Modified G0 and G2 Character Set"
≥ 2.5 1 67  7 

Presettings for each Teletext page:

1The more significant selection bits for the standard G0 / G2 character sets depend on the decoder and the region set there. From level 2.5, the neutral default setting is 0 (Western European) - Latin .
2The selection of the second G0 character set depends on the decoder and the region set there. Whether the selection of the standard G0 character set should have an influence on the second G0 character set at this point is not specified, but is necessary.
3With many Level 1.5 decoders, the selection and the character set of the G2 character set are limited. Whether the selection of the standard G0 character set should have any influence on the G2 character set at this point is not specified, but it would make sense. However, this question only arises for the two higher-value selection bits 4 (Eastern European, Cyrillic) and 6 (Greek / Turkish) , where more than one G2 character set is defined in each case.

Notes on packages X / 28/1 and M / 29/1:

4thThe character set selection functions in these packages are defined in earlier specifications and have been retained for compatibility with corresponding Level 1 and Level 1.5 decoders. They are not intended for use by Level 2.5 and Level 3.5 decoders.
5 It is not known whether the selection of the standard G0 character set should have an effect on the second G0 character set and the G2 character set, but it would make sense.

Notes on the X / 26 selection:

6thWith the X / 26 selection, the Latin variant "Standard" is always used.
7th At level 2.5, in addition to the standard G0 / G2 character set pair, only one additional G0 / G2 character set pair is possible for each teletext page, from level 3.5 any number.
Choice of characters
Level Control characters

00 hex ..1F hex

G0 character set G1 character set G2 character set G3 character set
default Second G0 X / 26 selection Character 2A hex Latin variant Standard a  default X / 26 selection Standard b 
X / 0 to X / 25 Simple level 1 teletext page all 1  23  3  * national 4 
X / 26 column function ...
... 10 hex "G0 Character" ≥ 1.5 @ default
… 09 hex "G0 Character (Levels 2.5 & 3.5)" ≥ 2.5 * default
... 11 hex to 1F hex "G0 Character with diacritical mark" ≥ 1.5 * default combining combining
... 01 hex "G1 character" ≥ 2.5 5  5  default 5 
… 0F hex "G2 Character" ≥ 1.5 6 
... 02 hex "G3 Character (Level 1.5)" ≥ 1.5 6 
… 0B hex "G3 Character (Levels 2.5 & 3.5)" ≥ 2.5

Notes on the G1 and G3 character sets :

aWith the G1 character set , the form of the 63 block elements (positions 21 hex to 3F hex and 60 hex to 7F hex ) can be combined with the two control characters 19 hex "Contiguous Mosaic Graphics" (connected) and 1A hex "Separated Mosaic Graphics" (separated), as well as from level 2.5 with the X / 26 column function 0C hex "Display attributes" as an attribute. The contiguous shape is preset at the beginning of each line.
bWith the G3 character set , the shape of the 57 smoothed block elements (positions 20 hex to 2D hex , 30 hex to 3D hex , 3F hex , 60 hex to 6D hex and 70 hex to 7D hex ) can be used with some decoders as with the block elements in G1 Character set as an attribute.

Notes on the simple level 1 teletext page:

1In the case of a control character, the space is normally displayed at position 20 hex in the selected character set. In the graphics hold mode, the last selected G1 block element / space (positions 20 hex to 3F hex and 60 hex to 7F hex ) is displayed when the G1 character set is selected . This stop character is reset to the blank at the beginning of each line, when changing G0 / G1 character set or when real size changes are made. The hold mode can be switched on and off with the two control characters 1E hex “Hold Mosaics” and 1F hex “Release Mosaics”, whereby the current hold character is already or still displayed. At the beginning of each line the hold mode is switched off.
2 The first G0 character set is always selected at the beginning of each line.
3The G0 character set can be selected with the eight control characters 00 hex to 07 hex "Alpha Color Codes". The control character 1B hex "ESC" can be used to switch between the first and second G0 character set .
4thThe G1 character set can be selected with the eight control characters 10 hex to 17 hex "Mosaic Color Codes". The corresponding characters from the selected G0 character set (standard or second G0) are used for the 32 positions 40 hex to 5F hex .

Comment on the X / 26 column function 01 hex "G1 Character":

5With the G1 character set , the corresponding characters of the selected G0 character set (standard or X / 26 selection) are used for the 32 positions 40 hex to 5F hex .

Comment on the X / 26 column functions 0F hex "G2 Character" and 02 hex "G3 Character (Level 1.5)":

6thWith many Level 1.5 decoders the character set of the G2 and G3 character sets is limited.

Web links

Individual evidence

  1. a b Philips SAA5246A , Philips, 1993 (English)
  2. Character histories: notes on some Ascii code positions , Jukka “Yucca” Korpela, 2006 (English);
    7-bit character sets , Aivosto Oy, 2016 (English)
  3. Quarter-quadrant, hyphen / divis , Wikipedia: “In the older ASCII character set and in the character sets of the ISO 8859 family of standards [...] the hyphen-minus is used, which was introduced with the typewriter as a common character for hyphen, dash and minus sign . ";
    IT and communication - Characters and encodings: The ISO Latin 1 character repertoire: Detailed descriptions of the characters, "- HYPHEN, MINUS SIGN (HYPHEN-MINUS) U + 002D" , Jukka "Yucca" Korpela, 2006 (English): "In situations where sufficient support to Unicode can be safely assumed (very rarely at present!), it is best to replace the use of hyphen-minus by Unicode hyphen (U + 2010) or non-breaking hyphen (U + 2011) or minus sign (U + 2212) or, if hyphen-minus had been used eg in place of a dash symbol, some other Unicode character such as en dash (U + 2013) or em dash (U + 2014) or horizontal bar (U + 2015 ). "
  4. a b c Minus sign, similar signs , U + 2015 horizontal bar , Wikipedia: " (2) This sign generally resembles an em dash in length, shape and altitude and differs from it only in its line break properties."
  5. On the use of some MS Windows characters in HTML, Suggested substitutes, Dashes , Jukka "Yucca" Korpela, 2017 (English): "In typewritten material, the em dash is represented by two hyphens with no space around them, and an en dash is represented by a hyphen. "
  6. Internationalization for Turkish: Dotted and Dotless Letter "I" , Tex Texin, 2010 (English);
    Resolving dotted and dotless "i" , John Cowan, 1997 (English)
  7. a b circumflex, character sets , Wikipedia: “The ASCII character set only contains the character ^ (in Unicode at position U + 005E), which is now interpreted as a single, universally applicable character. [...] In addition to the universal character ^ (U + 005E), the Unicode standard contains the typographically better character ˆ (U + 02C6) as well as other pre-composed characters with circumflex (e.g. Ẑ, ẑ). “;
    ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM43 Arrowhead upwards, circumflex shape"
  8. ^ A b ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM48 Lower bar (not jointive) low line, spacing underline (equivalent to SP09 of ISO 6937) "
  9. a b Grave accent, As surrogate of apostrophe or (opening) single quote , Wikipedia (English): "Additionally ASCII grave accent character (U + 0060` Grave accent ) was often used as surrogate of opening single quote, together with ASCII typewriter apostrophe (U + 0027 ' apostrophe ) used as closing single quote; double quotes were sometimes substituted by two consecutive grave accents and two consecutive typewriter apostrophes (`` ... ''). ";
    ASCII and Unicode quotation marks , Markus Kuhn, 2007 (English): "Only old X Window System fonts and some old video terminals show ASCII 0x60 / 0x27 as left and right quotation marks, while most modern systems follow the ISO and Unicode standards instead. ";
    ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM44 Upper reverse solidus, grave accent shape"
  10. Character histories: notes on some Ascii code positions, VERTICAL LINE , Jukka "Yucca" Korpela, 2006 (English)
  11. a b Tilde, ASCII tilde (U + 007E) , Wikipedia (English): “Most modern proportional fonts align plain spacing tilde at the same level as dashes, or only slightly upper. This distinguishes it from a small tilde (˜), which is always raised. But in some monospace fonts, especially used in text user interfaces, ASCII tilde character is raised too. This apparently is a legacy of typewriters, where pairs of similar spacing and combining characters relied on one glyph. ";
    Unicode Explained , Chapter 8: Character Usage, ASCII (Basic Latin), Tilde ~ (U + 007E), p. 401, Jukka K. Korpela, 2006 (English): “As a spacing clone of a diacritic tilde (ie, spacing counterpart of combining tilde U + 0303), use the small tilde ˜ (U + 02CD [correct: U + 02DC]). ";
    ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 77, ITU, 1994 (English): "SM47 Upper bar (not jointive) bar or tilde shape"
  12. a b List of Latin-based alphabets, extensions , Wikipedia;
    Everything about Unicode, Lithuanian special characters , Jens Meyer, 2007;
    Special letters
    and diacritical marks for the European languages ​​of the Latin alphabet, Wolfgang Hendlmeier and Gerhard Helzel, 2012
  13. Hatschek, Usage and Character Sets , Wikipedia: "In modern printed fonts, the character on the uppercase L and on the lowercase d, l and t is often shown in a form similar to a comma at the top right next to the basic character."
    And "It should be noted that these codes are also used if the hatschek is displayed on d, l, L and t in comma form. "
  14. ↑ Telephone keypad, recommendation ITU-T E.161 , placement, appearance and naming of the symbol ⌗, Wikipedia: “This symbol is contained in Unicode as U + 2317 viewdata square [...]. With the square shape, the line ends must protrude between 8% and 18% of the edge line length on each side, with the inclined shape (interior angle 80 °) always by 18%. ";
    Proposal to incorporate two telephony symbols into Unicode by glyph and annotation changes , Karl Pentzlin, 2013 (English): "The viewdata square, as its name implies, is introduced anyway as a character for" Viewdata "which is an application related to telephony introduced in the 1980s. It can be presumed that it had to be in fact the same symbol as the E.161 symbol.
    However, the proportions of its representative glyph are not within the constraints given in E.161. ";
    ITU-T Recommendation E.161: Arrangement of digits, letters and symbols on telephones and other devices that can be used for gaining access to a telephone network , 3.2.2 12 push buttons, symbols, pp. 3 + 4, ITU, 2001 (english)
  15. a b ITU-T Recommendation T.101: International interworking for Videotex services , I.1.2.7 Miscellaneous, p. 76, ITU, 1994 (English): "SM12 Central horizonal bar jointive"
  16. ż , Wiktionary: “As a typographical variant there is ƶ / Ƶ. However, this is usually only used if the whole word is written in capitals and there is no longer enough space for the point above the Z. ";
    Teletext mappings , Marcin “Qrczak” Kowalczyk, 2001 (English): “In Polish capital Z with dot above is sometimes rendered with stroke instead of the dot. It's just a glyph variant, the meaning is exactly the same. The letter should be consistently encoded as Z WITH DOT ABOVE even if it's rendered with a stroke. "
  17. a b comma (undersigned), coding , Wikipedia: “Until the early 1990s, no distinction was made between the comma and the cedilla in international standards. [...] Only later did the view prevail that these are two different diacritics. Today, Unicode contains both S and T with cedilla and S and T with comma. ”;
    ISO / IEC 6937: 2001 , Table 4 - Specification of the repertoire, pp. 15 and 18, ISO / IEC, 2001 (English): "NOTE 2: The letters used in the Romanian language LATIN CAPITAL LETTER S WITH COMMA BELOW and LATIN CAPITAL LETTER T WITH COMMA BELOW are different from the LATIN CAPITAL LETTER S WITH CEDILLA and LATIN CAPITAL LETTER T WITH CEDILLA. However, subject to the agreement of originator and receiver in information interchange, the letters WITH CEDILLA may be used to substitute for the letters WITH COMMA BELOW. "
    And" NOTE 5: The letters used in the Romanian language LATIN SMALL LETTER S WITH COMMA BELOW and LATIN SMALL LETTER T WITH COMMA BELOW are different from the LATIN SMALL LETTER S WITH CEDILLA and LATIN SMALL LETTER T WITH CEDILLA. However, subject to the agreement of originator and receiver in information interchange, the letters WITH CEDILLA may be used to substitute for the letters WITH COMMA BELOW. ";
    Cedillas and commas below , Eric Muller, Adobe, 2013 (English);
    Comments on cedilla and comma below (revision 2) , Denis Moyogo Jacquerye, 2013 (English);
    Romanian diacritic marks , Cristian Kit Paul, 2008 (English)
  18. Overline, Available Characters , Wikipedia: “In several character sets of the ISO 8859 family of standards and derived from it in the Unicode standard, there is a character U + 00AF (175 dec ) that can be used both as an overline and as a macron. [...] One of the reasons why the overline is often incorrectly referred to as a "macron" is not to be confused with the other Unicode characters of this name. The characters at the code points U + 02C9 ( modifier letter macron ) and U + 0304 ( combining macron ) are significantly shorter than their counterparts with overline . "
  19. The modern library , 10.2.4 and 10.2.5 character set sorting (literacy), pp 229-232, Rudolf Frankenberger and Klaus Haller, 2004
  20. Trema, Unicode , Wikipedia: “Most standards for character sets, including Unicode, do not differentiate between umlaut and trema. If a distinction between umlaut and trema is necessary in data processing, ISO / IEC JTC 1 / SC 2 / WG 2 recommends the following:
    • Representation of the trema by: Combining Grapheme Joiner (CGJ, 034F) + Combining Diaeresis (0308)
    • The umlauts are represented by: Combining Diaeresis (0308) “;
    Frequently Asked Questions, Characters and Combining Marks, "Q: Unicode doesn't seem to distinguish between tréma and umlaut, but I need to distinguish. What shall I do? " , Unicode, 2016 (English)
  21. Unicode Technical Note # 27 - Known Anomalies in Unicode Character Names , Unicode, 2017 (English)
  22. CCITT Recommendation T.61: Character repertoire and coded character sets for the international teletex service , 3.2.3.9 Non-spacing characters, p. 13, ITU, 1988 (English): "Note - The Non-spacing underline character is never used individually but always in combination with some other graphic character to represent the graphic rendition “underlined” for the associated character. The non-spacing underline character can be used in combination with any graphic character of the repertoire, including an accented letter or an umlaut, or space. It is recommended to implement the "underline" function by means of the control function SGR (4) instead of the "non-spacing underline" graphic character. "
  23. Proportionality Symbol , Doctor Peterson, 2003 (English): "If you prefer to describe it by its appearance rather than strictly by its usage, you might call it an" open alpha "or" loose alpha, "rather than" fishy alpha. " People do often describe it (wrongly) as an alpha, but I haven't seen these modifiers used anywhere. "
  24. ʼn, Miscellaneous , Wikipedia (English): "The upper case, or majuscule form has never been included in any international keyboards Therefore, it is decomposable by simply combining ʼ (U + 02BC) and N. 〔ʼN〕";
    Unicode 10.0 Character Code Charts, Latin Extended-A , 0149 ʼn LATIN SMALL LETTER N PRECEDED BY APOSTROPHE, Unicode, 2017 (English): "uppercase is 02BC ʼ 004E N"
  25. Kra (letter) , Wikipedia (English): “The letter can be capitalized as K ' , but it is not encoded separately as a single letter because it is very similar to the Latin capital letter K followed by an apostrophe, preferably the modifier letter apostrophe, U + 02BC ʼ modifier letter apostrophe (HTML & # 700;). “;
    Status of Mapping between Characters of ISO 5426-2 and ISO / IEC 10646-1 (UCS) , 4. ADDITIONAL MAPPINGS, 63 LATIN CAPITAL LETTER KRA, p. 5, Joan M. Aliprand, 2002 (English): “The capital form of the letter kra letter can be encoded as the sequence U + 004B LATIN CAPTIAL LETTER K followed by U + 02BC MODIFIER LETTER APOSTROPHE. "
  26. Unicode 10.0 Character Code Charts, Latin Extended-A , 0131 ı LATIN SMALL LETTER DOTLESS I, Unicode, 2017 (English): "uppercase is 0049 I"
  27. ß, capitalization and special features of use , as well as capital ß, capital letters without capital ß , Wikipedia;
    Unicode 10.0 Character Code Charts, C1 Controls and Latin-1 Supplement , 00DF ß LATIN SMALL LETTER SHARP S, Unicode, 2017 (English): 'uppercase is “SS”'
  28. Large ß , Wikipedia: “At the beginning of 2008 the capital ß was included as a new character in the international Unicode standard for computer character sets; Since June 29, 2017, the am has been part of the official German spelling. "
  29. a b I with grave (Cyrillic), Bulgarian and Macedonian , Wikipedia (English): “When not available, the character ⟨ѝ⟩ is often replaced by an ordinary ⟨и⟩ (not recommended, but still orthographically correct) or in Bulgarian by the letter ⟨й⟩ (formally this is considered a spelling error). "
  30. a b Tonos , Wikipedia: “In some fonts the tonos is vertical, that is, in a 'neutral' position in contrast to the acute acute inclined to the right and the grave accent inclined to the left, sometimes it is just a point, one on top Triangle or similar This custom dates back to the 1970s, i.e. from the time before the official introduction of monotonic orthography by the Greek government, when orthography reformers used a 'neutral' accent in this way, which had to differ from the existing ones in polytonic orthography. With the official introduction of the monotonic orthography by the Greek government in 1980, however, the distinction between the tone and the polytonic accents became unnecessary, and all style specifications stipulate that the monotonic tone is graphically identical to the polytonic acute. This is also what Unicode provides. "
  31. a b Arabic character tail for final Seen family (Seen, Sheen, Saad, Daad) , IBM Egypt, 2001 (English)
  32. The Unicode Consortium on Twitter , Unicode, 2019 (English);
    Proposal to add characters from legacy computers and teletext to the UCS , Doug Ewell, Rebecca Bettencourt and others, 2019 (English);
    Map from Teletext G1 character set to Unicode , Rebecca Bettencourt, 2018 (English);
    Map from Teletext G3 character set to Unicode , Rebecca Bettencourt, 2018 (English)
  33. Unicode Technical Report # 25 - Unicode Support for Mathematics, 2.11 Geometrical Shapes , Unicode, 2007 (English)
  34. Bug Reports DVBViewer Pro / GE - Teletext with Cyrillic , Griga, 2012 (English): "PS The following screenshot from Derrick's sample (see above) shows clearly which characters originate from which source: - White characters are from the Latin G0 Character Set (identical for all countries with a latin alphabet)
    - Red characters are from the Spanish / Portuguese National Option Subset.
    - Green characters added by packets X / 26 are from the Latin G2 Supplementary Set. "
  35. Siemens MEGATEXT PLUS SDA 5275-2 Delta Specification / Application Notes , 2.5.2 Example for Russian Market, p. 56, Siemens, 1998 (English): "The bit SEC_LA should be set and the secondary language should be defined to English because currently, no Russian broadcaster transmits packet X / 28 or X / 29. "
  36. Philips SAA5x9x family , 9.5 The twist attribute, p. 40, Philips, 1998 (English): “In many of the character sets, the 'twist' serial attribute (code 1BH) can be used to switch to an alternate basic character code table, eg to change from the Hebrew alphabet to the Arabic alphabet on an Arab / Hebrew device. "
  37. Philips SAA5x9x family , 9.5 The twist attribute, p. 40, Philips, 1998 (English): “In many of the character sets, the 'twist' serial attribute (code 1BH) can be used to switch to an alternate basic character code table [...]. For some national option languages ​​the alternate code table is the default, and a twist control character will switch to the first code table. "