There are three different ways of encoding texts and data in a GSM short message with a maximum amount of user data of 1120 bits:
7 bits, 160 characters
according to standard GSM 03.38 , coll.GSM alphabet . For SMS text messages where a limited number of characters is sufficient for display. The text can contain up to 160 characters per message (7 bits / characters × 160 characters = 1,120 bits). Each 7 bits are interpreted as one character, which basically limits the number of characters that can be displayed to 128. These 128 characters are defined in the 7-bit basic character set. There are several mechanisms with which the supply of displayable characters can be expanded:
Escape: The Escape character (ESC, 0x1B) uses the standard character set extension once to display the character immediately following.
Escape with single shift: Using an element in the user data header of the message, an alternative character set extension can be selected instead of the standard character set extension.
Locking Shift: Another element in the user data header of the message allows an alternative character set to be selected instead of the basic character set.
8 bits, 140 characters
For data messages ( binary content) such as logos, picture messages , ring tones. An 8-bit message can contain up to 140 characters (8 bits / character × 140 characters = 1,120 bits).
16 bit, 70 characters
Unicode UCS2 , d. H. UTF-16 limited to BMP ( Basic Multilingual Plane ) . Unicode messages are required for all writing systems that are not directly supported , e.g. B. Arabic , Hebrew , Cyrillic, and Latin with other special characters. A Unicode message is limited to 70 characters (16 bits / characters × 70 characters = 1,120 bits).
7 bit
The character set extension tables for 7-bit messages are usually designed in such a way that results that look as similar as possible are generated on terminals that do not have these tables and therefore display the characters in the base table, e.g. B. "e" instead of "€".
There are single-shift character set extension tables for Turkish, Spanish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu and Urdu.
There are locking-shift character set tables for Turkish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu and Urdu.
The single shift and locking shift mechanisms can be combined with one another.
Examples:
16 bit: 0x0637 results in the Arabic character Tah: "ط"
7 bit: 0x65 results in an "e"
7 bit with Escape: 0x1B followed by 0x65 results in a euro sign "€"
7 bit with single shift: the setting 'Turkish' results in 0x1B followed by 0x53 an S with cedilla "Ş"
7 bit with locking shift: the setting 'Turkish' results in 0x1C an S with cedilla "Ş"
Character set tables
Basic character set
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x00
@
Δ
SP⁴
0
¡
P
¿
p
0x01
£
_
!
1
A.
Q
a
q
0x02
$
Φ
"
2
B.
R.
b
r
0x03
¥
Γ
#
3
C.
S.
c
s
0x04
è
Λ
¤
4th
D.
T
d
t
0x05
é
Ω
%
5
E.
U
e
u
0x06
ù
Π
&
6th
F.
V
f
v
0x07
ì
Ψ
'
7th
G
W.
G
w
0x08
O
Σ
(
8th
H
X
H
x
0x09
Ç
Θ
)
9
I.
Y
i
y
0x0A
LF¹
Ξ
*
:
J
Z
j
z
0x0B
O
ESC³
+
;
K
Ä
k
Ä
0x0C
O
Æ
,
<
L.
Ö
l
ö
0x0D
CR²
æ
-
=
M.
Ñ
m
ñ
0x0E
Å
ß
.
>
N
Ü
n
ü
0x0F
å
É
/
?
O
§
O
à
¹ is a line feed (LF, Linefeed )
² is a carriage return (CR, Carriage Return )
³ is an escape character (ESC)
⁴ is a space (SP, Space)
Standard character set extension
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x00
|
0x01
0x02
0x03
0x04
^
0x05
€
0x06
0x07
0x08
{
0x09
}
0x0A
FF¹
0x0B
SS2²
0x0C
[
0x0D
~
0x0E
]
0x0F
\
¹ is a page break (FF, Form Feed or Page Break)
² is another single-shift escape character, reserved for future expansion
Locking Shift Character Table Turkish
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x00
@
Δ
⁴
0
İ
P
ç
p
0x01
£
_
!
1
A.
Q
a
q
0x02
$
Φ
"
2
B.
R.
b
r
0x03
¥
Γ
#
3
C.
S.
c
s
0x04
€
Λ
¤
4th
D.
T
d
t
0x05
é
Ω
%
5
E.
U
e
u
0x06
ù
Π
&
6th
F.
V
f
v
0x07
ı
Ψ
'
7th
G
W.
G
w
0x08
O
Σ
(
8th
H
X
H
x
0x09
Ç
Θ
)
9
I.
Y
i
y
0x0A
¹
Ξ
*
:
J
Z
j
z
0x0B
G
³
+
;
K
Ä
k
Ä
0x0C
G
Ş
,
<
L.
Ö
l
ö
0x0D
²
ş
-
=
M.
Ñ
m
ñ
0x0E
Å
ß
.
>
N
Ü
n
ü
0x0F
å
É
/
?
O
§
O
à
¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space
Single Shift Character Table Turkish
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x00
|
0x01
0x02
0x03
Ş
ç
ş
0x04
^
0x05
€
0x06
0x07
G
G
0x08
{
0x09
}
İ
ı
0x0A
¹
0x0B
²
0x0C
[
0x0D
³
~
0x0E
]
0x0F
\
¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.
Locking Shift Character Map Portuguese
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x00
@
*
⁴
0
Í
P
~
p
0x01
£
_
!
1
A.
Q
a
q
0x02
$
ª
"
2
B.
R.
b
r
0x03
¥
Ç
#
3
C.
S.
c
s
0x04
ê
À
º
4th
D.
T
d
t
0x05
é
∞
%
5
E.
U
e
u
0x06
ú
^
&
6th
F.
V
f
v
0x07
í
\
'
7th
G
W.
G
w
0x08
O
€
(
8th
H
X
H
x
0x09
ç
O
)
9
I.
Y
i
y
0x0A
¹
|
*
:
J
Z
j
z
0x0B
O
³
+
;
K
Ã
k
ã
0x0C
O
Â
,
<
L.
O
l
O
0x0D
²
â
-
=
M.
Ú
m
`
0x0E
Á
Ê
.
>
N
Ü
n
ü
0x0F
á
É
/
?
O
§
O
à
¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space
Portuguese single shift character map
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x00
|
0x01
À
Â
0x02
Φ
0x03
Γ
0x04
^
0x05
ê
Ω
Ú
€
ú
0x06
Π
0x07
Ψ
0x08
Σ
{
0x09
ç
Θ
}
Í
í
0x0A
¹
0x0B
O
²
Ã
ã
0x0C
O
[
O
O
0x0D
³
~
0x0E
Á
]
0x0F
á
Ê
\
O
O
â
¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.
Locking Shift Character Map Hindi
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x00
ँ
ऐ
⁴
0
ब
ा
ॐ
p
0x01
ं
ऑ
!
1
भ
ि
a
q
0x02
ः
ऒ
ट
2
म
ी
b
r
0x03
अ
ओ
ठ
3
य
ु
c
s
0x04
आ
औ
ड
4th
र
ू
d
t
0x05
इ
क
ढ
5
ऱ
ृ
e
u
0x06
ई
ख
ण
6th
ल
ॄ
f
v
0x07
उ
ग
त
7th
ळ
ॅ
G
w
0x08
ऊ
घ
)
8th
ऴ
ॆ
H
x
0x09
ऋ
ङ
(
9
व
े
i
y
0x0A
¹
च
थ
:
श
ै
j
z
0x0B
ऌ
³
द
;
ष
ॉ
k
ॲ
0x0C
ऍ
छ
,
ऩ
स
ॊ
l
ॻ
0x0D
²
ज
ध
प
ह
ो
m
ॼ
0x0E
ऎ
झ
.
फ
़
ौ
n
ॾ
0x0F
ए
ञ
न
?
ऽ
्
O
ॿ
¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space
Single Shift Character Map Hindi
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x00
@
<
४
ज़
|
P
0x01
£
=
५
ड़
A.
Q
0x02
$
>
६
ढ़
B.
R.
0x03
¥
¡
७
फ़
C.
S.
0x04
¿
^
८
य़
D.
T
0x05
"
¡
९
ॠ
E.
U
€
0x06
¤
_
॑
ॡ
F.
V
0x07
%
#
॒
ॢ
G
W.
0x08
&
*
{
ॣ
H
X
0x09
'
।
}
॰
I.
Y
0x0A
¹
॥
॓
ॱ
J
Z
0x0B
*
³
॔
K
0x0C
+
०
क़
[
L.
0x0D
²
१
ख़
~
M.
0x0E
-
२
ग़
]
N
0x0F
/
३
\
O
¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.
Locking Shift Character Map Bengali
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x00
ঁ
ঐ
⁴
0
ব
া
ৎ
p
0x01
ং
!
1
ভ
ি
a
q
0x02
ঃ
ট
2
ম
ী
b
r
0x03
অ
ও
ঠ
3
য
ু
c
s
0x04
আ
ঔ
ড
4th
র
ূ
d
t
0x05
ই
ক
ঢ
5
ৃ
e
u
0x06
ঈ
খ
ণ
6th
ল
ৄ
f
v
0x07
উ
গ
ত
7th
G
w
0x08
ঊ
ঘ
)
8th
H
x
0x09
ঋ
ঙ
(
9
ে
i
y
0x0A
¹
চ
থ
:
শ
ৈ
j
z
0x0B
ঌ
³
দ
;
ষ
k
ৗ
0x0C
ছ
,
স
l
ড়
0x0D
²
জ
ধ
প
হ
ো
m
ঢ়
0x0E
ঝ
.
ফ
়
ৌ
n
ৰ
0x0F
এ
ঞ
ন
?
ঽ
্
O
ৱ
¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space
Single Shift Character Map Bengali
0x00
0x10
0x20
0x30
0x40
0x50
0x60
0x70
0x00
@
<
৬
৶
P
0x01
£
=
৭
৷
A.
Q
0x02
$
>
৮
৸
B.
R.
0x03
¥
¡
৯
৹
C.
S.
0x04
¿
^
য়
৺
D.
T
0x05
"
¡
ৠ
E.
U
€
0x06
¤
_
ৡ
F.
V
0x07
%
#
ৢ
G
W.
0x08
&
*
{
H
X
0x09
'
০
}
I.
Y
0x0A
¹
১
ৣ
J
Z
0x0B
*
²
৲
K
0x0C
+
২
৳
[
L.
0x0D
³
৩
৴
~
M.
0x0E
-
৪
৵
]
N
0x0F
/
৫
\
O
¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.