GSM 03.38

There are three different ways of encoding texts and data in a GSM short message with a maximum amount of user data of 1120 bits:

7 bits, 160 characters

according to standard GSM 03.38 , coll.GSM alphabet . For SMS text messages where a limited number of characters is sufficient for display. The text can contain up to 160 characters per message (7 bits / characters × 160 characters = 1,120 bits). Each 7 bits are interpreted as one character, which basically limits the number of characters that can be displayed to 128. These 128 characters are defined in the 7-bit basic character set. There are several mechanisms with which the supply of displayable characters can be expanded:

Escape: The Escape character (ESC, 0x1B) uses the standard character set extension once to display the character immediately following.
Escape with single shift: Using an element in the user data header of the message, an alternative character set extension can be selected instead of the standard character set extension.
Locking Shift: Another element in the user data header of the message allows an alternative character set to be selected instead of the basic character set.

8 bits, 140 characters: For data messages ( binary content) such as logos, picture messages , ring tones. An 8-bit message can contain up to 140 characters (8 bits / character × 140 characters = 1,120 bits).
16 bit, 70 characters: Unicode UCS2 , d. H. UTF-16 limited to BMP ( Basic Multilingual Plane ) . Unicode messages are required for all writing systems that are not directly supported , e.g. B. Arabic , Hebrew , Cyrillic, and Latin with other special characters. A Unicode message is limited to 70 characters (16 bits / characters × 70 characters = 1,120 bits).

7 bit

The character set extension tables for 7-bit messages are usually designed in such a way that results that look as similar as possible are generated on terminals that do not have these tables and therefore display the characters in the base table, e.g. B. "e" instead of "€".

There are single-shift character set extension tables for Turkish, Spanish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu and Urdu.

There are locking-shift character set tables for Turkish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu and Urdu.

The single shift and locking shift mechanisms can be combined with one another.

Examples:

16 bit: 0x0637 results in the Arabic character Tah: "ط"
7 bit: 0x65 results in an "e"
7 bit with Escape: 0x1B followed by 0x65 results in a euro sign "€"
7 bit with single shift: the setting 'Turkish' results in 0x1B followed by 0x53 an S with cedilla "Ş"
7 bit with locking shift: the setting 'Turkish' results in 0x1C an S with cedilla "Ş"

Character set tables

Basic character set
	0x00	0x10	0x20	0x30	0x40	0x50	0x60	0x70
0x00	@	Δ	SP⁴	0	¡	P	¿	p
0x01	£	_	!	1	A.	Q	a	q
0x02	$	Φ	"	2	B.	R.	b	r
0x03	¥	Γ	#	3	C.	S.	c	s
0x04	è	Λ	¤	4th	D.	T	d	t
0x05	é	Ω	%	5	E.	U	e	u
0x06	ù	Π	&	6th	F.	V	f	v
0x07	ì	Ψ	'	7th	G	W.	G	w
0x08	O	Σ	(	8th	H	X	H	x
0x09	Ç	Θ	)	9	I.	Y	i	y
0x0A	LF¹	Ξ	*	:	J	Z	j	z
0x0B	O	ESC³	+	;	K	Ä	k	Ä
0x0C	O	Æ	,	<	L.	Ö	l	ö
0x0D	CR²	æ	-	=	M.	Ñ	m	ñ
0x0E	Å	ß	.	>	N	Ü	n	ü
0x0F	å	É	/	?	O	§	O	à

¹ is a line feed (LF, Linefeed )
² is a carriage return (CR, Carriage Return )
³ is an escape character (ESC)
⁴ is a space (SP, Space)

Standard character set extension
	0x00	0x10	0x20	0x30	0x40	0x60
0x00					\|
0x01
0x02
0x03
0x04		^
0x05						€
0x06
0x07
0x08			{
0x09			}
0x0A	FF¹
0x0B		SS2²
0x0C				[
0x0D				~
0x0E				]
0x0F			\

¹ is a page break (FF, Form Feed or Page Break)
² is another single-shift escape character, reserved for future expansion

Locking Shift Character Table Turkish
	0x00	0x10	0x20	0x30	0x40	0x50	0x60	0x70
0x00	@	Δ	⁴	0	İ	P	ç	p
0x01	£	_	!	1	A.	Q	a	q
0x02	$	Φ	"	2	B.	R.	b	r
0x03	¥	Γ	#	3	C.	S.	c	s
0x04	€	Λ	¤	4th	D.	T	d	t
0x05	é	Ω	%	5	E.	U	e	u
0x06	ù	Π	&	6th	F.	V	f	v
0x07	ı	Ψ	'	7th	G	W.	G	w
0x08	O	Σ	(	8th	H	X	H	x
0x09	Ç	Θ	)	9	I.	Y	i	y
0x0A	¹	Ξ	*	:	J	Z	j	z
0x0B	G	³	+	;	K	Ä	k	Ä
0x0C	G	Ş	,	<	L.	Ö	l	ö
0x0D	²	ş	-	=	M.	Ñ	m	ñ
0x0E	Å	ß	.	>	N	Ü	n	ü
0x0F	å	É	/	?	O	§	O	à

¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space

Single Shift Character Table Turkish
	0x00	0x10	0x20	0x30	0x40	0x50	0x60	0x70
0x00					\|
0x01
0x02
0x03						Ş	ç	ş
0x04		^
0x05							€
0x06
0x07					G		G
0x08			{
0x09			}		İ		ı
0x0A	¹
0x0B		²
0x0C				[
0x0D	³			~
0x0E				]
0x0F			\

¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.

Locking Shift Character Map Portuguese
	0x00	0x10	0x20	0x30	0x40	0x50	0x60	0x70
0x00	@	*	⁴	0	Í	P	~	p
0x01	£	_	!	1	A.	Q	a	q
0x02	$	ª	"	2	B.	R.	b	r
0x03	¥	Ç	#	3	C.	S.	c	s
0x04	ê	À	º	4th	D.	T	d	t
0x05	é	∞	%	5	E.	U	e	u
0x06	ú	^	&	6th	F.	V	f	v
0x07	í	\	'	7th	G	W.	G	w
0x08	O	€	(	8th	H	X	H	x
0x09	ç	O	)	9	I.	Y	i	y
0x0A	¹	\|	*	:	J	Z	j	z
0x0B	O	³	+	;	K	Ã	k	ã
0x0C	O	Â	,	<	L.	O	l	O
0x0D	²	â	-	=	M.	Ú	m	`
0x0E	Á	Ê	.	>	N	Ü	n	ü
0x0F	á	É	/	?	O	§	O	à

¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space

Portuguese single shift character map
	0x00	0x10	0x20	0x30	0x40	0x50	0x60	0x70
0x00					\|
0x01					À		Â
0x02		Φ
0x03		Γ
0x04		^
0x05	ê	Ω				Ú	€	ú
0x06		Π
0x07		Ψ
0x08		Σ	{
0x09	ç	Θ	}		Í		í
0x0A	¹
0x0B	O	²				Ã		ã
0x0C	O			[		O		O
0x0D	³			~
0x0E	Á			]
0x0F	á	Ê	\		O		O	â

¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.

Locking Shift Character Map Hindi
	0x00	0x10	0x20	0x30	0x40	0x50	0x60	0x70
0x00	ँ	ऐ	⁴	0	ब	ा	ॐ	p
0x01	ं	ऑ	!	1	भ	ि	a	q
0x02	ः	ऒ	ट	2	म	ी	b	r
0x03	अ	ओ	ठ	3	य	ु	c	s
0x04	आ	औ	ड	4th	र	ू	d	t
0x05	इ	क	ढ	5	ऱ	ृ	e	u
0x06	ई	ख	ण	6th	ल	ॄ	f	v
0x07	उ	ग	त	7th	ळ	ॅ	G	w
0x08	ऊ	घ	)	8th	ऴ	ॆ	H	x
0x09	ऋ	ङ	(	9	व	े	i	y
0x0A	¹	च	थ	:	श	ै	j	z
0x0B	ऌ	³	द	;	ष	ॉ	k	ॲ
0x0C	ऍ	छ	,	ऩ	स	ॊ	l	ॻ
0x0D	²	ज	ध	प	ह	ो	m	ॼ
0x0E	ऎ	झ	.	फ	़	ौ	n	ॾ
0x0F	ए	ञ	न	?	ऽ	्	O	ॿ

¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space

Single Shift Character Map Hindi
	0x00	0x10	0x20	0x30	0x40	0x50	0x60
0x00	@	<	४	ज़	\|	P
0x01	£	=	५	ड़	A.	Q
0x02	$	>	६	ढ़	B.	R.
0x03	¥	¡	७	फ़	C.	S.
0x04	¿	^	८	य़	D.	T
0x05	"	¡	९	ॠ	E.	U	€
0x06	¤	_	॑	ॡ	F.	V
0x07	%	#	॒	ॢ	G	W.
0x08	&	*	{	ॣ	H	X
0x09	'	।	}	॰	I.	Y
0x0A	¹	॥	॓	ॱ	J	Z
0x0B	*	³	॔		K
0x0C	+	०	क़	[	L.
0x0D	²	१	ख़	~	M.
0x0E	-	२	ग़	]	N
0x0F	/	३	\		O

¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.

Locking Shift Character Map Bengali
	0x00	0x10	0x20	0x30	0x40	0x50	0x60	0x70
0x00	ঁ	ঐ	⁴	0	ব	া	ৎ	p
0x01	ং		!	1	ভ	ি	a	q
0x02	ঃ		ট	2	ম	ী	b	r
0x03	অ	ও	ঠ	3	য	ু	c	s
0x04	আ	ঔ	ড	4th	র	ূ	d	t
0x05	ই	ক	ঢ	5		ৃ	e	u
0x06	ঈ	খ	ণ	6th	ল	ৄ	f	v
0x07	উ	গ	ত	7th			G	w
0x08	ঊ	ঘ	)	8th			H	x
0x09	ঋ	ঙ	(	9		ে	i	y
0x0A	¹	চ	থ	:	শ	ৈ	j	z
0x0B	ঌ	³	দ	;	ষ		k	ৗ
0x0C		ছ	,		স		l	ড়
0x0D	²	জ	ধ	প	হ	ো	m	ঢ়
0x0E		ঝ	.	ফ	়	ৌ	n	ৰ
0x0F	এ	ঞ	ন	?	ঽ	্	O	ৱ

¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space

Single Shift Character Map Bengali
	0x00	0x10	0x20	0x30	0x40	0x50	0x60
0x00	@	<	৬	৶		P
0x01	£	=	৭	৷	A.	Q
0x02	$	>	৮	৸	B.	R.
0x03	¥	¡	৯	৹	C.	S.
0x04	¿	^	য়	৺	D.	T
0x05	"	¡	ৠ		E.	U	€
0x06	¤	_	ৡ		F.	V
0x07	%	#	ৢ		G	W.
0x08	&	*	{		H	X
0x09	'	০	}		I.	Y
0x0A	¹	১	ৣ		J	Z
0x0B	*	²	৲		K
0x0C	+	২	৳	[	L.
0x0D	³	৩	৴	~	M.
0x0E	-	৪	৵	]	N
0x0F	/	৫	\		O

¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.

swell

↑ Mapping of GSM 03.38 characters to Unicode ( English , TXT; 9 kB) November 10, 2009. Retrieved November 18, 2009.
↑ 3GPP TS 23.038: Alphabets and language-specific information; Release 9.0.0 ( English , ZIP / DOC; 174 kB) September 28, 2009. Accessed November 16, 2009.

[MappingGsmUnicode-1] Mapping of GSM 03.38 characters to Unicode ( English , TXT; 9 kB) November 10, 2009. Retrieved November 18, 2009.

[3GPP23038-2] 3GPP TS 23.038: Alphabets and language-specific information; Release 9.0.0 ( English , ZIP / DOC; 174 kB) September 28, 2009. Accessed November 16, 2009.