GSM 03.38

from Wikipedia, the free encyclopedia

There are three different ways of encoding texts and data in a GSM short message with a maximum amount of user data of 1120 bits:

7 bits, 160 characters
according to standard GSM 03.38 , coll.GSM alphabet . For SMS text messages where a limited number of characters is sufficient for display. The text can contain up to 160 characters per message (7 bits / characters × 160 characters = 1,120 bits). Each 7 bits are interpreted as one character, which basically limits the number of characters that can be displayed to 128. These 128 characters are defined in the 7-bit basic character set. There are several mechanisms with which the supply of displayable characters can be expanded:
  • Escape: The Escape character (ESC, 0x1B) uses the standard character set extension once to display the character immediately following.
  • Escape with single shift: Using an element in the user data header of the message, an alternative character set extension can be selected instead of the standard character set extension.
  • Locking Shift: Another element in the user data header of the message allows an alternative character set to be selected instead of the basic character set.
8 bits, 140 characters
For data messages ( binary content) such as logos, picture messages , ring tones. An 8-bit message can contain up to 140 characters (8 bits / character × 140 characters = 1,120 bits).
16 bit, 70 characters
Unicode UCS2 , d. H. UTF-16 limited to BMP ( Basic Multilingual Plane ) . Unicode messages are required for all writing systems that are not directly supported , e.g. B. Arabic , Hebrew , Cyrillic, and Latin with other special characters. A Unicode message is limited to 70 characters (16 bits / characters × 70 characters = 1,120 bits).

7 bit

The character set extension tables for 7-bit messages are usually designed in such a way that results that look as similar as possible are generated on terminals that do not have these tables and therefore display the characters in the base table, e.g. B. "e" instead of "€".

There are single-shift character set extension tables for Turkish, Spanish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu and Urdu.

There are locking-shift character set tables for Turkish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu and Urdu.

The single shift and locking shift mechanisms can be combined with one another.

Examples:

  • 16 bit: 0x0637 results in the Arabic character Tah: "ط"
  • 7 bit: 0x65 results in an "e"
  • 7 bit with Escape: 0x1B followed by 0x65 results in a euro sign "€"
  • 7 bit with single shift: the setting 'Turkish' results in 0x1B followed by 0x53 an S with cedilla "Ş"
  • 7 bit with locking shift: the setting 'Turkish' results in 0x1C an S with cedilla "Ş"

Character set tables

Basic character set
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00 @ Δ SP⁴ 0 ¡ P ¿ p
0x01 £ _ ! 1 A. Q a q
0x02 $ Φ " 2 B. R. b r
0x03 ¥ Γ # 3 C. S. c s
0x04 è Λ ¤ 4th D. T d t
0x05 é Ω % 5 E. U e u
0x06 ù Π & 6th F. V f v
0x07 ì Ψ ' 7th G W. G w
0x08 O Σ ( 8th H X H x
0x09 Ç Θ ) 9 I. Y i y
0x0A LF¹ Ξ * : J Z j z
0x0B O ESC³ + ; K Ä k Ä
0x0C O Æ , < L. Ö l ö
0x0D CR² æ - = M. Ñ m ñ
0x0E Å ß . > N Ü n ü
0x0F å É / ? O § O à

¹ is a line feed (LF, Linefeed )
² is a carriage return (CR, Carriage Return )
³ is an escape character (ESC)
⁴ is a space (SP, Space)

Standard character set extension
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00         |      
0x01                
0x02                
0x03                
0x04   ^            
0x05              
0x06                
0x07                
0x08     {          
0x09     }          
0x0A FF¹              
0x0B   SS2²            
0x0C       [        
0x0D       ~        
0x0E       ]        
0x0F     \          

¹ is a page break (FF, Form Feed or Page Break)
² is another single-shift escape character, reserved for future expansion

Locking Shift Character Table Turkish
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00 @ Δ 0 İ P ç p
0x01 £ _ ! 1 A. Q a q
0x02 $ Φ " 2 B. R. b r
0x03 ¥ Γ # 3 C. S. c s
0x04 Λ ¤ 4th D. T d t
0x05 é Ω % 5 E. U e u
0x06 ù Π & 6th F. V f v
0x07 ı Ψ ' 7th G W. G w
0x08 O Σ ( 8th H X H x
0x09 Ç Θ ) 9 I. Y i y
0x0A ¹ Ξ * : J Z j z
0x0B G ³ + ; K Ä k Ä
0x0C G Ş , < L. Ö l ö
0x0D ² ş - = M. Ñ m ñ
0x0E Å ß . > N Ü n ü
0x0F å É / ? O § O à

¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space

Single Shift Character Table Turkish
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00         |      
0x01                
0x02                
0x03           Ş ç ş
0x04   ^            
0x05              
0x06                
0x07         G   G  
0x08     {          
0x09     }   İ   ı  
0x0A ¹              
0x0B   ²            
0x0C       [        
0x0D ³     ~        
0x0E       ]        
0x0F     \          

¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.

Locking Shift Character Map Portuguese
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00 @ * 0 Í P ~ p
0x01 £ _ ! 1 A. Q a q
0x02 $ ª " 2 B. R. b r
0x03 ¥ Ç # 3 C. S. c s
0x04 ê À º 4th D. T d t
0x05 é % 5 E. U e u
0x06 ú ^ & 6th F. V f v
0x07 í \ ' 7th G W. G w
0x08 O ( 8th H X H x
0x09 ç O ) 9 I. Y i y
0x0A ¹ | * : J Z j z
0x0B O ³ + ; K Ã k ã
0x0C O Â , < L. O l O
0x0D ² â - = M. Ú m `
0x0E Á Ê . > N Ü n ü
0x0F á É / ? O § O à

¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space

Portuguese single shift character map
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00         |      
0x01         À   Â  
0x02   Φ            
0x03   Γ            
0x04   ^            
0x05 ê Ω       Ú ú
0x06   Π            
0x07   Ψ            
0x08   Σ {          
0x09 ç Θ }   Í   í  
0x0A ¹              
0x0B O ²       Ã   ã
0x0C O     [   O   O
0x0D ³     ~        
0x0E Á     ]        
0x0F á Ê \   O   O â

¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.

Locking Shift Character Map Hindi
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00 0 p
0x01 ! 1 ि a q
0x02 2 b r
0x03 3 c s
0x04 4th d t
0x05 5 e u
0x06 6th f v
0x07 7th G w
0x08 ) 8th H x
0x09 ( 9 i y
0x0A ¹ : j z
0x0B ³ ; k
0x0C , l
0x0D ² m
0x0E . n
0x0F ? O ॿ

¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space

Single Shift Character Map Hindi
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00 @ < ज़ | P
0x01 £ = ड़ A. Q
0x02 $ > ढ़ B. R.
0x03 ¥ ¡ फ़ C. S.
0x04 ¿ ^ य़ D. T
0x05 " ¡ E. U
0x06 ¤ _ F. V
0x07 % # G W.
0x08 & * { H X
0x09 ' } I. Y
0x0A ¹ J Z
0x0B * ³ K
0x0C + क़ [ L.
0x0D ² ख़ ~ M.
0x0E - ग़ ] N
0x0F / \ O

¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.

Locking Shift Character Map Bengali
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00 0 p
0x01   ! 1 ি a q
0x02   2 b r
0x03 3 c s
0x04 4th d t
0x05 5   e u
0x06 6th f v
0x07 7th     G w
0x08 ) 8th     H x
0x09 ( 9   i y
0x0A ¹ : j z
0x0B ³ ;   k
0x0C   ,     l ড়
0x0D ² m ঢ়
0x0E   . n
0x0F ? O

¹ is a line feed (LF)
² is a carriage return (CR, carriage return)
³ is an ESC
⁴ is a space

Single Shift Character Map Bengali
  0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00 @ < P
0x01 £ = A. Q
0x02 $ > B. R.
0x03 ¥ ¡ C. S.
0x04 ¿ ^ য় D. T
0x05 " ¡ E. U
0x06 ¤ _ F. V
0x07 % # G W.
0x08 & * { H X
0x09 ' } I. Y
0x0A ¹ J Z
0x0B * ² K
0x0C + [ L.
0x0D ³ ~ M.
0x0E - ] N
0x0F / \ O

¹ is a page break
² is an ESC
³ is a control character. No language-specific characters should be coded at this point.

swell

  1. Mapping of GSM 03.38 characters to Unicode ( English , TXT; 9 kB) November 10, 2009. Retrieved November 18, 2009.
  2. 3GPP TS 23.038: Alphabets and language-specific information; Release 9.0.0 ( English , ZIP / DOC; 174 kB) September 28, 2009. Accessed November 16, 2009.