Indian Script Code for Information Interchange
Indian Script Code for Information Interchange ( ISCII ) is the Indian national standard for the coding of the characters of the various Indian scripts , all of which are descendants of the Brahmi script . In principle, they are structured very similarly, but the letter shapes are very different. ISCII tries to encode the logical structure of these fonts, while the selection of the special letter forms is made by a markup language or a writing technique such as OpenType .
ISCII includes the following scriptures: Bengali , Devanagari , Gujarati , Gurmukhi , Kannada , Malayalam , Oriya , Tamil and Telugu .
If a text is changed to another font, an automatic transliteration takes place .
ISCII is an 8-bit character set in which, as with ISO 8859 and many other character sets , the lower 128 characters correspond to the ASCII standard.
In Unicode , the coding type of ISCII has largely been retained. Here, however, the different fonts are encoded in separate 128-byte code blocks in the range U + 0900 to U + 0DFF.
code | … 0 | …1 | … 2 | … 3 | … 4 | … 5 | … 6 | … 7 | …8th | … 9 | … A | … B | ... C | … D | … E | ... F |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A ... | ँ | ं | ः | अ | आ | इ | ई | उ | ऊ | ऋ | ऎ | ए | ऐ | ऍ | ऒ | |
B ... | ओ | औ | ऑ | क | ख | ग | घ | ङ | च | छ | ज | झ | ञ | ट | ठ | ड |
C ... | ढ | ण | त | थ | द | ध | न | ऩ | प | फ | ब | भ | म | य | य़ | र |
D ... | ऱ | ल | ळ | ऴ | व | श | ष | स | ह | INV | ा | ि | ी | ु | ू | ृ |
E ... | ॆ | े | ै | ॅ | ॊ | ो | ौ | ॉ | ् | ़ | । | ATR | ||||
F ... | EXT | ० | १ | २ | ३ | ४ | ५ | ६ | ७ | ८ | ९ |
- D9 hex : INV
- Invisible ( invisible ) character, with the so-called half-molds combining the characters can be displayed in conjunction with isolated Halant (see below), z. B. क (ka) + ् (Halant) + INV = क् . In Unicode, the halant U + 094D is followed by the character ZERO WIDTH JOINER U + 200D instead.
- INV is also used as a (blank) base character to represent combining vowel characters. In Unicode, NBSP U + 00A0 or the dotted circle ◌ U + 25CC is used instead.
- EF hex : ATR
- Switch to select a specific font format or language up to the end of the line. This is before a bytecode.
- F0 hex : EXT
- Vedic accent. Selection by the following byte.
- E8 hex : Halant (Virama)
- Removes the preceding inherent vowel and combines consonants into clusters, e.g. B. क (ka) + ् (Halant) + त (ta) = क्त (kta).
- The sequence ् (Halant) + ् (Halant) creates an explicit Halant, e.g. B. क (ka) + ् (Halant) + ् (Halant) + त (ta) = क्त.
- The sequence ् (halant) + ़ (nukta) generates half-consonants, if possible, e.g. B. क (ka) + ् (Halant) + ़ (Nukta) = क्.
ISCII | Unicode |
---|---|
Halant | Halant |
Halant + Halant | Halant + ZWNJ |
Halant + Nukta | Halant + ZWJ |
- E9 hex : Nukta
- Creates less common characters without their own code, e.g. B. क (ka) + ़ (Nukta) = क़ (qa).
See also
- TSCII (alternative standard for Tamil)
Web links
- The ISCII standard (PDF, English, 258 kB)
- Further information from the Indian government