KOI8-R

from Wikipedia, the free encyclopedia

KOI8-R from the KOI8 family is an 8-bit character encoding of the Cyrillic alphabet as it is used for the Russian language .

KOI8-R is a superset of ASCII and thus also contains the 26 letters of the Latin alphabet . The encoding can also be used for Bulgarian , while a related character encoding KOI8-U was designed for Ukrainian , which contains four additionally required codes.

KOI8 is the Russian abbreviation for "Kod Obmena Informazijei, 8 bit" ( Код Обмена Информацией, 8 бит ), translated as "Code for information exchange, 8 bit".

KOI8-R is described in RFC 1489 and is IANA registered and approved for MIME .

The KOI8 character encodings are designed in such a way that the Cyrillic letters are not arranged in their natural alphabetical order, but in the alphabetical order of the Latin letters, which result from a (rough) transliteration . This results in the interesting property that if the most significant bit is omitted, Cyrillic text remains legible as Latin transliteration (with difficulty). The assignment was chosen so that upper and lower case letters are swapped. For example Русский Текст becomes rUSSKIJ tEKST if the MSB is omitted .

Today, this property is little more than a historical curiosity, since there are almost no transmission paths that are not 8-bit clean , and on the other hand, because the "automatic" transliteration is inferior to a real transliteration.

Alternatives to KOI8 are Windows-1251 , ISO 8859-5 and Unicode .

table

… 0 …1 … 2 … 3 … 4 … 5 … 6 … 7 …8th … 9 … A … B ... C … D … E ... F
0 ... not used
1…
2… SP ! " # $ % & ' ( ) * + , - . /
3… 0 1 2 3 4th 5 6th 7th 8th 9 : ; < = > ?
4… @ A. B. C. D. E. F. G H I. J K L. M. N O
5… P Q R. S. T U V W. X Y Z [ \ ] ^ _
6… ` a b c d e f G H i j k l m n O
7… p q r s t u v w x y z { | } ~
8th…
9 ... Ø NBSP ° ² · ÷
A ... ё
B ... Ё ©
C ... ю а б ц д е ф г х и й к л м н о
D ... п я р с т у ж в ь ы з ш э щ ч ъ
E ... Ю А Б Ц Д Е Ф Г Х И Й К Л М Н О
F ... П Я Р С Т У Ж В Ь Ы З Ш Э Щ Ч Ъ

While according to RFC 1489 95 hex Unicode should be U + 2219 (∙), it is often converted to U + 2022 (•) because of the compatibility with code page 1251 .

See also

Web links