Drawing salad

from Wikipedia, the free encyclopedia
Example of the incorrect display of umlauts

With alphabet soup of state incorrectly shown will sign designates the mainly instead of the desired character in the Internet occurs. During the early days of the Internet, this condition occurred in almost all languages that use character sets beyond pure ASCII . These are almost all languages ​​except English (provided that foreign language diacritics such as naïve , café or résumé are consistently omitted). In German , for example, the umlauts and the ß were often affected. With the introduction of Unicode in 1991, a technical basis was created to remedy the problem in the long term, but data exchange procedures that do not prescribe a uniform character encoding will still cause such problems even in 2020.

In Japanese the problem is referred to as mojibake (Japanese 文字 化 化 , “letter transformation”), in Russian as krakosjábry ( кракозябры ) and in Chinese as luànmǎ ( 亂碼  /  乱码 , “confused coding”).

Examples

Output coding Web browser setting Display in the web browser
UTF-8 UTF-8 Improper practice of xylophone music torments every taller dwarf.
ISO 8859-1 Improper practice of xylophone music plagues every taller dwarf.
ISO 8859-1 ISO 8859-1 Improper practice of xylophone music torments every taller dwarf.
UTF-8 The wrong practice of xylophone music torments every taller dwarf.
Windows-1251 Windows-1251 Широкая электрификация южных губерний даст мощный толчок подъёму сельского хозяйства.
ISO 8859-1 Øèðîêàÿ ýëåêòðèôèêàöèÿ þæíûõ ãóáåðíèé äàñò ìîùíûé òîë ÷ îê ïîäú¸ìó ñåëüñêîãî õîçÿéñòâà.
KOI8-R ьХПНЙЮЪ ЩКЕЙРПХТХЙЮЖХЪ ЧФМШУ ЦСАЕПМХИ ДЮЯР ЛНЫМШИ РНКВНЙ ОНДЗ╦ЛС ЯЕКЭЯЙНЦН УНГЪИЯРБЮ.
ISO 8859-5 иш confirmation.
Code page 866 ╪шЁюър ¤ыхъЄЁшЇшърЎш ■ цэ√ї уєсхЁэшщ фрёЄ ью ∙ э√щ Єюыўюъ яюф · ╕ьє ёхы№ёъюую їюч щёЄтр.
Shift JIS Shift JIS 文字 化 け (も じ ば け) と は 、 コ ン ピ ュ ー タ で 文字 を 表示 す る 際 際 に 、 正 し く 表示 さ れ な い 現象 の こ と。
Macintosh novel ï∂éöâªÇØÅiÇ ‡ Ç∂ÇŒÇØÅjÇ∆ÇÕÅAÉRÉìÉsÉÖÅ [É ^ Ç≈ï∂éöÇï \ é¶Ç∑ÇÈç € Ç… ÅAê≥ÇµÇ ≠ ï \ é¶Ç≥ÇÍÇ »Ç ¢ åªè € Ç Ç ± B
KOI8 or KOI7 (Russian mode) KOI8 or KOI7 (Russian mode) Русский Текст
ASCII or KOI7 (Latin mode) RUSSKIJ TEKST

The KOI encodings offer a special feature, which the last example shows: If they are incorrectly interpreted as ASCII (and the most significant bit is ignored in the case of KOI8), a coarse Latin transliteration results with swapped upper and lower case letters. Because the Cyrillic alphabet has more letters than the Latin one, some Cyrillic letters become punctuation marks.

Coded data

Salad of characters can also be used deliberately to save or transmit any data in places where only certain characters are possible, for example when using a bank transfer or in Internet addresses .

The Base64 method is often used in Internet addresses for this purpose . It generates a text from any data, which only consists of the letters A – Z, a – z, the digits 0–9 and the special characters +, / and =. Data encoded with Base64 looks like this:

RGllc2VyIFRleHQgaXN0IG5pY2h0IHZlcnNjaGzDvHNzZWx0Lg ==

At first glance, it is not clear what data this mess of characters contains. However, if you apply the Base64 encoding backwards, you get this text:

This text is not encrypted.

Some websites use this coding technique in order not to reveal the actual data obviously in the URL.

Letter salad

Character salad / mojibake can be viewed as a special case of letter salad . These are generally to be understood as meaning strings of characters that are difficult or impossible to decipher and that may have arisen for other reasons besides an incorrect combination of different character encodings .

References and comments

  1. Here at least the replacement character provided by Unicode is used.

Web links