Drawing salad
With alphabet soup of state incorrectly shown will sign designates the mainly instead of the desired character in the Internet occurs. During the early days of the Internet, this condition occurred in almost all languages that use character sets beyond pure ASCII . These are almost all languages except English (provided that foreign language diacritics such as naïve , café or résumé are consistently omitted). In German , for example, the umlauts and the ß were often affected. With the introduction of Unicode in 1991, a technical basis was created to remedy the problem in the long term, but data exchange procedures that do not prescribe a uniform character encoding will still cause such problems even in 2020.
In Japanese the problem is referred to as mojibake (Japanese 文字 化 化 , “letter transformation”), in Russian as krakosjábry ( кракозябры ) and in Chinese as luànmǎ ( 亂碼 / 乱码 , “confused coding”).
Examples
Output coding | Web browser setting | Display in the web browser |
---|---|---|
UTF-8 | UTF-8 | Improper practice of xylophone music torments every taller dwarf. |
ISO 8859-1 | Improper practice of xylophone music plagues every taller dwarf. | |
ISO 8859-1 | ISO 8859-1 | Improper practice of xylophone music torments every taller dwarf. |
UTF-8 | The wrong practice of xylophone music torments every taller dwarf. | |
Windows-1251 | Windows-1251 | Широкая электрификация южных губерний даст мощный толчок подъёму сельского хозяйства. |
ISO 8859-1 | Øèðîêàÿ ýëåêòðèôèêàöèÿ þæíûõ ãóáåðíèé äàñò ìîùíûé òîë ÷ îê ïîäú¸ìó ñåëüñêîãî õîçÿéñòâà. | |
KOI8-R | ьХПНЙЮЪ ЩКЕЙРПХТХЙЮЖХЪ ЧФМШУ ЦСАЕПМХИ ДЮЯР ЛНЫМШИ РНКВНЙ ОНДЗ╦ЛС ЯЕКЭЯЙНЦН УНГЪИЯРБЮ. | |
ISO 8859-5 | иш confirmation. | |
Code page 866 | ╪шЁюър ¤ыхъЄЁшЇшърЎш ■ цэ√ї уєсхЁэшщ фрёЄ ью ∙ э√щ Єюыўюъ яюф · ╕ьє ёхы№ёъюую їюч щёЄтр. | |
Shift JIS | Shift JIS | 文字 化 け (も じ ば け) と は 、 コ ン ピ ュ ー タ で 文字 を 表示 す る 際 際 に 、 正 し く 表示 さ れ な い 現象 の こ と。 |
Macintosh novel | ï∂éöâªÇØÅiÇ ‡ Ç∂ÇŒÇØÅjÇ∆ÇÕÅAÉRÉìÉsÉÖÅ [É ^ Ç≈ï∂éöÇï \ é¶Ç∑ÇÈç € Ç… ÅAê≥ÇµÇ ≠ ï \ é¶Ç≥ÇÍÇ »Ç ¢ åªè € Ç Ç ± B | |
KOI8 or KOI7 (Russian mode) | KOI8 or KOI7 (Russian mode) | Русский Текст |
ASCII or KOI7 (Latin mode) | RUSSKIJ TEKST |
The KOI encodings offer a special feature, which the last example shows: If they are incorrectly interpreted as ASCII (and the most significant bit is ignored in the case of KOI8), a coarse Latin transliteration results with swapped upper and lower case letters. Because the Cyrillic alphabet has more letters than the Latin one, some Cyrillic letters become punctuation marks.
Coded data
Salad of characters can also be used deliberately to save or transmit any data in places where only certain characters are possible, for example when using a bank transfer or in Internet addresses .
The Base64 method is often used in Internet addresses for this purpose . It generates a text from any data, which only consists of the letters A – Z, a – z, the digits 0–9 and the special characters +, / and =. Data encoded with Base64 looks like this:
- RGllc2VyIFRleHQgaXN0IG5pY2h0IHZlcnNjaGzDvHNzZWx0Lg ==
At first glance, it is not clear what data this mess of characters contains. However, if you apply the Base64 encoding backwards, you get this text:
- This text is not encrypted.
Some websites use this coding technique in order not to reveal the actual data obviously in the URL.
Letter salad
Character salad / mojibake can be viewed as a special case of letter salad . These are generally to be understood as meaning strings of characters that are difficult or impossible to decipher and that may have arisen for other reasons besides an incorrect combination of different character encodings .
References and comments
- ↑ Here at least the replacement character provided by Unicode is used.
Web links
- Michael Rollins: Avoiding the 'mojibake' bugaboo . The Japan Times, February 27, 2003 (English)
- Paul Hastings: Do You Want Coffee with That Mojibake? Character encodings and CFMX. Coldfusion Developer's Journal, April 13, 2004
- Tomohiro Kubota: What is Mojibake? ( Memento of May 24, 2008 in the Internet Archive ) with examples from the years 2000 and 2003 (English)
- John de Hoog: Avoiding Mojibake . (English)