Homographic attack

from Wikipedia, the free encyclopedia

Homographic or homographic attack (or homographic or homographic phishing ) is a method of spoofing in which the attacker uses the similar appearance of different characters to give computer users a false identity, especially in the case of domains . The attacker lures the user to a domain name that looks almost exactly like a well-known domain name but leads somewhere else, for example to a phishing website.

With the introduction of internationalized domain names is beyond the ASCII - character set a variety of fonts for domain names available that contain a number of similar character to some extent. This increases the possibilities for homographic attacks.

Homograph in ASCII

ASCII contains characters that look similar: the number 0 is similar to the letter O, and the letters l (lowercase L) and I (capital i) and the number 1 are similar to each other. (Such similar or identical characters are called homoglyphs , they can be used to write homographs , words that look the same but do not mean the same thing.)

Examples of possible spoofing attacks are the domains G00GLE.COM, which GOOGLE.COM looks similar in some fonts , or googIe.com with a capital i instead of the lower case L, which looks quite similar to google.com. PayPal was actually the target of a phishing attack that used the PayPaI.com domain with an uppercase i.

With some proportional fonts such as Tahoma (default setting of the address line in Windows XP) homographs are generated if you put a c in front of a j, l or i. The result cl is similar to d, cj is similar to g, and ci is similar to a. The long s (ſ) is easy to confuse with the f, but it is now evaluated as an “s” in URLs.

Homographs in internationalized domain names

In multilingual computer systems, logically different characters can look the same. For example, the Unicode character U + 0430, the small Cyrillic letter a (“а”), can look the same as the Unicode character U + 0061, the small Latin letter a (“a”).

The problem arises from the different way in which the characters are processed by the consciousness of the user and by the software. From the user's point of view, the Cyrillic “а” within a Latin character string is the Latin “a”. In most fonts, there is no difference between the glyphs for these characters . However, the computer treats the characters differently when it processes the character string as an identifier. The user's assumption that there is a one-to-one relationship between the visual appearance of the name and the named item fails here.

With the internationalized domain names, a backward compatible process is available to use the full Unicode character set for domain names. This standard has already been largely implemented. However, this system expands the character set from a few dozen to many thousands of characters, which increases the scope for homographic attacks considerably.

Evgeniy Gabrilovich and Alex Gontmakher from Technion in Haifa published an essay in 2001 called The Homograph Attack . You describe a spoofing attack with Unicode URLs. To demonstrate the feasibility of this attack method, they successfully registered a modification of the microsoft.com domain that contained Russian letters.

It was anticipated that such problems would arise even before the IDN was introduced. Guidelines have been issued to guide registries to avoid or reduce the problem. So was z. B. recommended that registries should only accept characters from the Latin alphabet and that of their own country and not the entire Unicode character set. However, this recommendation has been disregarded by significant top-level domains.

On February 7, 2005, Slashdot reported that the exploit was disclosed at the Shmoocon hacker meeting. The URL http: //www.pаypal.com/, where the first a is replaced by a Cyrillic а, directed web browsers that supported IDNA to appear on the website of the payment service PayPal , but in reality another website was accessed .

Cyrillic

The Cyrillic alphabet is most commonly used for homographic attacks. The Cyrillic letters а , с , е , о , р , х and у look almost or completely the same as the Latin letters a , c , e , o , p , x and y . The Cyrillic letters З , Ч and б are similar to the numbers 3 , 4 and 6 .

Italic typefaces produce further possibilities of confusion: дтпи ( д E button п и in normal type) is similar to g m n u (However, in many fonts similar д sign of partial derivative , ).

If capital letters are taken into account, then В Н К М Т can be confused with B H K M T , as well as the large versions of the above-mentioned small Cyrillic homographs.

Not Russian Cyrillic letters and their suitable for interchanging counterparts һ and h , і and i , ј and j , ѕ and s , Ғ and F . ё and ї can be used to simulate ë and ï .

Greek

From the Greek alphabet only the omikron ο and sometimes the ny ν resemble a Latin lowercase letter, as used in URLs. In italic fonts, the Latin a is similar to the Greek alpha α .

If approximate similarity counts, the Greek letters εικηρτυωχγ are added, which can be confused with eiknptuwxy. If capital letters are used, the list expands considerably: Greek ΑΒΕΗΙΚΜΝΟΡΤΧΥΖ looks identical to Latin ABEHIKMNOPTXYZ.

In some fonts, the Greek beta β can be confused with the German “sharp s” ß . The code page 437 of MS-DOS actually uses the SS in place of the β. The Greek small sigma ς can be confused with the Latin small C with cedilla ç .

The accented Greek letters όίά look deceptively similar in many fonts to óí á , although the third letter, the alpha, only resembles the Latin a in some italic fonts .

Armenian

The Armenian alphabet also contains letters that are suitable for homographic attack: ցհոօզս looks like ghnoqu, յ is similar to j (although it has no period), and ք can look similar to p or f, depending on the font used. Two Armenian letters (Ձշ) may also look similar to the number 2, and one (վ) sometimes looks similar to the number 4.

However, using the Armenian alphabet is not easy. Most standard fonts contain Greek and Cyrillic, but no Armenian characters. This is why Armenian characters are usually reproduced in a special font ( sylfa ) under Windows so that the mixture is visible. In addition, the Latin g and the Armenian ց are designed differently from each other in this font .

Hebrew

The Hebrew alphabet is rarely used for spoofing. Three of his characters are sufficiently suitable for this: Samech (ס) can resemble an o, Waw with a diacritical point (וֹ) resembles an i, and Chet (ח) resembles an n. Some Hebrew letters resemble other characters less clearly and are therefore more suitable for foreign branding than for homographic attacks.

Since Hebrew script is written from right to left, difficulties can arise when combining it in bidirectional text with characters written from left to right.

Cherokee

The Cherokee syllabary contains characters that are confusingly similar to Latin letters and Arabic numerals. The character strings ᎠᎡᎢᎥᎩᎪᎫᎬᎳᎵᎷᎻᏀᏃᏎᏒᏔᏚᏞᏟᏢᏦᏭᏮᏴ and DRTiYAJEWPMHGZ4RWSLCPK96B cannot be distinguished at first glance.

protection

The simplest protective measure is that a web browser does not support IDNA and similar functions, or that the user switches off these functions of his browser. This can mean that access to websites with internationalized domain names (IDN) is blocked. The browsers usually allow access and display the URLs in Punycode . In both cases, the use of domain names with non-ASCII characters is blocked.

Opera displays IDNs in Punycode, unless the top-level domain (TLD) fends off homographic attacks by restricting the characters allowed in domain names. The browser allows the user to manually add TLDs to the allowed list.

Firefox from version 22 (2013) displays IDNs if either the TLD restricts the characters allowed in domain names or labels only come from one writing system. Otherwise, IDNs are shown in Punycode.

Internet Explorer 7 allows IDNs, but not labels that mix writing systems in different languages. Such mixed labels are represented in punycode. Exceptions are locales , where it is common to use ASCII letters mixed with local writing systems.

As an additional measure of protection, Internet Explorer 7, Firefox 2.0 and above, and Opera 9.10 contain phishing filters that attempt to warn users when malicious websites are visited.

One possible protection method that has been proposed in the English-speaking world would be for web browsers to mark non-ASCII characters in URLs, for example by using a background of different colors. That would not protect against a non-ASCII character being replaced by another similar non-ASCII character (e.g. a Greek ο with a Cyrillic о). A more far-reaching solution that avoids this weakness would be to use a different color for each writing system that occurs.

Certain fonts represent homoglyphs differently and can help to identify characters that do not belong in a URL. For example, in Courier New you can distinguish some characters that look the same in other fonts. However, it is not yet easily possible for the typical user to change the font of the address line.

The Safari approach is to represent problematic fonts in Punycode. This can be changed by setting the settings in the Mac OS X system files.

The introduction of country-specific top-level domains (ccTLD) will make spoofing more difficult. For example, the future Russian TLD “.рф” will only accept Cyrillic domain names and will not allow any mixture with Latin letters. However, the problem persists with general TLDs like ".com".

See also

Web links

German

English

Individual evidence

  1. Evgeniy Gabrilovich, Alex Gontmakher: The Homograph Attack . (PDF; 73 kB) In: Communications of the ACM , 45 (2), February 2002, p. 128
  2. IDN spoof demo ( memento of the original from March 20, 2005 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. from shmoo.com  @1@ 2Template: Webachiv / IABot / www.shmoo.com
  3. Advisory: Internationalized domain names (IDN) can be used for spoofing. . Opera. Retrieved May 8, 2010.
  4. Opera's Settings File Explained: IDNA White List . Opera software. Retrieved May 8, 2010.
  5. Mozilla: IDN Display Algorithm. Retrieved February 21, 2018 .
  6. Bugzilla.Mozilla.org: Bug 722299. Retrieved February 21, 2018 .
  7. Changes to IDN in IE7 to now allow mixing of scripts . Microsoft. Retrieved May 8, 2010.
  8. Phishing Filter in IE7 . Microsoft. Retrieved May 8, 2010.
  9. Firefox 2 Phishing Protection . Mozilla. Retrieved May 8, 2010.
  10. ^ Opera Fraud Protection . Opera software. Retrieved May 8, 2010.
  11. About Safari International Domain Name support . Retrieved May 8, 2010.