Internationalized domain name

from Wikipedia, the free encyclopedia

As internationalized domain names ( internationalized domain name , IDN ), colloquially umlaut domain or special characters domain are domain names called, the umlauts , diacritics or characters from other alphabets other than the Latin alphabet contained. Such characters were not originally intended in the Domain Name System and were subsequently made possible by the Internet standard Internationalizing Domain Names in Applications ( IDNA ).

In principle, almost all Unicode characters are permitted in IDNs. However, each registry for domains regulates individually which characters they allow for domain registrations .

The share of IDNs in all registered domains below .de is around four percent.

functionality

Unicode domain names lead to be ASCII -compatible encodings ( English ASCII Compatible Encoding ; ACE ) converted. The conversion takes place at the client (e.g. the browser or mail program) so that the server infrastructure does not have to be adapted. Instead of the Unicode strings, the user can also enter the ACE strings directly in the client. This means that clients without IDN capability can also work with internationalized domains, provided the user knows the ACE string. However, this is more cumbersome because the user cannot easily read the Unicode domain name from an ACE string.

In the original IDNA2003 ( RFC 3490 ) process, the domain names were first normalized using the Nameprep process. The normalization consisted of replacing all uppercase letters with lowercase letters and swapping out equivalent characters. For example, “ß” was specified as equivalent to “ss”, so that the domain names “STRaße” and “strasse” were identical. With the new version IDNA2008 , which is partly also known as IDNAbis and was developed from 2008 to 2010 ( RFC 5890 , RFC 5891 , RFC 5892 , RFC 5893 , RFC 5894 ), normalization is no longer part of IDNA, but is in the User interface responsibility . IDNA2008 no longer prescribes normalization, but recommends a general algorithm in which the conversion from uppercase to lowercase letters and a few other rules are still provided. With .de it has been possible since November 16, 2010 (for owners of a domain with “ss” even earlier) to register separate domains with “ß”.

Following the normalization, the non-ASCII characters are removed from the name using Punycode and an ASCII string derived from it is added at the end of the name, in which the position and type of the Unicode character is coded. To distinguish an IDN from an ASCII domain name, the punycode string begins with the prefix xn-- . The unusual character string xn-- was chosen because it practically does not occur in real words or proper names and therefore conflicts with ASCII domains are extremely unlikely.

Incompatibilities of IDNA2003 and IDNA2008

The Unicode Technical Standard 46 describes measures with which the incompatibilities between IDNA2003 and IDNA2008 are to be minimized in practice in order to facilitate the switch from IDNA2003 to IDNA2008. But even three years after its introduction, browser support for IDNA2008 is still poor (see also section Support in the browser ): Since IDNA2003 converts “ß” to “ss”, the new “ß” domains are often not accessible or referenced the previous "ss" domains. As long as the "ß" domain and the "ss" domain belong to the same offer, the user usually does not notice anything; However, if the “ß” domain and the “ss” domain belong to different offers, this sometimes leads to confusion.

In addition, IDNA2008 no longer allows about 8000 Unicode characters that were still valid components of domain names after IDNA2003, so that previously valid domain names that contain these characters become invalid when switching from IDNA2003 to IDNA2008.

Example domains

dömäin.example           → xn--dmin-moa0i.example
äaaa.example             → xn--aaa-pla.example
aäaa.example             → xn--aaa-qla.example
aaäa.example             → xn--aaa-rla.example
aaaä.example             → xn--aaa-sla.example
déjà.vu.example          → xn--dj-kia8a.vu.example
efraín.example           → xn--efran-2sa.example
ñandú.example            → xn--and-6ma2c.example
foo.âbcdéf.example       → foo.xn--bcdf-9na9b.example
موقع.وزارة-الاتصالات.مصر   → xn--4gbrim.xn----ymcbaaajlc6dj7bxne2c.xn--wgbh1c
☃.example                → xn--n3h.example (erlaubt nach IDNA2003, aber unzulässig nach IDNA2008)
fußball.example          → xn--fuball-cta.example (wird nach IDNA2003 zwingend zu fussball.example, nicht jedoch nach IDNA2008)

A Whois query of the form whois -h whois.denic.de -- -C ISO-8859-1 example.comor whois -h whois.denic.de -- -C UTF-8 example.comon Unicode-based systems supplies u for registered domains. a. the spelling in Punycode .

Character sets

IDN top-level domains have existed since May 2010, and thus complete domains made up of non-Latin letters. For example, there is the top-level domain .مصر , which is the Arabic word for Egypt ( Misr ); the website of the Egyptian Ministry of Communication and Information Technology can be reached via the domain consisting entirely of Arabic characters موقع.وزارة-الاتصالات.مصر. The domain name should be read from right to left according to Arabic.

Below is a list of some top-level domains which non-ASCII characters are allowed in the respective IDN domains:

.com and .net
à á â ã ä å æ ā ă ą ç ć ĉ ċ č ď đ è é ê ë ē ĕ ė ę ě ĝ ğ ġ ģ ĥ ħ ì í î ï ĩ ī ĭ į ı ð ĵ ñ ĸ ĺ ļ ľ ł ł ĸ ĺ ļ ń ņ ň ŋ ò ó ô õ ö ø ō ŏ ő œ ŕ ŗ ř ś ŝ ş š ţ ť ŧ ù ú û ü ũ ū ŭ ů ű ų ŵ ý ŷ ÿ ź ż ž þ
.info
á ä å æ ā ą ć č é ē ė ę ģ í ī į ð ķ ļ ł ñ ń ņ ó ö ø ō ő ŗ ś š ú ü ū ű ų ý ź ż ž þ
.org
ä ö ü
.at
à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø œ š ù ú û ü ý я ž þ
.ch and .li
à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø œ ù ú û ü ý ÿ þ
.de
à á â ã ä å æ ā ă ą ç ć ĉ ċ č ď đ è é ê ë ē ĕ ė ę ě ĝ ğ ġ ģ ĥ ħ ì í î ï ĩ ī ĭ į ı ð ĵ ñ ĸ ĺ ļ ľ ł ł ĸ ĺ ļ ń ņ ň ŋ ò ó ô õ ö ø ō ŏ ő œ ŕ ŗ ř ś ŝ ş š ţ ť ŧ ù ú û ü ũ ū ŭ ů ű ų ŵ ý ŷ ÿ ź ż ž þ ß
.eu
à á â ã ä å æ ā ă ą ç ć ĉ ċ č ď đ è é ê ë ē ĕ ė ę ě ĝ ğ ġ ģ ĥ ħ ì í î ï ĩ ī ĭ į ı ð ĵ ñ ĺ ļ ľ ŀ ł ł ĺ ļ ľ ń ņ ň ʼn ŋ ò ó ô õ ö ø ō ŏ ő œ ŕ ŗ ř ś ŝ š ș ť ŧ ț ù ú û ü ũ ū ŭ ů ű ų ŵ ý ŷ ÿ ź ż ž þ ΐ ά έ ή ί ΰ α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω ϊ ϋ ό ύ ώ а б в г д е ж з и й к л м у о п р с т т о п р с т т х ц ч ш щ ъ ы ь э ю я ἀ ἁ ἂ ἃ ἄ ἅ ἆ ἇ ἐ ἑ ἒ ἓ ἔ ἕ ἠ ἡ ἢ ἣ ἤ ἥ ἦ ἧ ἰ ἱ ἲ ἳ ἴ ἵ ἶ ἷ ὀ ὁ ὂ ὃ ὑ ὒ ὐ ὓ ὔ ὕ ὖ ὗ ὠ ὡ ὢ ὣ ὤ ὥ ὦ ὧ ὰ ά ὲ έ ὴ ή ὶ ί ί ὸ ό ὺ ύ ὼ ώ ᾀ ᾀ ᾂ ᾂ ᾃ ᾄ ᾅ ᾆ ᾇ ᾐ ᾑ ᾒ ᾓ ᾔ ᾕ ᾖ ᾗ ᾠ ᾡ ᾢ ᾣ ᾧ ᾰ ᾱ ᾲ ᾳ ᾴ ᾶ ᾷ ῂ ῃ ῄ ῆ ῇ ῐ ῑ ῒ ΐ ῖ ῗ ῠ ῡ ῢ ΰ ῤ ῥ ῥ ῦ ῧ ῲ ῳ ῴ ῶ ῷ

Support in the browser

Support for internationalized domain names is common in current browsers, at least according to IDNA2003. In contrast, IDNA2008 was hardly supported by any browser in 2013 either.

Some IDNA2003 capable browsers:

Some IDNA2008-capable browsers (as of December 2016):

  • Firefox (since Firefox Nightly 46.0a1)
  • Safari from version 10.1 (from [1] (Safari Technology Preview 19))

ASCII spoofing problem (→ homographic attack )

The use of Unicode in domain names makes it easier to spoof web pages as the visual representation of the IDN string in a browser sometimes makes it impossible to distinguish a legitimate page from a spoofed one, depending on the character set used. For example, the Unicode character U + 0430, the Cyrillic lower case а, looks like the Unicode character U + 0061, which corresponds to the lower case letter a of the Latin writing system. Said Cyrillic character is z. B. Part of the above list of possible characters within .eu.

See also

Web links

Individual evidence

  1. Table of IDNA characters , unicode.org
  2. Statistics of the domain development on denic.de
  3. a b "ß" in future in a permitted character set for .de domains , DENIC press release , October 26, 2010
  4. Unicode Technical Standard # 46 - Unicode IDNA Compatibility Processing , The Unicode Consortium, accessed January 24, 2019
  5. Internationalized Domain Names (IDN) FAQ - How does IDNA2008 differ from IDNA2003? , The Unicode Consortium, accessed January 24, 2019
  6. a b c IDNA Hell , Anne van Kesteren, November 27, 2012, accessed January 24, 2019
  7. The first completely non-Latin domains go online at Heise-online
  8. IDNs at nic.at ( Memento of the original from February 10, 2007 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.nic.at
  9. General terms and conditions for the registration and administration of domain names under ".ch" and ".li", Appendix 2
  10. DENIC IDN list
  11. Supported characters ( Memento of the original from July 29, 2013 in the Internet Archive ) Info: The archive link was automatically inserted and not yet checked. Please check the original and archive link according to the instructions and then remove this notice. . The European Registry of Internet Domain Names. @1@ 2Template: Webachiv / IABot / www.eurid.eu
  12. Mozilla Bug 479520