Uniform Resource Locator

from Wikipedia, the free encyclopedia

A Uniform Resource Locator (Abbr. URL ; English for uniform resource indicator ) to identify and locate a resource such as a Web page via which to use access method (for example, the used network protocol such as HTTP or FTP ) and the location (engl. Location of the resource) in computer networks . The original standard was published in December 1994 as RFC 1738 ; it has since become obsolete due to the publication of several other RFCs . The current RFCs are (as of 2016):

  • RFC 3986 . - Uniform Resource Identifier (URI): Generic Syntax . (English).
  • RFC 4248 . - The telnet URI scheme . (English).
  • RFC 4266 . - The gopher URI Scheme . (English).
  • RFC 6068 . - The 'mailto' URI Scheme . (English).
  • RFC 6196 . - Moving mail server: URI Scheme to Historic . (English).
  • RFC 6270 . - The 'tn3270' URI Scheme . (English).

URLs are a subtype of the general identification designation using Uniform Resource Identifiers (URIs). Since URLs are the first and most common type of URI, the terms are often used interchangeably . In common parlance, URLs are also referred to as Internet addresses or web addresses , whereby (following the colloquial equation of Internet and WWW ) mostly specific URLs of websites are meant.

construction

The basic structure consists of a URL access method defining schema name (English scheme ) and a schema-specific part (scheme-specific part) , which are separated by a colon:

<scheme>:<scheme-specific-part>

wherein schemeoften, but not necessarily is the same as the underlying network protocol (with ftpor httpis the case for example, but not mailtoor file).

Possible URL parts are, for example http:

       |------------------ Schema-spezifischer Teil ------------------|

 https://max:muster@www.example.com:8080/index.html?p1=A&p2=B#ressource
 \___/   \_/ \____/ \_____________/ \__/\_________/ \_______/ \_______/
   |      |    |           |         |       |          |         |
Schema⁺   | Kennwort      Host      Port    Pfad      Query    Fragment
       Benutzer

⁺ (hier gleich Netzwerkprotokoll)

at mailto:

mailto:max@example.org
\____/ \______________/
   |          |
Schema⁺       |
        E-Mail-Adresse gemäß RFC 5322

⁺ (hier kein Netzwerkprotokoll)

at news(in this example neither a network protocol nor a host address is included):

 news:alt.hypertext
 \__/ \___________/
   |        |
Schema      |
       Name der Newsgroup

at file:

 file:///verzeichnis/unterverzeichnis/datei
 \__/ \___________________________________/
   |                    |
Schema                  |
       Pfad zu einer lokalen Datei im Dateisystem des Rechners, der den URL interpretiert

Strictly speaking, this scheme has the form file://<host>/<path>, but the host part is practically not used because the filescheme can hardly be used meaningfully over a network due to the lack of a way to specify a network protocol for accessing the file. File URLs are used in the Java programming language, for example , to access local files in this way. Depending on the browser, filelinks can often only be opened after a special client-side configuration or with the help of add-ons, etc.

Scheme (scheme)

Specifies the technical method with which the resource is to be addressed. Is mostly, but not necessarily, identical to the network protocol used , via which the resource can be located. Examples are HTTP , HTTPS or FTP , but also mailto(for writing an e-mail) or file(for accessing local files).

Schema-specific part (scheme-specific part)

Depending on the scheme, different specific information is required and possible. In most cases it begins with the character string //, but in some variants only the colon is defined. The following examples relate to the Hypertext Transfer Protocol (HTTP).

User and password (user, password)

If needed, you can login -Information from user (user) and password (password) to be transmitted. These are separated from each other by a colon and prefixed to the host with a separating at sign ( @ ).

Even if the HTTP protocol was selected for this example , specifying the user name and password as part of the URL is not part of the HTTP specification! Current browsers accept this URL syntax, but ask the user whether he really wants to log in with the specified data. The Internet Explorer  6 (Windows XP SP2) and later fall here out of the frame by flatly reject this URL syntax to be defective. With a registry entry you can force them to behave in the same way as the predecessors up to version 5.5 show: They take over the login data without being asked and transfer them directly to the server.

With some other protocols, such as FTP , the specification of the user data in the form shown is, however, completely correct and covered by the standards.

Host

The host component is separated in the form of an IPv4 address in decimal notation by dots, in the form of an IPv6 address in hexadecimal notation by colons and placed in square brackets or noted in the form of an FQDN .

port

Specifying the port allows a TCP port to be controlled . If no port is specified, the standard port of the respective protocol is used - for example with HTTP 80, with HTTPS 443 and with FTP 21.

Path (Path)

The path describes a certain resource (this can, for example, coincide with the directory structure of the target system, e.g. a file or a directory) on the server . The path can also be empty. An empty path can optionally be replaced by a slash and has the same meaning.

The interpretation ( file or directory ; deliver text file or execute script ) is left to the server. A typical example for the interpretation of freedom is the behavior at the request of the path /by a client: The server delivers about Depending on the setting the content of a particular excellent file (such as /index.html, /README, /HEADER), without this being apparent to the requesting client. In the same way, however - depending on the protocol - the server can also explicitly forward to this resource or output a directory listing.

Query (Query)

In the case of HTTP,  a query string can follow the actual resource pointer - separated by a question mark . This means that additional information can be transmitted that can be further processed on the server or client side.

fragment

After a hash mark , a part of the resource can be referenced, typically an anchor in an HTML page, which is automatically scrolled down after calling up the page : The URL http://example.com/dokument.html#absatz3in the fictitious document here would cause the browser to go to the beginning of the third Paragraph to scroll.

Examples

  • ftp://max:muster@ftp.example.comFTP with user and password
  • http://de.wikipedia.org... website without path (call up the start page )
  • http://de.wikipedia.org/wiki/Uniform_Resource_Locator ... website with path
  • https://de.wikipedia.org... like calling up the website without specifying a path, but with the encrypted Hypertext Transfer Protocol Secure
  • mailto:hans@example.org... to write an e-mail to the specified mail address (opens the standard mail client with a new, empty message in which the TO address is pre-filled)
  • news:alt.hypertext... display of a Usenet news group (generic, without specifying the network protocol NNTP )
  • nntp:alt.hypertext ... display of a Usenet newsgroup (with specification of the network protocol NNTP)
  • telnet:example.org… Start a Telnet session
  • file:///foo/bar.txt … Access to a local file

Relative URLs

In addition to the absolute or full URLs shown so far, there are also relative URLs. They are only valid within a context from which they inherit properties. You are missing the location on the World Wide Web or a real intranet . They are mainly possible in the http, https and ftp group, but also with mailto. That would correspond to a telephone number without an area code (of the country, the local network ).

Relative URLs for http, https, ftp
Beginning meaning annotation example
// Same protocol makes sense to use http:or https: the current environment //example.com/pfad/zu/datei
/ Same domain ( host:port), " root directory " /pfad/zu/datei
# Same resource Effect over side effect #
#fragment Same resource, jump label #knoten
Nothing Same resource
../ one path segment up A server does not have to support /structured path segmentation. /pfad/zur/../zur/datei
./relativer/pfad
./
other
same path segment

Relative URLs are often used to store a group of related resources either in a local file system or in different locations in different network domains unchanged and to link them to one another. Incidentally, the interpretation of the identifier (character string between host:portand #) is free for each server - although it handles the vast majority of servers and all standard software as specified above, it can be evaluated /exactly as ? % &according to its own rules.

With mailto:would be a relative URL mailto:Nachbar(without  @) - it is only valid in the local network.

List of allowed characters

Reserved characters are:

  • special character / ? # [ ] @ : $ & ' ( ) * + , ; =

Unreserved characters are:

  • special character - . _ ~
  • Letters A–Z, a–z
  • Digits 0–9

In certain cases, the space  (alternatively with +, and %) must also be shown in percent coding .

Use of language

In German usage, URL often has the feminine article , but is also used with a masculine article. The choice of the genus depends on whether it is based on the German translation of the address (feminine) is formed or that nouns on using the grammar rule -or (here Locator or -identifikator ) or -er ( designator , -lokalisierer , -anzeiger ) are always masculine in German.

URLs in texts

Appendix C of RFC 3986 recommends using URIs (and therefore URLs) in texts

  • independently on one line,
  • with double quotes "http://example.com/"or
  • with angle brackets <http://example.com/>

to be delimited against the context and especially against the punctuation of the sentence.

history

Name and standardization

In the early days of the WWW (from the end of 1990) there was info.cern.chinitially no specific designation for the addressing of websites in the documentation , the topic was only descriptive as “W3 document address”, “W3 name”, “W3 address” or “Hypertext” Name "is documented. The form of addressing specified at that time (and used in the first websites) already corresponds to the form standardized later as "URL"; Although changes were considered in the standardization process, they were rejected because of the advanced spread of the WWW.

In the summer of 1992, Tim Berners-Lee tried to set up a working group at the IETF meeting in Boston to standardize access to documents on the web. He suggested the name Universal Document Identifier (UDI) , which, according to his ideas, should define a general Internet standard. The name was criticized as too "arrogant", which was mainly due to the word universal (English for generally valid , comprehensive ). Instead, the more modest term was coined by the group uniform (Engl. For uniform ) proposed. In addition, “Document” was replaced by “Resource” to emphasize that the web should be integrated with other information systems. The URI working group finally came into being, with a further name change for the standard to be defined: “Identifier” was replaced by “Locator” to emphasize that web addresses are not permanently registered addresses.

Due to the conflict-ridden working methods of the group, the first - still informal - draft standardization RFC 1630 was presented by Berners-Lee in June 1994. He mentions the name “Universal Resource Identifiers” favored by Berners-Lee in the title and already defines the terms URI, URL and URN . In December 1994 the group published RFC 1738, the standard entitled "Uniform Resource Locators (URL)".

Components

Berners-Lee partially deliberately borrowed the individual components from existing systems in order to make web addresses appear as immediately familiar or logical to new users as possible:

  • The path ( ) directly quotes the path syntax in UNIX file systems .http://www.example.com/verzeichnis/unterverzeichnis/datei.html
  • The introduced with a double-slash notation of the host is from the syntax of the network file system of Apollo Domain / OS , in the paths on remote hosts on the pattern //example.com/verzeichnis/unterverzeichnis/…were addressed.
  • The fragment marked with a double cross is borrowed from the US spelling for apartment and suite numbers in postal addresses: 12 Foo Avenue # 34 stands for Foo Avenue No. 12, Apartment 34 . Correspondingly means part (section, chapter ...) within the document .datei.html#ressource ressourcedatei.html

See also

Wiktionary: URL  - explanations of meanings, word origins, synonyms, translations

literature

  • Tim Berners-Lee , Mark Fischetti: The Web Report. The creator of the World Wide Web on the limitless potential of the Internet . Econ, Munich 1999, ISBN 3-430-11468-3 (English: Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web .).

Web links

  • RFC 3986 . - Uniform Resource Identifier (URI): Generic Syntax . [Errata: RFC 3986 ]. January 2005. (Replaces RFC 2732 - Updated by RFC 6874  - English).
  • T. Berners-Lee, L. Masinter, M. McCahill:  RFC 1738 . - Uniform Resource Locators (URL) . [Errata: RFC 1738 ]. December 1994. (Updated by RFC 1808  - English).
  • R. Fielding:  RFC 1808 . - Relative Uniform Resource Locators . June 1995. (Obsolete by RFC 3986 - English).

Individual evidence

  1. Duden - German Universal Dictionary. 6th edition.
  2. Internet and World Wide Web - the difference. News.de, October 29, 2009, accessed December 11, 2010 .
  3. a b RFC 3986  - Uniform Resource Identifier (URI): Generic Syntax . January 2005. Section 3.3: Path. (English).
  4. RFC 1738  - Uniform Resource Locators (URL) . December 1994. Section 3.10: FILES. (English).
  5. Class File (Java 1.5.0 API). Oracle , accessed December 11, 2010 .
  6. File URI scheme #Browser behavior in the English language Wikipedia
  7. Firefox, for example, has been blocking all local access since 2012 for security reasons file:if the surrounding document http://comes from .
  8. RFC 2616  - Hypertext Transfer Protocol . Section 3.2.2: http URL.  Standard: [HTTP / 1.1]. (English).
  9. RFC 1738  - Uniform Resource Locators (URL) . December 1994. Section 3.1: Common Internet Scheme Syntax. (English).
  10. RFC 1738  - Uniform Resource Locators (URL) . December 1994. Section 3.3: HTTP. (English).
  11. RFC 3986  - Uniform Resource Identifier (URI): Generic Syntax . January 2005. Section 4.2: Relative Reference. (English).
  12. Matas Vaitkevicius: URL encoding the space character: + or% 20? In: stackoverflow.com. April 29, 2015, accessed April 8, 2016 .
  13. HTML URL Encoding Reference. In: w3schools.com. Retrieved April 8, 2016 .
  14. Duden - German Universal Dictionary , see also duden.de
  15. korkturen.de - Forum - The URL - The (advertising) banner . In : korkturen.de .
  16. Technical details. CERN / W3C, November 13, 1992, accessed December 22, 2010 .
  17. a b W3 Naming Schemes. CERN / W3C, February 24, 1992, accessed December 22, 2010 .
  18. W3 address syntax: BNF. CERN / W3C, June 29, 1992, accessed December 22, 2010 .
  19. a b Berners-Lee 1999, p. 63.
  20. Berners-Lee 1999, p. 62.
  21. a b c d Tim Berners-Lee: Frequently asked questions - Why the //, #, etc? November 20, 2007, accessed December 22, 2010 .