Uniform Resource Identifier
A Uniform Resource Identifier (abbreviation URI , English for uniform identifier for resources ) is an identifier and consists of a character string that is used to identify an abstract or physical resource . URIs are used to designate resources (such as websites , other files, calling up web services , but also e-mail recipients , for example ) on the Internet and there, above all, on the WWW . The current status for 2016 is published as RFC 3986 .
Tim Berners-Lee originally introduced the term in 1994 in RFC 1630 as a Universal Resource Identifier . Only later did the resolution uniform appear in official W3C documents . For this reason, Universal is occasionally mentioned as the first part of the name - even in specialist literature.
URIs can be integrated as a character string (encoded with a character set ) in digital documents, especially those in HTML format, or written down by hand on paper. A reference from one website to another is called a hyperlink or “link” for short.
A URI (or in the extension IRI) is the abstract principle, the syntax, an identifier in which a set of rules is given. This basic concept of the URI is then transferred to various specific areas of application, for which the corresponding rules and terms then apply. For example:
- “URIs cannot contain spaces.” Or
- "At the beginning there is the name of a scheme in ASCII letters and digits, if necessary subdivided by period and hyphen, beginning with letters, followed by a colon."
There are basically three types of applications:
- The content of a resource (and thus every copy with the same content) is given a unique identifier.
- Example: The ISBN of a book. There are an unlimited number of copies of this book.
- The location of a resource is defined by its name. So it is identified by where it can be found; however, it does not necessarily determine their content.
- Example: Current weather report on the Internet. It is known where (URL) this can be found; the content is constantly changing.
- Example: A book is described by the library in which it is located: there in the second room, third shelf, fourth compartment from the top, fifth book from the left. The current top 5 of the bestseller list could be there - regardless of their content.
- The rules of the URI can also be used if something is not a classic resource at all, but still needs to be identified.
- Initially, “resource” was understood to be something like resources in the IT sense, that is, in the broadest sense, electronic files that could also be made available on the Internet. In 1994 RFC 1630 and RFC 1738 were based on this. However, this concept has been expanded. In 1998, RFC 2396 (Section 1.1) stipulated: “A resource can be anything that has identity.” People, organizations and printed books could also be viewed as resources. This consideration aims at the identification of assignable entities.
- Examples: e-mail address, cell phone number, passport as well as the legitimate owner, social security number, fingerprint and the person in addition.
In January 2005, with RFC 3986, the concept of the resource in the sense of the URI was expanded to include abstract concepts:
“A resource is not necessarily accessible via the Internet; eg, human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (eg, 'parent' or 'employee'), or numeric values (eg, zero, one, and infinity ). ”
“A resource is not necessarily accessible via the Internet; for example, people, companies, and hardback books in libraries can also be a resource. Similarly, abstract concepts such as operators and operands of a mathematical equation, types of relationships (e.g. 'parent' or 'employee'), or numbers (e.g. zero, one and infinity) can be a resource. "
According to the current RFC 3986 standard , a URI consists of five parts:
scheme(scheme or protocol),
authority(provider or server),
fragment(part), of which only
pathmust be present in every URI. The generic syntax is:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
The focus is
hier-part(hierarchical part) for an optional
path. If the specification of a is
authorityrequired in order to ultimately locate the resource, it is introduced by a double slash and the path specification that follows must begin with a slash. The standard clarifies these components with two examples:
foo://example.com:8042/over/there?name=ferret#nose \_/ \______________/\_________/ \_________/ \__/ | | | | | scheme authority path query fragment | _____________________|__ / \ / \ urn:example:animal:ferret:nose
Scheme ( scheme )
The scheme (the part before the colon) defines the context and thus denotes the type of the URI, which determines the interpretation of the following part. Well-known schemes are, for example, the protocols
ftpand notation concepts such as
doi. The first mandatory part of the URI ends with the colon. If there is no reference to an (active) authority organizing the name administration, this colon is followed directly by the path for locating the resource.
Authority (in the sense of jurisdiction )
Many URI schemes such as
authority-part. The term authority refers to an instance that can centrally manage the names in this interpretation space (specified by the schema). One example of this is the Domain Name System , which is administered by global and local registrars .
authorityconsists of an optional user information (followed by a
@''), the host and an optional port specification (introduced by a colon). It follows two forward slashes (
//) and is delimited by a single forward slash (
/), a question mark (
?), a pound sign (
#) or the end of the URI. The host part can consist of an IP address , an IPv6 address (in square brackets
[…]'') or a registered name. Valid values are for example:
The possible specification of user name and password in the user information (
user:password@…) is described as obsolete in RFC 3986 (Section 3.2.1) and should no longer be used, as URIs are often transmitted and logged in clear text.
Path ( Path )
The path contains information - often organized hierarchically - which, together with the query part, identify a resource. If the URI
authorityspecified one of the previous sections , it must begin
pathwith a slash (
/); if there is none
authority, it can
//begin with a double slash ( ). This ensures a clear interpretation. It is delimited by a question mark (
?), a hash mark ( )
#or the end of the URI. Valid paths are for example:
Query ( Query )
The query part ( query string ) contains data for identifying those resources whose location cannot be precisely specified by specifying the path alone. They must be retrieved from the source indicated by the path, by means of this query, such as a data record from a database. It starts with a question mark (
?) and is
#delimited by a hash mark ( ) or the end of the URI. A valid query for the
?'' is for example:
='' play roughly the same role as
:'' in the part for the
fragmentis the optional fragment identifier and references a position within a resource. The fragment identifier always only refers to the immediately preceding part of the URI and is introduced by a hash sign (
#). An example of this is the anchor in HTML.
An example with very many elements at the same time in the URI:
Applications often do not use the full URI, but rather an abbreviated syntax, for example to save space or to enable easy relocation to other servers. Some URI schemes also limit the syntax in their definition to a certain form. Different spellings are summarized under the term URI references.
An absolute URI identifies a resource regardless of the context in which the URI is used. It consists of at least
authorityand / or one
path). Examples are:
In contrast to an absolute URI, a relative URI only describes the difference between the absolute URI of a resource and the current context in a hierarchical namespace.
If a URI reference does not start with a
scheme, it is assumed to be a relative reference. The resolution of a relative reference to an absolute URI takes place depending on the context according to standardized rules. A relative reference consists of a
fragment. There are three types of relative references:
- If the path begins without a slash, it is a relative path reference , for example
- If the path begins with a single slash (
/), it is an absolute path reference .
- If the path begins with double slashes (
//), it is a network path reference .
Reference within the same document
URI references can point to the same document of which they are part. The most common use is the pound sign (
#), followed by a fragment identifier.
The specification of URI references of the Internet without a designation of the protocol (the scheme), for example
www.wikipedia.de. Assuming that the protocol (here ) can be deduced from the suffix (in the example
www, DNS names are built from right to left)
http, the resolution of such references works. However, this resolution is dependent on corresponding assumptions and also on the respective software. Therefore suffix references should be avoided.
Among other things, the following schemes are defined:
||Data-URL : directly embedded data|
||Files in the local file system|
||File Transfer Protocol|
||Hypertext Transfer Protocol|
||Lightweight Directory Access Protocol|
||Newsgroup or news article|
||Mailbox access via POP3|
||Synchronization of data with rsync|
||SIP -based session setup, e.g. B. for IP telephony|
||Uniform Resource Names (URNs)|
||Extensible Messaging and Presence Protocol for Jabber Identifiers|
A full list of the official schemes can be found on the Internet Assigned Numbers Authority (IANA) website .
In addition, some unofficial schemes, also referred to as "provisional" by the IANA, have been established for individual applications or common protocols:
||internal browser information|
||Apple Filing Protocol|
||Advanced Packaging Tool|
||Telephone numbers (including Skype and NetMeeting )|
||Hyper Text Coffee Pot Control Protocol|
||Digital Audio Access Protocol|
||Digital Object Identifier|
||ED2k URI scheme from eDonkey2000 / Kademlia|
||Files transferred over Shell protocol|
||Internet relay chat|
||Microsoft Media Server|
||Real Time Messaging Protocol|
||SSH File Transfer Protocol|
||Phone numbers ( Skype only )|
||Server message block|
||Source code display for a website|
||What You Cache Is What You Get, Firefox-internal display for displaying cached content|
A distinction is made between the following subtypes of URIs:
- Uniform Resource Locator (URL)
- Name a resource by its primary access mechanism such as
ftp. This is followed by the name of the location of the resource in the network - usually the domain name. URLs were originally the only type of URI, which is why the term URL is often used synonymously with URI.
- Uniform Resource Name (URN)
- With the URI scheme,
urnidentify a resource by means of an existing or freely assignable name, for example
Originally, every URI should be divided into one of these two classes (or others to be defined). However, this strict division has been abandoned because it is unnecessary and some schemes (such as
dataor the one previously associated with URLs
mailto) do not fit into either class.
- Digital Object Identifier (DOI)
- Persistent Uniform Resource Locator (PURL)
- Internationalized Resource Identifier (IRI)
- RFC 1630 - Universal Resource Identifiers in WWW (Status: INFORMATIONAL)
- RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax (Status: STANDARD)
- Web Naming and Addressing (English)
- Uniform Resource Identifier (URI) Schemes - List of URI schemes at the Internet Assigned Numbers Authority (IANA). (English)
- Tim Berners-Lee : Cool URIs Don't Change . (English)
- Graham Klyne: Uniform Resource Identifier (URI) Schemes. In: https://www.iana.org/ . Internet Assigned Numbers Authority (IANA), March 20, 2016, accessed April 8, 2016 .