Uniform Resource Identifier

from Wikipedia, the free encyclopedia

A Uniform Resource Identifier (abbreviation URI , English for uniform identifier for resources ) is an identifier and consists of a character string that is used to identify an abstract or physical resource . URIs are used to designate resources (such as websites , other files, calling up web services , but also e-mail recipients , for example ) on the Internet and there, above all, on the WWW . The current status for 2016 is published as RFC 3986 .

Tim Berners-Lee originally introduced the term in 1994 in RFC 1630 as a Universal Resource Identifier . Only later did the resolution uniform appear in official W3C documents . For this reason, Universal is occasionally mentioned as the first part of the name - even in specialist literature.

URIs can be integrated as a character string (encoded with a character set ) in digital documents, especially those in HTML format, or written down by hand on paper. A reference from one website to another is called a hyperlink or “link” for short.

Internationalized Resource Identifiers (IRIs) are an extension of the URIs, which only consist of printable ASCII characters .

Conception

A URI (or in the extension IRI) is the abstract principle, the syntax, an identifier in which a set of rules is given. This basic concept of the URI is then transferred to various specific areas of application, for which the corresponding rules and terms then apply. For example:

  • “URIs cannot contain spaces.” Or
  • "At the beginning there is the name of a scheme in ASCII letters and digits, if necessary subdivided by period and hyphen, beginning with letters, followed by a colon."

There are basically three types of applications:

  • Surname
    • The content of a resource (and thus every copy with the same content) is given a unique identifier.
    • Example: The ISBN of a book. There are an unlimited number of copies of this book.
  • Locator
    • The location of a resource is defined by its name. So it is identified by where it can be found; however, it does not necessarily determine their content.
    • Example: Current weather report on the Internet. It is known where (URL) this can be found; the content is constantly changing.
    • Example: A book is described by the library in which it is located: there in the second room, third shelf, fourth compartment from the top, fifth book from the left. The current top 5 of the bestseller list could be there - regardless of their content.
  • individual
    • The rules of the URI can also be used if something is not a classic resource at all, but still needs to be identified.
    • Initially, “resource” was understood to be something like resources in the IT sense, that is, in the broadest sense, electronic files that could also be made available on the Internet. In 1994 RFC 1630 and RFC 1738 were based on this. However, this concept has been expanded. In 1998, RFC 2396 (Section 1.1) stipulated: “A resource can be anything that has identity.” People, organizations and printed books could also be viewed as resources. This consideration aims at the identification of assignable entities.
    • Examples: e-mail address, cell phone number, passport as well as the legitimate owner, social security number, fingerprint and the person in addition.

In January 2005, with RFC 3986, the concept of the resource in the sense of the URI was expanded to include abstract concepts:

“A resource is not necessarily accessible via the Internet; eg, human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (eg, 'parent' or 'employee'), or numeric values ​​(eg, zero, one, and infinity ). ”

“A resource is not necessarily accessible via the Internet; for example, people, companies, and hardback books in libraries can also be a resource. Similarly, abstract concepts such as operators and operands of a mathematical equation, types of relationships (e.g. 'parent' or 'employee'), or numbers (e.g. zero, one and infinity) can be a resource. "

- RFC 3986 , section 1.1

construction

According to the current RFC 3986 standard , a URI consists of five parts: scheme(scheme or protocol), authority(provider or server), path(path), query(query) and fragment(part), of which only schemeand pathmust be present in every URI. The generic syntax is:

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

The focus is hier-part(hierarchical part) for an optional authorityand path. If the specification of a is authorityrequired in order to ultimately locate the resource, it is introduced by a double slash and the path specification that follows must begin with a slash. The standard clarifies these components with two examples:

  foo://example.com:8042/over/there?name=ferret#nose
  \_/   \______________/\_________/ \_________/ \__/
   |           |            |            |        |
scheme     authority       path        query   fragment
   |   _____________________|__
  / \ /                        \
  urn:example:animal:ferret:nose

Scheme ( scheme )

The scheme (the part before the colon) defines the context and thus denotes the type of the URI, which determines the interpretation of the following part. Well-known schemes are, for example, the protocols httpand ftpand notation concepts such as urnand doi. The first mandatory part of the URI ends with the colon. If there is no reference to an (active) authority organizing the name administration, this colon is followed directly by the path for locating the resource.

Authority (in the sense of jurisdiction )

Many URI schemes such as httpor ftphave a authority-part. The term authority refers to an instance that can centrally manage the names in this interpretation space (specified by the schema). One example of this is the Domain Name System , which is administered by global and local registrars .

This authorityconsists of an optional user information (followed by a @''), the host and an optional port specification (introduced by a colon). It follows two forward slashes ( //) and is delimited by a single forward slash ( /), a question mark ( ?), a pound sign ( #) or the end of the URI. The host part can consist of an IP address , an IPv6 address (in square brackets […]'') or a registered name. Valid values ​​are for example:

  • de.wikipedia.org
  • user@example.com:8080
  • 192.0.2.16:80
  • [2001:db8::7]

The possible specification of user name and password in the user information ( user:password@…) is described as obsolete in RFC 3986 (Section 3.2.1) and should no longer be used, as URIs are often transmitted and logged in clear text.

Path ( Path )

The path contains information - often organized hierarchically - which, together with the query part, identify a resource. If the URI authorityspecified one of the previous sections , it must begin pathwith a slash ( /); if there is none authority, it can pathnot //begin with a double slash ( ). This ensures a clear interpretation. It is delimited by a question mark ( ?), a hash mark ( ) #or the end of the URI. Valid paths are for example:

  • /over/there
  • example:animal:ferret:nose

Query ( Query )

The query part ( query string ) contains data for identifying those resources whose location cannot be precisely specified by specifying the path alone. They must be retrieved from the source indicated by the path, by means of this query, such as a data record from a database. It starts with a question mark ( ?) and is #delimited by a hash mark ( ) or the end of the URI. A valid query for the ?'' is for example:

  • title=Uniform_Resource_Identifier&action=submit

Here &'' and ='' play roughly the same role as .'' and :'' in the part for the authority.

fragment

fragmentis the optional fragment identifier and references a position within a resource. The fragment identifier always only refers to the immediately preceding part of the URI and is introduced by a hash sign ( #). An example of this is the anchor in HTML.

Examples

  • https://de.wikipedia.org/wiki/Uniform_Resource_Identifier
  • ftp://ftp.is.co.za/rfc/rfc1808.txt
  • file:///C:/Users/Benutzer/Desktop/Uniform%20Resource%20Identifier.html
  • file:///etc/fstab
  • geo:48.33,14.122;u=22.5
  • ldap://[2001:db8::7]/c=GB?objectClass?one
  • gopher://gopher.floodgap.com
  • mailto:John.Doe@example.com
  • sip:911@pbx.mycompany.com
  • news:comp.infosystems.www.servers.unix
  • data:text/plain;charset=iso-8859-7,%be%fa%be
  • tel:+1-816-555-1212
  • telnet://192.0.2.16:80/
  • urn:oasis:names:specification:docbook:dtd:xml:4.1.2
  • git://github.com/rails/rails.git
  • crid://broadcaster.com/movies/BestActionMovieEver

An example with very many elements at the same time in the URI:

  • http://nobody:password@example.org:8080/cgi-bin/script.php?action=submit&pageid=86392001#section_2

URI references

Applications often do not use the full URI, but rather an abbreviated syntax, for example to save space or to enable easy relocation to other servers. Some URI schemes also limit the syntax in their definition to a certain form. Different spellings are summarized under the term URI references.

Absolute URIs

An absolute URI identifies a resource regardless of the context in which the URI is used. It consists of at least schemeand hier-part(i.e. one authorityand / or one path). Examples are:

  • https://de.wikipedia.org
  • file://localhost/var/spool/dump.bin

Relative reference

In contrast to an absolute URI, a relative URI only describes the difference between the absolute URI of a resource and the current context in a hierarchical namespace.

If a URI reference does not start with a scheme, it is assumed to be a relative reference. The resolution of a relative reference to an absolute URI takes place depending on the context according to standardized rules. A relative reference consists of a pathand optionally queryand fragment. There are three types of relative references:

  • If the path begins without a slash, it is a relative path reference , for example image.png, ./image.pngand ../images/image.png.
  • If the path begins with a single slash ( /), it is an absolute path reference .
  • If the path begins with double slashes ( //), it is a network path reference .

Reference within the same document

URI references can point to the same document of which they are part. The most common use is the pound sign ( #), followed by a fragment identifier.

Suffix references

The specification of URI references of the Internet without a designation of the protocol (the scheme), for example www.wikipedia.de. Assuming that the protocol (here ) can be deduced from the suffix (in the example www, DNS names are built from right to left) http, the resolution of such references works. However, this resolution is dependent on corresponding assumptions and also on the respective software. Therefore suffix references should be avoided.

Schemes

Among other things, the following schemes are defined:

Scheme description
crid Television broadcasts
data Data-URL : directly embedded data
file Files in the local file system
ftp File Transfer Protocol
geo Geographic coordinates
gopher Gopher
http Hypertext Transfer Protocol
ldap Lightweight Directory Access Protocol
mailto Email address
news Newsgroup or news article
pop Mailbox access via POP3
rsync Synchronization of data with rsync
sip SIP -based session setup, e.g. B. for IP telephony
tel Phone number
telnet Telnet
urn Uniform Resource Names (URNs)
ws WebSocket
wss
xmpp Extensible Messaging and Presence Protocol for Jabber Identifiers

A full list of the official schemes can be found on the Internet Assigned Numbers Authority (IANA) website .

In addition, some unofficial schemes, also referred to as "provisional" by the IANA, have been established for individual applications or common protocols:

Scheme description
about internal browser information
afp Apple Filing Protocol
apt Advanced Packaging Tool
callto Telephone numbers (including Skype and NetMeeting )
coffee Hyper Text Coffee Pot Control Protocol
daap Digital Audio Access Protocol
doi Digital Object Identifier
ed2k ED2k URI scheme from eDonkey2000 / Kademlia
feed Web feeds
finger finger
fish Files transferred over Shell protocol
git Git
irc/ircs Internet relay chat
itunes iTunes
javascript Execution of JavaScript code
lastfm Last.fm
magnet Magnet link
mms Microsoft Media Server
rtmp Real Time Messaging Protocol
sftp SSH File Transfer Protocol
skype Phone numbers ( Skype only )
smb Server message block
ssh Secure Shell
svn/svn+ssh Apache Subversion
view-source Source code display for a website
webcal iCalendar
wyciwyg What You Cache Is What You Get, Firefox-internal display for displaying cached content
ymsgr Yahoo Messenger

Subspecies

A distinction is made between the following subtypes of URIs:

Uniform Resource Locator (URL)
Name a resource by its primary access mechanism such as httpor ftp. This is followed by the name of the location of the resource in the network - usually the domain name. URLs were originally the only type of URI, which is why the term URL is often used synonymously with URI.
Uniform Resource Name (URN)
With the URI scheme, urnidentify a resource by means of an existing or freely assignable name, for example urn:isbnor urn:sha1.

Originally, every URI should be divided into one of these two classes (or others to be defined). However, this strict division has been abandoned because it is unnecessary and some schemes (such as dataor the one previously associated with URLs mailto) do not fit into either class.

See also

Web links

Individual evidence

  1. ietf.org
  2. ietf.org
  3. Graham Klyne: Uniform Resource Identifier (URI) Schemes. In: https://www.iana.org/ . Internet Assigned Numbers Authority (IANA), March 20, 2016, accessed April 8, 2016 .
  4. tools.ietf.org
  5. tools.ietf.org
  6. tools.ietf.org
  7. tools.ietf.org
  8. tools.ietf.org
  9. https://www.iana.org/assignments/uri-schemes/prov/sftp
  10. a b tools.ietf.org
  11. tools.ietf.org
  12. https://www.iana.org/assignments/uri-schemes/prov/ssh
  13. msdn.microsoft.com