Hypertext Transfer Protocol
Hypertext Transfer Protocol | |
---|---|
Family: | Internet protocol family |
Field of application: | Data transfer (hypertext, etc.) on the application layer |
Based on | TCP (transport) |
Introduction: | 1991 |
current version: | 2 (2015) |
Preliminary version: | 3 |
Default: |
RFC 1945 HTTP / 1.0 (1996) RFC 2616 HTTP / 1.1 (1999) |
The Hypertext Transfer Protocol ( HTTP , English for hypertext transmission protocol ) is a stateless protocol for the transmission of data on the application layer via a computer network . It is mainly used to load web pages (hypertext documents) from the World Wide Web (WWW) into a web browser . However, it is not restricted to this in principle and is also very common as a general file transfer protocol.
HTTP was standardized by the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C). The current version is HTTP / 2, which was published as RFC 7540 on May 15, 2015. The further development is organized by the HTTP working group of the IETF (HTTPbis). There are standards that complement and build on HTTP, such as HTTPS for encrypting transmitted content or the WebDAV transmission protocol .
properties
According to established layer models for classifying network protocols according to their more fundamental or more abstract tasks, HTTP is assigned to the so-called application layer. This is addressed by the application programs , in the case of HTTP this is usually a web browser. In the ISO / OSI layer model , the application layer corresponds to layer 7.
HTTP is a stateless protocol. Information from previous requests is lost. Reliable transfer of session data can only be implemented on the application layer by means of a session using a session identifier. Using cookies in the header information, however, applications can be implemented that can assign status information (user entries, shopping carts). This enables applications that require status or session properties. User authentication is also possible. Normally the information that is transmitted via HTTP can be read on all computers and routers that pass through the network . However , the transmission can be encrypted via HTTPS .
By expanding its request methods, header information and status codes, HTTP is not limited to hypertext , but is increasingly used to exchange any data. It is also the basis of the WebDAV protocol, which is specialized in file transfer . For communication HTTP is a reliable transport protocol dependent, for which in almost all cases TCP is used.
There are currently two main versions of the protocol, HTTP / 1.0 and HTTP / 1.1. Newer versions of important web browsers such as Chromium , Chrome , Opera , Firefox , Edge and Internet Explorer are also already compatible with the newly introduced version 2 of HTTP (HTTP / 2)
construction
The communication units in HTTP between client and server are called news called of which there are two different ways: the request (English Request ) from the client to the server and the response (English Response ) in response from server to client.
Each message consists of two parts, the header (English message header , in short header called or HTTP header) and the message body (English message body , in short Body ). The message header contains information about the message body such as the encoding used or the content type so that it can be correctly interpreted by the recipient (→ main article: List of HTTP header fields ). The message body finally contains the user data.
functionality
If the link to the URL http://www.example.net/infotext.html is activated in a web browser , the request is sent to the computer with the host name www.example.net to send the resource /infotext.html back.
The name www.example.net is first converted into an IP address using the DNS protocol . For the transfer, an HTTP GET request is sent via TCP to the standard port 80 of the HTTP server.
Inquiry:
GET /infotext.html HTTP/1.1
Host: www.example.net
If the link contains characters that are not allowed in the request, these are % -coded . Additional information such as information about the browser, the desired language, etc. can be transmitted via the header (headers) in every HTTP communication. With the "Host" field, different DNS names can be differentiated under the same IP address. It is optional under HTTP / 1.0, but required under HTTP / 1.1. As soon as the header ends with a blank line (or two consecutive line endings), the computer that operates a web server (on port 80) sends back an HTTP response. This consists of the header information from the server, a blank line and the actual content of the message, i.e. the file content of the infotext.html file. Files in page description languages such as ( X ) HTML and all their additions, for example images, style sheets ( CSS ), scripts ( JavaScript ), etc., which are usually linked together by a browser in a readable representation, are usually transferred. In principle, any file can be transferred in any format, whereby the "file" can also be generated dynamically and does not need to be available on the server as a physical file (for example when using CGI , SSI , JSP , PHP or ASP.NET ). Each line in the header is terminated by a line break < CR > < LF >. The blank line after the header may only consist of <CR> <LF>, without enclosed spaces .
Answer:
The server sends back an error message and an error code if the information cannot be sent for any reason, but status codes are used even if the request was successful, in that case (mostly) 200 OK
. The exact sequence of this process (request and response) is specified in the HTTP specification.
history
From 1989 onwards, Tim Berners-Lee and his team at CERN , the European nuclear research center in Switzerland, developed the Hypertext Transfer Protocol, together with the URL and HTML concepts , which laid the foundations for the World Wide Web. The first results of these efforts were the version HTTP 0.9 in 1991.
HTTP / 1.0
The request published in May 1996 as RFC 1945 ( Request for Comments No. 1945) has become known as HTTP / 1.0. With HTTP / 1.0, a new TCP connection is established before each request and, by default, is closed again by the server after the response has been transmitted. If, for example, ten images are embedded in an HTML document, a total of eleven TCP connections are required to set up the page on a graphics-capable browser.
HTTP / 1.1
In 1999 a second requirement was published as RFC 2616 , which reflects the HTTP / 1.1 standard. With HTTP / 1.1, a client can use an additional header entry ( keepalive ) to express the wish not to terminate the connection in order to be able to use the connection again (persistent connection). However, support on the server side is optional and can cause problems in conjunction with proxies. Version 1.1 allows multiple requests and responses to be sent per TCP connection using HTTP pipelining . Only one TCP connection is required for the HTML document with ten images. Since the speed of TCP connections is quite low at the beginning due to the use of the slow start algorithm, the loading time for the entire page is significantly reduced. In addition, interrupted transmissions can be continued with HTTP / 1.1.
One possibility to use HTTP / 1.1 in chats is to use the MIME type multipart / replace , in which the browser can read the content of the. After sending a boundary code and a new content length header field and a new content type header field Rebuilds the browser window.
With HTTP / 1.1 it is not only possible to fetch data but also to transfer data to the server. With the help of the PUT method, web designers can publish their pages directly via the web server via WebDAV and with the DELETE method it is possible for them to delete data from the server. In addition, HTTP / 1.1 offers a TRACE method with which the path to the web server can be traced and checked whether the data is being transferred there correctly. This makes it possible to determine the route to the web server across the various proxies , a traceroute at the application level.
Due to inconsistencies and ambiguities, a working group was started in 2007 to improve the specification. The aim here was simply a clearer formulation, new functions were not incorporated. This process ended in 2014 and resulted in six new RFCs:
- RFC 7230 - HTTP / 1.1: Message Syntax and Routing
- RFC 7231 - HTTP / 1.1: Semantics and Content
- RFC 7232 - HTTP / 1.1: Conditional Requests
- RFC 7233 - HTTP / 1.1: Range Requests
- RFC 7234 - HTTP / 1.1: Caching
- RFC 7235 - HTTP / 1.1: Authentication
HTTP / 2
In May 2015, the IETF adopted HTTP / 2 as the successor to HTTP / 1.1. The standard is specified by RFC 7540 and RFC 7541 . The development was largely driven by Google ( SPDY ) and Microsoft (HTTP Speed + Mobility) , each with their own suggestions. A first draft, which was largely based on SPDY, was published in November 2012 and has since been adapted in several steps.
With HTTP / 2 the transmission should be accelerated and optimized. The new standard should, however, be completely backward compatible with HTTP / 1.1.
Important new opportunities are
- the possibility of combining several inquiries,
- further data compression options ,
- the binary-coded transmission of content and
- Server-initiated data transfers (push procedure),
- individual streams can be prioritized.
An acceleration results mainly from the new possibility of combining ( multiplexing ) several requests in order to be able to process them via one connection. The data compression can now also include header data (using a new special algorithm called HPACK). Content can be transmitted in binary code. In order not to have to wait for follow-up requests from the client that can be foreseen on the server side, data transfers can be partially initiated by the server (push procedure). By using HTTP / 2, website operators can reduce the latency between client and server and relieve the network hardware.
The originally planned option that HTTP / 2 uses TLS by default was not implemented. However, the browser manufacturers Google and Mozilla announced that their web browsers will only support encrypted HTTP / 2 connections. For this, ALPN is an encryption extension that requires TLSv1.2 or newer.
HTTP / 2 is supported by most browsers, including Google Chrome (including Chrome on iOS and Android) from version 41, Mozilla Firefox from version 36, Internet Explorer 11 under Windows 10, Opera from version 28 (and Opera Mobile from version 24 ) and Safari from version 8.
HTTP / 3
In November 2018 the IETF decided that Google's " HTTP-over-QUIC " should become HTTP / 3.
HTTP request methods
- GET
- is the most common method. It is used to request a resource (for example a file) from the server by specifying a URI . Contents can also be transferred to the server as arguments in the URI, however, according to the standard, a GET request should only retrieve data and otherwise have no effect (such as data changes on the server or logging out). (see below )
- POST OFFICE
- sends unlimited amounts of data to the server for further processing, depending on the physical configuration of the server used; these are transmitted as the content of the message and can, for example, consist of name-value pairs that come from an HTML form. New resources can be created on the server or existing ones can be modified. POST data is generally not cached . In addition, with this type of transmission, data can also be appended to the URI as in the GET method. (see below )
- HEAD
- instructs the server to send the same HTTP headers as with GET, but not the message body with the actual document content. For example, the validity of a file in the browser cache can be quickly checked.
- PUT
- is used to upload a resource (for example a file) to a web server by specifying the target URI. If a resource already exists under the specified target URI, it will be replaced, otherwise it will be created.
- PATCH
- Changes an existing document without completely replacing it as with PUT. Was specified by RFC 5789 .
- DELETE
- deletes the specified resource on the server.
- TRACE
- returns the request as the server received it. In this way it can be checked whether and how the request has been changed on the way to the server - useful for debugging connections.
- OPTIONS
- provides a list of the methods and features supported by the server.
- CONNECT
- is implemented by proxy servers that are able to provide SSL tunnels.
RESTful web services use the different request methods to implement web services . In particular, the HTTP request methods GET, POST, PUT / PATCH and DELETE are used for this.
WebDAV adds the PROPFIND , PROPPATCH , MKCOL , COPY , MOVE , LOCK and UNLOCK methods to HTTP.
Argument transfer
Often times, a user wants to send information to a website. In principle, HTTP provides two options for this:
- HTTP GET
- The data are part of the URL and are therefore retained when the link is saved or passed on. They have to be URL-encoded , that means reserved characters have to be represented with “% < hex value >” and spaces with “+”.
- HTTP POST
- Transmission of the data with a specially designed request type in the HTTP message body so that it is not visible in the URL.
HTTP GET
Here the parameter-value pairs are introduced by the character ?
in the URL . This procedure is often chosen to transfer a list of parameters that the remote station should take into account when processing a request. This list often consists of value pairs, which &
are separated from each other by the symbol. The respective value pairs are structured in the form of parameter name = parameter value . The character is used less ;
often to separate entries in the list.
An example: On the homepage of Wikipedia, "cats" is entered in the search field and the "Article" button is clicked. The browser sends the following or a similar request to the server:
GET /wiki/Spezial:Search?search=Katzen&go=Artikel HTTP/1.1
Host: de.wikipedia.org
…
Two pairs of values are transferred to the Wikipedia server:
argument | value |
---|---|
search | Cats |
go | items |
These value pairs are in the form
Parameter1=Wert1&Parameter2=Wert2
with a prefix ?
attached to the requested page.
This tells the server that the user wants to view the article cats . The server processes the request, but does not send a file, but forwards the browser to the correct page with a location header , for example with:
HTTP/1.0 302 Found
Date: Fri, 13 Jan 2006 15:12:44 GMT
Location: http://de.wikipedia.org/wiki/Katzen
…
The browser follows this instruction and based on the new information sends a new request, for example:
GET /wiki/Katzen HTTP/1.1
Host: de.wikipedia.org
…
And the server responds and prints the article cats , something like:
HTTP/1.1 200 OK
Date: Fri, 13 Jan 2006 15:12:48 GMT
Last-Modified: Tue, 10 Jan 2006 11:18:20 GMT
Content-Language: de
Content-Type: text/html; charset=utf-8
Die Katzen (Felidae) sind eine Familie aus der Ordnung der Raubtiere (Carnivora)
innerhalb der Überfamilie der Katzenartigen (Feloidea).
…
The data part is usually longer, only the log is to be considered here.
The disadvantage of this method is that the specified parameters with the URL are usually saved in the browser's history and so personal data can be saved unintentionally in the browser. Instead, you should use the POST method in this case.
HTTP POST
Since the data is not in the URL, large amounts of data, for example images, can be transferred via POST.
The following example requests the article cats again , but this time the browser uses method="POST"
a POST request due to a modified HTML code ( ). The variables are not in the URL, but separately in the body part, for example:
POST /wiki/Spezial:Search HTTP/1.1
Host: de.wikipedia.org
Content-Type: application/x-www-form-urlencoded
Content-Length: 24
search=Katzen&go=Artikel
The server also understands this and responds with the following text, for example:
HTTP/1.1 302 Found
Date: Fri, 13 Jan 2006 15:32:43 GMT
Location: http://de.wikipedia.org/wiki/Katzen
…
HTTP status codes
Every HTTP request is answered by the server with an HTTP status code. For example, it provides information on whether the request has been processed successfully or, in the event of an error, informs the client, e.g. the browser, where (for example, redirection) or how (for example with authentication) he or she has the desired information (if possible) can get.
- 1xx - information
- The request is still being processed despite the feedback. Such an intermediate response is sometimes necessary because after a certain period of time ( timeout ) , many clients automatically assume that an error has occurred during the transmission or processing of the request and terminate with an error message.
- 2xx - Successful operation
- The request has been processed and the answer will be sent back to the inquirer.
- 3xx - redirect
- To ensure that the request is processed successfully, further steps are required on the part of the client. This is the case, for example, when a website has been redesigned by the operator so that a desired file is now in a different location. With the response from the server, the client can find out where the file is now in the Location header.
- 4xx - Client error
- An error occurred while processing the request for which the client is responsible. A 404 occurs, for example, if a document was requested that does not exist on the server. A 403 advises the client that it is not allowed to access the respective document. For example, it can be a confidential document or a document that is only accessible via HTTPS .
- 5xx - Server error
- An error has occurred which is due to the server. For example, 501 means that the server does not have the necessary functions (e.g. programs or other files) to handle the request.
In addition to the status code, the header of the server response contains a description of the error in plain English . For example, a 404 error can be recognized by the following header:
HTTP/1.1 404 Not Found
…
HTTP authentication
If the web server determines that a user name or password is required for a requested file, it reports this to the browser with the status code 401 Unauthorized and the WWW-Authenticate header . This checks whether the information is available or presents the user with a dialog in which the name and password must be entered and transmits them to the server. If the data is correct, the corresponding page is sent to the browser. According to RFC 2617, a distinction is made between:
- Basic authentication
- Basic authentication is the most common type of HTTP authentication. The web server requests authentication, the browser then searches for the username / password for this file and asks the user if necessary. It then sends the authentication with the authorization header in the form of username: password Base64- encoded to the server. Base64 does not offer any cryptographic protection, so this method can only be viewed as secure when using HTTPS .
- Digest Access Authentication
- With Digest Access Authentication, the server also sends a specially generated random string ( nonce ) with the WWW-Authenticate header . The browser calculates the hash code of the entire data (user name, password, received character string, HTTP method and requested URI ) and sends it back to the server in the authorization header together with the user name and the random character string, which then sends this back to the server with the self-calculated checksum compares. Listening to the communication is of no use to an attacker here, as the encryption with the hash code means that the data cannot be reconstructed and is different for each request.
HTTP compression
In order to reduce the amount of data transferred, an HTTP server can compress its responses . When making a request, a client must indicate which compression method it can process. The Accept-Encoding header is used for this ( e.g. Accept-Encoding: gzip , deflate ). The server can then compress the response using a method supported by the client and specifies the compression method used in the Content-Encoding header .
HTTP compression saves considerable amounts of data, especially with textual data (HTML, XHTML, CSS, Javascript code, XML, JSON), as these can be compressed easily. In the case of data that has already been compressed (e.g. common formats for images, audio and video), (re) compression is useless and is therefore usually not used.
In conjunction with a with TLS encrypted communication, however, the compression leads to BREACH - Exploit , whereby it is possible to encrypt broken.
Applications over HTTP
HTTP as a text-based protocol is not only used for the transmission of websites, it can also be used in stand-alone applications, the web services. The HTTP commands such as GET and POST are still used for CRUD operations. This has the advantage that no separate protocol has to be developed for data transmission . This is used as an example at REST .
See also
- Request cycle
- SOAP
- Extensible Markup Language (XML)
- Character encoding
- Content negotiation
- HTTP ETag
- HTTP caching
Web links
- Working group of the IETF for the further development of HTTP
- HttpTea , an HTTP analyzer (freeware, English)
- Link catalog on the subject of HTTP at curlie.org (formerly DMOZ )
- Microsoft's detailed report on the http 2.0 (English)
Individual evidence
- ↑ Tim Berners-Lee: The Original HTTP as defined in 1991 . In: w3.org , accessed November 13, 2010.
- ↑ RFC 2616 Hypertext Transfer Protocol - HTTP / 1.1
- ↑ RFCs for HTTP / 2 defined and written . iX , news item from May 15, 2015 2:23 p.m.
- ↑ Christian Kirsch: Microsoft has its own proposal for HTTP 2.heise.de, March 29, 2012, accessed on April 4, 2012 .
- ↑ Christian Kirsch: Google's SPDY is supposed to accelerate the web . heise.de, November 13, 2009; Retrieved April 4, 2012
- ↑ Draft specification of HPACK (the header compression algorithm for HTTP / 2). IETF HTTP Working Group
- ↑ HTTP / 2 - surf faster with the new protocol version. pcwelt.de, January 30, 2016, accessed on February 21, 2018 .
- ↑ M. Belshe, R. Peon, M. Thomson: Hypertext Transfer Protocol Version 2, Use of TLS Features. Retrieved February 10, 2015 .
- ↑ Firefox 36 implements HTTP / 2
- ↑ IETF: HTTP over Quic becomes HTTP / 3. November 12, 2018, accessed April 27, 2019 .
- ↑ Appendix B: Performance, Implementation, and Design Notes . In: World Wide Web Consortium [W3C] (Ed.): HTML 4.01 Specification . December 24, 1999, B.2.2: Ampersands in URI attribute values ( w3.org ).