Hypertext Markup Language

from Wikipedia, the free encyclopedia
HTML (Hypertext Markup Language)
Screenshot
File extension : .html, .htm
MIME type : text / html
Developed by: World Wide Web Consortium (W3C)
Current version: 5.2 (as of December 14, 2017)
Type: Markup language
Expanded to: XHTML , HTML5
Standard (s) : ISO / IEC 15445

W3C HTML 5 W3C HTML 4.01 W3C HTML 3.2

Website : www.w3.org/html

The Hypertext Markup Language ( HTML , English for hypertext -Auszeichnungssprache ) is a text-based markup language for structuring electronic documents such as text with hyperlinks , images and other content. HTML documents are the basis of the World Wide Web and are displayed by web browsers . In addition to the content displayed by the browser, HTML files can contain additional information in the form of meta information , e.g. B. about the languages used in the text , the author or the summarized content of the text.

HTML is being further developed by the World Wide Web Consortium (W3C) and the Web Hypertext Application Technology Working Group (WHATWG). The current version has been HTML 5.2 since December 14, 2017 , which is already supported by many current web browsers and other layout engines . The Extensible Hypertext Markup Language (XHTML) is also being replaced by HTML5.

HTML is used as a markup language to semantically structure a text , but not to format it. The visual representation is not part of the HTML specifications and is determined by the web browser and master pages such as CSS . Exceptions are the presentation-related elements marked as out of date ( English deprecated ).

Emergence

Before the development of the World Wide Web and its components, including HTML, it was not possible to exchange documents electronically easily, quickly and in a structured manner between several people and to link them efficiently with one another. In addition to transmission protocols, an easy-to-understand markup language was required. This is exactly where the starting point of HTML lay. In order to share research results with other employees of the European Organization for Nuclear Research (CERN) and to make them accessible from the two locations in France and Switzerland, a project was set up at CERN in 1989 that dealt with this task. The first version of the HTML specification appeared on November 3, 1992.

syntax

The text is given a structure through markup of text parts.

The award is made using standardized (SGML) elements. Most of these HTML elements are marked by a pair of tags, i.e. a start tag and an end tag. A start day always begins with the sign <. This is followed by the element name (e.g. pfor a paragraph or h1for a first-order heading ) and optionally a list of its attributes (e.g. class="warning"or id="warning"). The >starting day is closed with one . An end tag consists of the characters </, the element name and the terminating one >. The start and end tags that belong together, together with the content in between, form an element of the general SGML specification . These elements can be nested according to rules that are specified in a document type definition (DTD):

<p>Ein Textabsatz, der ein <em>betontes</em> Wort enthält.</p>

Certain elements do not need to be noted explicitly. According to the SGML rule "OMITTAG", the end tag may be missing for some elements (e.g. </p>or </li>). In addition, no difference (z. B. for element and attribute names are case-sensitive <ul>, <UL>, <uL>). For comparison: In XHTML these rules are written more strictly.

In addition to elements with start and end tags, there are also empty elements in HTML, such as line breaks ( br) or images ( img).

Eine Textzeile,<br>die hier fortgesetzt wird.
<img src="E-Mail-Button.jpg" alt="E-Mail">

It comes to HTML descriptive (English descriptive ), not procedural (English procedural ) and presentation-oriented (English presentational ) markup , even if HTML was used in earlier versions of this. HTML elements are not information about the presentation that tell the web browser how to visually format the text . Rather, elements are a structuring label that can be used to assign a meaning to text areas, e.g. B. for a heading, for a paragraph of text and for stressed text. How this meaning is ultimately conveyed to the user (in the case of a heading, e.g. by enlarged, bold font) is initially up to the web browser and depends on the output environment. Because although HTML documents are usually displayed on computer screens , they can also be output on other media, for example on paper or using voice output . CSS format templates are suitable for influencing the presentation of an HTML document in various media. <h1></h1><p></p><em></em>

Therefore, elements and attributes for presentation such as , and are considered out of date (English deprecated ) and should be avoided according to the general opinion; they should no longer be used in newly developed software and should be replaced when the document-generating software is revised. <font></font><u></u>noshade

Reading in the source text and processing the existing information is also referred to as parsing in technical terminology , and processing for the output medium as rendering . The HTML language describes how the browser (or another program, such as a text editor ) has to "understand" the markups in the text, not how it then converts them into the display. So says though that a heading followed, but in which no font size or font style , this is to be presented - here only some usual defaults have naturalized, but part of the HTML specification are not. <h1>

Character set

The standard character set, originally based on 7-bit ASCII , was expanded to include numerous special characters in the early days of the WWW and coded as an HTML entity. The support of universal character sets for all common languages ​​worldwide required the support of UTF (Unicode), which is now implemented in all common browsers. HTML is therefore designed for platform-independent portability, provided this is supported by the HTML renderer used. The choice of the underlying character set for a web document is made in the meta elements in the file header, the browser then adapts to it.

Creators of websites whose keyboard may not provide all characters directly, such as German umlauts , can encode special characters in several ways ; an A umlaut (“ä”) can either be coded as an HTML entity ( &auml;), as Unicode decimal ( &#228;) or as Unicode hexadecimal ( &#x00E4;), cf. Unicode # code point specifications in documents . Many complex website editors automatically resolve special characters when the source text is encoded.

When resolving into address lines ( URLs ), the procedure is again different; here the characters that are not directly supported are encoded in ASCII characters using the MIME method, e.g. B. %20for a space, for example if it occurs in a file name and has to be different from the regular space at the end of the link .

Language type

HTML is a markup language and as such is usually differentiated from programming languages (see section External systematics: Classification as a programming language or data format in the article on markup languages). One thing in common with most programming languages ​​is that no special software (see also list of HTML editors ) is required for editing the source documents , but any text editor is sufficient.

A similar concept (logical description) as behind HTML is behind the typesetting system TeX / LaTeX , which, in contrast to HTML, aims at the output by printer on paper.

Versions

HTML was first proposed on March 13, 1989 by Tim Berners-Lee at CERN in Geneva.

  • HTML (without version number, November 3, 1992): Original version that was based only on text.
  • HTML (without version number, April 30, 1993): In addition to text, image integration was added in addition to attributes such as bold or italic display.
  • HTML + (November 1993) Planned enhancements that were incorporated into later versions but were never passed as HTML +.
  • HTML 2.0 (November 1995): The version defined with RFC 1866 led under certain circumstances . a. Form technology. The status of this standard is "HISTORIC". The predecessors are also out of date.
  • HTML 3.0 : Not published because it was out of date with the introduction of the Netscape browser in version 3 before the planned release.
  • HTML 3.2 (January 14, 1997): New features such as tables, text flow around images, integration of applets .
  • HTML 4.0 (December 18, 1997): Introduction of stylesheets , scripts and frames. There was also a separation into strict , frameset and transitional . A slightly corrected version was published on April 24, 1998.
  • HTML 4.01 (December 24, 1999): Replaced HTML 4.0 with many minor fixes. Was standard for a long time until 2014.
  • XHTML 1.0 (January 26, 2000): Reformulated HTML 4.01 using XML . A revised version appeared on August 1, 2002.
  • XHTML 1.1 (May 31, 2001): After XHTML was divided into modules, a strict version was defined with XHTML 1.1, in which the frameset and transitional variants introduced with HTML 4 were omitted.
  • XHTML 2.0 (closed, July 26, 2006): This version should no longer be based on HTML 4.01 and introduce some new elements, such as: B. <nl>for navigation lists. The separation of distinction and style should be completed in this version. - The W3C stopped work on XHTML 2.0 in summer 2009 because XHTML was to be replaced by HTML5.
  • HTML5 (Recommendation, October 28, 2014): Created a new vocabulary based on HTML 4.01 and XHTML 1.0. The DOM specification belonging to HTML has also been revised and expanded.
  • HTML 5.1 (Recommendation, November 1, 2016)
  • HTML 5.2 (Recommendation, December 14, 2017): Current version.

HTML structure

General structure

An HTML document consists of three areas:

  1. the document type declaration (doctype) at the very beginning of the file containing the used document type definition specifies (DTD), for example. B. HTML 5,
  2. the HTML header ( HEAD), which mainly contains technical or documentary information that is usually not displayed in the browser's display area
  3. the HTML body ( BODY), which contains the information that can usually be seen in the browser display area.

The basic structure of a website looks like this:

<!DOCTYPE html>
<html>
  <head>
    <title>Titel der Webseite</title>
    <!-- weitere Kopfinformationen -->
    <!-- Kommentare werden im Browser nicht angezeigt. -->
  </head>
  <body>
    <p>Inhalt der Webseite</p>
  </body>
</html>

HTML head

In the head (English head ) can be used seven different elements:

title
refers to the title of the page, which is displayed in the title bar by most browsers.
meta
can contain a variety of metadata . See meta element .
base
specifies either a base URI or a base frame .
link
is used to specify logical relationships to other resources. Most often used to include stylesheets .
script
embeds code in a specific scripting language , mainly JavaScript .
style
contains style information, mainly CSS declarations.
object
includes an external file. Browsers are not allowed to display such objects in the document header. As of HTML5, the object tag is no longer allowed in the HTML header.

HTML body

The HTML body contains the actual page information. HTML distinguishes between block and inline elements. The main difference is that the former generate a separate block in the output, in which the content is accommodated, while the inline elements do not interrupt the flow of text. Put simply, block elements always have their own paragraph. However, with the help of CSS it is possible to display block elements like an inline element and vice versa. In addition, all elements can also be marked as inline blocks via CSS , with the result that such an element has properties of both a block element and an inline element.

A headline of the first order is marked as follows:

<h1>Überschrift</h1>

h1stands for Heading 1 , so it characterizes a heading of the first (and in HTML highest) outline level. Are still possible h2until h6, headings second to sixth level of classification.

A hyperlink :

<a href="http://example.com/">Dies ist ein Verweis auf example.com</a>

Hyperlinks are references to other resources, mostly also HTML documents, which can usually be followed in the browser with a click. This link could be rendered like this: This is a reference to example.com. This example also shows that the link element is an inline element and does not start a new line.

Normal text is specified with p(for paragraph ) by default , although a text without pit would be possible without any problems, but it is highly recommended, because on the one hand it allows a separation between source text and output, and on the other hand the command is mandatory for CSS programming at the latest necessary is.

This is how a text is output in HTML:

<p>Ich bin ein Beispieltext</p>

For the logic, for example, the elements strongor emare available, with which strongly emphasized or stressed text can be distinguished. By default (according to W3C recommendation) strong- and emelements are rendered in bold or italic font .

The structural description of the text makes it easier to adapt the rendering to the viewer, for example to read the text to a visually impaired person or to output it as Braille .

HTML variants

When drafting the last HTML version 4, the fact that elements and attributes are still used for presentation in many HTML documents should be taken into account. The result were three variants:

Strict

This DTD comprises the core inventory of elements and attributes. Most of the elements and attributes that influence the presentation are missing, including the elements font, centerand uand attributes such as bgcolor, alignand target. Style sheets should take over their role in Strict documents. Text and non-block-forming elements inside of the elements body, form, blockquoteand noscriptmust normally be located within a container element, for example in an pelement.

Transitional

The transitional variant contains older elements and attributes that also enable physical text markup. This DTD is intended to give web authors who are not yet able to separate logical structuring and presentation from one another the opportunity to write standard-compliant HTML. At the same time, it should ensure that existing websites can continue to be displayed by the current web browser.

Frameset

In addition to all the elements of the transitional variant, this variant also contains the elements for generating framesets .

Additional techniques and further developments

Cascading style sheets

Over the years, HTML has been expanded to include elements that serve the visual design of the documents. This ran counter to the original idea of ​​system independence. A return to the separation of structure and layout (better: presentation) was made through the definition of Cascading Style Sheets (CSS). The appearance or presentation of the document should be specified in a separate file, the so-called stylesheet. This improves the adaptability of the layout to the respective output device and to special needs of the user, for example a special display for the visually impaired. Nowadays, the browser's CSS support is sufficient to implement a sophisticated design.

In the early years of HTML until the 2000s, no strict distinction was made between layout and page physics. Design was implemented with the help of layout attributes such as color="Farbe"or layout tags how <font>, or the appearance of tables was tableroughly specified directly in the area. This is now considered out of date and unprofessional. In addition, the CSS code can also be integrated into a page without an exported file.

A CSS file can be integrated in the HTML header using the link element:

<link rel="stylesheet" href="stylesheet.css">

Dynamic HTML

Very early in the history of HTML, additional technologies were invented that enable HTML documents to be dynamically changed while they are displayed in the browser. The most common is JavaScript . Such interactive documents are called dynamic HTML . These techniques were developed independently by various browser manufacturers, above all Microsoft and Netscape . As a result, there have been significant problems implementing the techniques between the various browsers. All popular JavaScript-enabled browsers now interpret the Document Object Model (DOM). This makes it possible to write executable scripts in all browsers. However, there are still differences in support for the DOM standard.

XHTML

XHTML 1.0 was developed on the basis of HTML 4.01 (SGML) . XHTML meets the syntactic rules of XML, which are stricter than SGML , but its three DTD variants are semantically identical to the corresponding DTD variant of HTML 4.01.

See main article XHTML .

HTML5

The respective advantages of SGML and XML of the previous HTML versions have been combined in HTML5. In contrast to the previous HTML versions, there is no longer a DTD in HTML5.

See main article HTML5 .

Ajax

With the Ajax technology, JavaScript can be used to specifically change and reload individual already loaded web browser content without having to completely reload the website. Because of the lower data volume, on the one hand, a faster web server response is enabled, and on the other hand, the reaction modes of desktop applications can be simulated.

See also

literature

  • Stefan Münz , Wolfgang Nefzger: HTML manual . Franzis-Verlag, Poing 2005, ISBN 3-7723-6654-6 .
  • Stefan Mintert (Ed.): XHTML, CSS & Co. The W3C specifications for web publishing. Addison-Wesley, Munich 2003, ISBN 3-8273-1872-6 .
  • Mark Lubkowitz: Programming and designing websites - HTML, CSS, JavaScript, PHP, Perl, MySQL, SVG and newsfeeds, with CD . Galileo Press, Bonn 2004, ISBN 3-89842-557-6 .
  • Elisabeth Robson, Eric Freeman: HTML and CSS from head to toe . O'Reilly, Cologne 2012, ISBN 978-3-86899-934-1
  • Stephan Heller: Workshop HTML5 & CSS3 . Implementing web layouts professionally - an introduction to front-end development . 1st edition, 2012; dpunkt.verlag, Heidelberg; ISBN 978-3-89864-807-3 .

Web links

Commons : HTML  - collection of images, videos and audio files

Tutorials

Wikibooks: Website Development  - Learning and Teaching Materials
Wikibooks: Handbook Web Design  - Learning and Teaching Materials

Validation

Older standards

Individual evidence

  1. W3C HTML 5 w3.org
  2. W3C HTML 4.01 w3.org
  3. W3C HTML 3.2 w3.org
  4. a b HTML 5.2. Recommendation. W3C, December 14, 2017, accessed January 1, 2018 .
  5. a b Simon Pieters: HTML 5 differences from HTML 4. w3.org, September 18, 2014, accessed on October 1, 2014 (English): “HTML 5 replaces these documents. [DOM2HTML] [HTML4] [XHTML1] "
  6. HTML / rules / text markup in the selfhtml wiki
  7. Entities - table for frequently used umlauts, special characters and symbols at e-workers.de; accessed on November 11, 2018
  8. ^ Tim Berners-Lee : Information Management: A Proposal. March 1989, accessed November 25, 2014 .
  9. HTML , original version
  10. ^ David Raggett: A Review of the HTML + Document Format.
  11. ^ XHTML 2.0, W3C Working Draft
  12. XHTML 2 is discontinued . Heise announcement, July 3, 2009
  13. on the discontinuation of the development of XHTML 2 message of the W3C
  14. HTML5, A vocabulary and associated APIs for HTML and XHTML. Recommendation. W3C, October 28, 2014, accessed November 27, 2014 .
  15. HTML object tag. Retrieved July 26, 2017 (American English).