Extensible Hypertext Markup Language

from Wikipedia, the free encyclopedia
XHTML
Exemplary representation of an XHTML document


Exemplary representation of an XHTML document

File extension : .xhtml, .xht
MIME type : application / xhtml + xml
Developed by: World Wide Web Consortium
Type: Markup language
Extended by: XML , HTML
Standard (s) : 1.0 (Recommendation) ,

1.1 (Recommendation) ,
1.1 SE (Working Draft) ,
5 (Working Draft) ,
2.0 (Working Draft)


Exemplary representation of an XHTML document

The W3C standard Extensible Hypertext Markup Language (expandable HTML ; abbreviation XHTML ) is a text-based markup language for structuring and semantic marking of content such as texts, images and hyperlinks in documents. It is a reformulation of HTML 4.01 in XML : In contrast to HTML, which was defined using SGML , XHTML uses the stricter and easier-to- parse SGML subset XML as the language basis. XHTML documents therefore satisfy the syntax rules of XML.

XHTML 1.0: Transition from HTML to XHTML

XHTML 1.0 contains all elements of HTML 4.01, so that a conversion of HTML 4.01 compliant pages to XHTML 1.0 is easy. A non-XHTML-capable web browser can nevertheless display XHTML documents correctly under certain conditions (see MIME types and HTML compatibility ): it processes them as normal HTML. This takes advantage of the fact that the HTML parsers of the popular browsers are tolerant of syntax errors . This fault tolerance was created in response to the fact that numerous HTML documents on the World Wide Web did not meet the formal standard and that users found browser messages about HTML syntax errors annoying. For XHTML, however, the basic XML idea of ​​uncomplicated data exchange and problem-free automated processing applies . As a result, programs that process XHTML are no longer that tolerant.

Newer XHTML document types no longer contain layout markups. XHTML Transitional 1.0 is the last document type that still contains layout elements like <font>or <b>. More modern document types such as XHTML Strict 1.0 still contain a few layout elements, but only for reasons of backward compatibility with the Transitional document types. Finally, in XHTML Basic or XHTML 2, layout elements are no longer included. For the visual design of XHTML elements, reference should only be made to external CSS rules.

XHTML modularization

To enable the development of languages ​​based on XHTML, related and related elements in XHTML 1.1 have been combined in so-called modules. Based on these modules in DTDs and in the future in XML schema , you can assemble your own XHTML document types according to the modular principle and mix them with other XML-based languages. Example applications of XHTML modularization are XHTML 1.1, XHTML Basic and the mixtures with SMIL ( multimedia ), SVG ( vector graphics ) and MathML (mathematical formula set). The object module is used to integrate general objects such as multimedia plug-ins .

Version overview

XHTML is a term used to summarize the various XHTML versions:

  • XHTML 1.0 represents the XML-based reformulation of HTML 4.01 . XHTML 1.0 contains the well-known three document types Strict , Transitional and Frameset . XHTML 1.0 was created in such a way that backward compatibility with the popular HTML browsers is possible. At the same time, it can be processed by newer browsers according to the strict rules.
  • The current version XHTML 1.1 separates from the deprecated elements and attributes of the transitional and frameset variants, which directly influence the presentation of the document. The scope of the language largely corresponds to XHTML 1.0 Strict, plus elements for Ruby explanations. XHTML 1.1 is not designed to be compatible with HTML browsers.
  • XHTML Basic is designed for minimalist devices such as cell phones and handhelds by using only some language components (modules) of XHTML. XHTML Basic is the basis for XHTML Mobile Profiles (see WAP 2.0 ) and for WML 2.0.
  • The modularization resulted in further mixed versions , such as XHTML 1.1 plus  MathML  plus  SVG .
  • Version XHTML 2.0 , the development of which was stopped in favor of HTML5 at the end of 2009, would have broken with the legacy of HTML 4 and provided for fundamental changes.
    Important innovations would have been the simplified, unrestricted notation of hyperlinks , the simplified integration of other types of media (e.g. graphics and videos), the expanded options for ensuring accessibility , and the more sophisticated specification of metadata . Previous core functions of HTML or XHTML would have been outsourced to other XML languages ​​in XHTML 2.0, namely XForms for forms, XML events for the integration of scripts and XFrames for frames .

The main differences between HTML and XHTML

HTML XHTML
Element and attribute names are case-sensitive not relevant (z. B. <br>, <Br>, <BR>) always small (only <br />)
Elements without content, e.g. B.br <br> <br /> (depending on the DTD) either an empty element tag
(e.g. <br />) or with an end tag
(e.g. <br></br>)

This variant <br />is recommended for reasons of compatibility

Start or end day Omission is partly allowed always specify both
Include attribute value in quotation marks optional as long as the attribute value does not contain certain characters. always
boolean attributes, e.g. B.  checked <input type="radio" checked> Specify the attribute name as the attribute value, e.g. B. <input type="radio" checked="checked" />

Also:

  • The start tag of the root element htmlmust always contain the namespace specification for XHTML:<html xmlns="http://www.w3.org/1999/xhtml">
  • In XHTML 1.1, the langattribute was replaced by the xml:langattribute of XML. XHTML 1.0 recommends specifying both attributes; B. <html xmlns="http://www.w3.org/1999/xhtml" xml:>.
  • The role of the nameattribute in the elements a, frameand maptakes 1.0 from the XHTML idattribute. If backward compatibility is desired, both the name- and the - idattribute should be noted with the same attribute value and XHTML 1.0 Transitional should be declared. In XHTML 1.1 and XHTML modularization, there is no namelonger an attribute for these elements.
  • The attribute namefor the elements formand imgis only available in XHTML 1.0 Transitional, not in XHTML 1.0 Strict and XHTML 1.1. This restriction is particularly relevant for DOM access to the elements.

example

This is the source text of a standards-compliant HTML document. The example is deliberately kept as short as possible and is intended to show differences in the permitted syntax. In HTML, it is a good idea to write down all the necessary elements in full.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<head>
  <title>Beispiel</title>

<h1>Beispielseite</h1>
<p>Ein Absatz
<p>Noch ein<br>
Absatz
<ol>
  <li>Listelement
  <li>Listelement
</ol>
<p><img src=bild.gif alt="Bildmotiv">
</body>

The htmlelement was omitted entirely, the end or start day is missing for the heador bodyelement, pand lielements were not closed. The srcattribute of the image is shown without quotation marks.

The same document as valid XHTML 1.1 could look like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de">
 <head>
   <title>Beispiel</title>
 </head>
 <body>
    <h1>Beispielseite</h1>
    <p>Ein Absatz</p>
    <p>Noch ein<br />
    Absatz</p>
    <ol>
      <li>Listelement</li>
      <li>Listelement</li>
    </ol>
    <p>
      <img src="bild.gif" alt="Bildmotiv" />
    </p>
 </body>
</html>

The XML declaration <?xml version="1.0" encoding="UTF-8" ?> is optional, but is recommended by the W3C because it tells XML parsers the character encoding of the document. The UTF-8 character encoding should be written in capital letters. At first glance, this contradicts the XHTML principle of writing all elements and attributes in lower case. However, this is the official name given by the IANA , and the XML parser can ignore upper and lower case letters when interpreting it. If the coding information is missing and no encoding in HTTP - header is sent, the browser, according to the XML standard coding UTF-8 or UTF-16 use.

Specifying the XML declaration causes Internet Explorer 6 and Opera 7.0 to 7.03 to jump into the so-called quirks mode , which leads to peculiarities when processing the style sheets and JavaScript . For this reason, the XML declaration is often omitted when the document is text/htmldelivered to this browser as (see the following section on MIME types).

MIME types and HTML compatibility

When transmitting HTML and XHTML documents, certain MIME types are used, e.g. B. in the Content-Typeheader for email and especially for HTTP :

  • According to RFC 2854 , HTML documents should be sent with the MIME type text/html.
  • An XHTML 1.0 document should normally be sent with the MIME type according to RFC 3236application/xhtml+xml . If the document adheres to the guidelines of backward compatibility, it can be sent as per RFC 2854 and the XHTML 1.0 standard text/html. Due to the lack of XHTML support of the popular software, the latter option is of particular importance.
  • Since XHTML 1.1 is not compatible with normal HTML browsers, such documents should only be delivered as , according to a W3C notificationapplication/xhtml+xml . The same applies to the other descendants of XHTML modularization, such as XHTML Basic.

How the browsers process the document depends on the MIME type. Only when an XHTML document is application/xhtml+xmldeclared with the content type, for example , do XHTML-capable browsers use their XML parsers, through which the advantages of the strict XHTML code, such as ease of processing, are exploited. If so, the document can only be represented if it is well-formed XML . Many current browsers, including Mozilla , Mozilla Firefox , Google Chrome , Opera and Safari , support the MIME type application/xhtml+xml. The widespread Internet Explorer can only do something with this MIME type from version 7.0: Older versions open a download dialog instead of displaying the document. It should therefore be text/htmlused if the browser has not expressly stated Acceptthat it application/xhtml+xmlsupports it in the header of the request . This can be determined on the server side in order to send the appropriate MIME type - whereby a version check would have to be carried out for Internet Explorer from version 7.0, since IE still */*sends as MIME type.

XHTML and layout

With HTML 4, the W3C began to gradually exclude from HTML those elements and attributes that were directly responsible for the presentation of the document and did not express any output-independent structuring. Like HTML 4, XHTML 1.0 contains a transitional variant with these outdated language components. However, it has become established in modern web design that the strict variant is used and the documents are consistently formatted with CSS . The structured content and the respective layout can thus be defined separately. With XHTML 1.1 and the planned XHTML 2.0, the W3C wanted to finally conclude this development by only allowing output-independent text markup and the layout inevitably to be implemented with CSS or similar languages.

Extensions

Enhancements to HTML

As an SGML language, HTML pages have a precisely defined structure that is specified in the document type definition (DTD). Without knowledge of the DTD, however, the hierarchical tree structure of a document cannot be determined unequivocally. Some elements do not have an end tag (such as <br>a line break) or an optional end tag (such as <p>a paragraph of text). Only the DTD determines which elements these are. If the parser does not know it, the document hierarchy is ambiguous. XHTML as an XML language remedies this shortcoming.

HTML is actually not expandable, but common browsers proceed as follows when processing HTML:

  • Identification by unknown elements is ignored.
  • If there are syntactic errors, an attempt is made to create a logical element tree anyway. An attempt is made to make the best of a non-compliant site, i.e. H. one page is displayed in any case.

This enables the processing of different HTML versions. If a version introduces a new element, it is simply ignored by older browsers. The same applies to attributes. For example, if an HTML 3.2-capable browser acronymdoes not know the element for abbreviations introduced in HTML 4.0 , it will be skipped and the abbreviation will appear in normal text formatting. The same applies to browser-specific extensions. The blinkelement is for example included in any HTML standard. Some browsers, originally only the Netscape Navigator , then display the text flashing. Other browsers display the text normally.

Extensions to XHTML

Unlike HTML, XHTML was created with extensibility in mind. XHTML uses the namespace concept of XML for this. An XHTML version forms such a namespace. Other XML languages ​​such as MathML , SVG and RDF represent further namespaces. Elements from other namespaces can now be used in an XHTML document by using the attribute xmlnsto specify a corresponding namespace. To do this, a special doctype must be used, which defines the elements - when using MathML it is:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"
"http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd">

An example of using the namespace concept to expand XHTML is the embedding of MathML :

<p>Dies ist noch ganz normales XHTML</p>
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mrow>
    <msub>
      <mi>x</mi>
      <mn>1,2</mn>
    </msub>
    <mo>=</mo>
    <mfrac>
      <mrow>
        <mrow>
          <mo>-</mo>
          <mi>b</mi>
        </mrow>
        <mo>&PlusMinus;</mo>
        <msqrt>
          <mrow>
            <msup>
              <mi>b</mi>
              <mn>2</mn>
            </msup>
            <mo>-</mo>
            <mrow>
              <mn>4</mn>
              <mo>&InvisibleTimes;</mo>
              <mi>a</mi>
              <mo>&InvisibleTimes;</mo>
              <mi>c</mi>
            </mrow>
          </mrow>
        </msqrt>
      </mrow>
      <mrow>
        <mn>2</mn>
        <mo>&InvisibleTimes;</mo>
        <mi>a</mi>
      </mrow>
    </mfrac>
  </mrow>
</math>
<p>...und hier geht XHTML weiter</p>

A MathML-capable browser could display this document section as follows:

This is still normal XHTML

... and here XHTML continues

Extensions are therefore possible by creating new namespaces without having to change the XHTML standards themselves. By using namespaces, a conflict of elements with the same name in different extensions is excluded. These can always be clearly assigned and, for example, addressed via the DOM with the identifier of the namespace. The extended XHTML versions resulting from the XHTML modularization are based on this concept.

The emergence of such extensions creates a situation similar to that of HTML extensions, because not all browsers support the integrated extensions as with SVG. The browser has the following options for dealing with elements from unknown namespaces:

  • He can ignore the markup by such elements and simply display the text content (as with HTML).
  • He can ignore all elements of the unknown namespace as well as their text contents.
  • He can try to load a plug-in for the extension from the web and then display the page correctly.

Individual evidence

  1. RFC 3236
  2. Bill Wilder: Is “UTF-8” case-sensitive in XML declaration? In: blog.codingoutloud.com. Retrieved October 5, 2019 .
  3. iana.org
  4. w3.org
  5. XHTML media type test - results. w3.org, March 9, 2006, accessed April 3, 2019 .
  6. Jens Oliver Meiert: XHTML and the right MIME type. meiert.com, April 5, 2006, accessed April 3, 2019 .

Web links

Wikibooks: Website development: XHTML  - learning and teaching materials

Specifications related to XHTML

XHTML itself
Descendants of XHTML modularization
Basics for XHTML

XHTML tutorials and tools