Extensible Hypertext Markup Language
|File extension :||
|MIME type :||application / xhtml + xml|
|Developed by:||World Wide Web Consortium|
|Extended by:||XML , HTML|
|Standard (s) :||
1.0 (Recommendation) ,
The W3C standard Extensible Hypertext Markup Language (expandable HTML ; abbreviation XHTML ) is a text-based markup language for structuring and semantic marking of content such as texts, images and hyperlinks in documents. It is a reformulation of HTML 4.01 in XML : In contrast to HTML, which was defined using SGML , XHTML uses the stricter and easier-to- parse SGML subset XML as the language basis. XHTML documents therefore satisfy the syntax rules of XML.
XHTML 1.0: Transition from HTML to XHTML
XHTML 1.0 contains all elements of HTML 4.01, so that a conversion of HTML 4.01 compliant pages to XHTML 1.0 is easy. A non-XHTML-capable web browser can nevertheless display XHTML documents correctly under certain conditions (see MIME types and HTML compatibility ): it processes them as normal HTML. This takes advantage of the fact that the HTML parsers of the popular browsers are tolerant of syntax errors . This fault tolerance was created in response to the fact that numerous HTML documents on the World Wide Web did not meet the formal standard and that users found browser messages about HTML syntax errors annoying. For XHTML, however, the basic XML idea of uncomplicated data exchange and problem-free automated processing applies . As a result, programs that process XHTML are no longer that tolerant.
Newer XHTML document types no longer contain layout markups. XHTML Transitional 1.0 is the last document type that still contains layout elements like
<b>. More modern document types such as XHTML Strict 1.0 still contain a few layout elements, but only for reasons of backward compatibility with the Transitional document types. Finally, in XHTML Basic or XHTML 2, layout elements are no longer included. For the visual design of XHTML elements, reference should only be made to external CSS rules.
To enable the development of languages based on XHTML, related and related elements in XHTML 1.1 have been combined in so-called modules. Based on these modules in DTDs and in the future in XML schema , you can assemble your own XHTML document types according to the modular principle and mix them with other XML-based languages. Example applications of XHTML modularization are XHTML 1.1, XHTML Basic and the mixtures with SMIL ( multimedia ), SVG ( vector graphics ) and MathML (mathematical formula set). The object module is used to integrate general objects such as multimedia plug-ins .
XHTML is a term used to summarize the various XHTML versions:
- XHTML 1.0 represents the XML-based reformulation of HTML 4.01 . XHTML 1.0 contains the well-known three document types Strict , Transitional and Frameset . XHTML 1.0 was created in such a way that backward compatibility with the popular HTML browsers is possible. At the same time, it can be processed by newer browsers according to the strict rules.
- The current version XHTML 1.1 separates from the deprecated elements and attributes of the transitional and frameset variants, which directly influence the presentation of the document. The scope of the language largely corresponds to XHTML 1.0 Strict, plus elements for Ruby explanations. XHTML 1.1 is not designed to be compatible with HTML browsers.
- XHTML Basic is designed for minimalist devices such as cell phones and handhelds by using only some language components (modules) of XHTML. XHTML Basic is the basis for XHTML Mobile Profiles (see WAP 2.0 ) and for WML 2.0.
- The modularization resulted in further mixed versions , such as XHTML 1.1 plus MathML plus SVG .
- Version XHTML 2.0 , the development of which was stopped in favor of HTML5 at the end of 2009, would have broken with the legacy of HTML 4 and provided for fundamental changes.
Important innovations would have been the simplified, unrestricted notation of hyperlinks , the simplified integration of other types of media (e.g. graphics and videos), the expanded options for ensuring accessibility , and the more sophisticated specification of metadata . Previous core functions of HTML or XHTML would have been outsourced to other XML languages in XHTML 2.0, namely XForms for forms, XML events for the integration of scripts and XFrames for frames .
The main differences between HTML and XHTML
|Element and attribute names are case-sensitive||not relevant (z. B.
||always small (only
|Elements without content, e.g. B.
||either an empty element tag
|Start or end day||Omission is partly allowed||always specify both|
|Include attribute value in quotation marks||optional as long as the attribute value does not contain certain characters.||always|
|boolean attributes, e.g. B. checked||
||Specify the attribute name as the attribute value, e.g. B.
- The start tag of the root element
htmlmust always contain the namespace specification for XHTML:
- In XHTML 1.1, the
langattribute was replaced by the
xml:langattribute of XML. XHTML 1.0 recommends specifying both attributes; B.
<html xmlns="http://www.w3.org/1999/xhtml" xml:>.
- The role of the
nameattribute in the elements
maptakes 1.0 from the XHTML
idattribute. If backward compatibility is desired, both the
name- and the -
idattribute should be noted with the same attribute value and XHTML 1.0 Transitional should be declared. In XHTML 1.1 and XHTML modularization, there is no
namelonger an attribute for these elements.
- The attribute
namefor the elements
imgis only available in XHTML 1.0 Transitional, not in XHTML 1.0 Strict and XHTML 1.1. This restriction is particularly relevant for DOM access to the elements.
This is the source text of a standards-compliant HTML document. The example is deliberately kept as short as possible and is intended to show differences in the permitted syntax. In HTML, it is a good idea to write down all the necessary elements in full.
htmlelement was omitted entirely, the end or start day is missing for the
lielements were not closed. The
srcattribute of the image is shown without quotation marks.
The same document as valid XHTML 1.1 could look like this:
The XML declaration
<?xml version="1.0" encoding="UTF-8" ?> is optional, but is recommended by the W3C because it tells XML parsers the character encoding of the document. The UTF-8 character encoding should be written in capital letters. At first glance, this contradicts the XHTML principle of writing all elements and attributes in lower case. However, this is the official name given by the IANA , and the XML parser can ignore upper and lower case letters when interpreting it. If the coding information is missing and no encoding in HTTP - header is sent, the browser, according to the XML standard coding UTF-8 or UTF-16 use.
text/htmldelivered to this browser as (see the following section on MIME types).
MIME types and HTML compatibility
- According to RFC 2854 , HTML documents should be sent with the MIME type
- An XHTML 1.0 document should normally be sent with the MIME type according to RFC 3236
application/xhtml+xml. If the document adheres to the guidelines of backward compatibility, it can be sent as per RFC 2854 and the XHTML 1.0 standard
text/html. Due to the lack of XHTML support of the popular software, the latter option is of particular importance.
- Since XHTML 1.1 is not compatible with normal HTML browsers, such documents should only be delivered as , according to a W3C notification
application/xhtml+xml. The same applies to the other descendants of XHTML modularization, such as XHTML Basic.
How the browsers process the document depends on the MIME type. Only when an XHTML document is
application/xhtml+xmldeclared with the content type, for example , do XHTML-capable browsers use their XML parsers, through which the advantages of the strict XHTML code, such as ease of processing, are exploited. If so, the document can only be represented if it is well-formed XML . Many current browsers, including Mozilla , Mozilla Firefox , Google Chrome , Opera and Safari , support the MIME type
application/xhtml+xml. The widespread Internet Explorer can only do something with this MIME type from version 7.0: Older versions open a download dialog instead of displaying the document. It should therefore be
text/htmlused if the browser has not expressly stated
application/xhtml+xmlsupports it in the header of the request . This can be determined on the server side in order to send the appropriate MIME type - whereby a version check would have to be carried out for Internet Explorer from version 7.0, since IE still
*/*sends as MIME type.
XHTML and layout
With HTML 4, the W3C began to gradually exclude from HTML those elements and attributes that were directly responsible for the presentation of the document and did not express any output-independent structuring. Like HTML 4, XHTML 1.0 contains a transitional variant with these outdated language components. However, it has become established in modern web design that the strict variant is used and the documents are consistently formatted with CSS . The structured content and the respective layout can thus be defined separately. With XHTML 1.1 and the planned XHTML 2.0, the W3C wanted to finally conclude this development by only allowing output-independent text markup and the layout inevitably to be implemented with CSS or similar languages.
Enhancements to HTML
As an SGML language, HTML pages have a precisely defined structure that is specified in the document type definition (DTD). Without knowledge of the DTD, however, the hierarchical tree structure of a document cannot be determined unequivocally. Some elements do not have an end tag (such as
<br>a line break) or an optional end tag (such as
<p>a paragraph of text). Only the DTD determines which elements these are. If the parser does not know it, the document hierarchy is ambiguous. XHTML as an XML language remedies this shortcoming.
HTML is actually not expandable, but common browsers proceed as follows when processing HTML:
- Identification by unknown elements is ignored.
- If there are syntactic errors, an attempt is made to create a logical element tree anyway. An attempt is made to make the best of a non-compliant site, i.e. H. one page is displayed in any case.
This enables the processing of different HTML versions. If a version introduces a new element, it is simply ignored by older browsers. The same applies to attributes. For example, if an HTML 3.2-capable browser
acronymdoes not know the element for abbreviations introduced in HTML 4.0 , it will be skipped and the abbreviation will appear in normal text formatting. The same applies to browser-specific extensions. The
blinkelement is for example included in any HTML standard. Some browsers, originally only the Netscape Navigator , then display the text flashing. Other browsers display the text normally.
Extensions to XHTML
Unlike HTML, XHTML was created with extensibility in mind. XHTML uses the namespace concept of XML for this. An XHTML version forms such a namespace. Other XML languages such as MathML , SVG and RDF represent further namespaces. Elements from other namespaces can now be used in an XHTML document by using the attribute
xmlnsto specify a corresponding namespace. To do this, a special doctype must be used, which defines the elements - when using MathML it is:
An example of using the namespace concept to expand XHTML is the embedding of MathML :
A MathML-capable browser could display this document section as follows:
Extensions are therefore possible by creating new namespaces without having to change the XHTML standards themselves. By using namespaces, a conflict of elements with the same name in different extensions is excluded. These can always be clearly assigned and, for example, addressed via the DOM with the identifier of the namespace. The extended XHTML versions resulting from the XHTML modularization are based on this concept.
The emergence of such extensions creates a situation similar to that of HTML extensions, because not all browsers support the integrated extensions as with SVG. The browser has the following options for dealing with elements from unknown namespaces:
- He can ignore the markup by such elements and simply display the text content (as with HTML).
- He can ignore all elements of the unknown namespace as well as their text contents.
- He can try to load a plug-in for the extension from the web and then display the page correctly.
- RFC 3236
- Bill Wilder: Is “UTF-8” case-sensitive in XML declaration? In: blog.codingoutloud.com. Retrieved October 5, 2019 .
- XHTML media type test - results. w3.org, March 9, 2006, accessed April 3, 2019 .
- Jens Oliver Meiert: XHTML and the right MIME type. meiert.com, April 5, 2006, accessed April 3, 2019 .
Specifications related to XHTML
- XHTML itself
- HTML 4.01 ( German translation )
- XHTML 1.0 ( German translation )
- Modularization of XHTML ( German translation )
- Descendants of XHTML modularization
- XHTML 1.1 ( German translation )
- XHTML Basic ( German translation )
- To XHTML + MathML + SVG profiles
- XHTML + RDFa
- XHTML + SMIL profiles
- XHTML 2.0
- Basics for XHTML
XHTML tutorials and tools
- Introduction to XHTML, CSS and web design
- Specialist article in T3N magazine ( Memento from September 28, 2007 in the Internet Archive ) XHTML2: From XML hype to application (PDF; 252 kB)
- XHTML overview by Jens Meiert
- HTML and XHTML Frequently Answered Questions
- XHTML 1.0 Schema Validator for checking the syntax of an XHTML document
- W3C Markup Validation Service , also for checking for syntactic errors (English)