Document type definition
A document type definition ( English Document Type Definition ), abbreviated DTD , is a set of rules, is used to declare documents of a particular type. A document type is a class of similar documents, such as telephone books or inventory records. The document type definition consists of element types, attributes of elements, entities and notations. In concrete terms, this means that the order, the nesting of the elements and the type of content of attributes are specified in a DTD, i.e. the structure of the document.
The syntax and semantics of a DTD are part of the SGML and XML specification. The DTD is already checked when reading the document. In SGML, a DTD must be assigned to each document. In XML, this mapping is optional. A document based on DTD (
<?xml version="1.0"?><!DOCTYPE ...>) guarantees the correctness of the applied DTD, i.e. the intended syntax and semantics. The focus here is on the correctness of the data. A document without an externally referenced DTD or an included DTD is only checked for well-formedness when it is read. The focus here is on quick readability. The content can differ from the desired syntax and semantics. In both cases, the quality of the data can be checked retrospectively using additional processes.
Note : In the following, the examples are given in XML syntax.
Document type declaration (DOCTYPE)
A document type declaration establishes the connection between a document and the DTD. The document type declaration is given at the beginning of a document before the root element. The DTD can be referenced as an external file (external DTD) or integrated directly into the document (internal DTD).
The syntax for a document type declaration in SGML and XML is:
<!DOCTYPE Wurzelelement SYSTEM "datei.dtd"> <!DOCTYPE Wurzelelement SYSTEM "datei.dtd" [ … ]> <!DOCTYPE Wurzelelement PUBLIC "Public Identifier" "datei.dtd"> <!DOCTYPE Wurzelelement PUBLIC "Public Identifier" "datei.dtd" [ … ]> <!DOCTYPE Wurzelelement [ … ]>
In SGML, the following variants are also permitted without a system identifier:
<!DOCTYPE Wurzelelement PUBLIC "Public Identifier"> <!DOCTYPE Wurzelelement PUBLIC "Public Identifier" [ … ]>
In HTML5 there is no longer a DTD, but the document type declaration still exists in a shortened form:
The system identifier (SYSTEM), in the example the entry
datei.dtd, contains the file name of the external DTD. The file name can be specified as any URI .
The public identifier (PUBLIC) contains a publicly known identifier for the DTD. For example, the identifier is
"-//W3C//DTD XHTML 1.0 Strict//EN"used to uniquely declare the DTD for XHTML . If the identifier is known to the system, the system uses the associated DTD and does not load the directly specified DTD from the system identifier. This avoids repeated loading of the DTD in web browsers .
[ ... ]identifies an internal DTD or additions to a DTD.
Within a DTD, the document structure can be defined with declarations of element types , attribute lists , entities and notations and text blocks. Special parameter entities can be used that contain DTD parts and are only allowed within the DTD.
The structural elements (building blocks) are defined using attribute assignments:
CDATA ( English Character Data ) identifies an unparsed text block. The syntax for a CDATA area is:
All characters are permitted in the character data area, with the exception of the recognition pattern for the end of
]]>the CDATA area. Example:
Within an entity definition in XML, the syntax is:
<!ENTITY amp "Zeichendaten">
In SGML the keyword CDATA must be specified explicitly:
<!ENTITY amp CDATA "Zeichendaten">
In the area of character data, all characters are allowed, except for the end identifier string
<!ENTITY amp CDATA "&">
The character data
&is not analyzed by the parser.
The keyword #PCDATA is used for PCDATA ( English Parsed Character Data ) . This marks a block of text that can also contain further instructions to the parser. The content of this text block is syntactically analyzed by the parser. In contrast to CDATA, only characters that do not introduce tags, declarations or processing instructions may be included. Here, for example, are prohibited (e.g. start characters of a day, e.g. ).
Element declarations (ELEMENT)
An element type declaration is used to define an element and its possible content. A valid document can only contain elements that are defined in the DTD.
The content of an element can be specified by specifying other element names and some keywords and characters.
#PCDATAfor character content (see PCDATA )
EMPTYfor no content
ANYfor any content
|for alternatives (in the sense of "either ... or")
*for any number of times (in succession)
+for at least once
?for none or exactly once
- If no asterisk, plus sign or question mark is given, the element must appear exactly once
<!ELEMENT html (head, body)> <!ELEMENT hr EMPTY> <!ELEMENT div (#PCDATA | p | ul | ol | dl | table | pre | hr | h1|h2|h3|h4|h5|h6 | blockquote | address | fieldset)*> <!ELEMENT dl (dt|dd)+>
Attribute declarations (ATTLIST)
<!ATTLIST Elementname Attributliste>defined within an attribute list . The attribute list contains the attribute name , the type and specifications of the individual attributes, separated by spaces or line breaks .
Examples of elements:
- IDREF and IDREFS
- NMTOKEN and NMTOKENS
- NOTATION and NOTATIONS
- Lists and NOTATION lists
The attribute specifications can be used to specify whether an attribute must occur (
#REQUIRED) or not (
#IMPLIED) or contains a fixed value (
#FIXED) and which value is used as the default value if the attribute is not specified for a tag.
|Default values for attributes|
||The attribute must be specified|
||The attribute is optional|
||Default value if the attribute is omitted|
||The attribute always has a fixed value|
Example of an attribute declaration:
<!ATTLIST img id ID #IMPLIED src CDATA #REQUIRED alt CDATA #REQUIRED ismap IDREF #IMPLIED >
Entity declarations (ENTITY)
An entity is a named abbreviation for a character string or an external document that can be used within the DTD or the document. An entity of the form
&Name;is replaced by the declared content of the entity . (For general usage, see Entity (markup language) .)
Entities are made up of strings. These can themselves contain entities and well-formed markup:
<!ENTITY name "Benedikt"> <!ENTITY papst "&name;, der XVI."> <!ENTITY wplink "<a href='http://de.wikipedia.org'>Wikipedia</a>">
Entities can also be defined for the content of a file. A public or system identifier is used for this.
<!ENTITY kapitel1 SYSTEM "kapitel1.xml"> <!ENTITY wichtig PUBLIC "-//privat//WICHTIG//" "wichtig.xml">
In the case of external entities, it can also be specified that it is a non-parsed entity (NDATA, non-XML / SGML data). In this case a notation must be given (here "gif").
<!ENTITY bild SYSTEM "../grafiken/bild.gif" NDATA gif>
Notation declarations (NOTATION)
Notations are notes on the interpretation of external data that are not processed directly by the parser. For example, notations can refer to a file format for images.
<!NOTATION Datentyp SYSTEM "URL"> <!NOTATION Datentyp PUBLIC "Identifikator">
NMTOKEN ( name token ) is related to an identifier, but is more permissive with the rules of naming. For example, identifiers with a leading digit or period are allowed for an NMTOKEN, whereas for an identifier only letters, ideograms and underlines are allowed in the first place. Thus, every identifier is also an NMTOKEN, but not the other way around.
Examples for NMTOKEN:
<!ATTLIST birthdate year NMTOKEN #REQUIRED >
Parameter entities contain a named character string that can be used
%Name;in almost all places within a DTD. In this way, for example, external files can be integrated into a DTD and repeatedly occurring components can be abbreviated. Parameter entities are declared like normal entities , with a single percent sign in front of the element name . Example:
<!ENTITY % datei SYSTEM "andere-datei.ent"> %datei; <!ENTITY % foo.inhalt "(bar|doz)*"> <!ELEMENT foo %foo.inhalt;>
A conditional section is a construct used to switch declarations on or off. Example:
<![INCLUDE[ <!ENTITY hallo "welt"> ]]>
Turns on the declaration of
hallo. The following applies accordingly:
<![IGNORE[ <!ENTITY hallo "welt"> ]]>
However, conditional sections as above are not used alone, but mostly in conjunction with parameter entities:
<!ENTITY % weiche "INCLUDE"> <![%weiche;[ <!ENTITY hallo "welt"> ]]>
The parameter entity
%weiche;is occupied by one of the possible keywords
IGNORE. Depending on the assignment, the entity is
hallodeclared or not.
With this type of notation, a conditional section can be adapted by overwriting parameter entities.
Example of a short document with reference to an external DTD:
<?xml version="1.0"?> <!DOCTYPE hallo SYSTEM "hallo.dtd"> <hallo>Hallo Welt!</hallo>
In the example, the pseudo-
standalone="no"attribute can also be specified in the XML declaration (i.e. an external DTD is required):
<?xml version="1.0" standalone="no"?> <!DOCTYPE hallo SYSTEM "hallo.dtd"> <hallo>Hallo Welt!</hallo>
The content of
<!ELEMENT hallo (#PCDATA)>
Short document with internal DTD:
<?xml version="1.0"?> <!DOCTYPE hallo [<!ELEMENT hallo (#PCDATA)>]> <hallo>Hallo Welt!</hallo>
In the example, the pseudo-
standalone="yes"attribute can also be specified in the XML declaration (i.e. no external DTD is required):
<?xml version="1.0" standalone="yes"?> <!DOCTYPE hallo [<!ELEMENT hallo (#PCDATA)>]> <hallo>Hallo Welt!</hallo>
- Document Schema Definition Languages (specification for defining document structures, data types and data relationships in XML )
- Detailed introduction to DTDs (German)
- Document type declaration in the SELFHTML Wiki (German)
- Web document types W3C (English)