XML schema
XML Schema, or XSD (XML Schema Definition) for short , is a W3C recommendation for defining structures for XML documents. Unlike the classic XML DTDs , the structure is described in the form of an XML document. A large number of data types are also supported.
XML schema describes data types, individual XML schema instances (documents) and groups of such instances in a complex schema language . A concrete XML schema is also known as an XSD (XML Schema Definition) and the file usually has the extension .xsd . In contrast to DTDs , when using XML schemas, a distinction can be made between the name of the XML type and the name of the XML tag used in the instance.
In addition to XML schema, there are other concepts for defining XML structures, such as RELAX NG or Schematron . Also DTD as a standard component of XML itself can be used.
Data types
XML Schema distinguishes between simple (atomic) data types and complex data types. In the following text, the term type denotes the abstract specification of the structure of a section within an XML document. Data types in XML Schema are classified into built-in or predefined (built-in) and custom (user defined) data types.
In the specification of the W3C for XML schema , 19 preset primitive data types (e.g. boolean, string, float, date and NOTATION ) and another 25 derived primitive data types (such as ID and integer ) are defined.
Simple guys
XML Schema provides some basic atomic data types. The atomic data types contain the "classic" types, as they are also partly specified in other type systems (e.g. C , Java or SQL ):
xs:string
xs:decimal
xs:integer
xs:float
xs:boolean
xs:date
xs:time
There are also other XML-specific atomic types, including:
-
QName
: qualified name, globally unique identifier. Composed of so-called NCNames (Non-Colonized Names), each NCName to the last one namespace (namespace) , respectively. The last NCName corresponds to the local name within the namespace. The individual NCNames are combined to form a QName using a period (.) . -
anyURI
: Uniform Resource Identifier ( URI ) -
language
: Language name, e.g. B. de-DE, en-US, fr -
ID
: Identification attribute within XML elements -
IDREF
: Reference to an ID value
Simple XML data types must neither contain XML child elements nor have XML attributes.
In addition to the atomic data types, lists and unions (consisting of atomic elements and lists) belong to the simple types:
- The following example defines a new XML data type with the name
monatInt
and a list ofmonate
this new type:
<xs:simpleType name="monatInt">
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
<xs:maxInclusive value="12"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="monate">
<xs:list itemType="monatInt"/>
</xs:simpleType>
An instance of the new type could look like this:
<monate>
1 2 3 4 5 6 7 8 9 10 11 12
</monate>
The individual elements of a list are separated by spaces (here: spaces).
- The simple types also include so-called unions .
A new type is defined as the union of existing types. Each instance then chooses its type from this set. The following example defines a further type monatsname
and a union type monat
:
<xs:simpleType name="monatsname">
<xs:restriction base="xs:string">
<xs:enumeration value="Jan"/>
<xs:enumeration value="Feb"/>
<xs:enumeration value="Mär"/>
<!-- und so weiter … -->
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="monat">
<xs:union memberTypes="monatsname monatInt"/>
</xs:simpleType>
XML elements of the type monat
can either contain integer values in the range 1–12 or one of the corresponding month names as a character string. Valid instances are for example:
<monat>Jan</monat>
<monat>2</monat>
Complex types
In addition to the simple types, complex XML data type definitions offer the possibility of defining element structures together. Such structures can contain further elements and attributes.
The following example defines a new type pc-Typ
with corresponding child elements name
, hersteller
etc., as well as an attribute id
:
<xs:complexType name="pc-Typ">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="hersteller" type="xs:string"/>
<xs:element name="prozessor" type="xs:string"/>
<xs:element name="mhz" type="xs:integer" minOccurs="0"/>
<xs:element name="kommentar" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="id" type="xs:integer"/>
</xs:complexType>
The options for defining complex types are only explained here as examples. The interested reader is referred to the links to the W3C website given below.
The children of a complex type can be combined in three different ways:
-
xs:sequence
: A list of child elements is specified. Each of these elements can appear never, once or more than once (attributesminOccurs
andmaxOccurs
). If there is nooccurs
attribute, the default value 1 is used in both cases. The elements within asequence
must appear in the specified order. In the example above, the elements mustname
,hersteller
andprozessor
exactly once occur, themhz
element can occur zero or one,kommentar
elements can as often or not at all occur. -
xs:choice
: An element can be selected from a list of alternatives. The following example defines a new typecomputer
that has either andesktop
element (of typepc-Typ
) or anlaptop
element as a child element:
<xs:complexType name="computer">
<xs:choice>
<xs:element name="desktop" type="pc-Typ"/>
<xs:element name="laptop" type="laptop-Typ"/>
</xs:choice>
</xs:complexType>
-
xs:all
: Thexs:all
tag can be used to define a group of child elements, each of which may only occur once (min
andmaxOccurs
the child elements may only have the values 0 or 1). The order of the elements is arbitrary.
Any content
XML elements with any content can be anyType
defined using the basic type . The following code specifies an kommentar
element of any content, i. H. complex XML elements as well as text can occur.
<xs:element name="kommentar" type="xs:anyType"/>
If text and tags are to appear in any order in the content, the value for the "mixed" attribute must be set to "true":
<xs:element name="tagname">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" name="child" type="xs:integer"/>
<!-- Weitere Elemente … -->
</xs:sequence>
</xs:complexType>
</xs:element>
Empty elements
Of empty XML elements is when the respective element consists of only a single XML tag and no other XML elements enclosing or text (e.g., the XHTML-line break. <br />
). XML Schema makes use of a little trick at this point: A xs:complexType
new type is defined without specifying a child element. Since xs:complexType
only complex XML child elements are allowed as content by default, the respective element remains empty in this case.
Derivation of new types
New data types can be created by defining a new type (see previous section) or by deriving a new type from existing ones.
The derivation of a new type is not an inheritance in the sense of object orientation , since no properties comparable to the methods or attributes of object-oriented classes are inherited. Rather, it is a matter of reusing existing type definitions. Accordingly, there is no implicit substitutability when deriving new types, as is common in other type systems (however, explicit type conversions are possible).
The derivation of a new type can be done in two ways: increased or reduced.
Extension of a type
The extension of an existing type (engl. Extension ) to other properties, d. H. new elements or attributes are added. In the following example, the type defined above is extended pc-Typ
by an element ram
:
<xs:complexType name="myPC-Typ">
<xs:complexContent>
<xs:extension base="pc-Typ">
<xs:sequence>
<xs:element name="ram" type="xs:integer"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
The newly defined XML type myPC-Typ
consists of all child elements of the type pc-Typ
and the element ram
. The latter is xs:sequence
appended to the previous child elements , as in a definition.
Since there is no substitutability, an element of the type may pc-Typ
not simply be used at a point where an element of the type is expected myPC-Typ
.
Restriction of a type
By restricting existing types (Engl. Restriction ) can also be derived new definitions. For this purpose, all element definitions of the basic type must be repeated, modified by the more restrictive restrictions. The following example derives a new type myPC2-Typ
of pc-Typ
. In this case, a maximum of one kommentar
element may appear (as opposed to any number for the type pc-Typ
)
<xs:complexType name="myPC2-Typ">
<xs:complexContent>
<xs:restriction base="pc-Typ">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="hersteller" type="xs:string"/>
<xs:element name="prozessor" type="xs:string"/>
<xs:element name="mhz" type="xs:integer" minOccurs="0"/>
<xs:element name="kommentar" type="xs:string" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
In addition to restricting complex types, it is also possible to define new types as restricting simple types. An example of such a definition can already be found in the section on simple types. A new type monatInt
is defined as a restriction of the Integer type to the value range 1–12. Basically, the following primitives are available to describe restrictions on simple types:
-
length
,maxLength
,minLength
- Limits the length of a string or a list. -
enumeration
- Restriction by specifying alternative values -
pattern
- Restriction by specifying a regular expression -
minExclusive
,minInclusive
,maxExclusive
,maxInclusive
- limitation of the value range. -
totalDigits
,fractionDigits
- restriction of decimal places (total number and decimal places) -
whiteSpace
- Handling of spaces and tabs
The following examples illustrate the use of these components:
- Body temperature, 3 decimal places, 1 decimal place, minimum and maximum value
<xs:simpleType name="celsiusKörperTemp">
<xs:restriction base="xs:decimal">
<xs:totalDigits value="3"/>
<xs:fractionDigits value="1"/>
<xs:minInclusive value="35.0"/>
<xs:maxInclusive value="42.5"/>
</xs:restriction>
</xs:simpleType>
- German postcodes, optional "
D
" followed by five digits
<xs:simpleType name="plz">
<xs:restriction base="xs:string">
<xs:pattern value="(D )?[0-9]{5}"/>
</xs:restriction>
</xs:simpleType>
- Size specification
<xs:simpleType name="size">
<xs:restriction base="xs:string">
<xs:enumeration value="XS"/>
<xs:enumeration value="S"/>
<xs:enumeration value="M"/>
<xs:enumeration value="L"/>
<xs:enumeration value="XL"/>
</xs:restriction>
</xs:simpleType>
When defining a type, it is possible to determine whether and in what way other XML element types may be derived from this type. For example, you can specify that pc-Typ
other types can only be derived from a type by setting further restrictions - and not by adding new child elements.
Element definition
As explained in the previous section, XML Schema allows you to define new XML data types and use them when defining your own XML elements. The following example illustrates the use of the type already defined pc-Typ
within a list of pc elements:
<xs:element name="pc-liste">
<xs:complexType>
<xs:sequence>
<xs:element name="pc" type="pc-Typ" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
A corresponding XML element could look like this:
<pc-liste>
<pc>
<name>Dimension 3100 </name>
<hersteller>Dell</hersteller>
<prozessor>AMD</prozessor>
<mhz>3060</mhz>
<kommentar>Arbeitsplatzrechner</kommentar>
</pc>
<pc>
<name>T 42</name>
<hersteller>IBM</hersteller>
<prozessor>Intel</prozessor>
<mhz>1600</mhz>
<kommentar>Laptop</kommentar>
</pc>
</pc-liste>
In this example, the specification of the anonymous list type is done directly within the element definition , while the specification of the pc type is done externally.
When designing a complex XML schema, the reusability and extensibility of the individual XML element types as well as the readability of the schema itself should be taken into account. Using anonymous XML element types as part of larger elements generally makes smaller XML schemas easier to read. The definition and naming of individual, smaller and reusable XML element types, on the other hand, enables greater modularization of the XML schema structure. Due to the large number of possible application scenarios, no generally valid design principles for XML schemas have yet emerged (comparable to the normal forms for relational databases).
Advanced concepts and properties
Unique key
Similar to the primary keys in relational databases , unique keys can be defined using XML schema . XML Schema is different (Engl. Between the uniqueness unique ) and the key property.
The following example defines a new element pc-list with a list of pc
child elements:
<xs:element name="pc-list">
<xs:complexType>
<xs:sequence>
<xs:element name="pc" type="pc-Typ" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:unique name="hersteller-name">
<xs:selector xpath="pc"/>
<xs:field xpath="name"/>
<xs:field xpath="hersteller"/>
</xs:unique>
<xs:key name="idKey">
<xs:selector xpath="pc"/>
<xs:field xpath="@id"/>
</xs:key>
</xs:element>
The two elements unique
and key
select a set of elements with an XPath path expression (in the example:) . The respective uniqueness or key condition must be met for this set. In the example above, it is specified that the combination of the elements and must be unique for each element within this list. The element specifies that the attribute must be unique within this list and that it can be referenced from outside.
pc
pc
name
hersteller
pc
key
id
The following example shows the referencing of this key with the attribute refer
and the keyword @references
.
<xs:keyref name="idFremdKey" refer="idKey">
<!-- idKey von obigem Beispiel -->
<xs:selector xpath="computerFremd"/>
<xs:field xpath="@references"/>
</xs:keyref>
- Notice
With refer
you refer to the name
attribute of a key condition, not to the key field. The values in references
must therefore always be found under the keys to computern
. (The background to this construct is to ensure referential integrity , as we know it from relational database systems.)
Import, include and redefine
XML schema allows foreign schemas to be reused.
Both the include
- and the - import
tag are available for this purpose, as well as the option of a new definition or adaptation of external schemes when integrating.
include
Type definitions within a namespace that are distributed over several files can be include
combined with.
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:pcTeile="http://www.example.com/pcTeile"
targetNamespace="http://www.example.com/pcTeile">
…
<include schemaLocation="http://www.example.com/schemata/harddisk.xsd"/>
<include schemaLocation="http://www.example.com/schemata/ram.xsd"/>
…
</schema>
- Multiple schemes can be included.
-
targetNamespace
desharddisk.xsd
must match that of the including scheme.
redefine
Same example as just now. Assumption there is one complexType
Hersteller
in the scheme harddisk.xsd
.
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:pcTeile="http://www.example.com/pcTeile"
targetNamespace="http://www.example.com/pcTeile">
…
<redefine schemaLocation="http://www.example.com/schemata/harddisk.xsd">
<!-- redefinition of Hersteller -->
<complexType name="Hersteller">
<complexContent>
<!-- redefinition of Hersteller mit ''restriction'' oder auch ''extension'' etc. -->
<restriction base="pcTeile:Hersteller">
<sequence>
<element name="hersteller" type="string" minOccurs="10" maxOccurs="10"/>
</sequence>
</restriction>
</complexContent>
</complexType>
</redefine>
…
<include schemaLocation="http://www.example.com/schemata/ram.xsd"/>
…
</schema>
-
redefine
can be used in place ofinclude
. - The name of the type does not change.
import
The import
tag allows elements to be imported from other namespaces, given a prefix, and thus schema components from different namespaces to be reused.
Assumption is that there is a defined type superTyp
in pcTeile
.
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:pcTeile="http://www.example.com/pcTeile"
targetNamespace="http://www.example.com/firma">
…
<import namespace="http://www.example.com/pcTeile"/>
…
<…
<xs:attribute name="xyz" type="pcTeile:superTyp"/>
…/>
…
</schema>
Use of XML Schemas
To use an XML schema in an XML file, the attribute of schemaLocation
the schema instance namespace can be used to make the address of the schema known. This enables an application such as an XML parser to load the schema if it is not already aware of it. Alternatively, the scheme can also be made known to the application in other ways, e.g. B. via configuration files. The latter option, however, is not standardized and therefore differs from application to application.
The following example expresses that the standard namespace http://www.w3.org/1999/xhtml
is and then specifies that the XML schema for this namespace can www.w3.org/1999/xhtml.xsd
be found under .
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/1999/xhtml
http://www.w3.org/1999/xhtml.xsd">
The definition applies to the XML element for which the attributes are specified and all child elements.
If elements that do not belong to a namespace are to be assigned an XML schema, this is done using the attribute, as shown in the following example noNamespaceSchemaLocation
.
<html xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.w3.org/1999/xhtml.xsd">
example
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:bsp="http://de.wikipedia.org/wiki/XML_Schema#Beispiel"
targetNamespace="http://de.wikipedia.org/wiki/XML_Schema#Beispiel"
elementFormDefault="qualified">
<element name="doc">
<complexType>
<sequence>
<element ref="bsp:head"/>
<element name="body" type="string"/>
</sequence>
</complexType>
</element>
<element name="head">
<complexType>
<sequence>
<element name="title" type="string"/>
</sequence>
</complexType>
</element>
</schema>
Apart from the namespace, this corresponds to the following DTD
<!ELEMENT doc (head, body)>
<!ELEMENT head (title)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT body (#PCDATA)>
An XML structure that corresponds to the schema is this:
<?xml version="1.0" encoding="UTF-8"?>
<doc xmlns="http://de.wikipedia.org/wiki/XML_Schema#Beispiel">
<head>
<title>
Dies ist der Titel
</title>
</head>
<body>
Dies ist der Text.
</body>
</doc>
See also
literature
- Alfons Kemper, André Eickler: Database Systems - An Introduction . Oldenbourg Wissenschaftsverlag, Munich 2004, ISBN 3-486-27392-2 .
- Helmut Vonhoegen: Getting started with XML . Current standards: XML Schema, XSL, XLink . 5th edition. Galileo Press, 2009, ISBN 978-3-8362-1367-7 .
- Margit Becher: XML - DTD, XML-Schema, XPath, XQuery, XSLT, XSL-FO, SAX, DOM . W3L Verlag, Witten 2009, ISBN 978-3-937137-69-8 .
- Marco Skulschus, Marcus Wiederstein: XML Schema . Comelio Medien, Berlin 2009, ISBN 978-3-939701-22-4 .
- Eric van der Vlist: XML Schema . O'Reilly, Cologne 2003, ISBN 978-3-89721-345-6 ( online ).
Web links
- W3C XML Schema Specification: Primers, Structures, Datatypes and Miscellaneous; German translations: introduction, structures, data types
- Introduction to XML Schema and Reference
Individual evidence
- ↑ www.w3.org/1999/xhtml.xsd ( Memento of November 10, 2000 in the Internet Archive )