XML schema

from Wikipedia, the free encyclopedia

XML Schema, or XSD (XML Schema Definition) for short , is a W3C recommendation for defining structures for XML documents. Unlike the classic XML DTDs , the structure is described in the form of an XML document. A large number of data types are also supported.

XML schema describes data types, individual XML schema instances (documents) and groups of such instances in a complex schema language . A concrete XML schema is also known as an XSD (XML Schema Definition) and the file usually has the extension .xsd . In contrast to DTDs , when using XML schemas, a distinction can be made between the name of the XML type and the name of the XML tag used in the instance.

In addition to XML schema, there are other concepts for defining XML structures, such as RELAX NG or Schematron . Also DTD as a standard component of XML itself can be used.

Data types

XML Schema distinguishes between simple (atomic) data types and complex data types. In the following text, the term type denotes the abstract specification of the structure of a section within an XML document. Data types in XML Schema are classified into built-in or predefined (built-in) and custom (user defined) data types.

In the specification of the W3C for XML schema , 19 preset primitive data types (e.g. boolean, string, float, date and NOTATION ) and another 25 derived primitive data types (such as ID and integer ) are defined.

Simple guys

XML Schema provides some basic atomic data types. The atomic data types contain the "classic" types, as they are also partly specified in other type systems (e.g. C , Java or SQL ):

  • xs:string
  • xs:decimal
  • xs:integer
  • xs:float
  • xs:boolean
  • xs:date
  • xs:time

There are also other XML-specific atomic types, including:

  • QName: qualified name, globally unique identifier. Composed of so-called NCNames (Non-Colonized Names), each NCName to the last one namespace (namespace) , respectively. The last NCName corresponds to the local name within the namespace. The individual NCNames are combined to form a QName using a period (.) .
  • anyURI: Uniform Resource Identifier ( URI )
  • language: Language name, e.g. B. de-DE, en-US, fr
  • ID: Identification attribute within XML elements
  • IDREF: Reference to an ID value

Simple XML data types must neither contain XML child elements nor have XML attributes.

In addition to the atomic data types, lists and unions (consisting of atomic elements and lists) belong to the simple types:

  • The following example defines a new XML data type with the name monatIntand a list of monatethis new type:
<xs:simpleType name="monatInt">
  <xs:restriction base="xs:integer">
    <xs:minInclusive value="1"/>
    <xs:maxInclusive value="12"/>
  </xs:restriction>
</xs:simpleType>
<xs:simpleType name="monate">
  <xs:list itemType="monatInt"/>
</xs:simpleType>

An instance of the new type could look like this:

<monate>
   1 2 3 4 5 6 7 8 9 10 11 12
</monate>

The individual elements of a list are separated by spaces (here: spaces).

  • The simple types also include so-called unions .

A new type is defined as the union of existing types. Each instance then chooses its type from this set. The following example defines a further type monatsnameand a union type monat:

<xs:simpleType name="monatsname">
  <xs:restriction base="xs:string">
    <xs:enumeration value="Jan"/>
    <xs:enumeration value="Feb"/>
    <xs:enumeration value="Mär"/>
    <!-- und so weiter … -->
  </xs:restriction>
</xs:simpleType>
<xs:simpleType name="monat">
  <xs:union memberTypes="monatsname monatInt"/>
</xs:simpleType>

XML elements of the type monatcan either contain integer values ​​in the range 1–12 or one of the corresponding month names as a character string. Valid instances are for example:

<monat>Jan</monat>
<monat>2</monat>

Complex types

In addition to the simple types, complex XML data type definitions offer the possibility of defining element structures together. Such structures can contain further elements and attributes.

The following example defines a new type pc-Typwith corresponding child elements name, herstelleretc., as well as an attribute id:

<xs:complexType name="pc-Typ">
  <xs:sequence>
    <xs:element name="name"       type="xs:string"/>
    <xs:element name="hersteller" type="xs:string"/>
    <xs:element name="prozessor"  type="xs:string"/>
    <xs:element name="mhz"        type="xs:integer" minOccurs="0"/>
    <xs:element name="kommentar"  type="xs:string"  minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:attribute name="id" type="xs:integer"/>
</xs:complexType>

The options for defining complex types are only explained here as examples. The interested reader is referred to the links to the W3C website given below.

The children of a complex type can be combined in three different ways:

  • xs:sequence: A list of child elements is specified. Each of these elements can appear never, once or more than once (attributes minOccursand maxOccurs). If there is no occursattribute, the default value 1 is used in both cases. The elements within a sequencemust appear in the specified order. In the example above, the elements must name, herstellerand prozessorexactly once occur, the mhzelement can occur zero or one, kommentarelements can as often or not at all occur.
  • xs:choice: An element can be selected from a list of alternatives. The following example defines a new type computerthat has either an desktopelement (of type pc-Typ) or an laptopelement as a child element:
<xs:complexType name="computer">
  <xs:choice>
    <xs:element name="desktop" type="pc-Typ"/>
    <xs:element name="laptop" type="laptop-Typ"/>
  </xs:choice>
</xs:complexType>
  • xs:all: The xs:alltag can be used to define a group of child elements, each of which may only occur once ( minand maxOccursthe child elements may only have the values ​​0 or 1). The order of the elements is arbitrary.

Any content

XML elements with any content can be anyTypedefined using the basic type . The following code specifies an kommentarelement of any content, i. H. complex XML elements as well as text can occur.

<xs:element name="kommentar" type="xs:anyType"/>

If text and tags are to appear in any order in the content, the value for the "mixed" attribute must be set to "true":

<xs:element name="tagname">
  <xs:complexType mixed="true">
    <xs:sequence>
      <xs:element minOccurs="0" maxOccurs="unbounded" name="child" type="xs:integer"/>
      <!-- Weitere Elemente … -->
    </xs:sequence>
  </xs:complexType>
</xs:element>

Empty elements

Of empty XML elements is when the respective element consists of only a single XML tag and no other XML elements enclosing or text (e.g., the XHTML-line break. <br />). XML Schema makes use of a little trick at this point: A xs:complexTypenew type is defined without specifying a child element. Since xs:complexTypeonly complex XML child elements are allowed as content by default, the respective element remains empty in this case.

Derivation of new types

New data types can be created by defining a new type (see previous section) or by deriving a new type from existing ones.

The derivation of a new type is not an inheritance in the sense of object orientation , since no properties comparable to the methods or attributes of object-oriented classes are inherited. Rather, it is a matter of reusing existing type definitions. Accordingly, there is no implicit substitutability when deriving new types, as is common in other type systems (however, explicit type conversions are possible).

The derivation of a new type can be done in two ways: increased or reduced.

Extension of a type

The extension of an existing type (engl. Extension ) to other properties, d. H. new elements or attributes are added. In the following example, the type defined above is extended pc-Typby an element ram:

<xs:complexType name="myPC-Typ">
  <xs:complexContent>
    <xs:extension base="pc-Typ">
      <xs:sequence>
        <xs:element name="ram" type="xs:integer"/>
      </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>

The newly defined XML type myPC-Typconsists of all child elements of the type pc-Typand the element ram. The latter is xs:sequenceappended to the previous child elements , as in a definition.
Since there is no substitutability, an element of the type may pc-Typnot simply be used at a point where an element of the type is expected myPC-Typ.

Restriction of a type

By restricting existing types (Engl. Restriction ) can also be derived new definitions. For this purpose, all element definitions of the basic type must be repeated, modified by the more restrictive restrictions. The following example derives a new type myPC2-Typof pc-Typ. In this case, a maximum of one kommentarelement may appear (as opposed to any number for the type pc-Typ)

<xs:complexType name="myPC2-Typ">
  <xs:complexContent>
    <xs:restriction base="pc-Typ">
      <xs:sequence>
       <xs:element name="name"       type="xs:string"/>
       <xs:element name="hersteller" type="xs:string"/>
       <xs:element name="prozessor"  type="xs:string"/>
       <xs:element name="mhz"        type="xs:integer" minOccurs="0"/>
       <xs:element name="kommentar"  type="xs:string" minOccurs="0" maxOccurs="1"/>
      </xs:sequence>
    </xs:restriction>
  </xs:complexContent>
</xs:complexType>

In addition to restricting complex types, it is also possible to define new types as restricting simple types. An example of such a definition can already be found in the section on simple types. A new type monatIntis defined as a restriction of the Integer type to the value range 1–12. Basically, the following primitives are available to describe restrictions on simple types:

  • length, maxLength, minLength- Limits the length of a string or a list.
  • enumeration - Restriction by specifying alternative values
  • pattern- Restriction by specifying a regular expression
  • minExclusive, minInclusive, maxExclusive, maxInclusive- limitation of the value range.
  • totalDigits, fractionDigits- restriction of decimal places (total number and decimal places)
  • whiteSpace - Handling of spaces and tabs

The following examples illustrate the use of these components:

  • Body temperature, 3 decimal places, 1 decimal place, minimum and maximum value
<xs:simpleType name="celsiusKörperTemp">
  <xs:restriction base="xs:decimal">
    <xs:totalDigits value="3"/>
    <xs:fractionDigits value="1"/>
    <xs:minInclusive value="35.0"/>
    <xs:maxInclusive value="42.5"/>
  </xs:restriction>
</xs:simpleType>
  • German postcodes, optional " D " followed by five digits
<xs:simpleType name="plz">
   <xs:restriction base="xs:string">
     <xs:pattern value="(D )?[0-9]{5}"/>
   </xs:restriction>
</xs:simpleType>
  • Size specification
<xs:simpleType name="size">
  <xs:restriction base="xs:string">
    <xs:enumeration value="XS"/>
    <xs:enumeration value="S"/>
    <xs:enumeration value="M"/>
    <xs:enumeration value="L"/>
    <xs:enumeration value="XL"/>
  </xs:restriction>
</xs:simpleType>

When defining a type, it is possible to determine whether and in what way other XML element types may be derived from this type. For example, you can specify that pc-Typother types can only be derived from a type by setting further restrictions - and not by adding new child elements.

Element definition

As explained in the previous section, XML Schema allows you to define new XML data types and use them when defining your own XML elements. The following example illustrates the use of the type already defined pc-Typwithin a list of pc elements:

<xs:element name="pc-liste">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="pc" type="pc-Typ" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

A corresponding XML element could look like this:

<pc-liste>
  <pc>
    <name>Dimension 3100 </name>
    <hersteller>Dell</hersteller>
    <prozessor>AMD</prozessor>
    <mhz>3060</mhz>
    <kommentar>Arbeitsplatzrechner</kommentar>
  </pc>
  <pc>
    <name>T 42</name>
    <hersteller>IBM</hersteller>
    <prozessor>Intel</prozessor>
    <mhz>1600</mhz>
    <kommentar>Laptop</kommentar>
  </pc>
</pc-liste>

In this example, the specification of the anonymous list type is done directly within the element definition , while the specification of the pc type is done externally.

When designing a complex XML schema, the reusability and extensibility of the individual XML element types as well as the readability of the schema itself should be taken into account. Using anonymous XML element types as part of larger elements generally makes smaller XML schemas easier to read. The definition and naming of individual, smaller and reusable XML element types, on the other hand, enables greater modularization of the XML schema structure. Due to the large number of possible application scenarios, no generally valid design principles for XML schemas have yet emerged (comparable to the normal forms for relational databases).

Advanced concepts and properties

Unique key

Similar to the primary keys in relational databases , unique keys can be defined using XML schema . XML Schema is different (Engl. Between the uniqueness unique ) and the key property.

The following example defines a new element pc-list with a list of pcchild elements:

<xs:element name="pc-list">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="pc" type="pc-Typ" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
  <xs:unique name="hersteller-name">
    <xs:selector xpath="pc"/>
    <xs:field xpath="name"/>
    <xs:field xpath="hersteller"/>
  </xs:unique>
  <xs:key name="idKey">
    <xs:selector xpath="pc"/>
    <xs:field xpath="@id"/>
  </xs:key>
</xs:element>

The two elements uniqueand keyselect a set of elements with an XPath path expression (in the example:) . The respective uniqueness or key condition must be met for this set. In the example above, it is specified that the combination of the elements and must be unique for each element within this list. The element specifies that the attribute must be unique within this list and that it can be referenced from outside. pcpcnameherstellerpc
keyid

The following example shows the referencing of this key with the attribute referand the keyword @references.

<xs:keyref name="idFremdKey" refer="idKey">
  <!-- idKey von obigem Beispiel -->
  <xs:selector xpath="computerFremd"/>
  <xs:field xpath="@references"/>
</xs:keyref>
Notice

With referyou refer to the nameattribute of a key condition, not to the key field. The values ​​in referencesmust therefore always be found under the keys to computern. (The background to this construct is to ensure referential integrity , as we know it from relational database systems.)

Import, include and redefine

XML schema allows foreign schemas to be reused.
Both the include- and the - importtag are available for this purpose, as well as the option of a new definition or adaptation of external schemes when integrating.

include

Type definitions within a namespace that are distributed over several files can be includecombined with.

<schema xmlns="http://www.w3.org/2001/XMLSchema"
        xmlns:pcTeile="http://www.example.com/pcTeile"
        targetNamespace="http://www.example.com/pcTeile"><include schemaLocation="http://www.example.com/schemata/harddisk.xsd"/>
  <include schemaLocation="http://www.example.com/schemata/ram.xsd"/></schema>
  • Multiple schemes can be included.
  • targetNamespacedes harddisk.xsdmust match that of the including scheme.
redefine

Same example as just now. Assumption there is one complexType Herstellerin the scheme harddisk.xsd.

<schema xmlns="http://www.w3.org/2001/XMLSchema"
        xmlns:pcTeile="http://www.example.com/pcTeile"
        targetNamespace="http://www.example.com/pcTeile"><redefine schemaLocation="http://www.example.com/schemata/harddisk.xsd">
    <!-- redefinition of Hersteller -->
    <complexType name="Hersteller">
      <complexContent>
        <!-- redefinition of Hersteller mit ''restriction'' oder auch ''extension'' etc. -->
        <restriction base="pcTeile:Hersteller">
          <sequence>
            <element name="hersteller" type="string" minOccurs="10" maxOccurs="10"/>
          </sequence>
        </restriction>
      </complexContent>
    </complexType>
  </redefine><include schemaLocation="http://www.example.com/schemata/ram.xsd"/></schema>
  • redefinecan be used in place of include.
  • The name of the type does not change.
import

The importtag allows elements to be imported from other namespaces, given a prefix, and thus schema components from different namespaces to be reused.
Assumption is that there is a defined type superTypin pcTeile.

<schema xmlns="http://www.w3.org/2001/XMLSchema"
        xmlns:pcTeile="http://www.example.com/pcTeile"
        targetNamespace="http://www.example.com/firma"><import namespace="http://www.example.com/pcTeile"/><<xs:attribute name="xyz" type="pcTeile:superTyp"/>
    …/>
  …
</schema>

Use of XML Schemas

To use an XML schema in an XML file, the attribute of schemaLocationthe schema instance namespace can be used to make the address of the schema known. This enables an application such as an XML parser to load the schema if it is not already aware of it. Alternatively, the scheme can also be made known to the application in other ways, e.g. B. via configuration files. The latter option, however, is not standardized and therefore differs from application to application.

The following example expresses that the standard namespace http://www.w3.org/1999/xhtmlis and then specifies that the XML schema for this namespace can www.w3.org/1999/xhtml.xsdbe found under .

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/1999/xhtml
                          http://www.w3.org/1999/xhtml.xsd">

The definition applies to the XML element for which the attributes are specified and all child elements.

If elements that do not belong to a namespace are to be assigned an XML schema, this is done using the attribute, as shown in the following example noNamespaceSchemaLocation.

<html xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="http://www.w3.org/1999/xhtml.xsd">

example

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
        xmlns:bsp="http://de.wikipedia.org/wiki/XML_Schema#Beispiel"
        targetNamespace="http://de.wikipedia.org/wiki/XML_Schema#Beispiel"
        elementFormDefault="qualified">
  <element name="doc">
    <complexType>
      <sequence>
        <element ref="bsp:head"/>
        <element name="body" type="string"/>
      </sequence>
    </complexType>
  </element>
  <element name="head">
    <complexType>
      <sequence>
        <element name="title" type="string"/>
      </sequence>
    </complexType>
  </element>
</schema>

Apart from the namespace, this corresponds to the following DTD

<!ELEMENT doc (head, body)>
<!ELEMENT head (title)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT body (#PCDATA)>

An XML structure that corresponds to the schema is this:

<?xml version="1.0" encoding="UTF-8"?>
<doc xmlns="http://de.wikipedia.org/wiki/XML_Schema#Beispiel">
  <head>
    <title>
      Dies ist der Titel
    </title>
  </head>
  <body>
    Dies ist der Text.
  </body>
</doc>

See also

literature

  • Alfons Kemper, André Eickler: Database Systems - An Introduction . Oldenbourg Wissenschaftsverlag, Munich 2004, ISBN 3-486-27392-2 .
  • Helmut Vonhoegen: Getting started with XML . Current standards: XML Schema, XSL, XLink . 5th edition. Galileo Press, 2009, ISBN 978-3-8362-1367-7 .
  • Margit Becher: XML - DTD, XML-Schema, XPath, XQuery, XSLT, XSL-FO, SAX, DOM . W3L Verlag, Witten 2009, ISBN 978-3-937137-69-8 .
  • Marco Skulschus, Marcus Wiederstein: XML Schema . Comelio Medien, Berlin 2009, ISBN 978-3-939701-22-4 .
  • Eric van der Vlist: XML Schema . O'Reilly, Cologne 2003, ISBN 978-3-89721-345-6 ( online ).

Web links

Individual evidence

  1. www.w3.org/1999/xhtml.xsd ( Memento of November 10, 2000 in the Internet Archive )