Extensible Markup Language

from Wikipedia, the free encyclopedia
Extensible Markup Language
XML example
File extension : .xml
MIME type : application / xml, text / xml
Magic number : 3C 3F 78 6D 6C hex
<? Xml
Developed by: World Wide Web Consortium
Type: Markup language
Extended by: SGML
Expanded to: XHTML , RSS , Atom
Website : 1.0 (Fifth Edition)
1.1 (Second Edition)


The Extensible Markup Language (dt. Extensible Markup Language ), abbreviated XML , is a markup language to represent hierarchically structured data in the format of a text file that both human as well as machine readable is.

XML is also used for the platform and implementation -independent exchange of data between computer systems, especially over the Internet , and was published by the World Wide Web Consortium (W3C) on February 10, 1998. The current version is the fifth edition from November 26, 2008. XML is a metalanguage , on the basis of which application-specific languages ​​are defined by structural and content restrictions. These restrictions are expressed either by a Document Type Description (DTD) or by an XML schema . Examples of XML languages ​​are: RSS , MathML , GraphML , XHTML , XAML , Scalable Vector Graphics (SVG), GPX , but also the XML schema itself.

The standard character encoding of an XML document is UTF-8 . Systems processing XML must be able to use the UTF-8 and UTF-16 encodings . XML documents that use UTF-8 or UTF-16 can be viewed and edited in any text editor that supports these encodings.

If the XML document is to contain binary data, this data must be recoded as text. For this purpose z. B. Base64 coding can be used.

Technical terms

element

The most important structural unit of an XML document is the element . Elements can contain text as well as other elements as content. Elements form the nodes of the structure tree of an XML document. The name of an XML element can be freely selected in XML documents without a Document Type Definition (DTD) . In XML documents with DTD, the name of an element must be declared in the DTD and the element must be in an approved position within the structure tree according to the DTD . In the DTD u. a. the possible content of each element is defined. Elements are the carriers of information in an XML document.

Day

Tags are used to mark elements:

  • a start day for the beginning of an element: <Elementname>
  • an end tag for the end of an element: </Elementname>
  • a blank tag for an element with no content: <Leerelementname/>

See main article tag

Well-formedness

An XML document is "well formed" (or English well-formed ) if it complies with all the rules of XML. The following are examples:

  • The document has exactly one root element. The outermost element in each case is referred to as the root element, e.g. B. <html>in XHTML .
  • All elements with content have a start and an end tag (e.g. <eintrag>Eintrag 1</eintrag>). Elements without content can be marked with a blank (e.g. <eintrag />).
  • The start and end tags are nested in pairs. This means that all elements must be closed before the end markers of the corresponding parent element or the start markers of a sibling element appear.
  • An element cannot have multiple attributes with the same name.
  • Attribute values ​​must be enclosed in quotation marks ( "..."or '...').
  • The start and end tags are case-sensitive (e.g. <eintrag></Eintrag>is not valid).

Validity

If XML is to be used for data exchange , it is advantageous if the format is defined using a grammar (e.g. a document type definition or an XML schema ). The standard defines an XML document as valid (or English valid ) if it is well formed, contains the reference to a grammar and complies with the format described by the grammar.

Parser

Programs or program parts that read out, interpret and, if necessary, check the validity of XML data are called XML parsers . If the parser checks the validity, it is a validating parser.

Structure of an XML document

Example of an XML file

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<verzeichnis>
     <titel>Wikipedia Städteverzeichnis</titel>
     <eintrag>
          <stichwort>Genf</stichwort>
          <eintragstext>Genf ist der Sitz von ...</eintragstext>
     </eintrag>
     <eintrag>
          <stichwort>Köln</stichwort>
          <eintragstext>Köln ist eine Stadt, die ...</eintragstext>
     </eintrag>
</verzeichnis>

XML documents have a physical and a logical structure.

Physical structure

  • The Document entity contains the main document.
  • Other possible entities are about
    • Entity references ( &name;for the document or %name;for the document type definition) integrated character strings, possibly also entire files, as well as references to character entities for the integration of individual characters that were referenced by their number ( or ).&#Dezimalzahl;&#xHexadezimalzahl;
  • An XML declaration is used to specify the XML version, character encoding, and possible processability without a DTD .
  • A document type definition is used to specify entities and the permitted logical structure. The use of a DTD can be deselected in the XML declaration .

Logical structure

The logical structure corresponds to a tree structure and is thus organized hierarchically. The tree nodes are:

  • Elements whose physical identification is by means of
    • a matching pair of start day and end day or<Tagname></Tagname>
    • can take place on an empty day ,<Tagname/>
  • Attributes as additional properties of the elements in the syntax written for a start day or empty day ,Attributname="Attributwert"
  • Processing instructions (English. Processing Instructions )<?Zielname Daten?>
  • Comments , and<!-- Kommentar-Text -->
  • Text that can appear as normal character data or in the form of a CDATA section .<![CDATA[ beliebiger Text]]>

An XML document must contain exactly one top-level element. Further elements and text can be nested below this document element.

DTD

A document type definition (DTD) describes the structure and grammar of documents. It is part of the XML system and activated by default.

If documents are created with reference to an external document type definition or with an integrated document type definition, the parser checks the document when it is opened (read). A document based on a document type definition is always a valid document. The focus is on the compliance of the document content with the rules of the document type definition. Technical readability, including reading invalid documents, is of secondary importance. This is intended for full-text documents ( narrative documents ) and is the main purpose.

Documents without a DTD are more suitable for any data exchange. The parser only checks these documents according to the rules of well-formedness. Technical legibility is the top priority here. The actual information is checked and read out using downstream processes.

Readability of XML documents

Practically all web browsers such as Apple Safari , Google Chrome , Microsoft Internet Explorer , Mozilla Firefox and Opera can visualize XML documents directly with the help of the built-in XML parser .

Classification of XML documents

XML documents can be divided into document-centric and data-centric documents based on their intended use and degree of structure. However, the boundary between these types of documents is fluid. Mixed forms can be described as semi -structured.

  • document-centered: The document is based on a text document that is largely understandable for the human reader without the additional meta information. XML elements are mainly used for the semantic marking of passages in the document; the document is only poorly structured. Machine processing is difficult due to the weak structure.
  • data-centric: The document is primarily intended for machine processing. It follows a schema that describes the entities of a data model and defines the relationship between the entities and the attributes of the entities. The document is thus highly structured and less suitable for direct human use.
  • semi-structured: Semi-structured documents represent a kind of hybrid form that is more structured than document-centered documents, but weaker than data-centered documents.

It is typical of data-centric XML documents that elements have either element content or text content. The so-called mixed content, in which elements contain both text and child elements, is typical for the other XML documents.

Processing of XML

Processing criteria

Basically, three aspects are important when accessing an XML document:

  • How is the XML file accessed: sequentially or randomly?
  • How is the process for accessing the XML data designed: "Push" or "Pull"? (Push means that the parser controls the flow of the program. Pull means that the control of the flow is implemented in the code that calls the parser.)
  • How is the tree structure management of the XML data carried out: hierarchical or nested?

Programmatic access to XML documents

XML documents are read in at the lowest level using a special program component, an XML processor , also known as an XML parser. It provides a programming interface (API) via which the application accesses the XML document.

The XML processors support three basic processing models.

  • DOM : A DOM-API represents an XML document as a tree structure and grants random access to the individual components of the tree structure. In addition to reading XML documents, DOM also allows the tree structure to be manipulated and the tree structure to be written back to an XML document. Because of this, DOM is very memory-intensive.
  • SAX : A SAX API represents an XML document as a sequential data stream and calls callback functions specified in the standard for events. An application that uses SAX can register its own subroutines as callback functions and in this way evaluate the XML data.
  • Pull API : An XML pull API processes data sequentially and offers both event-based processing and an iterator . It is highly memory-efficient and possibly easier to program than the SAX-API, since the process control lies with the program and not with the parser.

Further processing models:

  • Data binding : This option provides XML data as a data structure directly for program access. The XML data is unmarshalled directly in z. B. Converted objects.
  • Non-extracting XML API : The data is processed very efficiently at the byte level.

Often times, the application code does not access the parser API directly. Instead, XML is further encapsulated so that the application code works with native objects / data structures that are based on XML. Examples of such access layers are JAXB in Java , the Data Binding Wizard in Delphi or the XML Schema Definition Toolkit in .Net . The conversion of objects to XML is usually bidirectional. This conversion is known as serialization or marshaling .

XML parser API examples

XML parser APIs are available for various programming languages, e.g. B. Java , C , C ++ , C # , Python , Perl and PHP . Parser API examples:

  • XML :: Parser ( Perl ): An XML parser for Perl. A very simple API offers e.g. B. also the CPAN module XML :: Simple.
  • DOM Functions (PHP5): Module in PHP5 to read XML documents; alternatively simpleXML; for PHP4 there is DOM XML.
  • StAX (Java) : A highly memory-efficient parser implementation (pull) and at the same time easy to program. Cursor and iterator processing models are offered.
  • JAXB : Data Binding for Java. For example, the corresponding Java class can be generated from an XML schema and vice versa.
  • Apache XMLBeans Java Data Binding Framework, can already be used with Java 1.4.2
  • Xerces : A validating XML parser for C ++, Java and Perl for a wide variety of platforms.
  • ElementTree iterparse : A parser API for Python that iterates over subtrees. It combines the storage efficiency of a pull parser with the simplicity of a DOM parser.
  • VTD-XML : Example of a non-extracting XML-API.
  • MSXML : Microsoft XML Core Services, the Microsoft XML software library for XML support over DOM, SAX, XSLT, XML Schemas, and other XML-related technologies
  • Pugixml : A DOM XML parser for C ++ whose development placed particular emphasis on efficient code.

There are special programs, so-called XML editors, for creating XML documents . There are also special programs, so-called XML databases, for storing and managing XML documents .

Transformation and representation of XML documents

An XML document can be transformed into another document using suitable transformation languages ​​such as XSLT or DSSSL . The transformation is often used to transfer a document from one XML language to another XML language, for example to transform it to XHTML, in order to display the document in a web browser.

Schema languages

So-called schema languages ​​are used to describe the structure of XML languages .

XML schema / XSD

XML schema (or XSD for XML schema definition) is the modern way of describing the structure of XML documents. XML Schema also offers the possibility to restrict the content of elements and attributes, e.g. B. on numbers, dates or texts, e.g. B. using regular expressions . A schema is itself an XML document, which allows more complex (also content-related) relationships to be described than is possible with a formal DTD.

More schema languages

Further schema languages are Document Structure Description , RELAX NG and Schematron .

XML family

Infrastructure

In connection with XML, the W3 consortium has defined many languages ​​on the basis of XML, which offer XML expressions for frequently required general functions, such as the linking of XML documents. Numerous XML languages ​​use these basic building blocks.

  • Transformation of XML documents: XSLT , STX
  • Addressing parts of an XML tree: XPath
  • Linking of XML resources: XPointer , XLink and XInclude
  • Selection of data from an XML data record: XQuery
  • Data manipulation in an XML data record: XUpdate
  • Drafting of electronic forms: XForms
  • Definition of XML data structures: XML Schema (= XSD, XML Schema Definition Language), DTD and RELAX NG
  • Signature and encryption of XML nodes: XML signature and XML encryption
  • Statements on the formal information content : XML Infoset
  • Formatted representation of XML data: XSL-FO
  • Definition of the method or function call by distributed systems: XML-RPC
  • Standardized attributes: XML Base and ID (DTD)
  • XML-based declarative programming language: MXML

languages

Today, many formal languages ​​use the syntax of XML. XML is an essential instrument for creating an open information landscape ( semantic web ) that is understandable for humans and machines - as intended by the W3C .

The well-known document language HTML was also integrated into this concept as "Extensible HyperText Markup Language" ( XHTML ) following version 4.01, so that it is now based on XML as the basis for definition. A common reason for using XML is the large number of parsers and the simple syntax: the definition of SGML comprises 500 pages, that of XML only 26.

The following lists represent some of these XML languages.

text

  • XSL-FO (text formatting)
  • DocBook
  • DITA
  • XHTML (XML-compliant HTML)
  • TEI (Text Encoding Initiative)
  • NITF (News Industry Text Format)
  • OPML (Outline Processor Markup Language)
  • OSIS (Open Scripture Information Standard)

graphic

  • SVG (vector graphics)
  • X3D (3D modeling language)
  • Collada (exchange format for data between different 3D programs)

Geospatial data

multimedia

  • MEI (Music Encoding Initiative)
  • MusicXML (sheet music data, recorded music)
  • SMIL (time-synchronized, multimedia content)
  • MPEG-7 (MPEG-7 metadata)
  • Laszlo (LZX)

safety

Engineering

  • AutomationML , a format for storing system planning data
  • CAEX , a format for storing hierarchical object information
  • GSDML, a format for describing automation devices that can communicate with Profinet
  • IODD , a format for describing sensors and actuators
  • PLMXML, a format for describing product data as part of the Siemens PLM software
  • LandXML, a format for storing georeferenced objects
  • RTML (Remote Telescope Markup Language), a format for describing astronomical observation requests

Further

In addition, there are XML languages ​​for web services (e.g. SOAP , WSDL and WS- * ), for the integration of Java code in XML documents ( XSP ), for the synchronization of calendar data SyncML , mathematical formulas ( MathML ), Representation of graphs ( GraphML ), procedures in the field of the semantic web ( RDF , OWL , Topic Maps , UOML ), service provisioning ( SPML ), the exchange of messages ( XMPP ) or financial reports such as annual financial statements ( XBRL ), in areas of Automotive industry ( ODX , MSRSW , AUTOSAR templates, QDX , JADM , OTX ), automated test e.g. B. from circuits ( ATML ) to systems biology ( SBML ) and agriculture ( AgroXML ) to publishing ( ONIX ) or chemistry (CIDX) and many more.

A summary of XML languages ​​for Office applications can be found in the OpenDocument exchange format ( OASIS Open Document Format for Office Applications ).

Alternative formats

  • S expressions (Lisp syntax for lists)
  • JSON (JavaScript Object Notation)
  • YAML (YAML Ain't Markup Language)

Trivia

Linus Torvalds described XML as unsuitable as a markup language (Comment No. 19):

“XML is crap. Really. There are no excuses. XML is nasty to parse for humans, and it's a disaster to parse even for computers. There's just no reason for that horrible crap to exist. "

- Linus Torvalds, 2014

literature

  • Charles F. Goldfarb, Paul Prescod: XML Handbook . Market and Technology, Munich [u. a.] 1999, ISBN 3-8272-9575-0 .
  • Wiebke Möhr, Ingrid Schmidt: SGML and XML: Applications and Perspectives . Springer-Verlag, Berlin / Heidelberg / New York [u. a.] 1999, ISBN 3-540-65543-3 .
  • Robert Eckstein: XML - short & good . O'Reilly Verlag, Cambridge / Cologne [u. a.] 2000, ISBN 3-89721-219-6 .
  • Henning Lobin: Information modeling in XML and SGML . Springer, Berlin 2000, ISBN 3-540-65356-2 .
  • Michael Seeboerger-Weichselbaum: The beginners seminar XML . 2nd, revised edition. BHV Software, Kaarst 2000, ISBN 3-8287-1018-2 .
  • Elliotte Rusty Harold: The XML Bible . 2nd updated edition. mitp, Bonn 2002, ISBN 3-8266-0821-6 .
  • Stefan Mintert: XML & Co. The W3C specifications for document and data architecture . Addison-Wesley, Munich 2002, ISBN 3-8273-1844-0 .
  • Christine Kränzler: XML / XSL -… for professional beginners. for book and web . Markt + Technik, Munich 2002, ISBN 3-8272-6339-5 .
  • Frank Bitzer: XML in the company. Briefing for IT management . Galileo Press, Bonn 2002, ISBN 3-89842-288-7 .
  • Erik T. Ray: Introduction to XML . O'Reilly, 2004, ISBN 3-89721-286-2 .
  • Margit Becher: XML: DTD, XML-Schema, XPath, XQuery, XSLT, XSL-FO, SAX, DOM . W3L Verlag, Witten 2009, ISBN 978-3-937137-69-8 .
  • Marco Skulschus, Marcus Wiederstein: XML: Standards and Technologies . Comelio Medien, Berlin 2009, ISBN 978-3-939701-21-7 .
  • Helmut Vonhoegen: Getting started with XML. Current standards: XML Schema, XSL, XLink . 8th edition. Rheinwerk, 2015, ISBN 978-3-8362-3798-7 .

Web links

Commons : XML  - collection of images, videos and audio files
Wikibooks: Website development: XML  - learning and teaching materials

Individual evidence

  1. Extensible Markup Language (XML) 1.0. w3.org, February 10, 1998, archived from the original on June 15, 2006 ; accessed on February 12, 2017 (English).
  2. Extensible Markup Language (XML) 1.0 (Fifth Edition). w3.org, November 26, 2008, accessed February 12, 2017 .
  3. Characters. In: Extensible Markup Language (XML) 1.0 (Fifth Edition). November 26, 2008, accessed March 9, 2019 .
  4. plm.automation.siemens.com
  5. Remote Telescope Markup Language (RTML), bibcode : 2006AN .... 327..751H
  6. Commentary in discussion about XML as a markup language (March 6, 2014) . ( plus.google.com [accessed April 10, 2017]).