XProc

XProc (from English XML Processing ) is an XML language standardized by the W3C for the definition of processing chains for XML documents (so-called XML pipelines ). It has been a W3C recommendation since May 2010 and serves the increased need for mass processing of formats based on XML, such as docx .

Use

When processing XML documents, different steps typically follow one another. For example, when a user manual is published, the DocBook source document could first be validated against a RelaxNG schema and then converted into an HTML and a PDF version using XSLT . Such processing chains can be described as XML documents with XProc - regardless of the software used and platform-neutral. XProc processors can process the described processing chains on the basis of XProc documents.

This is also useful if one or more operations, such as renaming an XML element, are to be carried out on a large number of identical XML documents.

Building an XProc pipeline

The code of an XProc pipeline is described in XML syntax, which is then read in and processed by an interpreter . Based on the concept of a well-formed XML document , an XProc pipeline always has a root element. Within this root element, the document is assigned at least one of the three XProc namespaces . Central elements of the pipeline are the steps, which are enclosed by the root element, described and processed sequentially. A pipeline can read in 0 or more XML documents and output 0 or more XML documents.

Steps

Steps are central elements of an XML pipeline described by XProc. There are three types of steps:

Atomic Steps
These carry out exactly one processing or operation, such as renaming or deleting an element within the XML document.
Compound Step

Steps can also be combined, which is then referred to as a compound step. A pipeline, which is only based on a certain number of steps, is therefore integrated into another, which is also referred to as a subpipeline.

With the help of this step, more complex structures such as loops can be designed.
Multiple steps (multi-container steps)
With the help of these steps it is possible to create parallel defined subpipelines, whereby, among other things, constructs for error control can be described.

Ports

Inputs and outputs of the steps in an XProc pipeline are implemented using ports . Primary ports are used to automatically connect the individual steps with each other or between these and the pipeline (in the first or last step) and do not necessarily have to be named. An implicit specification of the primary ports is used when these are used automatically. Accordingly, the opposite case is an explicit naming, i. H. the primary port is specified. The ports have unique names, such as source as the primary input port or result as the primary output port. Another port would be schema for XML schema files.

Namespaces

XProc uses three namespaces internally . The namespace http://www.w3.org/ns/xproc (by convention with the prefix p:) describes the XML vocabulary of XProc. The namespace http://www.w3.org/ns/xproc-step (by convention with the prefix c :) is used for documents that are created within a processing chain as defined input or output of individual steps - regardless of the namespaces of the processed external documents. Finally, the namespace http://www.w3.org/ns/xproc-error (by convention with the prefix err :) is used to process errors.

example

<p:pipeline name="pipeline" xmlns:p="http://www.w3.org/ns/xproc" version="1.0">
  <p:input port="schemas" sequence="true"/>

  <p:xinclude name="included">
    <p:input port="source">
      <p:pipe step="pipeline" port="source"/>
    </p:input>
  </p:xinclude>

  <p:validate-with-xml-schema name="validated">
    <p:input port="source">
      <p:pipe step="included" port="result"/>
    </p:input>
    <p:input port="schema">
      <p:pipe step="pipeline" port="schemas"/>
    </p:input>
  </p:validate-with-xml-schema>
</p:pipeline>

This is a pipeline that consists of two parts or atomic steps, XInclude and Validate . The pipeline itself has two inputs, source(a source document) and schema(a list of W3C XML Schemas). The XInclude step reads the pipeline input sourceand produces a result document. The Validate step reads the pipeline input schemasand the result of the XInclude processing step and produces a results document. The result of the validation,, resultis the result of the processing chain.

The same pipeline can be formulated in abbreviated form if its primary ports are specified implicitly:

<p:pipeline name="pipeline" xmlns:p="http://www.w3.org/ns/xproc" version="1.0">
  <p:input port="schemas" sequence="true"/>

  <p:xinclude/>

  <p:validate-with-xml-schema>
    <p:input port="schema">
      <p:pipe step="pipeline" port="schemas"/>
    </p:input>
  </p:validate-with-xml-schema>
</p:pipeline>

Implementations

Calabash by Norman Walsh

MorganaXProc

EMC Documentum XProc Engine

QuiXProc from Innovimax

yax - an XProc (XML Pipeline) implementation . Still based on an XProc working draft

antillesXML free XML toolbox with GUI and built-in Calabash

Web links

XProc: An XML Pipeline Language - W3C Recommendation (English)
XProc introduction in German
XProc reference in German
XFront. XProc Tutorial (English)


recommendations	ActivityPub • ARIA • Canonical XML • CDF • CSS • DOM • Geolocation API • HTML • HTML5 • InkML • ITS • JSON-LD • MathML • OWL • P3P • PLS • PNG • RDF • RDF schema • RIF • SCXML • SISR • SKOS • SMIL • SOAP • SRGS • SSML • SVG • SPARQL • Timed Text • Turtle • VoiceXML • WSDL • XForms • XHTML • XHTML + RDFa • XInclude • XLink • XML • XML Base • XML Encryption • XML Events • XML Information Set • XML Namespace • XML Schema • XML Signature • XPath • XPointer • XProc • XQuery • XSL • XSL-FO • XSLT ( elements )
Working drafts and candidates	CCXML • CURIE • SMIL Timesheets • sXBL • WICD • XFrames • XBL • XHTML + MathML + SVG • XMLHttpRequest
Remarks	XAdES • XFDL • XHTML + SMIL • XUP
Guidelines	Web Content Accessibility Guidelines • Multimodal Interaction Activity • Markup Validation Service
Initiatives	Web Accessibility Initiative