XML processor

An XML processor is software for reading and processing XML documents. The term XML parser is also often used synonymously, although this is more of a module in the "XML processor" software component that carries out the reading.

General

An XML processor essentially has three components:

the parser - it forms the front end to the documents to be processed
the processing component - it implements the actual business logic in terms of a model transformation
the output processor - it ensures the persistence of the target documents in the appropriate format

Actually, only the parser and output processor are XML-specific. The processing component can actually process any models, but also has XML-specific characteristics, for example

by access to the so-called post-schema validation information set PSVI allowed
by conceptualizing XML constructs in the processing language (node, element, entity ...)

XML parser

XML parsers in the processors can be differentiated based on two criteria:

validating or non-validating
Type of interface for accessing the document (as a tree similar to DOM or sequentially, for example SAX )

Basically, parsers can also be used that read in other formats or even query databases. This is useful for migrating legacy data to XML.

Non-validating parsers only check whether the document is well-formed, i.e. whether it meets the specifications of the W3C. Validating processors, on the other hand, also check conformity with a DTD or a schema language, such as XML schema or RELAX NG .

Processing component

The processing component usually implements its own programming language optimized for the processing paradigm (for example DSSSL , XSLT ). A distinction must be made here:

sequential processing - rules for entering / exiting a node can be specified. The specific processing is formulated in these rules. The content of the document is only available to the extent that it has been read or processed so far.
Tree-oriented processing - the processing component automatically traverses the document tree (s) and constructs the tree for the output document. The traversal can take place on the source tree (for example with XSLT) or in the target tree (for example with MetaMorphosis ). One speaks therefore of “sourced-driven” or “target-driven” processors. Target-driven processors are not that easy to penetrate, but they offer much more flexibility.

However, it is particularly advantageous in this approach to formulate the processing (transformation) largely independently of the specific syntax of the output format. The specifics of the desired output format (for example, line division, indentation, and so on) can be handled in the output processor. This sometimes makes it possible to handle several output formats with one transformation.

In essence, XML processors are similar (albeit defined before the MDA hype) to MDD , in which a formally described model transformation also takes place and the model is read in or serialized in its own processors . Therefore a language relationship between XML processors and model transformers can be recognized. The role of the meta models is performed by the DTD or the XML schema. The model persistence takes place in XML.

Output component

The output component serializes the document tree provided by the processing component into XML or into another desired text format (for example TeX ). The output processor can take over XML-specifics such as handling of special characters, handling spaces and so on. With powerful XML processors, this output component can be flexibly configured or programmed.

Implementations

literature

Meike Klettke, Holger Meyer: XML & Databases Concepts, Languages and Systems. 2003, ISBN 3-89864-148-1 .