Simple API for XML

from Wikipedia, the free encyclopedia

The Simple API for XML ( SAX ) is a de facto standard , the one application programming interface (API) for parsing of XML describes data. The current major version, SAX 2.0, was published by David Megginson in 2000 and is public domain . A SAX parser reads XML data as sequential data stream and calls for defined standard events specified callback functions (callback function) on. An application that uses SAX can register its own subroutines as callback functions and in this way evaluate the XML data.

How SAX works

SAX was originally developed in Java and consists of a number of Java interfaces , but today there are implementations for almost all common programming languages. SAX is not subject to any formal committee or consortium, which is rather untypical for an XML specification, but SAX is a de facto standard. It specifies a number of methods for accessing XML documents using a SAX parser. Unlike DOM , SAX works in an event-oriented manner. The processing principle corresponds to the concept of a pipeline. SAX defines a set of events that can occur when reading an XML document sequentially. These events are stateless, they do not refer to other, previous events and are otherwise in no relation to other events. When a syntactic structure is recognized, the SAX parser starts a handling routine which, if necessary, executes an individual handling routine for the event.

This means that the evaluation of the document can already begin with the reading in of the first characters. This shortens the subjectively perceived access time, especially in interactive systems. At the same time, the SAX parser minimizes the memory requirement, since in addition to the element that has been read in, the memory only contains data that has been explicitly selected by means of a handling routine.

SAX events are thrown out parallel to the reading of the document in the parser. In practice this means that there can be SAX events on ill-formed XML documents before the document is recognized as invalid. Error handling therefore plays an important role in SAX parsers; corresponding classes are available in Java. Validating an XML document before parsing it with SAX is contrary to the nature of SAX, as this would first require the entire document to be loaded into memory. Nevertheless, there are a large number of validating SAX parsers.

Events in SAX

The following document is given:

<?xml version="1.0"?>
<seminararbeit>
 <titel>DOM, SAX und SOAP</titel>
 <inhalt>
  <kapitel value="1">Einleitung</kapitel>
  <kapitel value="2">Hauptteil</kapitel>
  <kapitel value="3">Fazit</kapitel>
 </inhalt>
</seminararbeit>

If the shown XML document is read with the help of a SAX parser, this throws the following events in sequence

SAX event Explanation
startDocument()
Parser encountered beginning of XML document
startElement("seminararbeit",[])
an item with the name "seminararbeit"was found; the 2nd parameter (the square brackets "[]") is a list of all attributes belonging to the element; however, since this element has no attributes, this list is empty in this case.
characters("\n ")
a SAX parser also outputs all whitespaces that it finds between two element tags; in this case it is the line break (newLine, "\ n") followed by a space, which was used in the XML for better legibility for the line indentation.
startElement("titel",[])
characters("DOM, SAX und SOAP")
the content of the element "titel"
endElement("titel")
indicates that the end of the previously found element has been reached
characters("\n ")
startElement("inhalt",[])
characters("\n  ")
since the indentation in the XML is now 2 spaces, the event also outputs 2 spaces
startElement("kapitel", ["value="1""])
the 2nd parameter contains a list of all attributes; in this case the list contains only a single list element, namely "value = 1".
characters("Einleitung")
endElement("kapitel")
characters("\n  ")
startElement("kapitel", ["value="2""])
characters("Hauptteil")
endElement("kapitel")
characters("\n  ")
startElement("kapitel", ["value="3""])
characters("Fazit")
endElement("kapitel")
characters("\n ")
endElement("inhalt")
characters("\n")
since this line is no longer indented in the XML, only "\ n" (without following spaces) is output
endElement("seminararbeit")
endDocument()
Parser has reached the end of the XML document

Whenever an event occurs, the parser interrupts its work and waits for the document handler to return the work permit. In the meantime, the operator can start a treatment routine to evaluate the event. However, handling routines only have to be written for those events that are also of interest for further processing - otherwise control is immediately returned to the parser.

Working with SAX

Example in Java

In the following example in Java , the title and the number of chapters are to be read out as part of a document analysis and output at the end. To do this, the appropriate SAX classes must first be imported and the SAX parser initialized. In addition, a document handler must be started that is informed of the events by the parser. This document handler contains the following methods for the events "characters", "startElement" and "endDocument" for the example:

public int count = 0;
public boolean titel = false;
public String seminararbeit = "";

public void characters(char[] ch, int start, int length) throws SAXException {
 if (titel == true && seminararbeit.equals(""))  {
  seminararbeit = new String(ch, start, length);
  titel = false;
 }
}

public void startElement(String name, AttributeList atts) throws SAXException {
 if (name.equals("kapitel")) ++count;
 if (name.equals("titel")) titel = true;
}

public void endDocument() throws SAXException {
 System.out.println(seminararbeit + " enthält " + count + " Kapitel");
}

Example in Lua

In LuaExpat ( Lua with a SAX parser), so-called " callbacks " are first prepared for parsing XML files , with which the parser can transfer its data to the application program calling it. In the example these are simple calls to output functions. The following program outputs an XML file as plain text.

require "lxp"
xml_file="irgendeineXMLdatei.xml"

do
  local count = 0
  callbacks = {
    StartElement = function (parser, name, attributes)
      -- StartTag ausgeben--
      io.write("+ ", string.rep(" ", count*2), name, "\n")
      count = count + 1
      -- Attribute ausgeben --
      local attributename
      for _,attributename in ipairs(attributes) do
        io.write("  ", string.rep(" ", count*2),attributename,'="',attributes[attributename],'"\n')
      end
    end,
    EndElement = function (parser, name)
      -- Endtag ausgeben --
      count = count - 1
      io.write("- ", string.rep(" ", count*2), name, "\n")
    end,
    CharacterData = function (parser, text)
      -- Text ausgeben --
      io.write("----------\n",text, "\n----------\n")
    end
  }
end

p = lxp.new(callbacks)          -- Generiere eine Instanz des Parsers
file = assert(io.open(xml_file .. "","r"))
p:parse(file:read("*all"))      -- Parsen der gesamten Datei
                                -- (Auch zeilenweises Einlesen wäre möglich)
file:close();collectgarbage()

p:parse()         -- Beendet das Dokument
p:close()         -- Schließt den Parser

See also

literature

  • David Brownell: SAX2 . O'Reilly, ISBN 0-596-00237-8
  • Helmut Erlenkötter: XML Extensible Markup Language from the beginning . Rowohlt-Taschenbuch-Verlag, Reinbek near Hamburg 2003, ISBN 3-499-61209-7 , pp. 211-229.
  • W. Scott Means, Michael A. Bodie: The Book of SAX . No Starch Press, ISBN 1-886411-77-8

Web links

Individual evidence

  1. SAX Quickstart
  2. SAX API Documentation saxproject.org
  3. SAX Quickstart
  4. keplerproject.org