Analyzed Layout and Text Object

from Wikipedia, the free encyclopedia

ALTO ( A nalyzed L ayout and T ext O bject ) is an open XML schema for describing layout information digitized objects.

The standard was originally developed for the description of OCR recognition results, text and layout at the page level of digitized materials. The aim was to describe the text and the layout in such a way that a reconstruction based on digitized material would be possible.

ALTO is often used in combination with Metadata Encoding and Transmission Standard (METS) for the description of the entire digitized object and the generation of references within the ALTO file, e.g. B. to determine the reading sequence.

ALTO was developed in the EU-funded METAe project. Since 2010, the standard has been maintained by the Library of Congress and an editorial team.

Due to the recommendation in a DFG guideline, ALTO is a de facto standard for text digitization projects in Germany and is supported by the DFG Viewer , for example .

Versions

The latest schema version as well as an overview of the older versions can be found on GitHub .

Structure of an ALTO file

An ALTO file consists of three main sections, i.e. children of the root element <alto>:

  • The <Description> section contains metadata on the ALTO file itself and process information on how the file was created.
  • <Styles> contains the text and layout information in the respective individual form:
    • <TextStyle> describes font and font types
    • <ParagraphStyle> describes properties of a paragraph, e.g. B. its orientation
  • The <Layout> section contains the actual content, which is subdivided by <Page> elements for individual pages.
    <?xml version="1.0"?>
    <alto>
      <Description>
        <MeasurementUnit/>
        <sourceImageInformation/>
        <Processing/>
      </Description>
      <Styles>
        <TextStyle/>
        <ParagraphStyle/>
      </Styles>
      <Layout>
        <Page>
          <TopMargin/>
          <LeftMargin/>
          <RightMargin/>
          <BottomMargin/>
          <PrintSpace/>
        </Page>
      </Layout>
    </alto>

Supporting software

See also

Web links

Individual evidence

  1. DFG practical rules "Digitization" . S. 37 ( dfg.de [PDF]).
  2. https://github.com/altoxml
  3. ^ Structure of ALTO Files