Analyzed Layout and Text Object
ALTO ( A nalyzed L ayout and T ext O bject ) is an open XML schema for describing layout information digitized objects.
The standard was originally developed for the description of OCR recognition results, text and layout at the page level of digitized materials. The aim was to describe the text and the layout in such a way that a reconstruction based on digitized material would be possible.
ALTO is often used in combination with Metadata Encoding and Transmission Standard (METS) for the description of the entire digitized object and the generation of references within the ALTO file, e.g. B. to determine the reading sequence.
ALTO was developed in the EU-funded METAe project. Since 2010, the standard has been maintained by the Library of Congress and an editorial team.
Due to the recommendation in a DFG guideline, ALTO is a de facto standard for text digitization projects in Germany and is supported by the DFG Viewer , for example .
Versions
The latest schema version as well as an overview of the older versions can be found on GitHub .
Structure of an ALTO file
An ALTO file consists of three main sections, i.e. children of the root element <alto>
:
- The <Description> section contains metadata on the ALTO file itself and process information on how the file was created.
- <Styles> contains the text and layout information in the respective individual form:
- <TextStyle> describes font and font types
- <ParagraphStyle> describes properties of a paragraph, e.g. B. its orientation
- The <Layout> section contains the actual content, which is subdivided by <Page> elements for individual pages.
<?xml version="1.0"?>
<alto>
<Description>
<MeasurementUnit/>
<sourceImageInformation/>
<Processing/>
</Description>
<Styles>
<TextStyle/>
<ParagraphStyle/>
</Styles>
<Layout>
<Page>
<TopMargin/>
<LeftMargin/>
<RightMargin/>
<BottomMargin/>
<PrintSpace/>
</Page>
</Layout>
</alto>
Supporting software
See also
- Metadata Encoding and Transmission Standard (METS)
- Dublin Core , an ISO metadata standard
- Preservation Metadata: Implementation Strategies (PREMIS)
- Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
- hOCR
Web links
- ALTO (Analyzed Layout and Text Object) standards on the Library of Congress website
- altoxml.github.io or github.com ALTOxml on GitHub
- More information on METS / ALTO from CCS GmbH
- An introduction to METS ALTO from CCS GmbH
Individual evidence
- ↑ DFG practical rules "Digitization" . S. 37 ( dfg.de [PDF]).
- ↑ https://github.com/altoxml
- ^ Structure of ALTO Files