PDF / A

from Wikipedia, the free encyclopedia

PDF / A is a file format for long-term archiving of digital documents that has been standardized by the International Organization for Standardization (ISO) as a subset of the Portable Document Format (PDF). The standard specifies how the elements of the underlying PDF versions must be used with regard to long-term archiving. There are both mandatory and non-approved components.

PDF / A-1

The standard was published in 2005 as ISO 19005-1: 2005, Document management - Electronic document file format for long-term preservation - Part 1: Use of PDF 1.4 (PDF / A-1) and specifies two levels of conformity:

  • PDF / A-1b - Level B (Basic) conformance: clear visual reproducibility
  • PDF / A-1a - Level A (Accessible) conformance: both clear visual reproducibility and the ability to reproduce text according to Unicode and structuring of the content of the document so that it can be read out by a screen reader in the interests of accessibility .
  • PDF / A-1 is based on PDF 1.4.
  • Features of PDF / A-1b:
    • References to resources that are not contained in the file itself are not permitted, which means in particular that all images and fonts used (the limitation to the characters used is permitted) must be contained in the file.
    • Colors must be sufficiently defined to ensure a clear color representation. Either source profiles or an “output intent” (description of the typical type of output using an ICC profile , such as sRGB for screen-oriented documents) are used here.
    • Transparent elements are not allowed.
    • The use of JavaScript or actions are not permitted, as their execution could change or influence the content or the presentation of the PDF. Audio or video data must not be embedded.
    • Encryption and thus partial blocking of functions of the file such as printing and copying out data are prohibited.
    • The use of components protected by patents, in particular compression using the Lempel-Ziv-Welch algorithm (LZW), is prohibited. Although it can be assumed that the LZW compression algorithm is no longer encumbered with a patent, the ISO has nevertheless ruled out the use of LZW as a precaution - not least because ZIP is an equal compression that is free of patents.
    • The embedding of digital signatures is supported.
    • The file must be marked in the metadata in XMP format as PDF / A-1-compliant.
    • If further metadata is inserted, this must also be done in XMP format as an XMP extension scheme.
  • Properties of PDF / A-1a, in addition to the properties of PDF / A-1b:
    • It must be possible to map all text according to Unicode.
    • The content structure of the PDF / A file must be indicated by means of tagged PDF .

PDF / A-2

The standard was published on June 20, 2011 as ISO 19005-2: 2011, Document management - Electronic document file format for long-term preservation - Part 2: Use of ISO 32000-1 (PDF / A-2) and defines three levels of conformity :

  • PDF / A-2b: - Level B (Basic) conformance: Minimum requirement for a PDF / A-2 file, guarantees the correct appearance of the document for long-term archiving.
  • PDF / A-2u: - Level U (Unicode text semantics) conformance: like 2b, plus: the entire text is mapped in Unicode so that the entire text can be indexed and displayed.
  • PDF / A-2a: - Level A (Accessible) conformance: fully realizes all requirements of ISO 19005-2, in particular all structural and semantic properties
  • Major enhancements compared to PDF / A-1:
    • Based on PDF 1.7 (ISO 32000-1)
    • JPEG 2000 compression allowed
    • Transparent elements are allowed
    • Levels allowed
    • OpenType fonts can be embedded
    • Digital signatures in accordance with the PAdES (PDF Advanced Electronic Signatures, ETSI TS 102 778)
    • Container : PDF / A-1 files can be embedded in PDF / A-2 files

PDF / A-3

The specification of PDF / A-3 was published on October 17, 2012. The containers represent an essential extension compared to PDF / A-2: Any file types can be embedded in PDF / A-3. In this way, for example, the original data with which it was created can be attached to a PDF / A-3 document. The standard does not regulate the archivability of embedded files that are not themselves PDF / A-compliant.

PDF / A-3 opens up possibilities for electronic invoices , since with this standard both the machine-readable data in XML format and the archivable PDF output of the invoice can be stored in one file . In June 2014, the ZUGFeRD standard was published, which is based on PDF / A-3.

validity

PDF / A-1 remains in force. PDF / A-1-compliant files also meet the requirements of the corresponding PDF / A-2 conformity level. Where PDF / A-1 functions are sufficient, there is no compelling reason to switch to PDF / A-2.

Verification

A validation of valid PDF / A is possible using appropriate checking tools (see web links). However, these software tools often disagree about whether a produced file is a valid PDF / A. This often happens because the underlying norms are interpreted differently.

criticism

Visual inspection is required when converting documents to PDF / A, as this often causes errors in the visual representation. In a sample, 11 percent of the PDF / A-1b document generated contained visual artifacts. These reproducibility errors included problems with vector graphics (transparent objects), loss of links, loss of other document content (illegible characters, missing text, missing document part), updated fields (reflecting the time or folder of the conversion), and misspellings. Archives therefore usually do not convert to PDF / A themselves. Instead, some archives ask their users to provide a PDF / A document. Typical computer setups offer several methods of converting documents to PDF / A with different advantages and disadvantages.

See also

literature

  • ISO 19005-1: 2005 - Document management - Electronic document file format for long-term preservation - Part 1: Use of PDF 1.4 (PDF / A-1), Norm im Iso-Store
  • ISO 19005-2: 2011 - Document management - Electronic document file format for long-term preservation - Part 2: Use of ISO 32000-1 (PDF / A-2), Norm im Iso-Store
  • PDF / A compact - digital long-term archiving with PDF ( ISBN 978-3-9811648-0-0 )

Web links

Individual evidence

  1. ISO News: New ISO standard will ensure long life for PDF documents
  2. DIN Standards Committee Library and Documentation, NABD
  3. ^ Sustainability of Digital Formats - Planning for Library of Congress Collections
  4. The PDF / A archiving standards. PDFlib GmbH, accessed on January 12, 2019 .
  5. ^ Sustainability of Digital Formats - Planning for Library of Congress Collections
  6. pdfa.org: ISO has published PDF / A-3 ( memento of the original from July 14, 2014 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.pdfa.org
  7. pdfa.org: PDF / A including standard parts 1 to 3 ( Memento of the original from May 12, 2014 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.pdfa.org
  8. Computerwoche online: Electronic invoices on the advance
  9. a b Roland Erwin Suri, Mohamed El-Saad: Lost in migration: document quality for batch conversion to PDF / A . In: Library Hi Tech . June 6, 2018, ISSN  0737-8831 , p. LHT – 10–2017–0220 , doi : 10.1108 / LHT-10-2017-0220 ( emerald.com [accessed April 23, 2020]).
  10. Roland Suri: How do I create a PDF / A file? February 15, 2017, accessed April 23, 2020 (American English).