BagIt

from Wikipedia, the free encyclopedia

The BagIt File Format defines a platform-independent, hierarchical directory structure and is used in the area of ​​storage and transmission of digital content. A directory that is structured according to this format is called a bag. The format was developed at the California Digital Library and the Library of Congress and is currently available in version 1.0 as an IETF standard. The format is also increasingly being used in Germany and is used, for example, in the digital archive of the state of North Rhine-Westphalia , in a project at the German Literature Archive in Marbach , and at the Saxon State Library - Dresden State and University Library (SLUB).

specification

A bag must consist of the payload directory “data” and the metadata files “bagit.txt” and “manifest- <alg> .txt”. The content to be backed up must be stored in the "data" directory. BagIt refers to metadata files as "tags". The tag file “bagit.txt” always contains two lines, the first of which names the BagIt version, the second line the coding of the tag files, which must always be UTF-8 . In the file "manifest- <alg> .txt" all files that are in the payload directory are listed together with a checksum . The name of the manifest file must contain the algorithm with which the checksums were formed.

The following example shows a bag whose payload directory contains a JPG image file. The checksum was created using the MD5 algorithm.

bag/
|
|-- data
|   \-- nyancat.jpg
|
|-- manifest-md5.txt
|    +-------------------------------------------------+
|    |51afb385ha019f34b671a3f0a615fae1 data/nyancat.jpg|
|    +-------------------------------------------------+
\-- bagit.txt
     +-------------------------------------------------+
     |BagIt-version: 0.97                              |
     |Tag-File-Character-Encoding: UTF-8               |
     +-------------------------------------------------+

In addition to the two mandatory metadata files, the draft names other optional tag files and defines their content. It is also possible to define your own tag files. The draft also describes (up to and including version 14) the serialization of a bag. This enables the creation of archive files with tar or zip . In newer versions of the draft, serialization is no longer part of the specification, but still technically possible.

Implementations

A bag can be created with the means that almost every operating system provides and checked for data integrity by comparing checksums. In addition to such a manual procedure, there are implementations that optimize the process.

Individual evidence

  1. ^ Library Develops Specification for Transferring Digital Content ( en ) Library of Congress . June 2, 2008. Retrieved March 19, 2014.
  2. ^ J. Kunze, J. Littman, E. Madden, J. Scancella, C. Adams: The BagIt File Packaging Format (V1.0) . 2018, ISSN  2070-1721 ( rfc-editor.org [accessed May 4, 2020]).
  3. Sebastian Cuy, Martin Fischer, Daniel de Oliveira, Jens Peters, Johanna Puhl, Lisa Rau, Manfred Thaller: DA-NRW: A distributed architecture for digital long-term archiving (PDF; 275 kB) Archived from the original on July 25, 2014. Retrieved on September 23, 2018.
  4. Steffen Fritz : The application of the BagIt format in the German Literature Archive Marbach . In: BITonline . No. 2 , 2014, p. 102–106 ( full text online as PDF).
  5. SIP specification for automatic ingest SLUBArchiv SLUB Dresden Version 2.0, 2020-03-31. In: Technical standards for the delivery of digital documents. SLUB Dresden, March 31, 2020, accessed on May 4, 2020 .