Long-term archiving

from Wikipedia, the free encyclopedia

Under long-term archiving ( LTA ) is defined as the acquisition, the long-term storage and preservation of the permanent availability of information . Especially with the long-term archiving of digitally available information ( digital preservation ) new problems arise. For the preservation of digital resources, “long-term” does not mean issuing a guarantee for five or fifty years, but rather the responsible development of strategies that can cope with the constant change caused by the information market.

definition

A generally applicable definition of the term does not yet exist. Since archives always keep archives “for eternity” at first, the term long-term archive is also a pleonasm , and, according to Reinhard Altenhöner and Sabine Schrimpf's contribution, it suggests a static state. Both therefore advocate the term “long-term availability” (LZV).

Since many of the problems of digital long-term archiving only occur after about ten years, such as large version jumps in the software used, this value is used as a limit for considering long-term archiving. In addition, long-term archiving can be distinguished from data backup .

Problems

While physical objects have been kept and preserved in archives , museums and libraries for a long time, completely new problems arise with electronic publications . If data is stored in analog form, the data quality deteriorates with the degradation of the medium, which is why the focus is on maintaining the medium. Digitally stored data, on the other hand, can be reconstructed through suitable formatting in the event of small errors in the medium, whereby constant data quality can be guaranteed despite the deterioration of the medium. If these errors in the medium become too large, the data can no longer be completely reconstructed and are thus irretrievably lost ("digital forgetting"). Therefore, when it comes to long-term archiving of digital data, the focus is no longer on preserving the medium, but on copying it in good time before data is lost. Since the media (e.g. magnetic tape and DVD), formats and read / write devices for digital storage change rapidly over time, regular testing and continuity across the changes requires constant attention and long-term planning. Proprietary formats and copyright restrictions , among other things, cause problems when transferring to new systems .

Shelf life of the carrier media

For example, while old parchment and paper can be kept for hundreds of years if stored properly, this does not apply to new storage media . Most of the publications from the first half of the 20th century are printed on paper that is degraded by acid corrosion. In older printed works and manuscripts, other problems arise: was iron gall ink used in the manufacture, the ink ingredients can be a by unbalanced mixtures ink corrosion use. This occurs when the ink is a gallic acid excess or vitriol surplus prevails. The cellulose is attacked in a similar way to acid corrosion, and the paper can break due to different and changing moisture levels along the letter lines.

Analog films, photos and magnetic tapes also have a limited shelf life. The service life of digital storage media such as floppy disks, hard drives and burned CDs / DVDs is even shorter. Digital data carriers lose their media-specific structured data either due to environmental influences (for example due to sufficiently strong magnetic fields in the vicinity of floppy disks and magnetic tapes), or a data structure is changed so much by chemical or physical influences that no more data can be stored in it. or data that has already been written can no longer be read at all (for example if CD-ROMs have been exposed to UV radiation for a sufficiently long time). Often the data readability only fails because the appropriate reading devices and programs for making it readable are no longer available at a later point in time, or that older data formatting standards can no longer be interpreted, or that the technical interfaces of very old data reading devices are no longer supported. In order to avoid the aforementioned problems, it can make sense to (re) convert certain selected, electronically stored data into the non-electronic form (back) and to permanently chisel important data in stone as a modern equivalent of the cultural habit of our ancestors - to engrave an almost indestructible nickel plate with an ion beam.

Another method of permanently storing images and texts in analog legible form is to burn them onto stoneware slabs using ceramic pigments . The Memory of Mankind (MOM) project stores images of museum cultural assets as well as everyday cultural products on stoneware slabs and stores them in chambers in Hallstatt's salt mountain . The theoretical durability is given as hundreds of thousands of years. The durability of a ceramic data carrier has been proven for at least 5000 years ( cuneiform tablets ).

Service life of some data carriers at 20 ° C and 50% rel. Humidity
medium Expected life Recording density (kbit / kg)
Ceramic panels 5000 years (secured), probably several 10,000 years
Stoneware panels with fired ceramic color printing several 100,000 years if protected against erosion (assumed)
Stone tablets and stone paintings several 1,000 years (secured) 1 × 10 −3 - 1
Nickel plate several 1,000 years (presumed)
Books and manuscripts made from acid-free paper and
with acid-free and non-ferrous ink
several 100 years (secured) 3 × 10 3 - 3 × 10 4
Books and manuscripts made from acidic paper
(especially printed works from the 19th and early 20th centuries)
70 - 100 years
Newsprint analogous to acidic / -free letterpress paper
Films on celluloid (cellulose nitrate) more than 100 years (secured), probably up to 400 years
Films on cellulose triacetate 44 years (secured)
Films on polyethylene terephthalate (PET) Color film up to 150 years (presumed)

Black and white film up to 700 years (presumed)

Optical storage media
(burned) b
  • CD-R : 5 - 10 years
  • CD-RW : unclear, less than suspected DVD-RAM
  • DVD-ROM : unclear, less than suspected DVD-RAM
  • DVD ± R : unclear, less than suspected DVD-RAM
  • DVD ± RW : unclear, less than suspected DVD-RAM
  • DVD-RAM : 30 years (presumed)
  • DVD-R with 24k gold reflective layer: advertised for up to 100 years
  • M-Disc (modified DVD, BD, or BDXL): according to the manufacturer, up to 1,000 years
  • BD-R : up to 50 years (according to laboratory tests)
  • CD: 4 × 10 8
  • DVD: 2 - 4 × 10 9
  • BD: 2-4 × 10 10
Optical storage media (pressed)
  • CD : under ideal conditions estimated at 50 - 80 years a
  • DVD : min. 100 years (presumed)
  • BD : 82 - 85 years (presumed)
  • GlassMasterDisc (data engraved in glass): more than 1,000 years (manufacturer information)
Diskettes as archive media (stored) 10 - 30 years (depending on data density?)
  • 5.25 "HD disk: 4.80 × 10 5
  • 3.5 "DD disk: 2.80 × 10 5
  • 3.5 "HD disk: 5.76 × 10 5
Hard disk drives on the fly 2 - 10 years, depending on the daily operating time, an average of 5 years
Hard disk drives as archive media (stored) 10-30 years
USB stick 30 years
SD card 30 years
Magnetic tapes > 30 years (secured)
Magneto Optical Disk (MO disk) 30 - 50 years
Iomega REV removable drive up to 30 years (presumed)
a At the end of the 1980s, fungus-prone or oxygen-permeable plastic or aggressive ink was sometimes used for printing, which reduced data stability.
b Due to the fact that double-layer DVD ± Rs could cause system-related reading problems, single-layer (4.7 GB) DVD ± Rs are recommended.

Rapid media and system change

In the case of digitally stored information in particular, there is the additional problem that the data is no longer accessible although the medium itself has been preserved.

Readability of the storage medium

In order to be able to access stored information, the respective carrier medium must be readable. With some media such as stone tablets or books, this can also be possible for a person without aids. In the case of digitally stored media, an appropriate reading device, often a drive, is usually necessary. If reading devices are no longer available, for example due to technological change, the data can no longer be read out, or only with difficulty. One example is outdated tape formats .

Outdated data formats

Even if the storage medium has been preserved and it is still readable, it may be impossible to access the stored data. Since digitally stored data is not directly accessible, but is digitally coded and structured in a media-specific manner, it is only possible to read this data if a program and an operating system are available that “understand” the content of a file. Since many operating systems and programs use their own (proprietary) process to encode the data, data readability can no longer be guaranteed as soon as an operating system or program is not continuously maintained. This problem is exacerbated by the policy of many software manufacturers to publish new program versions with changed data storage formats, which older data storage formats of the same program can no longer fully use.

Other restrictions

Proprietary systems and copyright restrictions make it difficult to copy and migrate data, which is necessary for long-term archiving, because the necessary steps are not known or permitted. The introduction of digital rights management (DRM) in particular will exacerbate the problem in the future. Such a set of rules for digital data or documents is necessary because, just as with conventional data, copyright issues must be clarified before they can be archived. The difference between conventional data and electronic documents results from the fact that in the latter case, the copy and the original are practically indistinguishable. When migrating documents in particular, it is necessary to make copies and, if necessary, to change original documents. Therefore, the consent of the author for such measures must be obtained in advance. Further copies that are handed out to readers of documents are to be appropriately remunerated and, if necessary, must be linked with blocking notices if forwarding free of charge is not permitted.

Finding information

It is not enough to just copy original data: it must be possible to find it again on the new medium. Therefore, certain additional data on the structure and content of the original data, so-called metadata , must be entered in catalogs, databases or other finding aids in order to be available for later data readout or search.

Data consistency

An often overlooked problem with long-term archiving as well as with short-term archiving is checking that the data is free of errors. Data can be modified on purpose, but it can also be changed unnoticed by system errors.

A way out could be the distributed storage at different locations in different organizations and the protection with distributed stored cryptographic checksums . This will u. a. practiced with the open source solution LOCKSS . In Germany there is also a German project ( LuKII ) that meets this requirement.

Procedure

Basically, methods of migration / conversion and emulation can be distinguished in electronic archiving .

Due to the use of open standards such as graphic formats ( TIFF , PNG , JFIF ) or free document formats ( XML , PDF / A , OpenDocument ), which are considered to be relatively long-lived and whose structure is publicly known, the cycles after which saved Data needs to be reformatted longer. The probability that there will still be systems and programs that can read such data in a few years' time is therefore significantly higher.

To prevent the loss of data due to aging of data carriers, the data must be regularly copied to new data carriers within the guaranteed data security period of a medium. This means that it is also possible to switch to a new carrier format as soon as the one previously used has become obsolete due to technical developments .

However, the high costs that arise from this maintenance of the data stocks mean that only the most important data can be preserved in this way. Today's flood of data and metadata, which is created not least by the steadily increasing use of digital data processing systems, further exacerbates the problem of the best possible classification of storage-relevant data volumes. The proportion of long-term stored data will necessarily be relatively small, which places high technical and other specialist requirements on the selection of the information to be backed up in terms of data technology. An additional problem arises from the drifting apart of the relationship between data volume and data bandwidth. The volume grows significantly faster than the bandwidth available to transfer data from one medium to another.

This doesn't just affect government and commercial data. In the private sector, too, conventional media, which can often be stored for a long time, are being replaced by more manageable digital media (photographs and negatives by digital images on a CD-ROM).

The deposit copy libraries and archives are responsible for long-term archiving in Germany .

See also

literature

Web links

Wikibooks: Long-term archiving  - learning and teaching materials

Individual evidence

  1. ^ Ute Schwens, Hans Liegmann: Long-term archiving of digital resources . In: Rainer Kuhlen, Thomas Seeger, Dietmar Strauch (eds.): Basics of practical information and documentation. 5th, completely revised edition. Munich: Saur, 2004, p. 567.
  2. Reinhard Altenhöner, Sabine Schrimpf: Preservation and long-term availability of digital resources: strategy, organization and techniques . In: Rolf Griebel, Hildegard Schäffler and Konstanze Söllner (eds.): Praxishandbuch library management . De Gruyter Saur, Berlin 2014, ISBN 978-3-11-030293-6 , pp. 850-872 .
  3. ^ Lothar Schmitz, Uwe M Borghoff, Peter Rödig, Jan Scheffczyk: Long-term archiving . In: Computer Science Spectrum . tape 28 , no. 6 , December 1, 2005, ISSN  1432-122X , p. 489 , doi : 10.1007 / s00287-005-0039-7 .
  4. Archive DVDs in the long-term test -c't-Archiv, 16/2008, page 116. In: heise.de. August 16, 2011, archived from the original on July 23, 2008 ; accessed on February 20, 2015 .
  5. mp: A uniform standard for the flood of digital data. March 10, 2008, accessed October 27, 2012 .
  6. ^ A b c Michael W. Gilbert: Digital Media Life Expectancy and Care. University of Massachusetts Amherst, 1998, archived from the original on December 22, 2003 ; accessed on January 4, 2011 .
  7. Bit Rot. Software Preservation Society, May 7, 2009, accessed January 4, 2011 .
  8. Google study on the cause of failure of hard drives. In: heise.de. February 16, 2007, accessed February 20, 2015 .
  9. Google study on the durability of hard drives in continuous operation ( Memento from February 13, 2009 in the Internet Archive ) (PDF; 247 kB): Section 3.1, Figure 2 (English)
  10. UNC: Hard disks & flash memory: fun with risk potential. (No longer available online.) In: speicherguide.de. June 29, 2006, archived from the original on September 24, 2015 ; accessed on September 17, 2015 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. @1@ 2Template: Webachiv / IABot / www.speicherguide.de
  11. ↑ Shelf life of storage media: where data is right. In: netzwelt.de. April 22, 2007, accessed February 20, 2015 .
  12. a b Andreas Hitzig: Hard disk, SSD, USB & Co. - What is the best memory? In: PC World Online. March 20, 2017. Retrieved November 22, 2019 .
  13. Henrik Stamm: MO technology. Institute for Computer Science at Humboldt University Berlin, May 26, 2001, accessed on September 17, 2015 .
  14. Hartmut Gieselmann: DVDs in the long-term test - c't. In: heise.de. July 21, 2008, accessed February 20, 2015 .
  15. Uwe M. Borghoff u. a .: Long-term archiving . Methods for preserving digital documents. dpunkt.-Verl., Heidelberg 2003, p. 21.
  16. Frank Dickmann: Long-term archiving of research data: How do you deal with peta and exabytes? In: Deutsches Ärzteblatt Supplement: Practice . tape 108 , no. 41 , 2011, p. 6–8 ( uni-goettingen.de [accessed March 24, 2020]).
  17. Heike Neuroth, Stefan Strathmann, Achim Oßwald, Regine Scheffel, Jens Klump, Jens Ludwig (eds.): Long-term archiving of research data. An inventory . Verlag Werner Hülsbusch, Universitätsverlag Göttingen, Boizenburg 2012, ISBN 978-3-86488-008-7 , p. 16 , urn : nbn: de: hbz: 79pbc-opus-4204 ( th-koeln.de [PDF]).
  18. Asko Lehmuskallio, Edgar Gómez Cruz: Why material visual practices? In: Digital Photography and Everyday Life: Empirical Studies on Material Visual Practices . Routledge, 2016, ISBN 978-1-317-44778-8 , pp. 1 .
  19. Natascha Schumann: Introduction to digital long-term archiving . Scivero Verl., 2012, ISBN 978-3-944417-00-4 , pp. 46 ( ssoar.info [accessed March 24, 2020]).