Electronic archiving

from Wikipedia, the free encyclopedia

Electronic archiving generally stands for the storage of information in electronic form. A special case of electronic archiving is the revision-proof archiving of documents relevant to commercial and tax law, for which special requirements apply, in particular the immutability and long-term availability in accordance with the applicable retention periods .

Special archiving systems are usually used for electronic archiving. The term electronic archiving encompasses different components of an enterprise content management system , which in Anglo-American usage are referred to separately as " records management ", " storage " and " preservation ". The scientific term of an archive in the sense of long-term archiving is not identical in content with the term used by the document management industry.

The term electronic archiving is used very differently. While companies today already see retention periods of ten years for data and documents relevant to commercial and tax law as very difficult to implement, historical archives speak of a secure, orderly and accessible storage of information with storage periods of several centuries. In view of the constantly changing technologies, ever new software, formats and standards, this is a major challenge for the information society .

The storage, development and provision of information is a prerequisite for the working ability of modern companies and administrations. With the exponential growth of electronic information , the problems of long-term retention grow , although modern software technologies are much better suited to managing information than was traditionally possible with paper , files and shelves. More and more information is created digitally and the output as paper is only a possible representation of the original electronic document . The use of electronic signatures gives electronic documents the same legal character as originally manually signed documents. Such digital documents only exist legally in electronic form. These developments meanwhile force every company to deal more intensively with the topic of electronic archiving.

Definitions

In Germany, three definitions have been used for electronic archiving:

Electronic archiving

Electronic archiving is the database-supported, long-term, secure and unchangeable storage of electronic information objects that can be reproduced at any time.
Tape library (interior view)

Electronic long-term archiving

Electronic long-term archiving is the storage of electronic information for more than 10 years.
The term long- term archiving is basically a pleonasm , since archiving already implies the long-term aspect, but it helps to emphasize the difference to short-term archiving or backup .

Audit-proof electronic archiving

Audit- proof archiving is the storage of electronic, business-relevant information objects that meet the requirements of the Commercial Code § 239 , § 257 HGB as well as the Tax Code § 146 , § 147 , § 200 and the GoBD for the safe, proper storage of commercial documents and the retention periods of six to ten years.

The commercial code (HGB) and the tax code (AO) provide the basis for storage, regardless of whether in conventional paper archives or electronic systems (see audit security ). The requirements have been summarized in the Code of Practice "Principles of Electronic Archiving" of the Association of Organizational and Information Systems (VOI) 1996. The definition for audit-proof archiving originates from Ulrich Kampffmeyer as early as 1992. The functionality and scope of electronic archives in the ISO 14721 OAIS Open Archival Information System standard and that of records management systems in ISO 15489-1 are international / -2 Information and documentation - Defined records management . In Germany, the IT-Grundschutz Catalogs of the BSI can be consulted with regard to the security and testing of archive systems . If e-mails are not saved in an audit-proof manner, managing IT managers and You may be personally liable, which can result in imprisonment of up to two years or fines.

Electronic archiving standards

The most important standard for electronic archiving is the OAIS "Reference Model for an Open Archive Information System". The reference model describes the functions and components that are necessary for long-term electronic archiving. OAIS was developed by the space agencies and adopted in 2003 as ISO Standard 14721. Version 2 of OAIS, the so-called “Magenta Book”, was adopted in August 2012 as ISO standard 14721: 2012.

Reminder notes for audit-proof archiving

The following ten key phrases on audit-proof electronic archiving come from the Association of Organizations and Information Systems. V .:

  1. Each document must be properly stored in accordance with legal and organizational requirements
  2. The archiving must be carried out completely - no document may be lost on the way to the archive or in the archive itself
  3. Every document is to be archived at the earliest possible organizational time
  4. Each document must match its original and be archived unchangeably
  5. Each document may only be viewed by appropriately authorized users
  6. It must be possible to find and reproduce every document in a reasonable time
  7. Each document may be destroyed at the earliest after its retention period has expired, i. H. deleted from the archive
  8. Every action that changes in the electronic archive system must be recorded in a way that is understandable for authorized persons
  9. The entire organizational and technical process of archiving can be checked by an expert third party at any time
  10. For all migrations and changes to the archive system, compliance with all of the principles listed above must be ensured

Implementation of the requirements in electronic archive systems

To meet these requirements, archive systems consisting of databases , archive software and storage systems were created, which are offered by numerous manufacturers and system integrators in Germany. These systems are mostly based on the approach of using a reference database with the administration and index criteria to refer to an external memory in which the information objects are held. This so-called reference database architecture was necessary in order to move large amounts of information from the fast but cost-intensive online storage to separate archive storage. The database allows the document to be found again at any time via the index and made available to the user with a corresponding display program. In the early days of this technology, it was mostly a matter of very closed, independent systems that practically led to "islands" in the IT landscape. Today archive systems are integrated into the IT infrastructure as subordinate services (→ service-oriented architecture ), are served directly by office communication and specialist applications and also provide these applications with the information they need for processing and display. For the user, it is irrelevant where the required information is stored. The discussion about the "right" storage medium for electronic archiving is usually only conducted by IT specialists, project staff and legal departments when it comes to selecting and introducing an electronic archiving system.

Functional requirements for an electronic archive system

Electronic archive systems are characterized by the following independent features:

  • program-supported, direct access to individual information objects, commonly also called documents, or information collections, e.g. B. Lists, containers with several objects etc.
  • Database-supported management of the information objects on the basis of metadata and, if necessary, full-text indexing of the content of the archived information objects
  • Support of various indexing and research strategies in order to be able to access the information sought directly
  • Uniform and shared storage of any information objects, from scanned facsimiles to document format files and e-mails to complex XML structures, lists, COLD documents or entire database contents
  • Management of storage systems with media that can only be written once, including access to media that are no longer directly in the storage system
  • Ensuring the availability of the stored information over a longer period of time , which can be decades
  • Provision of information objects independent of the application that originally created them on different clients and with transfer to other programs
  • Support of “class concepts” to simplify the acquisition through inheritance of characteristics and structuring of the information base
  • Converter to generate long-term stable archive formats and viewers ( english Viewer ) for displaying information objects for which the original application does not longer available
  • Protection of the stored information objects against unauthorized access and against changeability of the stored information
  • Comprehensive management of different storage systems in order to e.g. B. to ensure fast access and rapid provision of information through intermediate storage (caches)
  • Standardized interfaces in order to be able to integrate electronic archives as services in any application
  • Independent recovery functionality (recovery) in order to be able to rebuild inconsistent or malfunctioning systems on their own without loss
  • Secure logging of all changes to structures and information objects that could endanger the consistency and retrievability and document how the information was processed in the archive system
  • Support of standards for the special recording of information on storage media with WORM processes, for stored documents and for metadata describing information objects in order to guarantee long-term availability and migration security
  • Support of automated, traceable and lossless migration processes

All these properties should make it clear that it is not about hierarchical storage management or conventional data backup . Electronic archive systems are in a class of their own and belong as subordinate services in every IT infrastructure.

Storage technologies for electronic archiving

In the case of electronic storage technologies, it is necessary to separate the administration and control software on the one hand and the actual storage media on the other. In the past, conventional magnetic storage media were not considered suitable for audit-proof electronic archiving, as the stored information can be changed and overwritten at any time. This applies particularly to hard disks that are dynamically managed by operating systems. Magnetic influences, "head crashes" and other risks assigned the hard drives to the role of pure online storage. In addition to the erasability of magnetic tapes, this media is subject to high loads and wear and tear, as well as magnetic overlays if stored for too long.

Conventional WORM media

In the 1980s, special digital-optical storage media were developed that can only be written to once in their drive with a laser without contact. This storage technology is called WORM “Write Once, Read Multiple Times”. The storage media themselves are protected against changes by their physical properties and offer a significantly longer service life than the magnetic media known up to then.

The following types fall into this category of storage media:

CD-WORM

CD-R media

With the compact disc, which can only be written to once and has a storage capacity of around 650 megabytes, the storage surface of the medium is irreversibly changed during writing. CD media are standardized by ISO 9660 and are inexpensive. However, the quality of some cheap media is not to be considered sufficient for long-term archiving. There are numerous providers for drives and media. The control of the drives is directly supported by the operating systems.

DVD-WORM

Similar to the CD, the storage surface of the DVD-ROM is irreversibly changed in the medium. DVDs offer different storage capacities between 4 and 24 gigabytes. When used for archiving, it must be ensured that the drive and media meet the requirements of long-term availability. There are also numerous providers here and most drives are directly supported by the common operating systems.

5¼ "WORM

These media and drives are traditional technology designed specifically for electronic archiving. The media are in a protective cover and are therefore better protected against environmental influences than CD and DVD, which were developed for the consumer market. The media are written on with a laser and offer an extremely high level of protection against corruption. The current state of the art are so-called UDO media , which use a blue laser and offer a storage capacity of 60 gigabytes. In the future, significantly higher capacities per medium can be expected. The disadvantage is that media from previous generations of the 5¼ ″ media cannot be used in new drives. Special driver software is required to connect 5¼ ″ drives.

So-called jukeboxes , i.e. automatic record changers, are used to manage and use the media . With the help of software, these provide the required information from the media. As a rule, the software also makes it possible to manage those media that are no longer in the jukebox and have to be added manually on request. The software for controlling jukeboxes is integrated directly into the archive software, but also offered as independent control software. To connect jukeboxes, you usually use your own server, which also takes care of administration and caching. Such systems can now also be used as Network Attached Storage (NAS) or integrated in Storage Area Networks (SAN).

Newer technologies

In addition to traditional archive storage based on rotating, digital-optical removable media, two other technologies are now emerging:

Content Addressed Storage (CAS)

These are hard disk systems that use special software to achieve the same properties as a conventional WORM medium. Overwriting or changing the information on the storage system is prevented by the coding during storage and the special addressing. These CAS storage systems are self-contained subsystems which, however, can be integrated directly into the IT environment almost like conventional hard disk systems. They offer storage capacities with high performance in the terabyte range.

WORM tapes

WORM tapes are magnetic tapes that also meet the requirements of a conventional WORM medium due to several combined properties. This includes special tape media as well as protected cartridges and special drives that ensure write-once. Especially in data centers where tape robots and library systems are already in place, WORM tapes are an easy-to-integrate component for long-term archiving. The existing control software can handle the media and also automate the corresponding copying and backup. Hard disk or WORM tape archives are an option, especially for larger companies and administrations with data centers, as they can be easily integrated into ongoing operations. The use of WORM tapes for online access is doubtful, however, as there are waiting times both for inserting the tape using a robot and for rewinding times. If the data is organized in containers, there may also be several rewinding processes for a single data object within the container (reading the table of contents, reading the data object, reading a checksum). Associated with this is a corresponding strain on the hardware and the tapes themselves.

Strategies for ensuring the availability of archived information

standardization
Compliance with standards is an essential prerequisite for the long-term availability of electronic information. Recording formats , metadata , media and the file formats of the information objects themselves must be taken into account. Long-term storage should be taken into account when generating data. Long-term stable formats should be preferred. Properties of such a format should be a wide distribution, an open specification (standard) or the special development as a format for long-term data storage. Examples of standardized formats are XML files, TIFF and PDF / A archives. There are various standardized metadata formats for metadata . The architecture of archive systems and the structure of information objects has been standardized by ISO 14721 ( Open Archival Information System ). An XML-based interface (XAM) is provided by the SNIA , the umbrella association of storage manufacturers, for connecting archive storage .
migration
One method of ensuring availability is to migrate information to a new system environment. Under certain circumstances, it represents a risk if the information is not migrated from one system solution to another in a demonstrably unchanged, complete and unrestricted manner. Originality and authenticity can be called into question by migration. On the other hand, technological change is forcing users to switch to new storage and management components in good time to keep the information available. The migration must therefore be planned when an archive and storage system is first set up so that the change can be carried out without risk and effort. Controlled, lossless, “continuous migration” is currently the most important solution for keeping information available for decades and centuries. Due to the changes and the consolidation of the document management market with the disappearance of numerous providers, the topic of migration has been discussed frequently. The elimination of individual products makes it necessary to migrate to other formats, sometimes with the help of a separate migration program. Anyone who introduces an archive system must therefore deal with the topic of migration planning right from the start.
emulation
In the scientific world, a second model is being discussed with a similar intensity: emulation . Emulation means simulating the properties of an older system in such a way that data from this system can also be used again with newer computers and operating systems. There are a few examples, for example in computer games or Apple computers. However, this solution strategy is not yet used to a large extent in the area of ​​long-term data storage. Disadvantages are that the effort for future emulation steps cannot be planned and, if the paradigm shift is too great, one day it may no longer be feasible. These disadvantages also apply in a similar form to migrations that were not carried out on time.
Encapsulation
The encapsulation process is particularly suitable as preparation for emulation. In addition to the file or information object to be preserved, the software with which it can be visualized and reproduced and the associated metadata are stored in a "capsule". This means that all information required for use is immediately stored in a connected manner in the future. With this method, the objects to be saved can become very large, but without fully ensuring that the archived software will also run in future operating system environments.
Conversion at runtime
If the formats of the information objects to be saved cannot be controlled and not restricted to a few long-term formats, converters and viewers must be kept available on the system side, which convert older formats into displayable formats when the objects are called up. In the medium term, this leads to a large number of converters and viewers to be kept available, for which independent administration is required in order to be able to call up the current converter suitable for an older information object . The conversion at runtime differs from the emulation in that an older environment is not called, but the object is converted for the current environment. Special properties of formats, electronic signatures and components of digital rights management can lead to problems here, as with the other methods.

Legal and regulatory requirements for electronic archiving

The topic of archiving and long-term storage has gained in importance in recent years, particularly due to legal and regulatory requirements. The equal treatment of digital documents with electronic signature such as conventional paper documents, the Sarbanes-Oxley Act and other compliance requirements in the USA, the discussion about the archiving of tax-relevant data in accordance with the GDPdU in Germany make audit-proof archiving and storage systems necessary. When discussing the legal requirements, the question of the “correct” storage medium often arose. Traditional WORM media, which can only be physically written to once, claimed to be the only correct storage media . The manufacturers of hard drive systems and WORM tapes countered. Basically, however, laws and ordinances are (or should be) media-neutral, as technology changes must also be taken into account in view of the long-term retention periods. There is therefore no such thing as the “right” storage medium . The entire archiving process must be closed and secure. This goes beyond the question of storage drives and media and also includes organizational processes.

Further development

Electronic archiving corresponds to the “Preserve” component in the Enterprise Content Management model.

In the meantime, software has become decisive for the use of archive storage technologies. It ensures that the information cannot be changed regardless of the medium, it enables fast access and it manages gigantic amounts of memory. Up until now, electronic archives were a special domain of archiving system providers. Now, however, storage technology itself is becoming more and more intelligent. System management and storage management software now also manage the electronic archives. In addition, a conventional archive, records management or content management system can still be used for structuring, organizing, indexing and providing the information. However, the storage system providers are upgrading. Their goal is to provide archival storage as infrastructure operating system near and same for all applications: This trend in 2003 is information lifecycle management (English information lifecycle management, ILM ) and called to the electronic archiving include. In particular, the promise that ILM migrations are unnecessary or automated, arouses interest among many users. The demands placed on ILM are clearly beyond conventional, hierarchical storage management (HSM). It is increasingly about software for managing the entire life cycle of information instead of pure storage hardware. Electronic archiving is used as a subordinate service that is integrated into enterprise content management solutions (ECM), but is available as an archiving component to all applications whose information must be stored securely over the long term.

Special forms

See also

literature

  • Uwe M. Borghoff , Peter Rödig, Jan Scheffczyk, Lothar Schmitz: Long-term archiving . dPunkt-Verlag, 2003, ISBN 3-8986-4245-3 .
  • Ulrich Kampffmeyer , Jörg Rogalla: Principles of electronic archiving . VOI Compendium Volume 3. VOI Association Organizational and Information Systems e. V., Darmstadt 1997, ISBN 3-932898-03-6 .
  • Ulrich Kampffmeyer: Fundamentals of document management . Gabler Verlag 1997, ISBN 3-4098794-0-4 .
  • Ulrich Kampffmeyer: Electronic archiving and storage technologies . Storage Guide, 2004.
  • Ulrich Kampffmeyer: Document Technologies: Where Are We Heading? . Hamburg 2003, 411 pages, ISBN 3-9806756-4-5 .

Web links

Individual evidence

  1. Based on the principles of electronic archiving. Association of Organizational and Information Systems e. V., Darmstadt 1997, ISBN 3-932898-03-6 .
  2. http://www.finance-magazin.de/lösungen-it/it/finance-ratgeber-revisionssichere-e-mail-archivierung-1352789/
  3. CCSDS, The Consultative Committee for Space Data Systems, Recommendation for Space Data System Practice, Reference Model for an Open Archive Information System (OAIS), Recommended Practice, CCSDS 650.0-M-2, MAGENTA BOOK, June 2012 PDF
  4. ISO 14721: 2012 Space data and information transfer systems - Open archival information system (OAIS) - Reference model
  5. Source: Verband Organization und Information e. V. (www.voi.de)
  6. Source: AIIM / PROJECT CONSULT 2003
This version was added to the list of articles worth reading on October 27, 2006 .