data backup

from Wikipedia, the free encyclopedia
Burned laptop

Data backup ( English backup [ ˈbækʌp ]) describes the copying of data with the intention of being able to copy them back in the event of data loss . Thus, data backup is an elementary measure for data security .

The data saved redundantly on a storage medium is used as a backup copy. Backup , called. The restoration of the original data from a backup copy is called data recovery , data restoration or (English) restore .

implementation

Compact network hard drive

Data backups should be stored away from the IT system and in a secure environment. The data backup can also be created on a different type of medium in order to reduce typical technical risks.

  • External hard drives with FireWire , eSATA or USB connections are ideal for private individuals . These can be easily connected to the system to be secured and disconnected from it again and thus enable at least remote storage. Also, network-based hard drives (NAS) and removable disks are easy to connect and are possible to remove, making meaningful fuses.
  • For smaller companies z. B. Safe deposit boxes for storage of data carriers. As a rule, however, it cannot be accessed at all times, as the data carriers can only be accessed during the bank's opening hours. Online data backup is an alternative to this : the data is backed up outside the home, usually in a data center , and it can be accessed at any time. In this case, however, it must be ensured that the data transfer takes place in a secure manner; the external service provider should also not be able to read the content.
  • For larger companies, specially secured safes or rooms (so-called cells) for the fire-proof storage of the tape library can be worthwhile. The saved data can also be distributed across multiple locations or data centers .

Legal situation

The obligation to back up data in companies results, among other things, from the statutory provisions on proper, traceable, audit-proof bookkeeping ( HGB ). Long -term data archiving , which is subject to different principles, differs from short-term storage (limited to one day to three or even six months) . The principles for archiving and verifiability of digital databases have been binding for companies in Germany since January 2002 in the principles for the proper management and storage of books, records and documents in electronic form as well as for data access ( GoBD ), published by the Federal Ministry of Finance .

documentation

When backing up data , it is very important to keep good documentation, as the success and speed of data backup and recovery can depend on it.

The documentation should include:

  • Data backup process
  • Structure of the archiving
  • (Immediate) measures to be taken
  • Competencies (of employees and service providers)
  • Priorities for particularly time-critical data and systems

For a better overview, the documentation for backup and recovery should be set out separately in a backup and recovery plan.

Security types

Depending on the intensity of change in the data to be backed up, certain types of backup can be used for the specific backup process. Individual backup processes can be divided into full data backup, differential and incremental backup. Differential and incremental backups require at least one full data backup. With normal data backup, certain files and / or directories (folders) are selected whose content is to be backed up. There is also the option of only backing up certain file formats. In addition, entire data carriers or partitions can be saved as an image from it. In all cases it is possible to restore just parts of a complete backup set.

A distinction is made between:

Complete / full backup

The complete or full backup is also referred to as “normal backup” in programs . The data to be backed up (a complete drive , a partition , certain directories and / or certain files, certain file formats ) are completely transferred to the backup medium and marked as saved.

The advantage is that the full backup is technically very simple - simply copying the data is sufficient and writing your own backup programs is easy. The very high memory requirement is a disadvantage.

Memory image backup

In the dump assurance (English image backup ) the entire disk can (usually the hard drive , as well as USB mass storage , optical media or in some programs and device on the network ) or just a partition backed by a 1-to-1 image become. For example, not only the user data but the entire file system , including the operating system and user settings, can be saved. The advantage of this backup is that in the event of a total computer failure, the memory image can be written back to the data carrier and the status of the respective data carrier at the time of the backup can be completely restored. With such a restore, either the entire file system is restored to its original structure (in this case no file system driver is required, just a device driver for disk access), or a special driver reads the file system regularly and extracts only the desired directories and files from the backup to integrate them into the current file system as normal directories and files or to overwrite the current ones with the older ones (see also “ Incremental backup ”). For some years now, programs have also been on the market that can also create such backups incrementally.

Differential backup

With the so-called differential backup , all files that have been changed or that have been added since the last full backup are saved. It is therefore always based on the last full backup, whereby storage space and time can be saved compared to a new full backup. If a file has been changed, the respective version of the file is saved with each differential run.

The significantly reduced storage requirement and the fact that the current data backup is only one step away from the last full backup are advantageous. Programming the backup software can be relatively simple. Another advantage is that backup statuses that are no longer required can be deleted independently of one another, while incremental backups are inevitably linked to one another. With very large files that change frequently ( virtual machines , databases, mailbox files of some e-mail programs), however, differential backup is disadvantageous; the differential backup saves the entire file again despite only minor changes.

Incremental backup

With an incremental backup, only those files or parts of files are saved that have been changed or have been added since the last incremental backup or (in the case of the first incremental backup) since the last full backup. It is always based on the last incremental backup. This method has the disadvantage that, when restoring, the data usually has to be retrieved from several backups. Various techniques (date stamp, checksums ) must be used to ensure that the complete chain (full backup - incremental backups 1, 2, 3, etc. - original data) can be traced without errors.

It should be noted that the increments can be stored in two ways:

  • The forward deltas are common . This corresponds to the case described above: The (older) full backup serves as a foundation and is not changed while the increments are built on it. The current database can only be restored by taking increments into account. Examples: duplicity .
  • An incremental backup with reverse deltas reverses this principle. Imagine the edge of a roof with icicles growing down from it. The full backup changes with every data backup and represents the edge of the roof here. The growing icicles are the increments. If a file has changed since the last full backup, the previous file version is saved as an increment - the icicle grows downwards - while the current version is inserted into the full backup. The full backup can be accessed at any time without any problems, while an older version of a file can only be restored by taking the increments into account. Example: rdiff-backup .

The advantage of the incremental backup is the very low memory requirement; the method is therefore suitable for data backup in networks or in the cloud . On the other hand, due to the principle involved, all increments are linked to one another, which is why it is only possible to remove an increment between two other increments with a great deal of computing effort, for example to save storage space or to delete private data.

Detection of changed files

To distinguish between changed files and previously backed up, unchanged files, some file systems use special file attributes that are automatically set by the system in the event of a change and are deleted again by the backup program in the case of a full or incremental backup (e.g. the archive bit for FAT and NTFS ). If such attributes are not available, the backup software must keep records of the files, for example by comparing the file date, the file size or using checksums such as SHA-1 . As examples, tar uses mere timestamps, rdiff-backup timestamps and the file sizes, and duplicity also uses SHA-1.

Problem with encrypted data

Backing up encrypted files is problematic in two ways: If the content of the file is changed only slightly, the attacker has different versions of the encrypted file thanks to different backups. The assumption that the plain text of the file was changed only slightly from increment to increment and that the clear majority of the plain text remained identical can be useful for cryptanalysis .

Some encryption tools (such as TrueCrypt ) do not save the change date and do not change the size of the encrypted file (container file with a fixed size, or padding to fill the file to the desired fixed size). The main purpose of this is to make it more difficult for the attacker to find the secret content. However, this has the consequence that the backup software cannot recognize the changed file. This then leads to another problem: Under certain circumstances, no backups are made of encrypted data as long as the backup software does not work with checksums.

Backup strategies

First in, first out (FIFO)

FIFO is the simplest strategy. As soon as the storage media - or the storage space of a medium - runs out, the oldest full backup is deleted, as well as all incremental or differential backups based on the oldest full backup.

Grandfather father son

Backups are created at different intervals. Here: daily, weekly and monthly. Weeks were marked in color.

Also known as the generation principle , “grandfather-father-son” is one of the most common strategies for creating backups.

The “son” backup, the most common, is created every working day, that of the “father” at the end of the week and that of the “grandfather” at the end of the month. If you use full backups - and one storage medium for each full backup - you need four media for the days of the week (the weekly backup is carried out on the last working day) and five storage media for the backup that takes place on Friday. There are also any number of storage media to cover the past few months.

On macOS , Time Machine uses a similar strategy on a single storage medium: hourly backups are kept for the last 24 hours, daily backups for the last month, and finally the oldest monthly backup is only deleted when the storage space runs out. Since the oldest backup is always a full backup, the data must be transferred to the second oldest backup before the deletion process.

Towers of Hanoi

In order to achieve a good compromise between the number of data backups and the hardware to be made available, the "Towers of Hanoi" backup type is also used. This security strategy is based on the puzzle game of the same name . Each backup medium used corresponds to a disc in the towers and a backup is played on the corresponding medium every time the disc is moved. So the first medium is used every other day (1, 3, 5, 7, 9, ...), the second every fourth (2, 6, 10, ...) and the third every eighth (4, 12, 20, ...) .

With n media you can get by for 2 n-1 days until the last medium is overwritten. So with three media you still have backups from four days ago, on the fifth day backup C is overwritten. With four media you have eight days until medium D is overwritten on the ninth day and with five media you have 16 days until medium E is overwritten on 17 , etc. Files can, depending on the amount of media, be from 1, 2, 4, 8, 16,…, 2 n-1 days to be restored. Mathematically speaking, the medium to be used is determined by the number of zeros on the right-hand side of the binary representation of the days since the backup began.

The following tables show which media are used on which days with different numbers of media. Please note that with this method the first backup will be overwritten after two days. However, this can be avoided by starting at the end of the cycle (marked in red in the tables).

Towers of Hanoi for three media
Day of the cycle
01 02 03 04 05 06 07 08
medium A. A. A. A.
B. B.
C. C.
Towers of Hanoi for four media
Day of the cycle
01 02 03 04 05 06 07 08 09 10 11 12 13 14th 15th 16
medium A. A. A. A. A. A. A. A.
B. B. B. B.
C. C.
D. D.
Towers of Hanoi for five media
Day of the cycle
01 02 03 04 05 06 07 08 09 10 11 12 13 14th 15th 16 17th 18th 19th 20th 21st 22nd 23 24 25th 26th 27 28 29 30th 31 32
medium A. A. A. A. A. A. A. A. A. A. A. A. A. A. A. A.
B. B. B. B. B. B. B. B.
C. C. C. C.
D. D.
E. E.

Special case for private users

For private users, the type of data backup that makes the most sense depends heavily on the hardware available, the expertise available and, last but not least, on the personal attitude towards the data to be backed up and their backup. With sufficient commitment, data backups can be created with simple means and security can be expanded to an industrial level.

Commercial, free, and free programs are available in the software market. The best-known commercial offerings include True Image from Acronis , ShadowProtect from StorageCraft , DriveImage XML from runtime software, and Carbon Copy Cloner for Mac OS X from Bombich Software . In the freeware area, Cobian or Areca can be mentioned as examples, but also simple tools such as robocopy or SyncToy from Microsoft. There is a large variety of free backup programs for Unix- like operating systems such as Linux that address very different users and needs. Examples are duplicity , rsnapshot , rdiff-backup , synchronization programs such as rsync and unison and finally archiving programs such as tar . Starting with Mac OS X Leopard (10.5), Time Machine is an automated data backup solution for backing up copies to external hard drives (USB / FireWire or network drives) integrated into the operating system.

example

A backup on a separate hard drive makes sense. An external hard drive can be stored separately from the computer in a safe place after the data backup. In the case of an internal hard drive, at least one must ensure that any viruses and malware do not have write access to the backup medium during regular operation. Hard disks with very large storage capacities are becoming cheaper and cheaper. Backups on a USB memory stick or on DVD / DVD-RW are also practical. Burners in notebooks and desktop PCs have long been part of the standard equipment and blank media are cheap. The easiest way to create a very good backup copy without software and with little background knowledge is to create at least two backups at regular intervals on physically independent data carriers. In this way, the grandfather-father-son principle can be reproduced. With three or more media, this principle can be expanded to be able to undo small-step changes or to keep versions that are earlier. Other media can increase speed and capacity.

If the data on the original hard disk is sorted accordingly, current or particularly important data can be backed up at shorter intervals (e.g. daily) than the other data.

history

Back in the 1980s, when working mainly on floppy disks, they could be copied quite easily. However, the capacity of the emerging hard disks grew so quickly that backing them up to dozens of floppy disks soon became impractical. Simple tape drives , which were connected via the floppy disk controller or, more professionally, via SCSI , also appeared as pure backup media for private individuals and especially companies .

In the 1990s Iomega tried to position the Zip disks with - for the conditions at the time - comparatively high capacities of 100, later up to 750 megabytes in the area of ​​data backup solutions. From the late 1990s onwards, recordable CDs and later DVDs were also used as very popular backup media , which practically completely supplanted other solutions in the private sector.

Magnetic tapes are nowadays very poorly used in the private sector and are sometimes inferior to hard drives in terms of speed and, above all, in terms of the cost per storage space. In terms of energy consumption and durability, however, they are superior to what they can still be used in a company. Hard drives now offer an attractive alternative to removable media with their large capacities and relatively stable device prices. Also, flash memory have reached workable capacity and can be suitable as backup media.

Backup media types

Tape library (interior view)

In 2005, most data backups of hard drive-based production systems were made on large-capacity magnetic tape (e.g. digital linear tape , linear tape open ), hard drive or optical storage media such as CD-R , DVD , DVD-RAM and comparable formats. With the increase in cheap broadband Internet connections , network and online data backups on external servers are becoming more important.

Other backup media are also used in the private sector (see special case for private users ).

Real-time applications

Databases must be backed up in a consistent state ( data consistency , see also database archiving ). This can be achieved by shutting down the database, then performing a data export and then restarting the database. This procedure is also called cold backup in technical jargon , because - in contrast to hot backup - the database is separated from the productive network (e.g. the Internet or intranet ) and operations are thereby interrupted.

Hot backup

A hot backup (Engl. Hot backup ) is a backup of a system (such as a database ) that is created during operation of this system. This process is also called online backup . Most of the time, the system must support this backup method, as otherwise a backup may have inconsistencies due to active use. A backup can be kept as up-to-date as possible with a hot backup - ideally, it is at the same level as the live system. The advantage of this method is the provision of a current “replacement database”, which can be used immediately in the event of a system crash. One disadvantage is that errors in a data set are immediately transferred to the backup. To get around this, you can incorporate a slight time offset, which of course in turn leads to the lack of the data generated during this period in the failure scenario.

Cold backup

A cold backup (Engl. Cold backup ) is a backup of a real-time system that is created while the system is not active. This ensures that the data is saved in a consistent state. The disadvantage of this method is that the system will not be available for the duration of the backup. It is therefore unsuitable for high-availability services. However, it is useful to create protective copies of environments that only need to be available during the day, for example. This process is also called offline backup .

A common practice with Oracle databases is to put the database in data storage mode when the backup begins and then back in production mode.

Various manufacturers of data backup programs and other manufacturers offer online integrations (integration agents) and additional products such as the St. Bernhard Open File Manager .

Backup strategy

A data backup strategy can be used wherever there is unique data of a certain value, be it in the private user area , in projects or in the corporate sector . In the latter case, this can exist as a binding specification in the form of a guideline.

In it can be determined:

  • How the data should be backed up.
  • Who is responsible for data backup.
  • When to back up data.
  • Which data should be backed up.
  • Which storage medium to use.
  • Where the data backup is kept safe.
  • How data backup is to be secured against data theft (for example through encryption).
  • How long backups are to be kept.
  • When and how data backups are checked for their recoverability.

It should also be determined when and whether (a) a full backup (e.g. on weekends) and / or (b) an incremental or differential backup (e.g. at midnight on weekdays) is performed.

Further points are:

  • If it is necessary to restore data, several employees should be familiar with the procedure. A checklist for this case is very useful, because in an emergency often nobody has the time or nerve to think about what to do next.
  • If possible, the data should not be compressed before the backup. Redundancy can be useful when recovering data.
  • At least one drive must be available that can read the media used.
  • The economic benefit of data backups (costs to restore the data without data backup) must be in a reasonable relationship to the effort made for data backup.
  • The only sure proof of a successful data backup is the proof that the backed up data can be restored completely and within a reasonable period of time. For this reason, restore tests should be carried out at regular intervals.

criteria

The optimal data backup strategy depends on many factors and must therefore be determined anew in each individual case. Important factors to consider are:

The type of data

Machine recoverable data
This includes data and, for example, installed software that only needs to be imported again after data loss. In most cases, backup software that has been tested to function properly is sufficient.
Manually recoverable data
This includes, for example, texts, plans and images that are also available on paper. These can be digitized again by typing or scanning (e.g. text recognition ). It should be noted, however, that the complex configuration and administration of installed software also fall under this heading. Likewise, a scanned construction plan must be processed manually so that it can be used again seamlessly in the CAD software . The manual restoration must also be documented so that the restored data meets the required quality.
Irreplaceable data
This includes, for example, digital photos and videos, but also scanned receipts if the originals are no longer available. Since data is irreplaceable, data backup must meet the highest standards.

The value of the data

A distinction must be made here between three aspects: Firstly, what loss occurs if the data is irretrievably destroyed? If z. If, for example, data is backed up daily at night in a company, all entries must be repeated shortly before the end of the working day if data is lost. The working hours of the employees concerned provide an indication of the loss. However, especially with irreplaceable data, the ideal value must often also be taken into account.

Second, what is the loss of time due to the time it takes for a full recovery and which may not be able to work? If z. For example, if it takes a day to install a PC , the damage can far exceed the value of the software installed. A backup method would have to be selected here that enables the installed status to be completely reconstructed very quickly ( memory image ).

Third, what costs arise from the information obligation , which u. U. according to the Federal Data Protection Act or legal provisions of other countries? If certain types of personal data are lost, those affected, the supervisory authorities or the public must be informed of the data breach .

The frequency with which the data changes

This factor has a decisive influence on the application and design of the generation principle . Data with low frequency of change, such as B. The operating system and installed software do not necessarily have to be backed up regularly. It can also be sufficient to secure these areas only before or after surgery.

The faster the data is changed, the shorter the backup cycle duration will be, based on the generation principle. The expiry period should also be noted here. While there are legally regulated retention times for a lot of data in business life (e.g. invoice data ), e.g. B. current content of websites u. U. can be discarded after a short time when they are no longer needed.

Legal requirements

The data backup strategy must be able to guarantee possible legal requirements (e.g. revision security ).

The principles of proper IT-supported accounting systems , especially paragraph 5.1 and 5.2, must be observed .

Location

Since there are very different types of data with different requirements for the backup strategy, it is advisable to separate this data in advance on different storage locations ( hard drives, partitions ). The optimal strategy can then be selected for each storage location. In addition, there are accident-proof data memories . With online data storage , in most cases the data is kept in a data center .

Time spent on data backup

When choosing a suitable concept, the time required for data backup plays an important role, especially from a business perspective. The total effort is made up of the recurring backup effort and the restoration effort in the event of a data loss. The relationship between these two variables depends on the selection of a specific data backup method. A low backup effort is particularly sought when large amounts of data have to be locked during the backup process, which in many systems has often been avoided for decades. For this purpose there is software that can back up the data of a system during operation.

conditions

The criteria will vary depending on the medium and type of data backup. Most of the time, however, the following points are mentioned:

Server after fire damage
regularity
Data backups should be carried out at regular, periodic intervals. These distances vary depending on the application. A monthly backup of the data on a private PC can be sufficient, while daily backups of the productive data are usually required in production environments. They increase the reliability of data recovery .
Topicality
The actuality of the data backup depends on the number of data changes. The more often important data is changed, the more often it should be backed up.
Safekeeping
Company data backups contain company secrets or personal data and must be protected from unauthorized access.
Preparation of two data backups
The creation of two spatially separate data backups of a database increases the reliability of data recovery in order to minimize the effects of sudden events such as fire or physical coincidences. Data backups should be stored spatially separate from the IT system. The distance should be so great that a catastrophe (fire, earthquake , flood ...) which afflicts the IT system does not endanger the stored data. Alternatively, accident-proof data storage devices can be used.
Constant checking for completeness and integrity
Data backups and data backup strategies must be regularly checked and adapted. Was the data really completely backed up? Is the strategy used consistent? Was the backup successful?
Regular check for recoverability
It must be possible to restore the data within a specified period of time. To do this, the data recovery procedure must be adequately documented and the required resources (personnel, media, tape drives , storage space on the target drives) must be available.
Data backups should be done automatically
Manual backups can be affected by human error.
Use of standards
Using standards makes data recovery easier.
Data compression
Data compression can save storage space, but depends on the compressibility of the data. Modern drives (e.g. DAT , DLT or LTO ) can compress the data during backup. However, uncompressed data may be easier to recover.
Data compression can make the automatic verification of data integrity more difficult and require additional computing time.
Time window
Backup processes can take a long time to complete, which can lead to problems in production environments (impairment of data transfer, accessibility). Compression can also affect the duration of the data backup.
Deletion of outdated data backups
Data backups that are no longer required should be deleted so that the confidentiality of the stored data is preserved.

See also

literature

  • Klaus-Rainer Müller: IT security with a system. Security Pyramid - Security, Continuity and Risk Management - Norms and Practices - SOA and Software Development. 4th expanded and updated edition. VIEWEG, Wiesbaden 2011, ISBN 978-3-8348-1536-1 .
  • W. Curtis Preston: Backup and Recovery. 1st edition. O'Reilly Media, Beijing et al. a. 2007, ISBN 978-0-596-10246-3 .
  • Egbert Wald: Backup & Disaster Recovery. (Risk analysis and preventive measures, data crash and data recovery, emergency plans and disaster management). MITP, Bonn 2002, ISBN 3-8266-0585-3 (IT management) .

Web links

Wiktionary: data backup  - explanations of meanings, word origins, synonyms, translations
Wikibooks: Data backup  - learning and teaching materials

information

Individual evidence

  1. GNU: Using tar to Perform Incremental Dumps
  2. There are months with 5 Fridays, such as B. September and December 2017. Therefore 5 and not 4 Friday bands.
  3. ^ San Francisco Computer Repair: Backup Methods . January 13, 2008. Retrieved February 21, 2008.
  4. ^ Alvechurch Data Ltd: Tower of Hanoi pattern for backup . November 27, 2007. Retrieved March 12, 2008.