Replication (data processing)

from Wikipedia, the free encyclopedia

Replication or replication (from the Latin replicare , 'to respond ', 'to repeat') in the literal sense of the word is the mere production of multiple copies of the same data , but mostly combined with the regular comparison of the data.

In general, replication is used in data processing to make data accessible in several places. This serves on the one hand for data backup ; on the other hand to shorten the response times, especially for read data access .

The simplest form of data replication is the storage of a copy of a file ( copy), in an extended form the copying and pasting of modern operating systems . - Replication in the literal sense of the word is also the duplication of optical data carriers in a press shop or with the help of a burner .

Multiple storage of data and their comparison

Changing data accesses are generally more time-consuming due to the multiple storage of the data. In the case of the master / slave replication that occurs frequently, a distinction is made between the "original" of the data ( primary data ) and the dependent copies. In the case of copies of the same rank ( version management ), merge strategies must be used in the replication in order to merge the data stocks (casual synchronization , different from real synchronization ).

Sometimes it is important to know how up-to-date the replicas must be. Depending on the type of replication, there is a certain period of time between the processing or creation of the primary data and its replication. This time span is referred to as timelines , but mostly as latency .

Synchronous replication

Of synchronous replication occurs when a change operation can only be successfully completed at a data object when it was also performed on the replicas. In order to be able to implement this technically, a protocol to guarantee the atomicity (indivisibility) of transactions must be used, the commit protocol .

Synchronous replication strategies:

Examples of synchronous replication are:

  • Warm standby replication of ASE Sybase server databases
  • Hot standby replication of SQL server Microsoft databases

Asynchronous replication

When there is latency between the processing of the primary data and the replication , it is called asynchrony . The data is only synchronous (identical) at the time of replication.

A simple variant of asynchronous replication is "File Transfer Replication", the transfer of files via FTP or SSH .

The data of the replicas therefore only represent a snapshot of the primary data at a specific point in time. At the database level, the transaction logs of the databases can be transported from one server to the other and read into the database at short time intervals.

Assuming an intact network , the latency then corresponds to the time interval in which the transaction logs are written.

Asynchronous replication strategies:

Advantages and disadvantages of replication

Advantages of replicas in distributed database systems:

Disadvantage:

  • high update effort
  • increased storage space requirements
  • possible redundancy of the data sets with possible networking

Possible applications

See also