Consistency (data storage)

from Wikipedia, the free encyclopedia

In databases, consistency is the correctness of the data stored there. Inconsistent databases can lead to serious errors if the application layer above does not expect them. One can differentiate between two basic perspectives on the consistency of data, on the one hand from the world of “classic” relational databases and on the other hand from the world of distributed systems .

Distributed Systems

Distributed storage systems have gained great popularity and popularity in the course of cloud computing . In distributed storage systems, data is usually replicated several times over different servers, on the one hand to increase the availability of the data and on the other hand to reduce the access times. The former is clearly evident, as the probability that several servers will fail at the same time is significantly lower than that only one will fail. The latter is explained by the fact that accesses can either be sent to geographically closer replicas or a completely overloaded server can be relieved by taking over part of the accesses from another server. In this context, consistency means that all replicas of a date are identical. In particular, this also meant that a distributed storage system can be consistent for a data record A and at the same time inconsistent for a data record B. Strict consistency is also used when all replicas are always identical.

Since it is not always useful in distributed systems to keep all replicas consistent, there are also so-called weak consistency ( english weak consistency ), d. This means that no guarantees of consistency are given, and so-called eventual consistency , which states that a data record will at some point be consistent, provided that a sufficiently long time without write processes and errors can be assumed.

In the spectrum between eventual and strict consistency there are still a few intermediate levels, a distinction being made between so-called client-centric consistency and data-centric consistency . The former describes consistency guarantees from the client's point of view, the latter describes internal consistency guarantees.

Client-centric consistency

Monotonic Read Consistency

Once a distributed storage system has responded to a read request from a client for a specific key with version N, all subsequent read accesses by this client will only return versions that are at least as new as N.

Monotonic Write Consistency

If a certain client first writes value 1 and then value 2 for a certain key, then it is guaranteed that the system internally also writes the values ​​in this order. This means in particular that (without further write access) in a replica value 1 value 2 will never be overwritten.

Read Your Writes Consistency

Here the storage system guarantees that a process that has written a date with the version number N is guaranteed not to read any versions that are older than N. A trivial implementation of this would be replicas held locally in the client that are not synchronized. However, this would only guarantee poor consistency and no eventual consistency . In practice, this is implemented through what is known as session consistency , in which this guarantee only applies for the duration of a session . For example, it is then possible to route all requests (regardless of whether read or write access) from a certain process to the same replica. If this replica is not available, the session is ended.

Write Follows Reads Consistency

If a process has read a date X in version N and then the same process overwrites this date, Write Follows Reads Consistency guarantees that the write process only takes place on a replica with at least version N available.

Data-centric consistency

Causal Consistency

Causal consistency means that all operations that are causally related must be serialized in the same order on all replicas. An operation O is causally dependent on an operation P if and only if one or more of the following conditions apply:

  1. O and P were both triggered by the same process and P was chronologically before O.
  2. O is a read, P is a write and O has read the result of P.
  3. O is causally dependent on an operation X, which in turn is causally dependent on P ( transitivity ).

Sequential Consistency

Sequential consistency is stricter than causal consistency in that the model requires that all operations be serialized in the same order on all replicas, and that each client process' s operations be performed in their correct final order.

Linearizability

Linearizability is stricter than sequential consistency in that the model also requires that the uniform order of operations corresponds to the actual chronological order and that all requests appear as if they happened at a point in time instead of during a time interval.

Consistency in classic relational databases

In relational databases, consistency is the integrity of data. This is defined by setting up so-called integrity conditions. A distinction is made between different types of integrity determinations:

  • Area integrity : The value of each attribute must lie in a certain value range.
  • Entity Integrity : The primary key of each object must be unique. The field content must never be empty ( Sql: Not Null ) .
  • Referential integrity : The content of a foreign key field must either be empty ( zero ) or an object with such a key must exist.
  • Logical consistency : The user can also define additional integrity requirements (e.g. in the case of a family tree database: the children must have been born after the parents). As a rule, such conditions cannot be controlled by the database system and must therefore be met by the user himself.

A database is only consistent if it meets all of the integrity requirements. A condition in which at least one constraint is violated is said to be inconsistent.

Consistency in classic relational databases is a superset of the consistency definition from the world of distributed systems, i. H. Unless all replicas are identical, the constraints cannot all be met.

Consistent transformations

Consistency is one of the four ACID properties required in database transactions . Every transaction has to convert a database from one consistent state to another. However, while the request is being processed, the consistency of the database can be temporarily violated.

After each series of changes to the data (inserting, deleting or changing) given by a transaction, the database is checked for the integrity conditions . If these are not met, the entire transaction must be reversed in such a way that the previous (consistent) state is restored (" rollback ").

Transactions running in parallel require particular caution.

See also

Individual evidence

  1. ^ Werner Vogels : Eventually Consistent Revisited. In: allthingsdistributed.com. December 22, 2008, accessed March 22, 2017 .
  2. What Does NOT NULL Mean, Really? Retrieved January 21, 2018 .
  3. ^ K. Scott Allen: Microsoft SQL Server Constraints. In: odetocode.com. January 3, 2004, accessed March 22, 2017 .