Distributed system

from Wikipedia, the free encyclopedia

According to the definition of Andrew S. Tanenbaum, a distributed system is an association of independent computers that present themselves to the user as a single system. Peter Löhr defines it more fundamentally as "a set of interacting processes (or processors ) that have no shared memory and therefore communicate with each other via messages". The part area in the computer science which deals with distributed systems and the algorithms employed is distributed computing or distributed processing ( English : Distributed Computing ) mentioned.

Classifications

Usually one differentiates in

Reasons for using distributed systems

Real concurrency can be achieved with distributed systems ; this means that several processes can really be executed simultaneously. In addition, a distributed system is usually more scalable than a single computer, since you can easily increase performance by adding more computers.

A common scenario is also the provision of remote resources, as is the case with Wikipedia . In addition, distributed systems are used to increase the reliability by offering certain functionalities from several computers ( redundancy ) so that if one computer fails, the same functionality is offered by another computer.

In many cases there are also economic reasons to network inexpensive computers instead of buying an expensive supercomputer . Volunteer computing projects such as SETI @ home , for example, take advantage of this, using the unused computing power of single-user computers to solve complex problems.

Additional reasons:

transparency

For the user as well as for the application of a distributed system, the type of distribution is not relevant and ideally also not visible. The system behaves transparently (in the sense of being transparent), as if the user were dealing with an overall system.

Problems

Since a partial failure can occur in distributed systems, affecting individual computers or parts of the network, it should be ensured that there is no single point of failure in the system. It should be noted that the probability of a process malfunctioning increases with the number of processes involved (see availability ).

An important sub-problem of this is noticing a partial failure first. There are no fully satisfactory methods that can detect and correct a partial failure. One possibility would be the heartbeat or regular pinging of the systems involved. However, these possibilities are not perfect.

True concurrency is possible in distributed systems , but processes can be processed at different speeds. A strong form of non-determinism caused by this increases the requirements for the synchronization of processes. For this reason, a concurrency check is usually very important: on the one hand with regard to transactions and on the other hand when accessing shared resources ( mutex ). In addition, there can always be deadlocks in distributed systems.

Overall states (sum of the states of all processes involved) and processes in a distributed system often cannot be traced in retrospect. This makes a diagnosis in the event of an error more difficult.

Distributed systems do not share a common memory and must therefore implement all of their communication by sending and receiving messages . Such communication is very error-prone, so that problems can arise due to falsification of messages, duplication of messages and loss of messages. In addition, the message transit time is unpredictable, so that one can never predict with certainty whether a system has failed or whether it only has a long response time .

Another problem with messages is that this type of communication can be insecure, i.e. it can be eavesdropped by attackers or deliberately manipulated, and must run over an infrastructure that (like the Internet) may not be fully suitable for group-based communication.

With complex processes it is often necessary to implement a common concept of time in data processing (synchronization without process communication). For this, it must be ensured that the time known to each process only coincides with small deviations. Distributed transactions can thus be carried out securely, since here with the help of timeouts an obsolescence of sent messages is avoided. (See also “ Algorithms for clock synchronization ” below).

In addition, distributed systems make (central) administration more difficult, especially with non-structured topologies . Depending on the application, millions of differently configured computers come together, which can also belong to completely strangers.

Models

In the case of distributed systems, different communication models are assumed.

Asynchronous model
In the asynchronous model, processes only have the status active and passive. Only an active process sends messages. An active process can become passive at any time, whereas a passive process can only be reactivated by a message.
Synchronous model
In the synchronous model, messages themselves have no runtime. This behavior is achieved in practice by means of synchronous communication .
Atomic model
In the atomic model, the messages have a runtime, but the processes themselves have no runtime.

Algorithms

Clock synchronization algorithms

Logical clocks
Logical clocks give events clear time stamps. In contrast to real-time clocks , the aim here is not to measure the physical time, but only to have a monotonically increasing time value in order to make a causal order of the events recognizable.
Physical clock synchronization

Broadcast algorithms

The aim of a broadcast is to distribute information throughout the network.

Examples:

Selection algorithms

Selection algorithms can be divided into two categories: algorithms that select a unique node from a set of identical nodes and maximum algorithms that select the node with the largest ID from a set of nodes with a unique ID.

Examples:

Concurrency control

See also

literature

Web links

Individual evidence

  1. Hagit Attiya, Jennifer Welch: Distributed Computing: Fundamentals, Simulations, and Advanced Topics . Wiley Series on Parallel and Distributed Computing. John Wiley & Sons , 2004, ISBN 978-0-471-45324-6 , pp. 2 (Translation of the term "Distributed Computing" according to the master course Parallel and Distributed Systems , p. 25).
  2. Distributed Systems Principles (PDF; 78 kB)
  3. Andrew Warfield, Yvonne Coady, and Norm Hutchinson: Identifying Open Problems In Distributed Systems (PDF; 40 kB)
  4. Security Engineering: A Guide to Building Dependable Distributed Systems, Chapter 6 (PDF; 568 kB)