Redundancy (from the Latin redundare , overflowing , abundantly pouring out ) is the additional presence of functionally identical or comparable resources of a technical system when these are normally not required in a trouble-free operation. Resources can e.g. B. redundant information , motors, assemblies, complete devices, control lines and power reserves. As a rule, these additional resources are used to increase failure , functional and operational reliability .
A distinction is made between different types of redundancy: Functional redundancy aims to design safety systems several times in parallel so that if one component fails, the others guarantee the service. In addition, one tries to spatially separate the redundant systems from one another. This minimizes the risk that they will be subject to a common disruption. After all, components from different manufacturers are sometimes used to avoid a systematic error causing all redundant systems to fail ( diverse redundancy). The software of redundant systems should differ as far as possible in the following aspects: specification (different teams), specification language, programming (different teams), programming language, compiler.
Subdivision of the redundancy design
- Hot redundancy (engl. Hot spare ) means that execute within the overall system several subsystems same function in parallel. Usually two units working in parallel are used, each of which can carry out the task on its own if the other unit fails. It must be ensured that the probability of two devices failing at the same time tends to zero. In the simplest case, if one unit fails, an otherwise distributed load is concentrated on the other, still working unit, without the need for a separate switching process. In industrial safety technology, a test device detects the failure of an individual component and initiates a suitable error reaction (e.g. error message or machine shutdown). The probability of a simultaneous failure of both units is calculated z. B. evaluated according to DIN EN ISO 13849 according to the risk arising from an error. In electronics, there is one possibility that a voter evaluates the results of at least three parallel systems and passes the result on to the majority.
- Cold redundancy means that there are several functions in the system in parallel, but only one is working. The active function is evaluated and, in the event of an error, a switch is used to switch to the function that is present in parallel. It must be given that the switching time is permissible for the overall task and that the system works with predictable tasks. The reliability of the switch must be far greater than that of the functional elements.
- Standby redundancy (passive redundancy): Additional resources are switched on or provided, but are only involved in the execution of the intended task in the event of a failure or malfunction.
- N + 1 redundancy , also known as operational redundancy , means that a system consists of n functioning units that are active at a time and one passive standby unit. If an active unit fails, the standby unit takes over the function of the failed unit. If an active unit fails again, the system is no longer fully available and is generally considered to have failed. For sufficient maintenance redundancy, the system must be increased by at least one additional unit, but the excess capacity incurs higher costs. With (n-1) security, on the other hand, also known as the N-1 rule or (n-1) criterion, network security is guaranteed in a network even if a component fails, without overloading the resources. The difference between N + 1 and N-1 is that with N + 1 the individual units can be fully utilized in normal operation and the redundant unit remains unloaded, with N-1 there is no redundant unit, but all units are in normal operation so lightly loaded that they, if necessary together, provide sufficient redundant capacities to be able to compensate for the failure of a unit.
When setting up a redundant system, a distinction can be made between two types of components of the same type, such as those used in connection with IEC 61508 :
- With homogeneous redundancy , the same components work in parallel. With this design, the development effort can be reduced by using identical components, but the design only protects against accidental failures, e.g. B. due to aging, wear and tear or bit errors . With homogeneous redundancy, there is a higher probability of a total failure due to systematic errors (e.g. design errors), since the components are the same.
- In the case of diverse redundancy , different components from different manufacturers, types and / or functional principles work together.
- In the case of electronic circuits, there is a good chance that, in addition to random failures, systematic errors (e.g. design errors) will also be detected during operation. Since the development is correspondingly more complex (possible reasons: compensating for different calculation times, integrating different controllers, more tests), the effort is correspondingly higher.
- For example, the Pentium FDIV bug with homogeneous redundancy would not be recognizable. If the system has a diversified redundant structure, for example from an Intel and an AMD processor, a voter could recognize different calculation results as errors. Typical applications are aerospace and industrial safety controls.
- By connecting two contactors in series with different current switching capacities, even wear and tear on both contactors and thus a possible simultaneous failure can be avoided
- By connecting a seat valve with a pressure switch and a slide valve with a position query in series, the probability of failure of both valves or their test equipment due to a common fault is reduced in fluid technology.
Failure behavior of redundant systems
If an error occurs in redundant systems, the following terms have been assigned to this failure behavior:
- Fail-safe means that in the event of a fault, the failed system is no longer available and assumes a controllable initial state. The failure of a component must lead to a controllable end result through additional measures in the system. An example of this would be hydraulic cylinders with larger dimensions compared to the automatic in manual operation. In this way it can be guaranteed that a faulty automatic system is always "overruled" with a manual measure.
- Fail passive means that the system must be constructed from 2 fail-safe systems and must have error detection and error suppression. Both systems must be able to compare their initial results. If they come to different results, the resulting initial result must be zero. The system thus behaves passively.
- Fail operational means that the system continues to work in the event of a fault. The system does not assume an error state, it remains operational. To achieve this, the system must at least consist of three systems that also have a fault diagnosis must have and error suppression. By comparing the systems with each other, you can find out that there is an error and also which system has the error. This system structure can then also be described as fault-tolerant.
Redundancies are also required in companies . They concern in the production technology to reduce or eliminate the risk of business interruption . The risk diversification knows in the context of ensuring production following types of redundancy:
- regional distribution : the manufacture of the same product in different premises ( parallel production );
- Property-related diversification : The risk of failure of technical systems is reduced or eliminated by replacement systems that can be used immediately. This also includes the use of emergency power generators in the event of power outages .
- Personal diversification occurs, for example, when several board members travel separately to the same travel destination.
Operational functions must be examined to see whether a lack of redundancies can lead to operational disruptions in the production process. At Volkswagen , for example, the delivery stop by two automotive suppliers in August 2016 showed the weak point that excessive dependency in the procurement of vehicle parts - with just-in-time production - can lead to immediate production stoppages .
- the fallback level with non-equivalent but stable functions in the event of damage or emergency situations,
- In the case of the single point of failure , a lack of redundancy at a point or individual element contributes to a weak point in the system, even at mission-critical times, insufficient redundancy can lead to a total failure,
- Parallelism, for example, of means of transport and routes with parallel routes.
- Fault tolerant rule system
- M 1.52 Redundancy, Modularity and Scalability in the Technical Infrastructure , Federal Office for Information Security, accessed on August 28, 2018.
- n-1 criterion , Federal Network Agency, accessed on August 28, 2018.
- Certification specifications all weather operations of EASA (CS-AWO) ( Memento of October 13, 2006 in the Internet Archive )
- Reinhold Hölscher, Ralph Elfgen (ed.): The challenge of risk management. Identification, evaluation and control of industrial risks. Gabler, Wiesbaden 2002, p. 15