Fault tolerant rule system

Architecture of a fault tolerant control system. It consists of a normal control loop, which is expanded by a monitoring level. This consists of a module for error diagnosis (error detection, isolation and identification, FDI), which monitors the control loop by observing the input and output variables and determines the error status of the controlled system. The estimate of the error is passed on to a module which adapts the controller so that at least the minimum required control objectives are met.

A fault-tolerant control system is a technical system that fulfills its function even after an error has occurred. The associated area of fault-tolerant control is a sub-area of control technology and adds a monitoring level to the control loop level in order to improve its reliability . The monitoring level essentially realizes two functions: The error diagnosis and the adaptation of the control to the current error status of the controlled system.

Overview of fault tolerant control systems

Two prerequisites are necessary to implement a fault-tolerant control loop. Firstly, the control devices themselves must meet minimum reliability requirements, that is, they must be designed at least twice, with mutual monitoring. Second, the logic of the control loops must be able to react to errors in the controlled system. Both aspects require the reduction of the effect of errors in the first case referred to the regulator , in the second case related to the controlled system including actuators and sensors .

At least simple redundancy (technology) is required to detect and react to errors . Physical redundancy means the multiple presence of physical components such as measuring devices or actuating devices. Analytical redundancy denotes the possibility of calculating or influencing a variable to be measured or manipulated in various ways. One example is the determination of the missing measured value by an observer after a sensor failure if the faulty route can be observed .

Apart from passive methods for fault-tolerant control, it is always necessary to determine the fault before it can be responded to. Error diagnosis refers to the detection (recognition), isolation (determination of the defective component) and identification (quantification of the error) of errors in technical systems. It is not to be confused with error correction procedures in message transmission.

Passive fault tolerant regulation

In passive fault-tolerant control , the control is dimensioned during the design phase so that certain errors are tolerated during runtime without interfering with the controller, and a disturbed signal is masked out. For this purpose, methods for robust regulation are used in particular. Methods for simultaneous stabilization and simultaneous observation are also suitable in principle for passive fault-tolerant control.

Active fault-tolerant regulation

The term active fault-tolerant control is used to summarize methods that change the controller during runtime in order to react to the error. The prerequisite for this is knowledge of an error model, i.e. a minimum level of knowledge about the error that has occurred. It is advantageous to be able to react to much more serious errors.

In the accommodation , only the controller is adapted to the error situation with regard to its structure and its parameters. The reconfiguration also allows the control loop structure to be changed so that completely failed sensors or actuators can be bypassed.

error

classification

Actuator errors (e.g. valves, motors), sensor errors (e.g. thermocouples, flow meters), internal route errors (e.g. clogged lines, leaks in containers). These errors affect the controlled system in a broader sense, including sensors and actuators. The reaction to such errors is the real concern of the error-tolerant control systems.

Error in the control device (e.g. CPU, memory, bus system) or the software. In practice, errors that affect computer technology are intercepted by multiple parallel arrangements of critical devices in different technological implementations and mutual monitoring. The achievable reliability is a question of the number of redundant components and the evaluation logic. Systematic errors are countered by diversifying the technological implementation. This problem is not a core component of the field of fault-tolerant control systems, see also reliability .

Faults are not to be confused with interference signals, which do not represent a malfunction of a component. In a room heating system, jamming the thermostatic valve is an error, but opening the window is a fault signal. A distinction must be made between this and the malfunction , which in technology denotes a failure or misconduct.

Standards for classifying defects

The most important standards are described in IEC 61508. AK is the requirement class according to the German pre-standard DIN V 19250, which was withdrawn in 2004. SIL stands for Safety Integrity Level , a current system for classifying industrial devices (IEC 61508, IEC 61511, VDI / VDE 2180).

SIL1, AK 2 & 3: Minor damage to equipment and property
SIL2, AK 4: Major damage to systems, personal injury
SIL3, AK 5 & 6: Injury to people, some deaths
SIL4, AK 7: Disasters, many deaths and serious environmental pollution

Fault diagnosis

Main article: Diagnosing faults

Defect detection, isolation and identification

Error diagnosis comprises three steps

Fault detection,
Fault isolation,
Fault identification,

which are explained below. The task of error detection consists in the two-valued decision as to whether or not an error has occurred in a system. If it is known that an error has occurred, a successful adjustment of the control requires at least knowledge of the component concerned, i.e. the exact actuator, measuring element or the system component. By switching off the affected component and switching to a redundant component, it is already possible to react to the error with this rough knowledge.

However, a possibly still existing defective component is then not used. Only when a quantitative model of the extent of the error has been identified (determined) is it possible to react optimally to the error. In practice, it is difficult to identify faults precisely, so it is often necessary to work on the basis of detection and isolation.

Overview of diagnostic procedures

Defect detection and isolation with

Threshold monitoring
Signal model

Defect detection and isolation with a process model

Identification filter
Parity equations
Condition observer-based diagnosis

Fault identification

Classification methods (neural networks, Bayesian network )
Knowledge-based diagnosis based on the inference methods of artificial intelligence

Active adjustment of the regulation

Accommodation

Adaptive control

Reconfiguration

Voting

Circuit example for valves in a pipeline for a 1oo3 system (a) and a 3oo3 system (b)

The term voting is used to summarize methods that require the parallel installation of functionally identical systems. For example, several temperature sensors are installed in parallel in aircraft. An evaluation logic compares the Y number of measured values, of which at least a number X must match. According to IEC 61508, XooY ( X out of Y , German : "X out of Y") elements are used. The addition D indicates a self-examination. But even systems without the suffix D require some type of diagnosis in multi-channel systems, so that this distinction is usually difficult to make. Depending on the choice of X and Y, one, two or more errors can be detected with certainty and assigned to the defective component. Typical schemes are e.g. B .:

1oo1: single-channel processing
1oo2: Redundant processing with cross diagnosis. If a system error is detected, the entire system is switched off
2oo2: Redundant processing that does not guarantee functionality until both systems fail
2oo3: Triple processing using the majority result: voting ('2 of 3' voters)

In functional safety in mechanical engineering, 1oo1 and 1oo2 systems are common, depending on the required safety level. In the process industry, as well as in systems without a safe state that can be achieved in a short time (e.g. aircraft controls, nuclear power plants, chemical reactors), higher redundancies in the form of 2oo2, 2oo3 or 2oo4 controls are also common. If a 2oo2 logic detects an error, it switches over to the intact unit. With a 2oo3 logic, the voting can determine with sufficient certainty which unit is the faulty one. The 2oo4 configuration remains stable in the event that two units could fail at the same time. In order to maintain operational safety, the signal inputs of the redundant units and the results of the computing processes must be constantly synchronized with each scanning process. One speaks of local simultaneity . This method realizes a reconfiguration of the control, because the signal path is changed when used in a control loop.

The availability of the overall system generally decreases with an increasing number of sub-systems.

Model adaptation

Pseudoinverse method
Perfect model adaptation
Adaptive model adaptation

Applications of fault tolerant control systems

1oo1 pipelines

1oo2 gate opening systems, AGV ( Automatically Guided Vehicle , German "Driverless Transport Systems ") ( AK4 )

1oo3, 1oo4, 2oo2, 2oo3 gas turbines ( AK5 )

2oo3 Airbus aircraft, chemical plants, AK6

2oo4 Space Shuttle ( AK6 ), nuclear power plants ( AK7 )

literature

M. Blanke, M. Kinnaert, J. Lunze and M. Staroswiecki: Diagnosis and Fault-Tolerant Control (2006), 2nd edition, Springer Verlag, ISBN 3-540-35652-5
R. Isermann: Fault-Diagnosis Systems (2006), Springer-Verlag, ISBN 3-540-24112-4
DIN 55350: Terms for quality management and statistics