Lockstep (computer technology)

from Wikipedia, the free encyclopedia

In the field of computer technology , especially in the case of processors and microcontrollers , the term lockstep describes a method for error tolerance and error detection in hardware , which is achieved by using several identical or similar units such as CPU cores in multi-core processors . The name of the procedure is derived from the English term lockstep marching , a close step in step found in the military and in some prisons . Processors in lockstep operation, such as the ARM Cortex-R designed for this purpose or the MPC57xx PowerPC series processors, are primarily used in safety-critical applications such as engine controls (ECU) in vehicles and in autonomous engine controls (FADEC) in aircraft.

Procedure

Simplified function diagram of two CPU cores in lockstep mode with error detection by comparing the results

The processor structure of Lockstep is similar to that of parallel computers , however, the parallel connection does not serve to increase the computing power, but all CPU cores run the exact same program or the same algorithm. The control and evaluation by comparing the results of the individual CPU cores then takes place in individual, time-tight and non-interruptible steps ( English step ). The redundancy achieved in this way enables hardware failures in one of the processor cores to be detected and responded to, as is the case with a dual core in lockstep mode, or errors not only recognized through majority decisions, but also corrected to a certain extent, such as this is the case with triple-core processors in the event of a single fault.

In order to be able to reliably detect faults which affect all processor cores operated in parallel, this case is referred to as a so-called failure with a common cause , the individual processors are operated with a few clock cycles . As a result, the common cause of the fault has an effect in different states of the individual cores and can thus be identified as a deviation in the subsequent comparison of the results.

With this method, error detection is limited to the detection and, if necessary, correction of hardware-related failures in the electronic component, to single event upsets (SEU) or latch-up effects in one of the CPU cores or in the area of ​​the separate power supply . Systematic errors in the program code or development errors in the algorithm cannot be recognized or compensated for with Lockstep.

literature

  • Hans-Leo Ross: Functional Safety for Road Vehicles: New Challenges and Solutions for E-mobility and Automated Driving . Springer, 2016, ISBN 978-3-319-33360-1 .
  • A. Avizienis, H. Kopetz , JC Laprie: The Evolution of Fault-Tolerant Computing . Springer Science & Business Media, 2012, ISBN 978-3-7091-8871-2 .

Individual evidence

  1. M. Baleani et al .: Fault-Tolerant Platforms for Automotive Safety-Critical Applications. (No longer available online.) International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), October 30, 2003, archived from the original on August 9, 2017 ; Retrieved March 8, 2018 (Chapter 3: A Survey of Fault-Tolerant Multi-Processor Architectures). Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice.  @1@ 2Template: Webachiv / IABot / embedded.eecs.berkeley.edu
  2. Safety Manual for MPC5744P. NXP (company publication), accessed on March 17, 2020 (Chapter 3.2: Dual-core lockstep).