Soft error

from Wikipedia, the free encyclopedia

In computer science, a soft error is a special form of an error, i.e. an unexpected and unintentional state of a logic circuit or a data memory . In contrast to errors that z. B. caused by defects in the hardware and permanently change the system, only temporary changes in status are caused by soft errors. If the wrong data is corrected, no further influence on the system by the soft error can be ascertained; in particular, the reliability of the system is not affected.

Soft errors are primarily caused by high-energy radiation , e.g. B. cosmic radiation (radiation) or ionizing radiation triggered by radioactive substances. In a broader sense, soft errors can also be caused by (external) interference signals, e.g. B. Crosstalk of signals or noise can be caused.

history

Soft errors were initially observed in the first semiconductor memories , especially DRAMs . In these, the information is in the form of an electrical charge , i. H. Electrons , stored on a capacitor . Since a capacitor with an associated drive transistor is required for each stored bit , the capacitance of the capacitor is designed to be small in order to accommodate a large number of memory cells on one chip . With increasing integration density from originally 1024x1 bit (Intel 1103) in 1970 to today's (2011) 8Gbit DRAMs, fewer and fewer electrons are available to differentiate between a logical "0" and "1".

Flash memories , in which the information is also stored in the form of electrons on isolated gates of MOS transistors, are just as sensitive as DRAMs . Due to the ever smaller structures of the semiconductors, the actually more stable SRAMs , whose storage element usually consists of six transistors, are also at risk.

causes

If some electrons of the storage capacitor are "shot away" by a high-energy radiation particle, the state of the storage device can change. This resulting error is reversible , i. This means that the error can be eliminated by rewriting the memory cell with the correct information.

Even the actual integrated circuit or its housing contains a few, unavoidable radioactive atoms that emit alpha particles when they decay . These helium nuclei, which consist of two protons and two neutrons , have a relatively large mass and therefore a very short range (a few cm in air or up to approx. 0.1 mm in solids), as they quickly collide with other atoms on their way. However, the alpha particle can ionize many other atoms on this short path , i.e. That is, to separate electrons from the atomic nucleus and thereby change the information stored in a memory cell.

Alpha radiation can also cause the state of a logic circuit to change briefly, which in the case of switching mechanisms can cause a permanent change in state.

By selecting improved materials, the failure rate caused by alpha radiation has been reduced in the last few decades.

Another source of the interfering radiation is cosmic radiation, primarily fast neutrons . Due to their electrical neutrality, they mostly penetrate the earth's atmosphere unhindered and generate through various complex processes, e.g. B. by interaction with the silicon of the semiconductors, ionizing particles, which in turn can change the memory information. Since neutrons are difficult to shield - if so, then at most at the system level, not at the IC level - cosmic rays are now regarded as the main factor for soft errors.

If the atomic structure of the circuit is destroyed by the high-energy radiation, this can lead to a permanent defect (hard error).

Protection against soft errors

The probability of soft errors occurring is known as the soft error rate (SER). Since it is usually very small, it is difficult to measure. In order to assess the (in) sensitivity of the actual semiconductor circuit, the exposed chips (housing possibly etched) are exposed to a standardized alpha emitter and the resulting error rate is measured. With this accelerated measurement, an Accelerated Soft Error Rate (ASER) is determined.

Since the presence of radioactive atoms and cosmic radiation cannot be completely ruled out, circuit engineering measures must be taken to reduce the effects of soft errors. One possibility is to introduce redundancy so that at least a reliable detection of errors or, with appropriate error correction methods , the failure of individual or several (memory) bits can be detected and corrected on the hardware side.

In computer systems , software processes can also be used to check the data integrity and, if necessary, to restore it.

For components that are used in the automotive industry and are to be qualified according to AEC-Q100 , the current standard recommends investigations according to JESD89 if they contain SRAM / DRAM blocks> 1Mbit.

See also

swell

  1. http://www.aecouncil.com/AECDocuments.html