Reed-Solomon Code

Reed-Solomon codes ( RS codes for short ) are a class of cyclic block codes . They are used in the context of channel coding to detect and correct transmission or storage errors as part of a forward error correction . They are a subclass of the general class of BCH codes . RS codes are MDS codes , which means that they are considered optimal codes within the framework of coding theory .

Reed-Solomon codes were developed around 1960 by Irving S. Reed and Gustave Solomon at MIT Lincoln Laboratory , a research facility of the United States Department of Defense . At that time, however, the practical usability of these codes was limited because no efficient method for decoding was known. In 1969, Elwyn Berlekamp and James Massey presented an efficient decoding algorithm in the form of the Berlekamp-Massey algorithm , which can also be used for BCH codes .

Reed-Solomon codes were first used in the Voyager program of NASA in 1977. The first commercial application they found in 1982 in the error correction of compact discs . Today's applications cover a large area such as the DVB standard for the transmission of digital television signals, various mobile radio standards , digital audio broadcasting (DAB) and file formats such as PAR2 for data storage. Further application examples are two-dimensional barcodes ; so put z. B. the QR code , DataMatrix , Aztec code and the PDF417 Reed-Solomon for error correction of reading errors. In newer fields of application, RS codes are increasingly being replaced by more powerful codes such as the low-density parity check codes (LDPC) or turbo codes (TPC). This is the case, for example, in the DVB-S2 television standard , which uses LDPC for forward error correction.

motivation

Every message, for example a text fragment, can be encoded and transmitted as a sequence of numbers. A typical example for the coding of texts is the ASCII standard . If a coded message is transmitted from a sender to a recipient, there is a risk of transmission errors. This means that some numbers in the message will be deleted or corrupted. The recipient of the message has no way of noticing the transmission error, since a number per se cannot be seen to be correct or incorrect. However, a transmission error can be counteracted on the sender side by transmitting redundant information in addition to the message. By comparing the received message with the redundantly transmitted information, the recipient can then not only verify the integrity of the transmitted message, but also correct errors detected in the message. ${\ displaystyle k}$

In order to add redundancy to the message, the numbers in the message are interpreted as values of a polynomial at firmly agreed support points. A polynomial of degree or less can be represented as the sum of monomials . The coefficients of these monomials result from the solution of a linear system of equations. Due to the special form of this system, there is a solution formula, the Lagrange interpolation . The polynomial obtained in this way is extrapolated to further support points, so that the coded message consists of numbers. ${\ displaystyle k}$ ${\ displaystyle k-1}$ ${\ displaystyle k}$ ${\ displaystyle n> k}$

If a few numbers are deleted during the transmission, so that more than the numbers are retained, the polynomial can again be reconstructed from the correctly transmitted numbers by interpolation, and thus the original message can also be reconstructed by evaluating in the first interpolation points. In the case of an error-prone transmission with errors in only a few places, the original message can still be reliably reconstructed using a somewhat more complicated approach. The more redundancy selected, the more errors can be corrected. Twice as many erasures (namely ) can be corrected as falsifications , so reading systems that recognize erasures when the message is received and output with the user data usually lead to an improved correction capability. ${\ displaystyle k}$ ${\ displaystyle k}$ ${\ displaystyle nk}$ ${\ displaystyle (nk) / 2}$

The expressions occurring in the interpolation contain divisions and must therefore be carried out over a field . If the numbers - or symbols - of the message are chosen from the whole numbers , then the calculations take place in the rational numbers. In addition, the extrapolated values can become very large, which may not be able to be transmitted in the present transmission channel. To overcome these disadvantages, the calculations are carried out in a finite field . This has a finite number of elements that can be numbered in order to link them to the symbols of the message. The division - except by zero - can be carried out without restriction, and thus also the interpolation.

Reed-Solomon codes are suitable for correcting burst errors in data transmission. In the case of burst errors, faulty ("flipped") bits often appear as a coherent chain of errors in the data stream. For example, by a number of consecutive bits do not read correctly scratches on a CD with every revolution. With the CD, however, the data is still interlaced so that the correction capability remains as high as possible even in the event of burst errors.

definition

Let be a finite field with elements ( is then necessarily a prime power , prime). Different elements are now selected and fixed in pairs . ${\ displaystyle \ mathbb {F} _ {p}}$ ${\ displaystyle p}$ ${\ displaystyle p = q ^ {m}}$ ${\ displaystyle q}$ ${\ displaystyle n}$ ${\ displaystyle u_ {1}, \ dots, u_ {n} \ in \ mathbb {F} _ {p}}$

The set of code words of a Reed-Solomon code of length for messages of length over results from the value tuples of all polynomials with degrees smaller at the selected support points: ${\ displaystyle {\ text {RS}} (p, k, n)}$ ${\ displaystyle n}$ ${\ displaystyle k}$ ${\ displaystyle \ mathbb {F} _ {p}}$ ${\ displaystyle \ mathbb {F} _ {p} [x]}$ ${\ displaystyle k}$

{\ displaystyle C = \ left \ {a = (a_ {1}, \ dots, a_ {n}) \ in \ mathbb {F} _ {p} {} ^ {n} \; {\ Big |} \ ; a_ {j} = f (u_ {j}), \; j = 1, \ dots, n; {\ mathsf {where}} \ f \ in \ mathbb {F} _ {p} [x] \ { \ mathsf {with}} \ \ deg (f) <k \ right \}}

Support point sets

RS codes for various permissible sets of interpolation points are linear isomorphic. The bijective linear mapping that mediates the isomorphism results from Lagrange interpolation with respect to the first set of support points and evaluation in the second set of support points. In the first step, code words are converted into polynomials of a lower degree, so that the second step results in a code word again. ${\ displaystyle k}$

If an element is of the order or larger, for example ${\ displaystyle \ alpha \ in \ mathbb {F} _ {p}}$ ${\ displaystyle n}$

{\ displaystyle u_ {1} = 1, \, u_ {2} = \ alpha, \, \ dots, u_ {j} = \ alpha ^ {j-1}, \ dots, u_ {n} = \ alpha ^ {n-1}}

to get voted. Every finite field contains a generating or primitive element of the multiplicative group , that is, an element of order . Hence, this particular choice is forever possible. ${\ displaystyle \ mathbb {F} _ {p} {} ^ {*} = \ mathbb {F} _ {p} \ setminus \ {0 \}}$ ${\ displaystyle p-1}$ ${\ displaystyle n = p-1}$

Are the reference points exactly the powers of an element of order , so the RS code is a cyclic code . Because the code word for the polynomial results from rotating the code word to by digits to the left. This variant is generally preferred because it is easier to implement cyclic codes. ${\ displaystyle u_ {1} = 1, \; u_ {j} = \ alpha ^ {j-1} \ neq 1, \; j = 2, \ dots, n,}$ ${\ displaystyle \ alpha \ in \ mathbb {F} _ {p}}$ ${\ displaystyle n}$ ${\ displaystyle \ alpha ^ {n} = 1}$ ${\ displaystyle f_ {j} (x) = f (\ alpha ^ {j} x)}$ ${\ displaystyle f (x)}$ ${\ displaystyle j}$

Encoding messages

One can convert a message directly into a codeword by using the components as coefficients of a polynomial ${\ displaystyle (a_ {1}, a_ {2}, \ dots, a_ {k}) \ in \ mathbb {F} _ {p} {} ^ {k}}$

{\ displaystyle f (x) = a_ {1} + a_ {2} \, x + a_ {3} \, x ^ {2} + \ dots + a_ {k} \, x ^ {k-1} = \ sum _ {i = 1} ^ {k} a_ {i} \, x ^ {i-1} \ in \ mathbb {F} _ {p} [x]}

uses and evaluates this at the support points . This results in a code word ${\ displaystyle u_ {1}, u_ {2}, \ dots, u_ {n} \ in \ {0,1, ..., p-1 \}}$

{\ displaystyle c = (c_ {1}, c_ {2}, \ dots, c_ {n}) = {\ Big (} f (u_ {1}), f (u_ {2}), \ dots, f (u_ {n}) {\ Big)} \ in \ mathbb {F} _ {p} {} ^ {n}}

the length . ${\ displaystyle n}$

A systematic coding is obtained , in which the message is contained in the first components in "clear text", through a preparatory transformation of the message. The polynomial leading to the code word results here as the interpolation polynomial of the pairs ${\ displaystyle k}$ ${\ displaystyle f (x)}$

{\ displaystyle {\ Big (} (u_ {1}, a_ {1}), \, (u_ {2}, a_ {2}), \, \ ldots, \, (u_ {k}, a_ {k }) {\ Big)}}

,

according to the formula of the Lagrange interpolation

{\ displaystyle f (x) = \ sum _ {i = 1} ^ {k} \ left (a_ {i} \ cdot \ prod _ {j \ neq i} ^ {k} {\ frac {x-u_ { j}} {u_ {i} -u_ {j}}} \ right)}

.

Because for results from the code word ${\ displaystyle f (u_ {i}) = a_ {i}}$ ${\ displaystyle i = 1, \ dots, k}$ ${\ displaystyle f (x)}$

{\ displaystyle c = (c_ {1}, c_ {2}, \ dots, c_ {n}) = {\ Big (} a_ {1}, a_ {2}, \ ldots, a_ {k}, f ( u_ {k + 1}), \ dots, f (u_ {n}) {\ Big)}}

.

Both variants use the same set of code words and therefore have the same error correction properties.

properties

The definition immediately results in the following properties:

Code word length: ${\ displaystyle n}$
Dimension of the code: ${\ displaystyle | C | = | f | = q ^ {k}}$
Code rate: ${\ displaystyle R_ {c} = k / n}$

The minimum distance is and thus fulfills the singleton limit . Codes with this property are also called MDS codes . ${\ displaystyle d _ {\ text {min}} = n-k + 1}$

Explanation

Since it can have a maximum of zeros (limited by the degree of the polynomial), a maximum of digits appear in the corresponding code word that become 0. This is the Hamming weight and, because of the linearity, also the minimum distance.

{\ displaystyle f}

{\ displaystyle k-1}

{\ displaystyle k-1}

{\ displaystyle wt (C) \ geqq n-k + 1}

Together with the singleton bound, this results in equality.

{\ displaystyle d _ {\ text {min}} \ leqq n-k + 1}

literature

Stephen B. Wicker, Vijay K. Bhargava: Reed Solomon Codes Applications . Wiley, 1999, ISBN 0-7803-5391-9 ( ieeexplore.ieee.org ).

Individual evidence

↑ Irving S. Reed, Gustave Solomon: Polynomial codes over certain finite fields . In: Journal of the Society for Industrial and Applied Mathematics, SIAM J. Band 8 , 1960, ISSN 0036-1399 , pp. 300-304 .

Web links

Coding and decoding algorithms for Reed-Solomon and other codes in C (Robert Morelos-Zaragoza) (en)
James S. Plank: A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems , Software - Practice & Experience, 27 (9), September, 1997, pp. 995-1012

[irvi1-1] Irving S. Reed, Gustave Solomon: Polynomial codes over certain finite fields . In: Journal of the Society for Industrial and Applied Mathematics, SIAM J. Band 8 , 1960, ISSN 0036-1399 , pp. 300-304 .