Hamming distance

The Hamming distance (even Hamming distance ) and the Hamming weight , named after the American mathematician Richard Wesley Hamming ( 1915 - 1998 ), are measures of the diversity of strings . The Hamming distance between two blocks with a fixed length (so-called code words) is the number of different places.

The Hamming distance is used for error detection and correction by comparing data units received via a transmission link with valid characters. Any correction of the characters is based on the principle of probability. Whether error detection or correction can take place depends on the Hamming distance.

Often these are numbers represented in binary form, for example in coding theory . In this case, the comparison can be made mathematically by an XOR operation and counting the resulting ones. However, there are also important applications for other number systems or alphabets.

definition

${\ displaystyle \ Sigma}$ is a finite alphabet as well , and two of-character words . The Hamming distance between and is defined as: ${\ displaystyle x = (x_ {1}, \ dots, x_ {n})}$ ${\ displaystyle y = (y_ {1}, \ dots, y_ {n})}$ ${\ displaystyle n}$ ${\ displaystyle \ Sigma ^ {n}}$ ${\ displaystyle x}$ ${\ displaystyle y}$

{\ displaystyle \ Delta (x, y): = | \ left \ {j \ in \ {1, \ dots, n \} \ mid x_ {j} \ neq y_ {j} \ right \} |}

It should be noted that the Hamming distance is also a metric in the code space.

Examples

001 1 0 and 001 0 0	→ Hamming distance = 1
1 2 34 5 and 1 3 34 4	→ Hamming distance = 2
H au s and B au m	→ Hamming distance = 2

Hamming weight

The Hamming weight of a character string is defined as the number of characters other than the null character of the alphabet used . This is also the Hamming distance to the zero vector (a character string of the same length that only consists of zero characters).

Hamming distance of a code

The Hamming distance of a code is the minimum of all distances between different words within the code.

Example:

A code consists of the following three words:

{\ displaystyle x = 00110,}

{\ displaystyle y = 00101,}

{\ displaystyle z = 01110.}

The Hamming distance between and is 2.

{\ displaystyle x}

{\ displaystyle y}

To generate, you have to change two bits (from right to left the first and second bit): y = x XOR 00011.

{\ displaystyle y}

The Hamming distance between and is 1.

{\ displaystyle x}

{\ displaystyle z}

To generate, you have to change one bit (the fourth): z = x XOR 01000.

{\ displaystyle z}

The Hamming distance between and is 3.

{\ displaystyle y}

{\ displaystyle z}

To generate, you have to change three bits: z = y XOR 01011.

{\ displaystyle z}

The smallest of the three distances is 1, so the Hamming distance of the code is also 1.

The Hamming distance is important if you want to develop codes that enable error detection ( EDC ) or correction ( ECC ).

In the case of codes with a Hamming distance , all -bit errors can be detected. In the example with , not even every 1-bit error can be detected (x↔z is not noticed, all other 1-bit errors generate invalid codes, e.g. 00111 from x or y). ${\ displaystyle h}$ ${\ displaystyle (h-1)}$ ${\ displaystyle h = 1}$

For every 1-bit error can be detected. In order to also be able to correct the errors, the Hamming distance must be increased to at least , where stands for the number of correctable bit errors. ${\ displaystyle h = 2}$ ${\ displaystyle 2r + 1}$ ${\ displaystyle r}$

With , all 1-bit errors can be recognized and corrected. If 2-bit errors occur, these may be incorrectly “corrected”, since the incorrect word may have a distance of 1 from another valid code word. ${\ displaystyle h = 3}$

With , all 1-bit errors can also be recognized and corrected. If 2-bit errors occur, they can be recognized, but no longer corrected. ${\ displaystyle h = 4}$

The Hamming distance of a code is necessarily a positive natural number . A code with a Hamming distance of 0 is not possible because in this case two code words could not be distinguished.

Finding the Hamming distance of a code

Karnaugh-Veitch diagram
4th	3	2	3	4th
3	2	1	2	3
2	1	X	1	2
3	2	1	2	3
4th	3	2	3	4th

The manual determination is best done with the Karnaugh-Veitch diagram . There you enter a cross for each code value that occurs. If at least two crosses are next to each other horizontally or vertically, with opposite edges coinciding, then the Hamming distance = 1.If two crosses are either only diagonally next to each other or horizontally or vertically to each other with a field in between, the Hamming distance = 2 The adjacent Karnaugh-Veitch diagram for 4 bits (gray fields are cyclical repetitions) shows the distance between a code value and a given (cross). So you can z. B. recognize that with 4 bits there are only two values with Hamming distance = 4, namely a complementary pair.

In binary code, the Hamming distance between two code words can and the ones in the result are determined by (a XOR b) and the counting. ${\ displaystyle a}$ ${\ displaystyle b}$

Application example

When transferring data, it must be ensured that information is not falsified or that changes to the data are at least noticed (detection of n-fold errors) and perhaps can be corrected.

In the following example, a rotary switch has four setting options. These are transmitted electronically as a binary number (code word) to a recipient: 00, 01, 10, 11; The receiver receives the code word, but has no other means of checking or recognizing the switch position. In technical applications, this is already the case when the receiver is a microcontroller and the transmitter consists of the sensors within a switch.

In this scenario, the receiver has no way of recognizing corruption in the transmission or a defect in the switch (e.g. defective sensors in the switch). With the help of the Hamming distance and corresponding codes, a way is to be found to detect errors in the transmitter or in the line.

The Hamming distance between the four mentioned values 00, 01, 10, 11 is in each case 1, i.e. H. if only one bit is reversed due to an error, the receiver will receive a different but equally valid code word. If a 00 to 01 is falsified, the recipient cannot recognize the error from the message alone, because both the intended and the falsified value describe a valid position of the switch.

To improve the situation, the sender and receiver first agree to only use certain (but longer) code words and to define their meaning in a table. For this purpose, both can select the code words 001, 010, 100, 111, for example, which each have the Hamming distance of 2 to one another - the other four code words with a length of three bits are not used.

In the case of a single erroneous bit (single error), none of these four code words 001, 010, 100, 111 changes into one of the other three valid code words. The recipient recognizes when z. B. a 011 arrives that an error must have occurred. A code with the Hamming distance 2 cannot be corrected reliably, as this example shows: The 011 could have been created by reversing just one bit from one of the three valid code words 001, 010, 111.

If the receiver assumes that only single errors occur and the receiver wants to correct them, it must agree with the sender code words that each have a Hamming distance ≥ 3, e.g. E.g. 01011, 01100, 10010, 10101.

If the receiver now receives 01111 and it assumes that a single error has occurred, then 01111 can only have arisen from the valid code word 01011 in which the middle bit was changed.
A double fault can also be detected. Since the sender and receiver know that they are only using certain code words that differ by at least three bits (Hamming distance ≥ 3), a double error (only two bits changed) is detected, which, however, cannot be corrected with the information sent .
Triple errors can no longer be recognized, but the relevance of multiple errors decreases in technical systems, since the simultaneous occurrence of several errors becomes less and less likely the more errors are supposed to coincide.

The double error opens up the possibility of an error, as can be shown with the example of 01111: If 01111 should have arisen from a double error from 01100, but the receiver considers it to be a single error and corrects it, then the 01100 actually intended by the sender becomes 01100 due to the double error a 01111 and incorrectly a 01011 due to the correction of the recipient (because of the assumption of a single error).

Because of the already mentioned decreasing probability of multiple errors (n-fold errors) with increasing n, a Hamming distance of 4 (detection of triple errors) to 5 (correction of double errors) is sufficient in most applications.

The required length of the code word depends on the required Hamming distance and the number of possible switch positions and is shown in the table above. There you can see, for example, that at least 8 bits have to be transmitted for 20 different positions of a switch if all 20 code words are to achieve at least the Hamming distance ≥ 3 from one another.

Representation of the bit string in a hypercube

The idea of the Hamming distance can be well represented with the help of hypercubes . A hypercube is the generalization of a three-dimensional cube on the dimension . Each node in the figure corresponds to a bit combination that can also be understood as a coordinate specification in space. The minimum number of edges that have to be traversed in order to get from one valid word of a code to another valid word of the code corresponds to the Hamming distance. ${\ displaystyle d}$

example

Hypercube with d = 1 to d = 4

If the two words {101, 010} are selected for a code in the adjacent cube with , the minimum Hamming distance is 3. This means that in a sphere with a distance of 1, a point with a valid word (e.g. for the valid code word 010) all errors (1-bit errors) are recognized and corrected {000, 110, 011}. ${\ displaystyle d = 3}$

If a code with the words {000, 101, 110, 011} is selected, the minimum Hamming distance is 2. With a Hamming distance of 2, 1-bit errors can only be recognized, but not corrected (for example recognize that 111 represents an incorrect word, but not whether it should be corrected after 110 or 011 or 101).

Minimum distance

The minimum distance between two adjacent code words is of interest for the construction of a code which can correct errors in bit positions for useful information . In the case of block codes with a fixed alphabet, the singleton limit , the Hamming limit (keyword t-perfect) and the Plotkin limit provide more general information about the maximum minimum distance. ${\ displaystyle m}$ ${\ displaystyle k}$

For a code with a minimum spacing it applies that errors can be corrected and errors can be recognized. ${\ displaystyle h}$ ${\ displaystyle k <{\ tfrac {h} {2}}}$ ${\ displaystyle h-1}$

example

If all 1-bit errors are to be correctable, that is , it follows by inserting and changing . With you can only correct 1-bit errors, you can recognize 2-bit errors as errors, but neither correct them nor reliably differentiate them from 1-bit errors. Error correction usually turns 2-bit errors into 3-bit errors. ${\ displaystyle k = 1}$ ${\ displaystyle h> 2}$ ${\ displaystyle h = 3}$

Inference

For each code, the Hamming distance must be at least 3 so that errors can be corrected at all. ${\ displaystyle h}$

literature

Richard W. Hamming: Error-detecting and error-correcting codes . In: Bell System Technical Journal , XXIX (2), 1950, pp. 147-160.
Karsten Weicker: Evolutionary Algorithms. 2nd edition, BG Teubner Verlag, Wiesbaden 2007, ISBN 978-3-8351-0219-4 .

Web links

Explanation and online visualization, also compared to the Levenshtein distance
Technical basics of computer science (accessed on February 12, 2018)
Error detection and correction in codes (accessed February 12, 2018)
Hamming Codes (accessed February 12, 2018)

Hamming distance

contents

definition

Examples

Hamming weight

Hamming distance of a code

Finding the Hamming distance of a code

Application example

Representation of the bit string in a hypercube

example

Minimum distance

example

Inference

See also

literature

Web links