# Redundancy (information theory)

The concept of redundancy (from the Latin redundare, overflowing, abundantly pouring out) describes in information theory that information that is present several times in one information source. An information unit is redundant if it can be omitted without loss of information. Identifying and removing such redundancies is called deduplication .

## Transmission of messages and information

Redundant is the part of a message that does not contain any information . The redundant part of the message can be a function of the information contained in the message. In information technology and communications technology applications, redundancy is used specifically to detect errors. A stronger redundancy enables not only the detection of errors but also their correction. Redundancy thus allows an increase in quality (fewer errors) at the expense of quantity (higher data rate). The strength of the redundancy to be used in each case depends on the fault tolerance of the respective application - in banking and space travel a single overturned bit could cost a lot of money, while in Internet telephony or DVB even the constant loss of entire packets is irrelevant.

### Fault tolerance

A communication can be carried out error-tolerant through redundant information via an information channel , since under certain circumstances lost or falsified partial information can be reconstructed by the recipient from its context . The Hamming distance is a measure of the error tolerance .

### Average code word length

Let Z be an alphabet and zZ
C (z) denotes the code word belonging to z
l (z) denotes the length of C (z)

The mean code word length L (C) of a source code C (z) with the probability distribution p (z) is given by:

${\ displaystyle L (C) = \ sum _ {i = 1} ^ {| Z |} {l (z_ {i}) p (z_ {i})}}$

### Redundancy of a code

The redundancy of the code is the difference between the mean code word length and entropy . (Example: Huffman coding for optimal (= minimal) ). ${\ displaystyle L (C)}$ ${\ displaystyle H (X)}$${\ displaystyle L (C)}$

${\ displaystyle R _ {\ text {Code}} = L (C) -H (X) \,}$

The redundancy of the source is the difference between maximum entropy and the entropy of the message source. ${\ displaystyle H _ {\ text {max}} (X) = \ log _ {2} | Z |}$${\ displaystyle H (X)}$

${\ displaystyle R _ {\ text {source}} = \ log _ {2} | Z | -H (X) \,}$

Since the code word length cannot be less than the entropy, the redundancy is never negative.

## Coding

Coding theory distinguishes between two manifestations of redundancy:

• The distribution redundancy lies in the different probability of occurrence of the individual characters of the alphabet.
• The tie redundancy is that after certain characters, the occurrence of a certain other character is particularly likely. For example, in a German text, a q is almost always followed by a u.

## Databases and data structures

In database development and in the data structures of programs, it is important to avoid redundancies as completely as possible, as these can lead to higher memory requirements and inconsistencies . Redundancies are therefore counted among the anomalies . Freedom from redundancy is a basic principle for a logical data model.

Redundancies can be largely avoided by normalizing the database schema. There are also redundancies that are unavoidable (for example key redundancies ) and are therefore accepted as a necessary evil . There are also redundancies that are accepted because avoiding them would represent too much effort in relation to their problem, such as multiple occurrences of an attribute value or the double storage of the name Müller for Mr. Müller and for Ms. Müller.

The deliberate acceptance of redundancy to achieve better reading performance is called denormalization .

Redundancies in the data structures of programs and databases can lead to program errors. The programmer must ensure that he also keeps the redundant data consistent with all changes . This requires a high level of synchronization effort. The larger the project and the longer the project is being developed, the more difficult this is. When multiple programmers unknowingly work independently on redundant data, it is almost impossible to keep the changes consistent.