Entropy (cryptology)

from Wikipedia, the free encyclopedia

The entropy ( portmanteau ancient Greek ἐντροπία Entropia , of ἐν s to 'in' and τροπή Trope , turn ',' reverse ') is also in the cryptology used term. It is a measure of the “disorder” in texts. Entropy is usually abbreviated with the Greek capital letter Η (“Eta”) .

When cryptanalysis of ciphertexts , the determination of the entropy is useful in order to obtain knowledge about the structure of the text and, supplemented by the coincidence index , if possible also about the underlying language, with the aim of breaking ( deciphering ) the ciphertext.

The term “entropy” is also used in thermodynamics , to which it was introduced in 1865 by the German physicist Rudolf Clausius (1822–1888).

definition

The entropy of a text in which the individual characters are numbered consecutively and in which each of these characters occurs with the probability is:

In this formula:

  • an index over the text
  • the likelihood of occurrence of the sign at the point
  • the logarithm dualis , i.e. base 2

The text can consist of any characters, symbols or numbers, the main thing is that they are distinguishable.

Since the individual probabilities are between 0 and 1, the expression is always negative or zero. The minus in front of the sum sign ensures that each individual summand is a positive number or zero. As a result, the entire entropy is always positive or zero.

The minus sign could have been placed directly in front of the , but then the formula would have needed additional brackets, and that would have been unaesthetic in the spelling. For the calculation of the formula it is irrelevant whether the minus is right in front or in front of (see equivalence transformation ).

Clearly, the entropy of a text corresponds exactly to the number of yes / no questions that one has to ask in order to guess the entire text. However, the entropy is not restricted to whole numbers, but is a real number.

Examples

Depending on the language used, plain texts have slightly different letter frequencies and thus also different entropy values. Based on the usual 26 capital letters of the Latin alphabet , the entropy can be calculated via the letter frequency . A count of the typical letter frequencies for some European languages ​​results in the following values ​​(in percent in each case ). In addition to the individual frequencies for various languages ​​such as German, English, Dutch, Spanish, French and Italian, the value 3.85% has been added to the left (in the second column) for comparison. This is the quotient 1/26 for an ideal random text in which all letters appear with the same probability .

    Zuf    Deu    Eng    Nie    Spa    Fra    Ita
A   3,85   5,45   7,19   7,17   6,69   6,82  10,73
B   3,85   1,75   1,58   1,41   0,71   0,70   0,89
C   3,85   3,37   4,05   1,78   3,52   3,30   5,05
D   3,85   5,11   3,11   6,85   4,03   3,71   3,57
E   3,85  16,89  13,05  18,84  15,92  15,61  13,19
F   3,85   1,28   2,42   0,78   1,10   1,13   1,31
G   3,85   3,76   2,34   2,94   1,57   0,84   1,05
H   3,85   5,26   4,71   2,75   1,22   0,59   1,50
I   3,85   8,51   7,71   6,87   7,32   7,11   9,80
J   3,85   0,18   0,09   1,50   0,16   0,19   0,01
K   3,85   1,51   0,58   1,92   0,05   0,01   0,01
L   3,85   3,77   3,72   4,15   5,31   4,85   5,76
M   3,85   2,22   2,54   1,88   2,56   3,22   2,98
N   3,85  10,42   7,81   9,91   7,14   9,42   7,57
O   3,85   3,11   7,52   5,85   6,01   6,08   9,66
P   3,85   0,63   2,30   1,36   3,53   3,21   2,63
Q   3,85   0,01   0,10   0,02   1,36   1,74   0,69
R   3,85   7,14   6,41   6,50   7,03   5,81   6,09
S   3,85   6,24   6,49   4,45   9,44   9,53   5,94
T   3,85   6,08   9,22   6,02   7,31   7,32   5,90
U   3,85   3,40   2,83   1,77   5,72   6,92   2,95
V   3,85   0,89   0,86   2,66   1,12   1,06   1,64
W   3,85   1,64   1,07   1,40   0,05   0,01   0,01
X   3,85   0,02   0,45   0,02   0,71   0,42   0,02
Y   3,85   0,07   1,73   0,09   0,36   0,36   0,01
Z   3,85   1,27   0,10   1,12   0,06   0,03   1,04

From this, the entropy values (in bits / characters) for the various languages ​​can easily be calculated using the definition equation given above. In the following table they are again supplemented by the value for ideal random texts (in the second column).

    Zuf   Deu    Eng    Nie    Spa    Fra    Ita
Η   4,7   4,07   4,16   4,06   4,03   3,98   3,99

The value for the random texts is obtained by inserting the value for each in the definition equation . This means that all summands are the same, and the entropy is calculated as or 4.7 bits / character. This is also the maximum entropy that a text of 26 characters can have.

The minimum entropy is achieved when a text (of any length) only ever uses a single letter, for example consists of the (senseless) sequence of A, such as "AAAAA ...". In this case the entropy is calculated from the partial entropy of the "A" and that of the other letters. The partial entropy of the "A" is , so 0 bit / character. The partial entropy of the remaining characters is also 0 bit / character. The total entropy is thus also 0 bit / character.

redundancy

The term redundancy (also in bits / characters) is closely linked to entropy . This means the difference between the maximum entropy and the entropy of the text or language under consideration. A random text has no redundancy whatsoever (R = 0), while natural languages ​​are all more or less redundant .

    Zuf   Deu    Eng    Nie    Spa    Fra    Ita
R    0    0,63   0,54   0,64   0,67   0,72   0,71

See also

literature

  • Friedrich L. Bauer : Deciphered secrets - methods and maxims of cryptology . Springer, Berlin 2000 (3rd edition), ISBN 3-540-67931-6 .
  • CA Deavours: Unicity Points in Cryptanalysis , Cryptologia , 1 (1), 1977, pp. 46-68.
  • Michael Miller: Symmetrical encryption method, Teubner, 2003, pp. 88-105. ISBN 3-519-02399-7 .
  • Claude Shannon : Communication Theory of Secrecy Systems . Bell System Technical Journal, 28 (Oct), 1949, pp. 656-715. PDF; 0.6 MB . Retrieved September 15, 2016.
  • Ronald Wick: European Alphabets PDF; 1.9 MB . Retrieved September 15, 2016.

Web links

  • Video Prof. Craig Bauer presents the entropy for different languages ​​(after 7 minutes; English). Retrieved September 15, 2016.

Individual evidence

  1. ^ Claude Shannon: Communication Theory of Secrecy Systems . Bell System Technical Journal, 28 (Oct), p. 681 below. PDF; 0.6 MB . Retrieved September 15, 2016.
  2. ^ Ronald Wick: European Alphabets , p. 80. PDF; 1.9 MB ( Memento of the original from September 15, 2016 in the Internet Archive ) Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice. . Retrieved September 15, 2016. @1@ 2Template: Webachiv / IABot / www.wi.hs-wismar.de