Kasiski test

The kasiski examination is in the cryptanalysis a tool for deciphering of ciphertext associated with the -Vigen'ere method were produced. It can be used to determine the length of the keyword used .

history

In 1854, the British Charles Babbage (1791–1871) succeeded in deciphering a Vigenère-encrypted text . However, he kept his method a secret. In 1863 the Prussian infantry major Friedrich Wilhelm Kasiski (1805–1881) published this method in the book "The Secret Scripts and the Deciphering Art", which he invented independently of Babbage. In his honor, the procedure is called the Kasiski test.

General procedure

Given is the cryptogram , a Vigenère-encrypted text. First you search the ciphertext for letter sequences of length 2 or longer that occur several times. Then you determine the distance between each 2 identical sequences, that is, you count the letters from the first letter of the first sequence (including) to the first letter of the second sequence (excluding). This is how you proceed with all the sequences found and write down the distances. A list of natural numbers is obtained. These are now broken down into prime factors . Identical factors can thus be found quickly. Coincidences that occur by chance are then easily recognizable because they fall out of line. However, the exact key length is not known because the Kasiski test only provides multiples of the key length. The Friedman test can then be used for a more precise examination , which also provides an indication of whether the encryption is mono- or polyalphabetic.

Idea of the Kasiski test

Why does the Kasiski test provide quite reliable statements about the keyword length? Let us consider the following encryptions:

The plain text (1st line) is Vigenère-coded with the keyword PLUTO (length 5). The ciphertext is on the 3rd line.

                   DER KLARTEXT WIRD ZUM GEHEIMTEXT
                   PLU TOPLUTOP LUTO PLU TOPLUTOPLU
                   SPL DZPCNXLI HCKR OFG ZSWPCFHTIN

The string TEXT occurs twice in plain text . Nevertheless, the corresponding strings in the ciphertext differ. The reason for this is that TEXT for the first time with utop , the second time but with OPLU is coded. This happens because the distance between TEXT and TEXT is 17 letters. However, the keyword has 5 letters, and because 5 is not a divisor of 17, both text passages are not encoded with the same part of the keyword, so that the same letter sequences are not to be expected in the ciphertext. Let's change the little example a little.

                   DER KLARTEXT WERDE GEHEIMTEXT
                   PLU TOPLUTOP LUTOP LUTOPLUTOP
                   SPL DZPCNXLI HYKRT RYASXXNXLI

This time, TEXT is encrypted twice with UTOP ; therefore the consequences in the cryptogram also match. If you determine the distance between TEXT and TEXT here too , you get 15, a multiple of 5, of the keyword length. In summary, it can be stated that the same letter sequences (words, syllables, word stems etc.) only result in the same letter sequences in the cryptogram if the distance between them is a multiple of the keyword length. In other words: If a sequence of letters occurs twice in the cryptogram and the same word was encrypted with it, the distance between the two sequences is a multiple of the keyword length. The Kasiski test searches for the same sequence of letters in the cryptogram. It is now assumed that they encode the same word. If this is correct, the distance is a multiple of the keyword length. But if the same word was not encrypted, the distance is not a multiple of the keyword length, and the two places in the ciphertext are only the same by chance. Of course, you cannot immediately tell whether the same string of characters was created “by chance” or whether the same word was actually encrypted. Therefore, at the end of the day, common factors are sought in order to find the "unsuitable" distances. Of course, especially with short episodes, it happens that they appear twice, even though the same word was not encrypted. That is also the reason why one does not usually look for the same episodes of length 2. The probability that the letter sequences do not match in plain text is simply too great.

Examples

Let the following Vigenère-encrypted ciphertext be given.

                   SPL  DZPCNXLI  HYKRT  RYASXXNXLI

The sequence NXLI appears twice in the ciphertext. The distance between these two text passages is 15 characters. 15 can be broken down into the prime factors 3 and 5. Assuming that it is not a random occurrence, one can say that the same word (or syllable, word beginning or similar) was encrypted. One will assume here that the keyword has the length 3, 5 or 15.

Of course, with longer ciphertexts, more precise statements can be made about the length of the keyword. The main reasons for this are as follows: Several letter sequences appear twice . A sequence of letters (especially with frequently occurring words, e.g. articles, pronouns, conjunctions) even occurs three or more times in the cryptogram.

The following Vigenère-encrypted ciphertext is given (Genesis, chapter 1, verses 1–4 was encrypted with the key word OLD TESTAMENT (14 letters)). The Kasiski test is used to determine the length of the keyword.

                   AXTRX TRYLC TYSZO EMLAF QWEUZ HRKDP NRVWM WXRPI
                   JTRHN IKMYF WLQIE NNOXW OTVXB NEXRK AFYHW KXAXF
                   QYAWD PKKWB WLZOF XRLSN AAWUX WTURH RFWLL WWKYF
                   WGAXG LPCTG ZXWOX RPIYB CSMYF WIKPA DHYBC SMYFW
                   KGMTE EUWAD LHSLP AVHFK HMWLK

Procedure: Search for identical text strings of at least length 3, mark them and determine the distances.

                   AXTRX TRYLC TYSZO EMLAF QWEUZ HRKDP NRVWM WXRPI 
                   JTRHN IKMYF WLQIE NNOXW OTVXB NEXRK AFYHW KXAXF 
                   QYAWD PKKWB WLZOF XRLSN AAWUX WTURH RFWLL WWKYF 
                   WGAXG LPCTG ZXWOX RPIYB CSMYF WIKPA DHYBC SMYFW
                   KGMTE EUWAD LHSLP AVHFK HMWLK

                   XTR:         Abstand 3
                   XRPI:        Abstand 98
                   YFW:         Abstand 70
                   YBCSMYFW:    Abstand 14

Decomposition of the distances into prime factors.

                  3   =      3
                 98   = 2 x         7 × 7
                 70   = 2 x     5 × 7
                 14   = 2 x         7

evaluation

As you can see from the prime factorization, all distances (except the first) are multiples of 14. The distance 3 is probably a coincidence. This makes the following guesses for the keyword length: 2, 7, or 14.

literature

Albrecht Beutelspacher : Cryptology. An introduction to the science of encryption, concealment, and concealment. Without any secrecy, but not without deceitful rogue, presented for the benefit and delight of the general public. 2nd considerably expanded and hopefully improved edition. Vieweg, Braunschweig 1991, ISBN 3-528-18990-8 .