Skip main navigation

Decrypting the ciphertext knowing the length of the codeword

Once we've worked out the length of the codeword, how can this help us to break the Vigenère cipher?
An open book

How does knowledge of the codeword length help us to break the Vigenère cipher?

Recap: we’ve see that the distances between occurrences of the most common trigram are very likely to be multiples of the length of the codeword. By taking the greatest common divisors of these distances we can therefore deduce the length of the code word.

Once we know the length of the codeword, we can go back to using normal frequency analysis.

Remember in our example we had a codeword of length 7. So that means the 1st letter of the ciphertext, the 8th letter of the ciphertext, the 15th letter of the ciphertext, the 22nd letter of the ciphertext, and so on, are all encoded with a shift cipher using the same shift – that shift being the first letter of the code word.

So now we do a frequency analysis on just these letters, and we can deduce what shift was used, and by subtracting that shift from the ciphertext we have decrypted the 1st letter, the 8th letter, the 15th letter, the 22nd letter, and so on and so forth.

And then we do the same process for the 2nd letter, the 9th letter, the 16th letter, the 23rd letter etc. These will also be encrypted with a shift cipher with the same shift, but this time that shift is the second letter of the code word. But again that can now be found using frequency analysis on those extracted letters.

And we just repeat this process until we have recovered the entire plaintext. If the codeword has seven letters, then we need to repeat this process seven times.

Remark: Our short example (BOTHER THE THEOREM) was specially constructed to have a trigram appearing twice in the ciphertext. Typically one would need a longer ciphertext for this to happen at random. But in a ciphertext of several thousand characters, the method outlined here will break the encryption.

An obvious strengthening of this method would be to not use a repeating codeword, but rather one the same length as the plaintext message. But if the codeword were in English, the ciphertext would still be susceptible to a more advanced form of frequency analysis. However, if the long codeword is completely random then the ciphertext is provably unbreakable – such a system is better known as a one-time pad.

Example

At the bottom of this page is a link to a book encrypted with the Vigenère cipher. See if you can use the methods outlined here to decrypt it. (We have stripped out any non-alphabetic characters, such as space, punctuation, etc.)

Hint: You should find the most common trigram is NQL, occurring 566 times. And the three most common gaps between each occurrence of NQL are 2,735 characters apart, 7,390 characters apart, and 11,225 characters apart (all of these gaps occur 26 times). Even if you don’t wish to analyse the ciphertext, you should be able to make a good guess for the length of the codeword using this information.

© University of York
This article is from the free online

The Mathematics of Cryptography: From Ancient Rome to a Quantum Future

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now