Skip main navigation

Hurry, only 3 days left to get one year of Unlimited learning for £249.99 £174.99. New subscribers only. T&Cs apply

Find out more

How the Vigenère cipher was broken

So how was the Vigenère cipher broken? Guesswork, frequency analysis, and some more simple maths!
A pocket calculator
© University of York

The key flaw in the Vigenère cipher was the repetition of the codeword.

In our example above, where the plaintext BOTHER THE THEOREM was encrypted using the seven-letter codeword KASISKI to obtain the ciphertext MPMQXCCSFMQXZAPN, notice how the THE in BOTHER and in THEOREM both get enciphered to MQX, and notice that the distance between these two occurrences of MQX is 7, a multiple of the length of the codeword: MPMQXCCSFMQXZAPN

How was it broken?

Just as with frequency analysis, where some letters in English are more common than others, so it is that some combinations of letters are more common than others. For example, the three most common trigrams (that is, three consecutive characters) in English would be THE, AND, and ING.

So it might be that one of these common trigrams just happens to occur under the same three letters of the repeated codeword. And when that happens, if you think about how the cipher works, they will get encrypted to the same three letters in the ciphertext.

So if you search the ciphertext for all possible combinations of three letter characters, it’s not implausible that you will find some trigrams are more common than others. Taking the most common of these ciphertext trigrams, maybe it comes from the same three letters of plaintext encrypted under the same three letters of the codeword.

Under this assumption, the distance between any two occurrences of this common ciphertext trigrams will be some multiple of the length of the codeword.

Thus to find the length of the unknown codeword, find the most common trigram in the ciphertext, and find the distance between each occurrence, and then see if those distances have a common factor – that number is likely the length of the codeword.

The next step will show how to use knowledge about the length of the keyword to decrypt the ciphertext. But first we need an easy way of finding this common factor between the distances.

Greatest common divisors

Given two integers (n) and (m), the greatest common divisor (gcd) or largest common factor is the largest integer (d) such that (d) divides (n) and also (d) divides (m).

For small numbers, this can often be found by finding the prime factorisations of (n) and (m). For example, the greatest common divisor of 72 and 90 can easily be seen to be 18, since (72 = 2^3 times 3^2) and (90 = 2 times 3^2 times 5) and the largest factor in common with both numbers is (2 times 3^2 = 18).

For larger numbers, factorising them can be very hard. However, finding the greatest common divisor between two numbers is relatively simple, even for very big numbers. This is due to something called Euclid’s algorithm.

It relies on the fact that if (d) divides both (n) and (m) then (d) also divides any integer linear combination of (m) and (n). For simplicity, let’s take (n > m > 0). The remainder term of (n/m) is the integer (r) such that (n = q times m + r) with (q) an integer (called the quotient) and (0 leq r < m). This means that (r=n – q times m) is an integer linear combination of (m) and (n), and therefore it is divisible by any common divisor of those two numbers.

In particular, taking the largest such (d) shows that (gcd(n,m)) divides (r).

Reversing the argument shows that any divisor of (m) and (r) must also divide (n). Together that shows that (gcd(n,m) = gcd(m,r)).

We now repeat this process with (m) and (r) taking the roles of (n) and (m). Since the remainder terms are strictly decreasing in this process, the last non-zero term must be the greatest common divisor of the original numbers. For large numbers, this process is much more efficient than trying to factor those numbers, as finding the remainder term upon division is straightforward.

Example

To find the greatest common divisor of (n=48,929) and (m=16,021) using Euclid’s algorithm we first divide (n) by (m) and find the remainder. This is (r=866) since (48,929 = 3 times 16,021 + 866).

Next we do the same, but for (16,021) and (866). We find that (16,021 = 18 times 866 + 433).

Now we do the same with (866) and (433). But it is clear that (433) exactly divides (866), so this tells us that (433) is the gcd of (866) and (433). Working back up the chain, this tells us that (433) is the gcd of (16,021) and (866), and hence is also the gcd of (48,929) and (16,021). (One can easily check to see (433) exactly divides both these numbers, and the argument above shows that there can be no larger integer that exactly divides these two numbers).

© University of York
This article is from the free online

The Mathematics of Cryptography: From Ancient Rome to a Quantum Future

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now