Skip main navigation

Frequency analysis

Despite the huge number of possible substitution ciphers, they're very easy to break using frequency analysis. Let's see how.
An abacus

We’ve seen that there are over 403 octillion ways of permuting the 26 letters of the English alphabet. That’s 403 followed by 24 zeros. Checking one permutation per second to see if it yields an English decipherment would take 13 billion billion years.

However, we know that substitution ciphers were being broken as long ago as the 9th Century AD thanks to al-Kindi’s method based on frequency analysis. This attack was known in Europe by the 1400s.

It comes down to the fact that some letters typically appear more often in English than others do: for example, the letter E gets used much more than the letter X. Since each plaintext letter gets enciphered to the same ciphertext letter, frequency analysis allows us to make a good guess at the permutation that has been used.

Here are the relative frequencies with which each letter of the alphabet appears in written English.

English letter frequencies

We can see that E is the most common letter, followed by T and then A. So, if we’ve been given some ciphertext which we think may have been produced by applying a substitution cipher to some English plaintext, we can try to decode it by replacing the most common letter in the ciphertext with E, the next most common letter with T, and so on. Once the most common letters are in place, we can hopefully work out the less common letters “by eye”.

In some sense this shouldn’t be a surprise. Children have long been playing hangman, and it’s well known that a good strategy is to try common letters such as E before less common letters such as Z.

For this to work, you need to know what the underlying language of the plaintext is. The distribution of letters in English is very different from the distribution of letters in Welsh:

Welsh letter frequencies

Finally, it’s worth noting some peculiar examples of texts for which the letter frequencies are far from what would be expected in a piece of English writing.

  • Gadsby: A Story of Over 50,000 Words Without Using the Letter “E’‘, by Ernest Vincent Wright, was published in 1939. More recently, the 1969 novel La Disparition by Georges Perec (which was translated into English by Gilbert Adair as A Void in 1995) tells the story of some friends searching for their missing colleague, Anton Vowl, and does so without any use of the letter E whatsoever.
  • Eunoia by Christian Bok is a book where each vowel appears by itself in its own chapter, so in the first chapter every word contains no vowel other than “A”. The word “Eunoia’’ is the shortest English word containing all five vowels, and it comes from the Greek for “beautiful thinking’’. It’s a rhetorical device for building goodwill with the audience.
© University of York
This article is from the free online

The Mathematics of Cryptography: From Ancient Rome to a Quantum Future

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now