## Want to keep learning?

This content is taken from the Coventry University & Institute of Coding's online course, An Introduction to Cryptography. Join the course to learn more.
2.3

## Coventry University

Skip to 0 minutes and 8 seconds For something to be truly random, then it must be non-deterministic and totally unpredictable. For something to be chaotic, it will be defined and deterministic. Unpredictability arises from a lack of precision. At some point, our understanding of deterministic behaviour may be inadequate to provide a complete model for something and consequently, we can see a fuzzing of the boundaries between deterministic chaotic behaviour and pure random behaviour. A coin toss is an execution of an incredibly complex physical model that might be regarded as deterministic and chaotic, yet the capacity to completely model the physical processes, the interactions between many fundamental forces, is so beyond our capabilities that the behaviour of the coin toss could be deemed to be random.

Skip to 1 minute and 9 seconds The outcome of flipping an unbiased coin cannot be predicted, heads or tails. If we perform many flips of the coin, then we will never learn anything which will provide us with the ability to predict the outcome of the next toss of the coin. This is an example of non-deterministic and random behaviour. Pure randomness is a very useful concept which forms an important component of our cryptographic algorithms. It represents a hypothetical benchmark against which the outcome of our cryptographic algorithms can be tested. What is order then? Well order is represented by the lack of surprise in outcomes.

Skip to 1 minute and 52 seconds If you see a binary stream as a repetitive sequence of ones and noughts, then you might hypothesise that the next bit in the sequence will be one. If this is subsequently observed, then we may assume some order to the sequence. However, sequences like this can occur perfectly naturally in a random generation of bits, apparent order in randomness. With a big enough experiment then the ordered sequences can become arbitrarily long as the consequence of this random process. Back in the 1940’s Claude Shannon recognised similarities between these concepts and thermodynamics, and consequently, he devised a measure called information entropy to represent disorder, uncertainty or surprise.

Skip to 2 minutes and 40 seconds If we look at a language alphabet comprising 26 letters it has the potential information content of log to the base-2 of 26, or approximately 4.7 bits. If a letter from the alphabet is selected at random then this uncertainty or entropy will be 4.7 bits. If a second letter is selected then the entropy for that letter, given the selection of the first letter, will also be 4.7 bits. The entropy, however, for both letters selected at random is twice 4.7, or 9.4 bits. The information potential of alphabets is constrained by language then, as there are rules governing word construction. The use of vowels, double letter pairs and triples etc.

Skip to 3 minutes and 30 seconds There are also many unacceptable arrangements, such as QF or AAE within the language lexicon. Due to this, the English language has a per character entropy of just 1.5 bits and the difference between the alphabet entropy and the language entropy is called the redundancy or compressibility of the language. For a perfect cryptographic algorithm, the per character entropy should be equal to the alphabet entropy, i.e. there should be redundancy of zero. This is the theoretical ideal. Perfect security. You are no more likely to correctly identify the plain text for a perfect cryptographic algorithm with sight of the cypher text than if you had no knowledge at all of the cypher text.

# Chaos and order

Can we tell the difference between encrypted text, which looks random to a human observer, and truly random sequences of characters? Here, we investigate ideas of entropy and information.

Humans are good at seeing patterns. In fact, we can visually distinguish between two textures even when they are statistically similar – differences just ‘pop out’ (Julesz 1962).

The kinds of patterns that might be seen in streams of digits that can tell us the difference between random digits and a sequence that contains information are much harder to find and require more than just visual inspection to detect.