3.14

# Modular Exponentiation

The key to Shor’s algorithm is the insight that the function $f(x) = a^x \bmod N$ is periodic, and that we can extract that period using the quantum Fourier transform. But this requires us to calculate $f(x)$ in superposition. Can we do that efficiently?

## The State

The state we need is the superposition of $|x\rangle|f(x)\rangle$ for all of the possible values of $x$.

For reasons related to cleanly finding the period, we need the number of qubits in $|x\rangle$ to be twice the number of bits in the number $N$. If we want to find the factors of a 1,024-bit number (substantially larger than the 768-bit number that is the best solved classically), we should use 2,048 qubits to represent $|x\rangle$.

This means that the exponent – the power we are raising to – is as much as $2^{2^{2049}-1}$ (all ones in a 2,048-bit number). You might guess that the numbers grow very quickly, and indeed they do. Let’s pick $a = 3$. The “small” number $3^{1024} = 373391848741020043532959754184866588225409776783734...$, and that’s still just the beginning; to factor a 1,024-bit number, we would need to calculate the enormous number $3^{2^{2048}}$, which has more than $10^{600}$ digits in it!

Fortunately, we don’t have to keep all of the digits, since we are working with modulo arithmetic. (Good thing, too, since writing down such a number would take more than all of the atoms in the Universe, even if we could write one decimal digit in each atom.) Rather than try to calculate the whole thing, then take the modulo, instead we do all of multiplications modulo $N$ as we go along, and the math works out just fine.

So, to write down our state, if $N$ is a 1,024-bit number, we need 2,048 qubits to hold $|x\rangle$ and 1,024 to hold $|f(x)\rangle$. We will also need various temporary variables during the computation, so we might need a total of five or six thousand qubits in the fastest, most straightforward form of the computation.

## The Algorithmic Idea

If you are asked to calculate $3^{1024}$, how will you do it? The most straightforward way is to keep multipling by three: $3^1, 3^2, 3^3, 3^4, ...$ which gives us $3, 9, 27, 81, 243, ...$. Repeat that a full 1,024 times. But that is extremely tedious even for exponents as small as a thousand, and instead we need to do this for exponents that are many, many digits long when we write them out. We would be stuck with the same problem of the exponential growth in computation that we’re trying to avoid. Above we solved the space problem by only keeping the modulo results; now we need to solve the time problem.

There is a simple fix: repeated squaring. Instead of multiplying by 3 each time, square the result of the last computation:

$3\times 3 = 9 = 3^2$
$9\times 9 = 81 = 3^4$
$81\times 81 = 6561 = 3^8$
$\vdots$
$3^{512}\times3^{512} = 3^{1024}$
$\textrm{(a really big number)}\times\textrm{(same really big number)}$
$= \textrm{(that enormous number above)}$
$\vdots$

and so on. This will require $2n$ repetitions for $n$ bits.

## The Arithmetic

The modular exponentiation clearly requires $2n$ modulo multiplications. The most straightforward way to multiply is just the way you learned in school: compute all of the partial products, one digit at a time, then sum them. Thus, we create $n$ partial products of $n$ qubits each, and summing them will take $n$ additions. Each of those $n$ additions will require $O(n)$ gates, using the CNOT and CCNOT gates we discussed earlier in this course.

But it’s worse than that: performing the modulo operations requires a lot of work to decide if we have grown bigger than $N$, and if so then work to subtract $N$ from the register. This multiplies our work by a factor of five or so, depending on the details of the algorithm we choose to implement. However, it is still $O(n)$ operations.

Our complete modular exponention, then, requires

So our execution time for this phase of the algorithm is $O(n^3)$ to factor an $n$-bit number, which is polynomial rather than exponential.

## Comments

Space (number of qubits), time and fidelity (probability of no errors occurring) are our biggest worries when trying to build a quantum computer to run a particular application.

Various optimizations have reduced the number of qubits from the $5n$ or more that we mentioned above to about $2n$ qubit to factor an $n$-bit number, but they are complex and generally slow. At any rate, thousands of extremely high-quality qubits are necessary before Shor’s algorithm solve problems that we can’t solve classically. This is a significant engineering challenge, as we will see later in the course.

The performance depends on the circuit depth: how many gates in a row do we have to execute? Sometimes, we can reduce the amount of time necessary by executing some gates in parallel (at the same time). This depends on both the algorithm we are running, and the structure of the quantum computer itself, most importantly, how easy it is to move data from one side of the quantum computer to the other.

The required fidelity depends on the number of qubits in our system and that circuit depth. To figure out how good our quantum computer has to be, multiply the number of qubits times the number of time steps, and we are allowed less than one error, on average, in that many gate operations. Let’s say that we need $5n = 5120$ qubits, and we require about $5n^3 \approx 5\times 10^9$ time steps, so we can have less than one error in about $25n^4 \approx 2.5\times 10^{13}$ gates. That’s pretty high precision! (This computation isn’t exact, but gives you the idea.) Other algorithms require fewer qubits and fewer timesteps, and may be more tolerant of errors; finding those boundaries is an important area of research.