# Modular Exponentiation

Modular Exponentiation

The key to Shor’s algorithm is the insight that the function (f(x) = a^x bmod N) is periodic, and that we can extract that period using the quantum Fourier transform. But this requires us to calculate (f(x)) in superposition. Can we do that efficiently?

## The State

The state we need is the superposition of (|xrangle|f(x)rangle) for all of the possible values of (x).

For reasons related to cleanly finding the period, we need the number of qubits in (|xrangle) to be twice the number of bits in the number (N). If we want to find the factors of a 1,024-bit number (substantially larger than the 768-bit number that is the best solved classically), we should use 2,048 qubits to represent (|xrangle).

This means that the exponent – the power we are raising to – is as much as (2^{2^{2049}-1}) (all ones in a 2,048-bit number). You might guess that the numbers grow very quickly, and indeed they do. Let’s pick (a = 3). The “small” number (3^{1024} = 373391848741020043532959754184866588225409776783734…), and that’s still just the beginning; to factor a 1,024-bit number, we would need to calculate the enormous number (3^{2^{2048}}), which has more than (10^{600}) digits in it!

Fortunately, we don’t have to keep all of the digits, since we are working with modulo arithmetic. (Good thing, too, since writing down such a number would take more than all of the atoms in the Universe, even if we could write one decimal digit in each atom.) Rather than try to calculate the whole thing, then take the modulo, instead we do all of multiplications modulo (N) as we go along, and the math works out just fine.

So, to write down our state, if (N) is a 1,024-bit number, we need 2,048 qubits to hold (|xrangle) and 1,024 to hold (|f(x)rangle). We will also need various temporary variables during the computation, so we might need a total of five or six thousand qubits in the fastest, most straightforward form of the computation.

## The Algorithmic Idea

If you are asked to calculate (3^{1024}), how will you do it? The most straightforward way is to keep multipling by three: (3^1, 3^2, 3^3, 3^4, …) which gives us (3, 9, 27, 81, 243, …). Repeat that a full 1,024 times. But that is extremely tedious even for exponents as small as a thousand, and instead we need to do this for exponents that are many, many digits long when we write them out. We would be stuck with the same problem of the exponential growth in computation that we’re trying to avoid. Above we solved the space problem by only keeping the modulo results; now we need to solve the time problem.

There is a simple fix: repeated squaring. Instead of multiplying by 3 each time, square the result of the last computation:

(3times 3 = 9 = 3^2)
(9times 9 = 81 = 3^4)
(81times 81 = 6561 = 3^8)
(vdots)
(3^{512}times3^{512} = 3^{1024})
(textrm{(a really big number)}timestextrm{(same really big number)})
(= textrm{(that enormous number above)})
(vdots)

and so on. This will require (2^n) repetitions for (n) bits.

## The Arithmetic

The modular exponentiation clearly requires (2^n) modulo multiplications. The most straightforward way to multiply is just the way you learned in school: compute all of the partial products, one digit at a time, then sum them. Thus, we create (n) partial products of (n) qubits each, and summing them will take (n) additions. Each of those (n) additions will require (O(n)) gates, using the CNOT and CCNOT gates we discussed earlier in this course.

But it’s worse than that: performing the modulo operations requires a lot of work to decide if we have grown bigger than (N), and if so then work to subtract (N) from the register. This multiplies our work by a factor of five or so, depending on the details of the algorithm we choose to implement. However, it is still (O(n)) operations.

Our complete modular exponention, then, requires

So our execution time for this phase of the algorithm is (O(n^3)) to factor an (n)-bit number, which is polynomial rather than exponential.

Space (number of qubits), time and fidelity (probability of no errors occurring) are our biggest worries when trying to build a quantum computer to run a particular application.

Various optimizations have reduced the number of qubits from the (5n) or more that we mentioned above to about (2n) qubit to factor an (n)-bit number, but they are complex and generally slow. At any rate, thousands of extremely high-quality qubits are necessary before Shor’s algorithm solve problems that we can’t solve classically. This is a significant engineering challenge, as we will see later in the course.

The performance depends on the circuit depth: how many gates in a row do we have to execute? Sometimes, we can reduce the amount of time necessary by executing some gates in parallel (at the same time). This depends on both the algorithm we are running, and the structure of the quantum computer itself, most importantly, how easy it is to move data from one side of the quantum computer to the other.

The required fidelity depends on the number of qubits in our system and that circuit depth. To figure out how good our quantum computer has to be, multiply the number of qubits times the number of time steps, and we are allowed less than one error, on average, in that many gate operations. Let’s say that we need (5n = 5120) qubits, and we require about (5n^3 approx 5times 10^9) time steps, so we can have less than one error in about (25n^4 approx 2.5times 10^{13}) gates. That’s pretty high precision! (This computation isn’t exact, but gives you the idea.) Other algorithms require fewer qubits and fewer timesteps, and may be more tolerant of errors; finding those boundaries is an important area of research.

# 冪乗余

Shorのアルゴリズムにとって重要なポイントは、関数(f(x) = a^x bmod N)が周期的であり、量子フーリエ変換を用いるとその周期を抽出することができるという点にあります。しかし、これを求めるには重ね合わせ状態の(f(x))を計算することが必要になります。効率よく計算するにはどうすれば良いでしょうか？

## 状態について

ここで、取りうる全ての(x)に関する重ね合わせ状態(vert xranglevert f(x)rangle)が必要になります。

これは、指数が(2^{2^{2049}-1})よりも大きいということを意味します。数字が非常に急速に伸びると推測でき、実際そうなっています。 例えば、(a=3)とすると(3^{1024} = 373391848741020043532959754184866588225409776783734…)となります。1,024ビットでいえば(3^{2^{2048}})もの膨大な数の計算が必要となり、これはなんと(10^{600})桁以上にものぼります！

しかしながら幸運なことにこの場合モジュロ演算で作業しているのですべての桁を書き留める必要はありません。(このような数を書き留めていくと宇宙のすべての原子の数より多くの数が取り込まれるため、各原子に10進数の1桁を書き込むこともできます）。つまり全体を計算するのではなく、代わりにモジュロを取ることで(N)を法としてすべての乗法を行うという方法にすれば膨大な計算もやりやすくなります。

したがって、状態を書き留めるには(N)が 1,024ビットの数であれば2,048量子ビット、(vert xrangle)を保持するには1,024が必要です。計算中には一時的に様々な変数が必要になるため、場合によっては最も速くで最も単純な形式の計算でも合計5〜6000量子ビットが必要になります。

## アルゴリズムの発想

3の1024乗を計算する時、皆さんならどうするでしょうか？例えば(3^1, 3^2, 3^3, 3^4, …)が (3, 9, 27, 81, 243, …)という調子で1024回繰り返すという方法があるでしょう。しかしこの方法だとたとえ1000ほどの指数であったとしても非常に煩雑になりますし、大きな桁の指数を書くのもなかなか大変です。同様に我々はしばしばこの計算の指数関数的増加という問題によく悩まされます。上記の説明では空間的問題を解決しましたが今度は時間的問題を解決する必要があります。

ただし次のように計算を工夫することができます。毎回3を掛ける代わりに、毎回の計算の結果を2乗していくという方法です。

(3times 3 = 9 = 3^2)
(9times 9 = 81 = 3^4)
(81times 81 = 6561 = 3^8)
(vdots)
(3^{512}times3^{512} = 3^{1024})
(textrm{(a really big number)}timestextrm{(same really big number)})
(= textrm{(that enormous number above)})
(vdots)

この調子で(2^n)を(n)ビット分繰り返していきます。

## 数学的解釈

しかし残念なことに実際にモジュロ演算を実行するには、計算結果が(N)より大きくなったかどうかを判断した後にもし大きければ レジスタから(N)減算するというかなり煩雑な行程が必要となってしまいます。もちろん実装しようとしているアルゴリズムにもよりますがこれでは計算量は5倍程度に増加してしまいます。

(n)ビットの因数分解を指数関数ではなく多項式で行うアルゴリズムのこの段階での実行時間は (O(n^3))になります。

## まとめ

このコースの後半でも説明しますが、あらゆる最適化によって、(n)ビットの数の因数分解に必要な量子ビットは、(5n)以上の量子ビットから約(2n)量子ビットに減少しました。しかしながら、依然このプロセスは複雑でもちろん計算速度も十分に早いとは言えません。 そのためShorのアルゴリズムが古典的には解決できない問題を解決するためには何千もの非常に高品質な量子ビットが必要になってくるという重要な工学的課題をクリアしていく必要があります。