IBM Quantum Challenge 2020 - Part 5
This post is part of a serie of posts on my experience with the IBM Quantum Challenge 2020.
The serie is organised as follow:
- Introduction to the problem.
- Mathematical reformulation of the problem.
- Unoptimised implementation of the solution.
- How to optimise a quantum circuit?.
- Optimising multi-controlled
X
gates (this post). - Optimising the
QRAM
(Quantum Random Access Memory).
In the last blog post. I showed how to benchmark a quantum circuit to isolate the optimisation cadidates.
In this post I will use the call-graph generated in the previous blog post (re-included below) to perform the first optimisation.
65k - Improving multi-controlled X gates
The call-graph obtained in the last section is very useful in
understanding the first optimisation target: the mcx_gray
procedure accounts for more than 95% of the total cost!
This basically means that if I am able to halve the cost of implementing a multi-controlled X gate, I will nearly halve the total cost!
The question is now “how to reduce the multi-controlled X gate cost?”.
The first step to answer this question is to understand that a
multi-controlled X gate should be decomposed with only 1- and 2-
qubit gates. Within the context of the challenge, this restriction
comes from the fact that the cost of a circuit can only be
computed if the circuit is exclusively composed of cx
and u3
gates.
The decomposition step can be performed by following different algorithms that are more or less efficient.
Benchmarking mcx
algorithms
It is easy to see from the documentation of QuantumCircuit.mcx that qiskit implements several construction algorithms that have different behaviours:
- “noancilla”, the default construction algorithm, implements a multi-controlled X gate without needing any additional ancilla qubits.
- “recursion” needs 1 ancilla qubit if we need more than 4 control qubits.
- “v-chain” needs $c - 2$ ancilla qubits, $c$ being the number of control qubits.
The benchmark of these different methods is quite straightforward:
import matplotlib.pyplot as plt
from qiskit import QuantumCircuit, QuantumRegister
# Warning: the "noancilla" method generates circuits with an exponential
# number of gates. Setting this value to something higher will
# likely increase by a lot the computational time.
max_qubits = 15
for method in ["noancilla", "recursion", "v-chain"]:
costs = list()
for ctrl_number in range(3, max_qubits):
qctrl = QuantumRegister(ctrl_number)
qtarget = QuantumRegister(1)
qancilla = QuantumRegister(ctrl_number - 2)
circuit = QuantumCircuit(qctrl, qtarget, qancilla)
circuit.mcx(qctrl, qtarget, ancilla_qubits=qancilla, mode=method)
costs.append(cost(circuit))
print(ctrl_number, method, costs[-1])
plt.semilogy(range(3, max_qubits), costs, label=method)
plt.title("Evolution of mcx cost for different construction algorithms")
plt.xlabel("Number of controls")
plt.ylabel("Cost of the multi-controlled X gate")
plt.grid(which="both")
plt.legend()
plt.show()
And here is the graph produced by the previous code snippet:
The “v-chain” method is a clear winner here, even if it requires more ancilla qubits. So let’s change the non-optimised code to use the “v-chain” method instead of the default “noancilla” one.
Changing the code to use the best mcx
implementation
The code resulting from applying this optimisation is available in this Gist. With this small change the overall circuit cost went down from 144 423 to 65 429!
Re-using qprof
, the following call-graph is generated:
The overall number of gates droped significantly but the most
costly procedure is still the mcx
gate. This is good hint for
what should be the next optimisation: as I already optimised each
mcx
gate, the only optimisation left is to try to lower down
the number of mcx
gates used.
According to the call-graph, the mcx
gates are called by the 2
main quantum circuits:
- The QRAM (
qram
andqram_dg
) that accounts for more than 60% of the overall cost. - The oracle (
check_if_match_any_permutation
) that represents 37% of the overall cost.
Because both the QRAM and the oracle takes a large portion of the overall cost, I will eventually need to optimise both of them.
The next blog post will
be about QRAM
optimisation. The oracle optimisation will be
discussed in another blog post.