# IBM Quantum Challenge 2020 - Part 5

This post is part of a serie of posts on my experience with the IBM Quantum Challenge 2020.

The serie is organised as follow:

- Introduction to the problem.
- Mathematical reformulation of the problem.
- Unoptimised implementation of the solution.
- How to optimise a quantum circuit?.
- Optimising multi-controlled
`X`

gates (this post). - Optimising the QRAM (Quantum Random Access Memory).
- Optimising the oracle

In the last blog post. I showed how to benchmark a quantum circuit to isolate the optimisation cadidates.

In this post I will use the call-graph generated in the previous blog post (re-included below) to perform the first optimisation.

## 65k - Improving multi-controlled X gates

The call-graph obtained in the last section is very useful in
understanding the first optimisation target: the `mcx_gray`

procedure accounts for more than 95% of the total cost!

This basically means that if I am able to halve the cost of implementing a multi-controlled X gate, I will nearly halve the total cost!

The question is now “how to reduce the multi-controlled X gate cost?”.

The first step to answer this question is to understand that a
multi-controlled X gate should be decomposed with only 1- and 2-
qubit gates. Within the context of the challenge, this restriction
comes from the fact that the cost of a circuit can only be
computed if the circuit is exclusively composed of `cx`

and `u3`

gates.

The decomposition step can be performed by following different algorithms that are more or less efficient.

**excellent**blog post from Craig Gidney: Constructing Large Controlled Nots.

### Benchmarking `mcx`

algorithms

It is easy to see from the documentation of QuantumCircuit.mcx that qiskit implements several construction algorithms that have different behaviours:

- “noancilla”, the default construction algorithm, implements a multi-controlled X gate without needing any additional ancilla qubits.
- “recursion” needs 1 ancilla qubit if we need more than 4 control qubits.
- “v-chain” needs \(c - 2\) ancilla qubits, \(c\) being the number of control qubits.

The benchmark of these different methods is quite straightforward:

```
import matplotlib.pyplot as plt
from qiskit import QuantumCircuit, QuantumRegister
# Warning: the "noancilla" method generates circuits with an exponential
# number of gates. Setting this value to something higher will
# likely increase by a lot the computational time.
max_qubits = 15
for method in ["noancilla", "recursion", "v-chain"]:
costs = list()
for ctrl_number in range(3, max_qubits):
qctrl = QuantumRegister(ctrl_number)
qtarget = QuantumRegister(1)
qancilla = QuantumRegister(ctrl_number - 2)
circuit = QuantumCircuit(qctrl, qtarget, qancilla)
circuit.mcx(qctrl, qtarget, ancilla_qubits=qancilla, mode=method)
costs.append(cost(circuit))
print(ctrl_number, method, costs[-1])
plt.semilogy(range(3, max_qubits), costs, label=method)
plt.title("Evolution of mcx cost for different construction algorithms")
plt.xlabel("Number of controls")
plt.ylabel("Cost of the multi-controlled X gate")
plt.grid(which="both")
plt.legend()
plt.show()
```

And here is the graph produced by the previous code snippet:

The “v-chain” method is a clear winner here, even if it requires more ancilla qubits. So let’s change the non-optimised code to use the “v-chain” method instead of the default “noancilla” one.

### Changing the code to use the best `mcx`

implementation

The code resulting from applying this optimisation is available in this Gist. With this small change the overall circuit cost went down from 144 423 to 65 429!

Re-using `qprof`

, the following call-graph is generated:

The overall number of gates droped significantly but the most
costly procedure is still the `mcx`

gate. This is good hint for
what should be the next optimisation: as I already optimised each
`mcx`

gate, the only optimisation left is to try to lower down
the number of `mcx`

gates used.

According to the call-graph, the `mcx`

gates are called by the 2
main quantum circuits:

- The QRAM (
`qram`

and`qram_dg`

) that accounts for more than 60% of the overall cost. - The oracle (
`check_if_match_any_permutation`

) that represents 37% of the overall cost.

Because both the QRAM and the oracle takes a large portion of the overall cost, I will eventually need to optimise both of them.

The next blog post will
be about `QRAM`

optimisation. The oracle optimisation will be
discussed in another blog post.