On Trial: P versus NP and the Complexity of Complexity

Benjamin Skuse

If I asked you to prove in a court of law whether sorting a long list of numbers in order from lowest to highest was as complex a task as, say, solving a massive Sudoku puzzle, you might think I had lost my marbles. You would certainly question why taxpayers’ money was being squandered on a trial about a frivolous topic.

Yet, there may be more merit to bringing the case to court than first impressions would suggest. The relative difficulty of tasks like these is the fundamental question underlying one of the most pernicious problems in mathematics and computer science that has remained unsolved since its formulation in 1971: the P versus NP problem. The solution to this problem has huge ramifications in the real world, impacting medicine, artificial intelligence, internet security and a host of other areas. For these reasons, P versus NP is one of seven Millenium Prize Problems selected by the Clay Mathematical Institute as the most important to solve in our time.

The Civil Case

The “P” of P versus NP stands for “polynomial time.” A computer program runs in polynomial time if, when you increase the size of the input, the (idealised version of a) computer takes a proportionately longer time to complete its given task. List-sorting is a perfect example of a P problem, where there are known and straightforward methods of sorting the list and verifying that the list is sorted correctly that don’t scale at some absurd rate as the length of the list increases.

Illustration of how the difficulty grows at a reasonable rate as input grows larger for problems that can be solved in polynomial time (e.g. O(n), O(n2)), and how the difficulty of solving other problems grows at an accelerated rate (e.g. O(2n), O(n!)).

Expanding the “NP” of P versus NP is less informative (for the record, it stands for “nondeterministic polynomial time”), but in essence, NP problems are hard to solve and easy to verify. A generalised version of Sudoku – where instead of having a 9 × 9 grid you allow the grid to be arbitrarily large (n2 x n2) – is an NP problem. There is no known way to solve it that is significantly faster than brute-force checking of every possible combination of numbers to fill the board. As a result, when you increase n, the time to solve the puzzle scales faster than exponentially; i.e. it gets really hard to solve, really quickly. However, verifying a solution remains polynomial in time, as you can run a relatively simple and quick algorithm to check that the digits entered in all rows, columns and squares follow all the rules; i.e. it remains easy to verify.

Relating what you now know about the P versus NP problem to list-sorting and taking on a massive Sudoku puzzle, the answer to the question of which is more difficult to solve seems even more obvious – list-sorting, hands-down! If you then applied the same train of thought to a large selection of P problems and NP problems and presented your findings in a civil court of law, it would be easy to show that – based on the balance of probabilities – all list-sorting-type problems are not equivalent to all massive Sudoku-like problems, i.e. P ≠ NP, and you would win your case easily.

The Criminal Case

Image credits: Nick Youngson CC BY-SA 3.0 Alpha Stock Images.

But mathematicians and computer scientists require a higher standard of proof than that. In fact, society demands a higher standard of proof than that; because if P does turn out to equal NP, in principle, currently intractable problems will become easy. And this fact could be used for good – such as optimising transport routing and finding new medicines – or ill, including hacking bank accounts and government websites. Like juries in a criminal court determining the guilt of defendants, we all need to know beyond reasonable doubt whether or not P = NP. And that is a much tougher proposition.

Towards this aim, researchers have shown that nearly all seemingly hard NP problems are “NP-complete,” which means that if they had an efficient solution to one NP-complete problem, it could be adapted to solve every other NP problem quickly too. For example, generalised Sudoku is NP-complete, because it can be reduced to other classic complex problems that are known to be NP-complete, including the (clearly similar) Latin square completion problem and the (on the face of it, very dissimilar) Hamiltonian cycle problem. They are – in a precise mathematical sense – equivalent.

Therefore, to prove whether P does or does not equal NP, all a plucky researcher has to do is discover a clever algorithmic trick that solves an NP-complete problem in polynomial time (P = NP) or, alternatively, find just one NP-complete problem that they can prove is not solvable quickly by any computer program (P ≠ NP). However, despite half a century of effort from some of the brightest minds, proof – one way or another – remains elusive.

Complexity of Complex Problems

In recent years, many researchers have taken a step back from attempting to directly solve P versus NP. Instead, they have been questioning why solving P versus NP and similar problems is difficult in the first place. They have been asking questions such as “how hard is it to determine the difficulty of various computational problems?” and “why is it hard to determine how difficult they are?” These and other “meta” problems are the foundations for an area of mathematics and computer science called “meta-complexity”; both a topic of study and a tool that researchers are attempting to use in order to attack problems such as P versus NP.

An important focus for meta-complexity researchers at the moment is a particular problem with a long history that continues to defy clear classification: the minimum circuit size problem (MCSP). MCSP is interesting for several reasons, not least because it is one of the few challenging computational complexity problems left where it remains unclear whether it is NP-complete or not.

Found to play a central role in diverse areas such as cryptography, proof complexity, learning theory and circuit lower bounds, MCSP asks: Can you determine whether a black-box hypothetical computer used to compute a Boolean function has high or low circuit complexity? A Boolean function takes as input a certain number of zeros or ones and outputs zero or one (true or false). From this you can construct a truth table: a tabular representation of all the combinations of values for inputs and their corresponding outputs. Essentially, the truth table provides the input-output behaviour of the function, but gives no information about the function’s computational complexity. This complexity is represented by circuit complexity, defined as the total number of logic gates needed to build the smallest circuit that can compute the given function. 

With these clarifications, the problem can be presented more precisely: MCSP takes as input the description of a Boolean function f as a truth table as well as a circuit size parameter s, and asks “is there a circuit that computes f of size ≤s?” As MCSP is both computationally complex and about computational complexity, it is unusual in being a meta-, meta-complexity problem, or a meta2-complexity problem. The fact that researchers are trying to figure out why it is hard to determine how difficult MCSP is to solve, means it could be argued to be even more meta; a meta3-complexity problem perhaps.

For the simplest Boolean functions, MCSP can be solved by an algorithm that runs in polynomial time. But the vast majority of functions seem to require an exponentially increasing number of gates as the functions get larger. Like generalised Sudoku, MCSP seems impossible to solve unless using a brute-force search, but also easy to verify if you are given a solution. You can guess a circuit, run every possible input and see if the output matches that of the given Boolean function. It is therefore clearly an NP problem. But is it also in P? Is it an NP-complete problem, i.e. would a fast way of solving it mean proving all problems in NP are actually in P? Many in the community suspect MCSP is not in P, but is NP-complete, which, if proven, would definitively mean that P ≠ NP.

Progress Towards a Resolution?

In recent years, researchers have made significant progress in showing MCSP is NP-complete. For example, a huge number of proofs, many in the past six years, have emerged showing variants of MCSP are NP-complete. Perhaps some of the most interesting progress has been made by Shuichi Hirahara from the National Institute of Informatics in Tokyo.

In 2018, Hirahara derived a worst-case to average-case reduction for MCSP. Here, “average-case” complexity is a gauge of how easy or hard a problem is on average for most inputs, whereas “worst-case” complexity is the standard approach, considering the maximum complexity of the problem over all possible inputs. Problems where average-case and worst-case complexity may differ are interesting, as they can provide insights into the foundational complexity of these problems. Hirahara’s result implies that a hypothetical average-case MCSP algorithm would actually be powerful enough to solve a slightly different version of MCSP that restricts the types of circuits allowed without making any mistakes. This result is exciting because no other NP-complete problem has a known worst-case to average-case reduction. Given all NP-complete problems’ equivalence, it therefore provides a new avenue for research into the P versus NP problem. 

Next, in 2022, using a blend of complexity theory and advanced information theoretic cryptography, Hirahara proved that MCSP on partial Boolean functions is NP-hard (NP-hard problems are as hard as NP-complete problems but do not necessarily have to be in NP). A partial Boolean function has a truth table that contains the usual zeros and ones, but also some “don’t know” symbols – some could be either zero or one, where you do not care what the circuit does. Somehow, adding this “don’t know” ambiguity allows a host of useful techniques to be deployed to resolve the partial-MCSP problem, but so far, it is unclear whether these same techniques can be applied to resolve the original MCSP problem.

Euler diagram for P, NP, NP-complete and NP-hard set of problems. The left side is valid under the assumption that P ≠ NP, while the right side is valid under the assumption that P = NP. Image credits: Behnam Esfahbod.

Another researcher, Rahul Ilango from Massachusetts Institute of Technology, has been working to prove MCSP’s NP-completeness in multiple ways, looking at both simpler and more complicated versions of MCSP as entry points from which to attack the main problem. His most recent meta-complexity result succeeds in linking MCSP with the seemingly disparate set of Boolean satisfiability problems (SATs), where you are given a representation of a Boolean function as a formula or circuit and asked information about its function (e.g. whether it has a one in its truth table, or if it outputs a certain pattern of zeros and ones, etc.).

MCSP is what is known as a “black box problem”, because you want to know something about the representation, given black box access. SATs are completely different. Known as “white box problems”, SATs aim to say something about a function, given its representation. In 2023, Ilango and colleagues made the startling discovery that SAT reduces to MCSP with a random oracle, i.e. with a perfectly random function computable in P and accessible to all circuits and computers. As SAT is well known to be NP-complete, MCSP with a random oracle must also be NP-complete.

These and other recent results add to the case that the original MCSP is in fact NP-complete too. And, if this is true, MCSP could hide the missing piece of evidence that proves beyond reasonable doubt whether P does or does not equal NP.

Much like the problem itself, the trial of P versus NP has certainly been complex and lengthy, but progress is being made, and it feels like researchers are finally inching closer to a verdict.

Image credits: Nick Youngson CC BY-SA 3.0 Alpha Stock Images.

The post On Trial: P versus NP and the Complexity of Complexity originally appeared on the HLFF SciLogs blog.