“OK Google, Prove the Riemann Hypothesis”

Benjamin Skuse

Since Ngô Bảo Châu’s inaugural talk in 2012, Fields Medal Symposia have explored and celebrated the current and potential impact of an individual Fields Medallists’ work. So when in 2022 Akshay Venkatesh (Fields Medal – 2018) said “we have enough math talks” and suggested using his Symposium to instead “reflect on the ways in which technology is and will affect mathematics,” it had people turning their heads.

Akshay Venkatesh delivering his Fields Symposium in 2022. Image credits: Fields Institute.

Where previous Symposia performed a type of public outreach role, Venkatesh’s was a slap in the face for mathematicians to wake up and ask deep existential questions about the nature of mathematics and the role of the human mathematician in light of rapidly evolving AI.

Over two years on, thankfully the machines have not fully taken over (yet). But sceptics suggesting that AI is all hype and will have little bearing on the way mathematics research is and will be carried out are no longer the dominant voices in the community. Intriguing, exciting, frightening, depressing are all fitting adjectives depending on your viewpoint for what could be described as a bifurcation point in mathematics, where decisions made right now with respect to how AI is applied will determine the subject’s future direction.

What’s All the Fuss About?

Putting aside the obvious – that calculating tools and computers have become more and more critical in mathematics over a period of centuries – recent developments in computing have seen computers go beyond mechanical computation and start to perform higher-level activities. Formal proof assistants, neural networks and large language models have each shown real potential in assisting in mathematical research.

The most obviously useful of the trio are formal proof assistants. These software tools are a special type of computer language specifically designed to verify the correctness of a mathematical argument’s conclusion. They translate each step of a mathematical proof into several lines of code, and then attempt to compile the overall result. If the code compiles, the proof is valid. If not, the proof needs some work. Modern proof assistants, such as Coq, Isabelle and Lean, mimic the language and structure of mathematical writing, employing a number of tactics to complete many of the tedious steps in proofs in the background, hidden from the user.

The first major achievement in serious mathematics by a formal proof assistant was to formally verify the prime number theorem in Isabelle in 2004. More challenging Fields Medal-level proof verifications ensued, including the four-colour theorem in 2005 and the Feit–Thompson odd order theorem in 2012, both using the Coq theorem prover. A milestone result came in 2022 when researchers formalized a proof in Lean of sphere eversion, i.e. the ability to turn a sphere inside out in 3-space (allowing it to pass through itself, but with creasing or cutting not allowed), showing formal proof assistants are not just restricted to algebraic applications. More recent results have even seen researchers formalise their conjectures and resolutions in ‘real-time’, i.e. as part of their research before publication.

The four colour theorem, whose proof was verified in 2005 by a formal proof assistant, states that no more than four colours are required to colour the regions of any map so that no two adjacent regions have the same colour. Image credit: Strangerpete.

Casting the Net Wide

Finding countless uses in recommendation systems, image recognition, fraud detection, etc., neural networks are a world away from formal proof assistants. Broadly speaking, they spot links between the swathes of data they are trained on, and build functions interpolating that data. They then apply these functions on any new data to which they are exposed. Always with less than 100% accuracy and a certain ‘black box’ character, it is hard to see on face value how neural networks can contribute to the business of research mathematics, whose prime objectives are building rigorous proofs and providing deep understanding of complex mathematics.

Yet neural networks are beginning to make a mark on the mathematics community. Arguably the biggest success so far came in 2021, when a team trained a neural network on a database of nearly 2 million known knots (together with a million randomly generated additional knots). From this, the model could predict the signature of a knot from a number of invariants (properties used to address the problem of distinguishing knots from each other). The researchers then used these results to home in on just three invariants that appeared to be important in defining the signature of a given knot. This eventually led to a conjecture – which the researchers have since proven rigorously – that a knot’s signature \( \sigma(K) \) is related to its slope, volume and injectivity radius by:

\[ |2\sigma(K) – \textrm{slope}(K)|\le c \textrm{vol}(K) \textrm{inj}(K)^{-3}. \]

A mathematical knot table exhibit created by Estes Objethos Atelier, as part of a collection by Matemateca (IME-USP), photographed by Rodrigo.Argenton.

An AI Mathematics Assistant

The third and final new mathematical helper on the block are large language models (LLMs). Trained on broad and large datasets of natural language texts, these neural network variants place weight and value on a string of words (or letters, numbers, etc) to simulate the context of a sentence in order to predict the next word or sentence. A given text prompt by a user thereby generates a text response from the LLM, simulating a conversation (which, as an aside, is also why you can never have the last word with an LLM).

LLMs trained on the entire internet, e.g. Google’s Gemini, OpenAI’s ChatGPT and DeepSeek, are delivering sophisticated human-like responses. However, although mathematicians are already using these LLMs for literature searches, brainstorming and tasks like writing optimised code, they are currently nowhere near the level required to be useful for serious mathematics. The greatest achievement so far came from an LLM designed by Google DeepMind that generated mathematical proofs to silver medal standard at 2024’s International Mathematical Olympiad (IMO), a competition pitting the world’s brightest high school mathematics students against one another.

Are We There Yet?

Right now, formal proof assistants, neural networks and LLMs – AI technology, for brevity – are just tools. Neural networks are becoming critical in searching for solution spaces in some areas of mathematical research, formal proof assistants offer super-human accuracy when it comes to verifying proofs, and LLMs are used more and more for bouncing around ideas to help mathematicians solve problems.

Already, researchers are exploring how to combine these tools, with some, for example, developing LLMs to read and translate human-written theorems and proofs into Lean or some other formal proof assistant to verify. But they all still require a healthy dose of human guidance.

Will this remain the case and, to paraphrase Venkatesh, how will all of this affect mathematics? Inspired by Venkatesh’s 2022 wake-up call Symposium, the Bulletin of the American Mathematical Society commissioned a special issue on this topic, soliciting opinions from across mathematics, machine learning and related disciplines.   

Kevin Buzzard, a prominent mathematician at Imperial College London, UK, was one contributor. “At the moment, they [AI tools] are machines that can do undergraduate-level mathematics,” he says. “Why would a research mathematician care about that? We’re surrounded by things that can do undergraduate mathematics: hundreds of undergraduates.”

Kevin Buzzard. Image credit: Thomas Angus/Imperial College London.

Buzzard adds that he has not seen any evidence that AI technology can actually do any research-level mathematics, like derive hypotheses or build complex proofs, autonomously. “If you just read the hype, you would think that they did the IMO in 2024, so they’ll be getting PhDs in 2025 and proving the Riemann hypothesis in 2026,” he remarks. “But I personally am slightly sceptical about whether the entire thing is going to work beyond some point.”

Similarly, Ernest Davis of the Courant Institute of Mathematical Science, New York University, USA, suggested in his special issue article that human commonsense reasoning is often an ignored but centrally important aspect of mathematical practice, and that this could “raise challenges to building an AI that can read and understand (human-written) mathematical articles.”  

Others are less sceptical of the potential of AI in mathematics. Maia Fraser from the University of Ottowa, Canada, an editor of the same special issue, works in both pure mathematics and machine learning. She is unequivocal in her response: “I think it’s quite likely that most things that pure mathematicians can do in terms of proving theorems will be able to be done by machines,” she says. “And I do absolutely see this as an existential threat to mathematicians, in the sense that people are really asleep at the wheel, either assuming that AI will not make these big changes, or that everything will just evolve somehow on its own, and it’ll be fine.”

Maia Fraser. Image credit: University of Ottowa.

Shaping Mathematics for the Future

Putting aside the very deep philosophical debates surrounding, for example, what constitutes a proof in a potential future where automated proofs that no human can understand become the norm, the near-term “big changes” Fraser alludes to are: What will be the role of a human mathematician, and how can the community ensure equal access to the AI tools that will be needed to conduct mathematical research?

Some in the community, such as world-renowned mathematician and 2006 Fields Medallist Terence Tao, see a future where humans and machines could work in tandem to conduct mathematics at currently inconceivable scales. This could, for example, allow analyses of large families of related equations in order to build deep fundamental insights, or the division of labour on huge mathematical problems, such as the Langlands Program, into multiple manageable chunks via large human–AI collaborative efforts.

Others urge caution. “These tools have a learning rate that far exceeds that of humans, so already we’re in a critical stage where it’s important to ask what kind of interaction with machines do we want to have,” Fraser warns. “We tend to neglect that mathematics is an inter-human activity – even if a machine could produce a proof of a theorem, that doesn’t necessarily mean we would want a machine to do that.” She likens this to how society still values sports such as athletics, even though cars and motorcycles can obviously travel far faster than humans can run.

Another special issue contributor, anthropologist Rodrigo Ochigame of Leiden University, the Netherlands, has thought deeply about the ramifications of increasingly leaving mathematical graft to the machines. “The character of mathematical research may change so that the parts of it that are most enjoyable to many mathematicians – gaining what they call a deep understanding – might become a secondary part of the work,” he predicts. “If the everyday work of doing math research becomes training machine learning models and then repeatedly prompting them, a lot of people will find this unsatisfying.”

Rodrigo Ochigame. Image credit: Suédy Mauricio/Leiden University

Though Buzzard, Fraser and Ochigame have unique outlooks on the future of mathematics, they are completely aligned on one aspect: the need for transparency and equal access to AI tools. Buzzard perhaps sums up the mood best. “What do we know about this DeepMind system that got the silver medal on the IMO? We have a blog post – there’s not even a research paper,” he laments. “It’s extremely frustrating; people in industry are all absolutely keeping their cards close to their chests.”

“I think this is troubling to a lot of mathematicians,” agrees Ochigame. “If raw computational power comes to constitute a major source of advantage for proving theorems, and if mathematicians maintain their first-past-the-finish-line criteria for attributing credit for proving theorems, research mathematics will increasingly depend on large funding for access to supercomputers – and academic departments will find it harder and harder to compete with big tech firms and military agencies.”

Fraser similarly worries about equal access and the effects of AI on human activities. “These tools can be used for things that strengthen what we value, and they can also be used in ways that really damage things that we value,” she explains. “We need to ask what actions we need to take right now before these very quickly developing AI tools become quite dangerous, and hard to regulate appropriately.” 

The post “OK Google, Prove the Riemann Hypothesis” originally appeared on the HLFF SciLogs blog.