From "Searching" to "Reasoning": My Thoughts on the New Frontier of AI Math
By Pagurad
I’ve been following the recent shifts in AI research, and we’re finally moving past the era of the "stochastic parrot." We are seeing the birth of what researchers call "System 2" thinking - where the AI doesn't just guess the next word but actually builds a logical chain.
Here are a few things that caught my eye this month:
The "First Proof" Experiment
A group of mathematicians just released 10 research-level problems that have never been seen online. It’s a "clean" test designed to stop AI from simply searching its training data. Early results show that while current models are great at "competition" math, they still struggle when there's no internet "cheat sheet" to rely on.
Professional-Grade Discovery
Google DeepMind’s new agent, Aletheia, just hit 95.1% accuracy on the advanced IMO-ProofBench. More impressively, it’s moving into actual autonomous research - calculating complex structures in arithmetic geometry that were previously unsolved by humans.
Solving Erdős Conjectures
We’ve seen GPT-5.2 solve (and in one case, disprove) long-standing Erdős conjectures. The cool part? These aren't just text outputs; they are being formalized in Lean (a math coding language) and verified by experts like Terence Tao.
The "Slow Thinking" Shift
The secret isn't just bigger models anymore. It’s "inference scaling" - letting the model "think" longer at the moment you ask the question. By scaling compute during the query, the 2026 version of Deep Think has already cut the energy needed for complex problems by 100x compared to last year.
The Bottom Line
We’re transitioning from AI as a high-speed library to AI as an autonomous researcher. It’s a quieter revolution than the "safety" headlines, but for the future of STEM, it might be the most important one.