FormalInception-ATP is an agent system built for automated theorem proving. We ran it over the miniF2F benchmark and it closed every one of the 244 problems, with each proof checked by the Lean 4 compiler.* As far as we know, nobody has reported a perfect score on this set before.
Each problem was given to the prover with only its informal and formal statements as input. The system relies solely on the Lean compiler, Mathlib search, and deterministic tactic search. It has no access to the web. We used Gemini 3.5 Flash and Gemini 3 Flash as the underlying models.
Every proof was checked with Comparator on Lean 4.29.0, and all of them passed. The full set of proofs is below, together with a script that reproduces the verification, so anyone can re-run it and confirm the claim for themselves.
We plan to open-source the pipeline soon, and will write up how it works in more detail shortly after.
Authors
Cite this result
* Disclaimer: The 244/244 result was not obtained in a single run of the system. Due to a limited compute budget, it is the union of several runs with different agent settings. Cost is at Google Cloud flex pricing.