1998 年,托马斯·海尔斯提出证明凯勒猜想:当每个球都位于下层由6个球构成的环形凹位中时,六边形堆叠是最致密的球体排列。该结论被同龄数学家普遍接受前经过十余年反复核查,凸显数学中的核心瓶颈是“信任”,即每一步都需逐符号、逐命题验证。
多个 AI 项目正在尝试自动化这一验证链条:从自然语言到 Lean 等形式语言的转换,到为优化问题生成可检查证明。DARPA 的研究、DeepMind 的 AlphaEvolve 与 DeepThink、Harmonic 的 Aristotle 以及 Math, Inc. 的 Gauss 都在推动中,后者在 8 维和 24 维球体堆叠上在数周内完成形式化,但用户仍需能识别每个步骤是否正确。
研究者也观察到 LLM 的演化路径并非线性替代人类:其推理更像“即时流”,在跨问题迁移、非标准思维和美学化精炼证明上仍弱于人,且会在边界情况下失常。案例上,Claude 在奇数路径分支上可解却在偶数上失效,几周后 ChatGPT 5.4 Pro 才补齐该类情形,说明进展是快速但不平稳的,数学专家的监督与创造仍是关键。

In 1998, Thomas Hales claimed to prove the Kepler conjecture that hexagonal stacking, where each sphere sits in a recess formed by six spheres below, gives the densest possible sphere packing, and it took over ten years of rechecking before peers accepted it, highlighting that proof trust is the major bottleneck in mathematics. The delay came from the requirement that every symbol and proposition be independently validated before formal proof status is granted.
Several AI programs now target that bottleneck by translating natural-language arguments into formal systems and generating formal proofs. Teams around DARPA, DeepMind’s AlphaEvolve/DeepThink, Harmonic’s Aristotle, and Math, Inc.’s Gauss report week-to-multiweek turnaround on hard problems such as 8-dimensional and 24-dimensional sphere packing, with AlphaEvolve being promptable in natural language, yet users still must verify each returned step.
Researchers also report structural limits: LLMs often reason like streaming improvisation rather than planned human proof strategy, and they lag in making non-obvious cross-domain links, aesthetic simplification, or robust transfer across problem types. In a travelling-salesman-style benchmark, Claude solved odd-branch cases but failed on even-branch cases, and only later a model labeled ChatGPT 5.4 Pro was reported to complete those, confirming that AI progress is fast but uneven while mathematicians remain essential for oversight and problem framing.
Source: AI models could offer mathematicians a common language
Subtitle: Some hope they will simplify the process of verifying proofs
Dateline: 4月 09, 2026 04:24 上午