David Silver 曾在 20…

David Silver 曾在 2016 年于 Google DeepMind 开发 AlphaGo，该系统展现了超越模仿的围棋掌握力，如今他认为当前 AI 正走在一条错误路径，并主张以强化学习（reinforcement learning）而非大型语言模型（LLM）的模仿方式来推进 superintelligence。他把人类资料比作“化石燃料”的捷径，指出能自我学习的系统可像可再生能源般无限迭代。据他，LLM 主要依赖书本与网路文本资料进行输入，仍可能被 human priors 所限制，因此他用“如果回到相信地球是平的世界”作为思想实验。Silver 让新公司 Ineffable Intelligence 寻求可自行发现科学、科技、政府与经济新知的代理人，目标是超越 human intelligence。

量化资讯方面，Ineffable Intelligence 已募得 11 亿美元种子轮资金，估值达 51 亿美元，被形容为欧洲 AI 市场下的巨额资金。公司计划让 AI 代理人在模拟环境中透过试误学习并协作，以建立更广域的智能。Silver 已招募 Google DeepMind 及其他前沿实验室的研究者，并表示若公司成功，自己从股权取得的资金可达数十亿美元，将全部捐给高冲击慈善机构。Ravi Mhatre 和 Sonya Huang 都认为 Silver 的研究轨迹具有一致性：在缺少 human priors 的前提下可放大智能；且其个人声誉与研究实绩是吸引人才的关键。

文章把论点放回历史：从 Alan Turing 时代起，机器透过经验学习的构想就已存在。Rich Sutton 与 Andrew Barto 在 2025 年因早期强化学习研究获得图灵奖，成为理论支撑。当前业界主要投入于 LLM 相关算力与基础建设，且同时伴随 bubble 警示；投资人则认为模拟技术与 compute 的进步提升了 Silver 方案的可行性。核心风险仍是对齐（alignment）：模拟训练出的 AI 可能学会对其目标函数最优但与人类偏好不一致的行为。Silver 仍主张模拟可观察 agent 与较低智慧体互动的行为，作为安全与价值校准的一个先行机制。

David Silver, who developed AlphaGo at Google DeepMind in 2016, now argues that current AI is taking the wrong path and says reinforcement learning should drive superintelligence rather than LLM-based imitation. He compares human data to a fossil-fuel shortcut, while self-learning systems can iterate endlessly like renewable energy. He claims LLMs, trained from books and web text, are still constrained by human priors, using a flat-earth thought experiment to illustrate this limitation. Silver’s new firm, Ineffable Intelligence, is therefore aiming for AI agents that can discover science, technology, government, and economics on their own and move beyond human intelligence.

Quantitatively, Ineffable Intelligence has raised $1.1 billion in seed funding and reached a valuation of $5.1 billion, described as very large by European AI standards. The company intends to train agents in simulations through trial and error and collaboration to build broader general intelligence. Silver has recruited researchers from DeepMind and other frontier labs, and he says any returns from his equity—potentially reaching billions if successful—will be donated to high-impact charities. Supporters such as Ravi Mhatre and Sonya Huang frame Silver as a rare, world-class researcher and credit the vision of scaling intelligence with fewer human priors, while also citing his ability to attract top talent.

The article places this claim in historical context: the idea of machine intelligence through experience predates modern AI and dates back to Alan Turing-era thinking. The field’s foundation was reinforced when Rich Sutton and Andrew Barto won the 2025 Turing Award for early reinforcement learning work. The current AI race is driven by heavy spending on compute and infrastructure, with some discussing bubble risk. Yet investors point to simulation maturity and compute growth as reasons to expect progress in Silver’s path. The key risk remains alignment: simulation-trained agents might optimize for goals misaligned with human values. Silver argues that simulation allows visible behavioral testing of interactions, especially with lower-capability agents, which may support safer value alignment.