← 返回 Avalaches

一种名为Absolute Zero Reasoner(AZR)的新方法展示了人工智能如何通过“自问自答”来学习,而非依赖人类示例或人工标注任务。该系统由清华大学、北京通用人工智能研究院和宾夕法尼亚州立大学的研究人员开发,使用同一个大语言模型先生成具有挑战但可验证的Python编程问题,再自行求解并通过运行代码来检验结果,随后利用成功与失败的反馈反向改进模型。

实验结果显示,这一自博弈式学习方法显著提升了开源语言模型Qwen在7亿和140亿参数规模下的编程与推理能力,甚至超过了一些使用人工整理数据训练的模型。研究人员指出,该机制的一个关键特征是可扩展性:随着模型能力增强,其提出的问题难度也会同步提升,更接近人类从模仿走向自主探索的学习路径。

目前,该方法主要适用于结果易于验证的领域,如数学和编程,但研究团队认为未来可扩展到更复杂的代理型任务。随着传统训练数据日益稀缺、成本上升,这类无需人类输入、依靠自生成问题学习的范式正受到大型AI实验室关注。研究者认为,这种能力或将使模型超越人类教学,为迈向“超级智能”提供一种潜在路径。

A new approach called Absolute Zero Reasoner (AZR) shows how AI can learn by asking itself questions rather than relying on human examples or labeled tasks. Developed by researchers from Tsinghua University, the Beijing Institute for General Artificial Intelligence, and Pennsylvania State University, the system uses a single large language model to generate challenging but checkable Python coding problems, solve them, verify the solutions by running code, and then use success or failure as feedback to refine the model.

Experiments show this self-play style learning significantly improves the coding and reasoning abilities of the open-source Qwen model at both 7 billion and 14 billion parameters, even outperforming some models trained with human-curated data. A key feature is scalability: as the model becomes more capable, the difficulty of the questions it generates also increases, resembling how humans move from imitation to independent inquiry.

For now, the method works best in domains with clear verification, such as math and programming, but researchers believe it could extend to more complex agent-based tasks. As conventional training data grows scarcer and more expensive, approaches that allow models to learn without human input are gaining attention at major AI labs. Proponents argue that such systems could eventually surpass human teaching and point toward a path to superintelligence.

2026-01-11 (Sunday) · b1ba0ad7f8a56153eafb494ef9acda669053ad56