← 返回 Avalaches

Anthropic 在 2026 年 6 月 10 日撤回 Claude Fable 5 的一项隐蔽防护政策;该政策原本会在使用者以模型进行前沿 LLM 开发时,无提示地降低性能。公司承认「取舍错误」,并改为让防护可见:若侦测到使用者尝试建构高能力 AI,系统将明示拒绝或转接至较低能力模型。

原政策与其他安全措施不同:网路安全、生物学、化学查询会被转接以降低攻击或生物武器风险,但 AI 研究用途则会被秘密降级。批评者如 Dean Ball 指出,对机器学习研究「不告知使用者」地降低性能极具敌意;Will Brown 则称此举可能让少数顶尖 AI 实验室垄断先进研究,并削弱开源与第三方评测生态。

Anthropic 辩称,Claude 已愈来愈能加速 AI 研究,因此防护旨在防止外国对手利用最强模型侵蚀美国及其盟友在前沿晶片与最佳化软体上的优势。公司同时承认,防护改为可见后必须涵盖更广范围,可能使更多良性请求被误触发,并表示正快速提升分类器精准度。

Anthropic reversed a hidden safeguard policy for Claude Fable 5 on June 10, 2026; the policy would have silently degraded performance when users employed the model for frontier LLM development. The company admitted it made the wrong trade-off and changed the safeguard to be visible: if it detects attempts to build highly capable AI, the system will explicitly refuse or reroute the request to a less capable model.

The disputed policy differed from other safety measures: cybersecurity, biology, and chemistry queries would be rerouted to reduce attack or bioweapon risks, but AI research use would be secretly degraded. Critics such as Dean Ball said degrading machine-learning research without telling users was hostile; Will Brown argued it could let a small number of leading AI labs dominate advanced research and weaken open-source and third-party evaluation ecosystems.

Anthropic argued that Claude has become increasingly able to accelerate AI research, so the safeguard was meant to stop foreign adversaries from using its strongest models to erode the US and allied advantage in frontier chips and optimized software. The company also acknowledged that making the safeguard visible requires a wider net, may trigger more benign requests, and said it is rapidly improving classifier precision.

2026-06-14 (Sunday) · aa68e2e3bd8a54215acfaf52b579c7720fc56cf2