Anthropic 面临一个核心悖论…

Anthropic 面临一个核心悖论：它在主要 AI 公司中对安全问题投入最多，但仍在加速推进更高风险的下一阶段 AI。2026 年 1 月发布的两份文件承认了这一矛盾。《The Adolescence of Technology》由 CEO Dario Amodei 撰写，全文超过 20,000 字，强调在威权滥用高概率背景下 AI 风险的严重性，却仅以有限篇幅讨论解决路径；语调从早期乐观转为明显阴郁，但仍以历史上“人类最终取胜”的判断收尾。

第二份文件《Claude’s Constitution》将解决方案指向模型本身，即聊天机器人 Claude。它更新了 Anthropic 的“宪法式 AI”方法，从依赖外部规则文本转向一个长提示式伦理框架，要求模型运用“独立判断”在有用性、安全性与诚实性之间权衡。该版本由哲学博士 Amanda Askell 主导，主张理解规则背后的理由比机械遵守更稳健，并明确期望模型能形成快速、直觉化的伦理权衡能力。

Anthropic 的设想是让 Claude 不仅匹配而且可能超越“最好的人类判断”，以此化解“若危险为何仍建造”的行业困境。类似思路也出现在 Sam Altman 的公开表态中，他称 OpenAI 未来可能将领导权交给 AI。乐观图景中，AI 将更高效且更具同理心地管理组织；悲观情形则是模型被恶意操纵或滥用自治权。无论结果如何，Anthropic 至少给出了一个以模型自身“智慧增长”为核心的方案。

Anthropic faces a central paradox: among major AI firms it invests the most in safety, yet it is accelerating toward a more dangerous next stage of AI. Two documents released in January 2026 acknowledge this tension. “The Adolescence of Technology,” written by CEO Dario Amodei, runs over 20,000 words, stressing the severity of AI risks under a high probability of authoritarian abuse while devoting comparatively little space to solutions; its tone shifts from earlier optimism to gloom, ending with the claim that humanity has always prevailed.

The second document, “Claude’s Constitution,” points the solution inward, to the chatbot Claude itself. It updates Anthropic’s Constitutional AI by moving from external rule texts to a long prompt-style ethical framework that instructs the model to exercise “independent judgment” when balancing helpfulness, safety, and honesty. Led by philosophy PhD Amanda Askell, the revision argues that understanding why rules exist is more robust than rote compliance and explicitly aims for rapid, intuitive ethical reasoning.

Anthropic’s wager is that Claude can match and eventually surpass the best human judgment, resolving the industry’s question of why to build something deemed dangerous. Similar thinking appears in statements by Sam Altman, who has suggested that OpenAI may one day hand leadership to an AI. Optimistically, AI leaders would manage institutions more efficiently and empathetically; pessimistically, they could be manipulated or abuse autonomy. Either way, Anthropic offers a plan centered on the model’s own accumulation of “wisdom.”