← 返回 Avalaches

RunSybil 的共同创办人 Vlad Ionescu 与 Ariel Herbert-Voss 表示,他们的 AI 工具 Sybil 在 2025 年 11 月(原文:last November)于客户系统中标记出一个 federated GraphQL 部署弱点,可能透过 API 的资料存取路径意外暴露机密资讯。他们困惑之处在于,辨识该问题需要对多个系统及其交互作用具备深度知识。团队称他们检索网路后未见任何公开记载,且之后在其他 GraphQL 部署中也发现同类问题,将其视为模型推理能力的「阶跃式」提升。

文中主张这种进展同时扩大风险:能找出漏洞的智慧也能用来利用漏洞。UC Berkeley 的 Dawn Song 指出,近月(原文:last few months)前沿模型的资安能力「大幅」提升,原因包括 simulated reasoning(将问题拆解为组成部分)与 agentic AI(例如搜寻网路、安装并执行软体工具)。她称这是「inflection point」。文章时间标记为 2026 年 1 月 14 日 2:00 PM(原始资料;未提供时区;因此无法确定对应 UTC+8 时间)。

Song 去年共同建立 CyberGym 基准测试,涵盖 188 个大型开源专案中的 1,507 个已知漏洞,用来衡量大型语言模型找漏洞的能力。结果显示:Anthropic 的 Claude Sonnet 4 在 2025 年 7 月可找出约 20%,到 2025 年 10 月的新模型 Claude Sonnet 4.5 可辨识 30%,增加 10 个百分点,约等于相对提升 50%。她提出的防御侧对策包括:在发布前让资安研究者先取得模型以提前找 bug,以及以 secure-by-design 思路用 AI 生成更安全的程式码;RunSybil 则警告短期内攻击者可能因模型能自动生成电脑操作与程式码而取得上风。

RunSybil cofounders Vlad Ionescu and Ariel Herbert-Voss say their AI tool Sybil flagged a weakness in a customer’s federated GraphQL deployment in Nov 2025 (original: “last November”), potentially exposing confidential data via an API access path. They were struck that finding it required deep knowledge of multiple systems and their interactions. They also claim they could not find the issue documented online and later saw the same pattern in other GraphQL deployments, treating it as a step-change in model reasoning.

The article argues this progress raises dual-use risk: the same intelligence that detects vulnerabilities can help exploit them. UC Berkeley computer scientist Dawn Song says frontier-model cyber capabilities have increased drastically in the last few months (original phrasing), driven by simulated reasoning (problem decomposition) and agentic AI (for example, web search plus installing and running tools). She calls the moment an “inflection point.” The piece is dated Jan 14, 2026 2:00 PM (original; timezone not stated; UTC+8 conversion cannot be determined).

Song’s lab co-created the CyberGym benchmark with 1,507 known vulnerabilities across 188 large open-source projects to measure how well models find flaws. Reported results: Anthropic’s Claude Sonnet 4 found about 20% in Jul 2025; Claude Sonnet 4.5 identified 30% in Oct 2025, a +10 percentage-point gain (~50% relative). Proposed defenses include pre-release model sharing with security researchers and secure-by-design AI-generated code, while RunSybil warns near-term offense may outpace defense as models automate computer actions and code generation.

2026-01-16 (Friday) · fe52fa4f33e5206c1187ff57b1726d2e9e5e8b57