← 返回 Avalaches

OpenAI 于 2025 年 12 月 11 日发布 GPT‑5.2,称其为「迄今最佳模型」,此前 CEO 在公司内部宣布「code red」,要求在竞争加剧

OpenAI 于 2025 年 12 月 11 日发布 GPT‑5.2,称其为「迄今最佳模型」,此前 CEO 在公司内部宣布「code red」,要求在竞争加剧下把资源重新集中到 ChatGPT。此次以三个型号推出:Instant(更快、偏资讯搜寻)、Thinking(更强于编码/数学/规划)、Pro(最强、难题更高准确度)。压力主要来自 Google 的 Gemini 3 与其应用快速成长:月活用户超过 6.5 亿;相较之下,OpenAI 约为每周活跃 8 亿。

在 OpenAI 的 GDPval 基准(涵盖 44 种职业)上,GPT‑5.2 Thinking 取得目前最高分;OpenAI 指其在超过 70% 的任务中胜过人类专业人士,且完成速度约快 11 倍。公司也宣称事实性错误下降:在回答事实问题的基准上,Thinking 的幻觉率比 GPT‑5.1 低 38%。该系列同时提供给 ChatGPT 用户与 API 开发者,主打日常专业写作、编码与推理的整体提升。

但基准分数无法完全反映聊天体验:OpenAI 于 2025 年推出 GPT‑5 后,因回复更「冷」引发反弹,数天内又更新以变得更「温暖」,同时仍需避免过度迎合(sycophancy)。安全与成长的拉扯持续加剧:一份 10 月报告称每周有超过 100 万人与 ChatGPT 讨论自杀,之后也出现相关领导人员变动。另有内部备忘录设定在 2026 年前把日活提高 5% 的目标;同时推进对推定未满 18 岁用户的年龄预测保护,并计划在 2026 年第一季推出「adult mode」。

OpenAI released GPT‑5.2 on December 11, 2025, calling it its “best model yet,” after an internal “code red” to refocus resources on ChatGPT amid tighter competition. The rollout spans Instant (speed/info-finding), Thinking (coding/math/planning), and Pro (highest power). Pressure is driven by Google’s well‑received Gemini 3 and rapid app growth, now over 650 million monthly active users, versus OpenAI’s roughly 800 million weekly active users.

On OpenAI’s GDPval benchmark across 44 occupations, GPT‑5.2 Thinking achieved the highest scores so far, beating human professionals on over 70% of tasks while finishing about 11× faster. OpenAI also reports fewer factual errors: Thinking hallucinated 38% less than GPT‑5.1 on factual‑question benchmarks. The models ship to both ChatGPT and the OpenAI API, aiming to improve everyday professional writing, coding, and reasoning.

Benchmarks don’t capture chat “feel”: GPT‑5 drew backlash in 2025 for colder replies and needed a warmth update days later, while OpenAI tries to avoid sycophancy. Safety and growth tensions remain acute—an October report said more than 1 million people discuss suicide with ChatGPT each week, and leadership turnover followed. A memo set a target to raise daily active users 5% before 2026, alongside age‑prediction protections for under‑18 users and a planned “adult mode” in Q1 2026.

2025-12-14 (Sunday) · 4e059be1ea6bb5832820e09ba7208b17732af13b