OpenAI 在 2026 年 4 月 21 日星期二下午 3:00 推出新的影像生成模型 ChatGPT Images 2.0。这个模型可以从单一提示词产生多张影像,包含类似一整本学习手册的输出,并且能输出文字,包括中文与印地语等非英语文字。它对 ChatGPT 与 Codex 使用者全域开放,付费订阅者可使用更强大的版本。这次更新被放在模型竞争循环中:类似历次大型影像模型发布,透过网路迷因式分享可推动社群扩散。新版本还可运用推理能力连线网路检索,并把知识截止日更新到 2025 年 12 月。模型也提供更细致的输出控制,允许 3:1 宽到 1:3 高的比例,以及在提示中指定尺寸。
在测试中,影像整体较先前更细致,尤其是英语文字的渲染。模型也处理过一个旧金山次日天气预报与推荐活动的资讯图示提示,输出中出现了看起来一致且与地标对应的视觉元素。文章指出,相较于旧系统常见文字误码与多余符号,这次改善较为明显。文中提到,两年前 ChatGPT 在标注影像文字时表现不稳,这次是可读性与排版构图的明显进步,但它仍只是影像生成领域持续演化的一环。
一项多语言检查显示目前仍有上限:作者要求产生以 Timothée Chalamet 为主题、受其中文粉丝圈美学启发的拼贴,结果得到超过 20 个文字片段,并混合有饺子、波霸奶茶与熊猫等图像。画面风格偏极繁,构图仍偏一致,但在要求翻译时,ChatGPT 回报大量伪造或半乱码字词,以及一些中日混合、字形混乱区段。它自己判断许多文字区域是伪装成中文字句的「假」或「半乱码」,而非完整正确句子。结论是英语文本性能明显提升,但多语言同质化仍未跟上;更多全球使用资料可能仍能在未来版本中推动更快进步。
OpenAI launched the new image-generation model ChatGPT Images 2.0 on Tuesday, April 21, 2026, at 3:00 PM. The model can create multiple images from one prompt, including outputs like a study booklet style set, and can render text, including non-English scripts such as Chinese and Hindi. It is available globally to ChatGPT and Codex users, with a stronger version for paid subscribers. The upgrade is positioned as a step in model competition cycles: like major-image launches before it, it may trigger social diffusion through meme-style user sharing. It also adds reasoning-linked internet lookup and a newer knowledge cutoff of December 2025. The model offers more granular output control, including aspect ratios from 3:1 wide to 1:3 tall and prompt-based size commands.
In testing, images were generally more detailed than before, especially for English text rendering. The model also handled a San Francisco weather-infographic prompt with location-specific landmarks, producing outputs that appeared visually coherent and factually aligned. The model is described as stronger than older image systems where text often appeared as malformed characters or random extra letters. Compared to two years ago, when labeling was poor, this is a clear readability and composition improvement, though still only one stage in an evolving pipeline across image-generation vendors.
A focused multilingual check exposed the current limit: the author requested a Timothée Chalamet-themed collage inspired by his Chinese fan-base aesthetics and got over 20 text snippets plus mixed imagery such as dumplings, boba, and a panda. The image composition was coherent and maximalist in style, but when translated, ChatGPT reported significant gibberish and several malformed Chinese-like or Japanese-mixed characters. It judged many text regions as fake or semi-gibberish despite some accurate parts. The takeaway is that English text performance is clearly improved, while language parity has not caught up; broader global usage data may still drive faster progress in future versions.