世界模型正在崛起,成为与 ChatGPT 和 Claude 等聊天机器人并列的独立 AI 前沿,因为它们能以语言模型无法做到的方式理解 3D 空间与物理。文章说,Nvidia Corp.、Alibaba Group 和 Tencent Holdings Ltd. 各自在过去 2 周发布了一款世界模型,显示围绕机器人、智慧眼镜、自驾车与其他系统的物理 AI 出现新的竞争。这个格局在商业上比大型语言模型市场更多元:Tencent 的 HY-World 2.0 是开源的,Nvidia 的模型仅限研究人员使用,而中国看起来远没有在 LLM 竞赛中落后那么多。
技术上的差距在于,聊天机器人很擅长模仿文字,却缺乏物体恒存性与对真实世界的锚定,因此它们可以描述一个房间,却仍可能无法完成判断沙发能否穿过门框,或预测球撞墙后的路径这类任务。各家公司正用不同方式收集真实世界资料:Niantic Spatial 正把 Pokemon Go 的地图资料转化为 Large Geospatial Model,DoorDash 正付钱给零工去拍摄折衣服和洗碗等家务,Instacart 则与 Nvidia 合作,使用装有感测器的购物车收集用于广告和库存管理的资料。Google DeepMind 也在 Genie 3 上押注世界模型,而有些研究者则把目标表述为替语言智能加入行动与模拟能力。
投资与市场前景仍未明朗,但利害关系重大。由 Fei-Fei Li 创立的 World Labs 在 2 月募得 10 亿美元,正在为游戏、虚拟实境和机器人训练打造 Marble,但 Li 表示盈利性仍不明显,华尔街希望看到这项技术成熟为清晰的使用场景。她认为合成资料将至关重要,因为网路上的丰富 3D 资料很稀缺,这意味著未来系统可能会大量用 AI 生成的影像进行训练。更广泛的前景比封闭的美国语言模型市场更开放,也更具地理多样性,而中国尤其可能重要,因为 Barclays 研究人员说,去年全球大约 85% 到 90% 的人形机器人是中国出货的,这让其企业在未来十年的物理 AI 竞赛中可能占有优势。
World models are emerging as a distinct AI frontier alongside chatbots such as ChatGPT and Claude, because they can understand 3D space and physics in ways language models cannot. The article says Nvidia Corp., Alibaba Group, and Tencent Holdings Ltd. each released a world model in the past 2 weeks, signaling new competition around physical AI for robots, smart glasses, self-driving cars, and other systems. The landscape is more commercially varied than the large language model market: Tencent's HY-World 2.0 is open source, Nvidia's model is limited to researchers, and China appears much less behind than it was in the LLM race.
The technical gap is that chatbots are strong text imitators but lack object permanence and real-world grounding, so they can describe a room yet still fail on tasks like judging whether a sofa fits through a doorway or predicting a ball's path after hitting a wall. Companies are trying different ways to collect real-world data: Niantic Spatial is turning Pokemon Go's mapping data into a Large Geospatial Model, DoorDash is paying gig workers to film chores such as folding laundry and washing dishes, and Instacart is using a sensor-filled shopping trolley with Nvidia to gather data for advertising and inventory management. Google DeepMind is also betting on world models with Genie 3, while some researchers frame the goal as adding action and simulation capability to language intelligence.
The investment and market picture remains unsettled, but the stakes are large. World Labs, founded by Fei-Fei Li, raised $1 billion in February and is building Marble for gaming, virtual reality, and robotics training, yet Li says profitability is still not obvious and Wall Street wants to see the technology mature into clear use cases. She argues synthetic data will be critical because rich 3D data is scarce online, implying future systems may train heavily on AI-generated footage. The broader outlook is more open and geographically diverse than the closed American language-model market, and China could matter especially because Barclays researchers say it shipped roughly 85% to 90% of the world's humanoid robots last year, giving its firms a potential advantage in the next decade of physical AI.