SenseTime(2014 年成立)于 2026-04-29(周二)开源释出影像生成与理解模型 SenseNova U1,主张在速度上明显快于美国竞品的顶尖模型;其关键在于可「原生」读取影像、免先转为文字,因而缩短推理流程并降低所需算力。共同创办人兼首席科学家 Dahua Lin 指出,推理不再受限于文字,模型可直接以影像推理,并预期此类能力将提升机器人对物理世界的理解效率与反应速度。
为因应美国出口管制限制取得最先进 AI 晶片(特别是训练用晶片,多由 Nvidia 等西方公司供应),SenseTime 强调 U1 可在中国制晶片上运作;发布当日共有 10(ten)家中国晶片设计公司(含 Cambricon、Biren Technology)宣布其硬体支援 U1。Lin 表示将持续推动在更多不同晶片上进行训练,但亦承认为确保迭代速度,仍「可能需要」使用最佳晶片。
U1 于 Hugging Face 与 GitHub 免费提供,反映中国公司在开源 AI 的活跃度;SenseTime 亦希望借此追赶 DeepSeek、MiniMax 等新创及西方玩家。技术报告称 U1 在「现有开源模型」中影像品质最佳,与 Alibaba 的 Qwen、ByteDance 的 Seedream 等中国闭源模型相当,但仍落后于一周前推出的 GPT-Image-2.0;其主要卖点为更快的生成速度,核心结构为 NEO-Unify。Hugging Face 研究者 Adina Yakefu 认为其架构更具野心但仍有实务挑战;模型亦小到可在 PC 与手机上执行,并被定位为机器人(含与 ACE Robotics 的合作)及地理空间理解/真实世界模拟的潜在基础。
SenseTime (founded in 2014) open-sourced an image generation and understanding model, SenseNova U1, on Tuesday, 2026-04-29, claiming markedly faster performance than top models from US competitors; the key is that it can “read” images natively without first translating them into text, shortening inference and reducing compute needs. Cofounder and chief scientist Dahua Lin said the reasoning process is no longer limited to text because the model can reason with images, and he argued that such direct visual processing could improve robots’ future understanding of the physical world and response speed.
With US export controls limiting Chinese firms’ access to the most advanced AI chips—especially training chips largely supplied by Western companies such as Nvidia—SenseTime says U1 can run on Chinese-made chips; on release day, 10 (ten) Chinese chip designers, including Cambricon and Biren Technology, announced hardware support for U1. Lin said SenseTime will keep pushing training across more different chips, while acknowledging it may still need the best chips to maintain iteration speed.
U1 was released for free on Hugging Face and GitHub, aligning with a broader rise in Chinese participation in open-source AI, and SenseTime hopes public release will help it catch up with domestic and Western rivals after falling behind newer Chinese startups like DeepSeek and MiniMax. In its technical report, SenseTime claims U1 produces higher-quality images than other open-source models and is comparable to leading Chinese closed models such as Alibaba’s Qwen and ByteDance’s Seedream, yet still trails GPT-Image-2.0 released a week earlier; its main selling point is much faster image generation via a new architecture called NEO-Unify. Hugging Face researcher Adina Yakefu called the approach more ambitious but practically challenging; the model is also small enough for PCs and phones and is positioned for robotics use cases (including work with ACE Robotics) and for geospatial understanding and real-world simulation.