2015 年,我们回顾双足人形机器人时,画面更接近“奥威尔式”失衡:Boston Dynamics 的 Spot(四足)在楼梯与外力恢复上表现稳定,而人形机器人仍常失足。到 2026 年,虽然 Tesla 将资源转向 Optimus、初创公司还在销售类 Android butler 概念,但即使是被视为最具代表性的 Atlas 与 Digit,Scott Kuindersma 与 Jonathan Hurst 仍指出它们「并未可靠」解决所有楼梯与门道问题。换言之,过去十年有显著进步,但对日常、可通用的双足操作而言,解决度仍不足。
造成这种进展与停滞同时存在的关键,是三次关键范式转换:第一是深度学习(配合快速 GPU)使视觉与强化学习大幅提速;第二是 2016 年以来的主动致动革新,重型液压被较小型、具本体感回馈的电机(proprioceptive motors)取代;第三是 2023 年 DeepMind 的 VLA(vision-language-action)模型把感知、规划、控制串成统一流程。这些组合让 Atlas 从 2015 年 DARPA 走形赛的“蹒跚者”变为可流畅舞步并自主搬运物件,从体态协调与速度上出现夜与昼差,但仍未形成万能解。
核心瓶颈在于物理,尤其是力控制。研究者普遍认为,若要达到人类式灵巧,必须掌握力与惯性的机械控制;经典力控制已有超过 40 年历史,但学术上很少被现代机器学习自然融合。虽然 quasi-direct drive 类致动器可提升透明度并让控制更稳,强化学习仍多在模拟中做约一百万次反复,学到的多是位置策略而非显式力策略;同时,许多影片与示教资料缺乏关键受力讯号。机体仍较重且不够顺应,面对脆弱物件(如鸡蛋)难保安全,导致仍需降速补偿。Kuindersma、Hurst、Parada 与 Tedrake 均认为:只做 position-based control 不可能支撑“有用、可靠、通用”的人形机器人;未来需硬体与软体、传感器与资料收集的新组合,但何时真正解决仍不确定。
When I last reviewed humanoid robotics in 2015, the field looked close to “Orwellian”: Boston Dynamics’ Spot (quadruped) was stable, while bipeds frequently fell. By 2026, despite headlines about Tesla prioritizing Optimus and startup Android-butler marketing, experts said flagship systems still have limits. In interviews, Scott Kuindersma and Jonathan Hurst stated that neither Atlas nor Digit can reliably handle every stairs-and-door case. So the decade from 2015 to 2026 brought major gains, yet routine household-scale bipedal behavior is still not robustly solved.
The transition came through three technological shifts. First, deep learning on fast GPUs improved computer vision and reinforcement learning, speeding interaction and control. Second, after 2016, actuators changed from heavy hydraulic systems to smaller proprioceptive electric motors with better compliance-like behavior. Third, in 2023 DeepMind introduced vision-language-action (VLA) models, integrating perception, planning, and control in one multimodal pipeline. These changes explain why Atlas evolved from the 2015 DARPA era’s unstable “Running Man” style gait to demonstrations of smoother movement, breakdancing, and object transfer under disturbance. The gains are large, but not complete.
The remaining barrier is physics, especially force control. Multiple researchers argued that true human-like capability requires mastering force and inertia, not just pose tracking. Force control is older than 40 years in robotics, but machine-learning systems mostly learn positional policies. Kuindersma, Kim, and others noted that quasi-direct-drive designs simplify hardware and actuation, while reinforcement learning now iterates policies in millions of simulations; yet force is often only implicit, because video and demonstration data often omit force signals. Heavy, relatively stiff machines still struggle with delicate manipulation unless they slow down. All interviewees agreed: purely position-based control cannot deliver reliable all-purpose dexterity. Progress likely needs a new blend of hardware, tactile sensing, and learning that encodes force as a first-class variable, and no one expects it to be immediate.