2015年的人形机器人状态以频繁跌倒…

2015年的人形机器人状态以频繁跌倒为特征，而四足机器人（如 Boston Dynamics 的 Spot）表现更稳定。十年后，即到2026年，人形机器人已明显进步：公司开始预售机器人助手，Tesla 甚至为其 Optimus 机器人调整产品战略。然而核心能力仍未完全解决。研究人员指出，即使是最先进的人形机器人——Boston Dynamics 的 Atlas 和 Agility Robotics 的 Digit——也无法“可靠地”处理所有楼梯或门。2015 年 DARPA Robotics Challenge 中笨重的 Atlas 曾获得第二名，而最新版本能够跳舞和自主搬运物体，但现实环境中的通用稳定性仍然有限。

这一进步主要来自三个技术转折。第一是深度学习与 GPU 计算，使机器人视觉和强化学习显著提升。研究人员通过在模拟中运行数百万次试验训练神经网络策略，让单一“全身控制器”同时处理平衡、碰撞避免和协调。第二是约 2016 年出现的本体感知电机（quasi-direct drive actuators），用更轻、更具弹性的电机取代液压系统，使机器人能够承受误差和冲击。第三是 2023 年提出的视觉-语言-行动（VLA）模型，它将视频输入与自然语言结合，直接生成动作步骤，使机器人能够规划多步骤任务，如清空洗碗机或准备食物。

尽管如此，人形机器人仍未被“科学解决”，关键瓶颈是对力与惯性的控制。经典机器人学在 40 多年前就能通过弹簧-阻尼模型实现精确力控制，但这种方法需要大量关于环境和任务的先验知识，难以泛化。现代强化学习主要学习位置策略，力调节往往只是间接副作用，而机器人身体通常比人体更刚性且惯性更高，因此在精细接触任务中容易产生错误。当前系统常通过降低速度来避免力问题。研究者普遍认为，需要更好的触觉传感器、数据规模和学习方法，将力作为核心变量，才能实现真正通用的人形机器人。

The state of humanoid robots in 2015 was characterized by frequent falls, while quadruped robots such as Boston Dynamics’ Spot appeared far more stable. A decade later, by 2026, humanoids have improved markedly: companies are preselling robotic assistants and Tesla has shifted product strategy toward its Optimus robot. Yet core capabilities remain unresolved. Researchers note that even the most advanced humanoids—Boston Dynamics’ Atlas and Agility Robotics’ Digit—cannot handle every staircase or doorway reliably. The bulky Atlas that placed second in the 2015 DARPA Robotics Challenge contrasts with the modern version that can dance and autonomously move objects, but general robustness in real environments remains limited.

This progress stems from three technological shifts. The first is deep learning powered by GPUs, which dramatically improved robot vision and reinforcement learning. Researchers train neural network policies through millions of simulations so a single “whole-body controller” coordinates balance, collision avoidance, and motion. The second shift came around 2016 with proprioceptive electric motors, or quasi-direct drive actuators, replacing heavy hydraulics and allowing robots to tolerate errors and impacts. The third shift is the 2023 introduction of vision-language-action (VLA) models, combining video perception and natural language to produce action commands, enabling robots to plan multistep tasks such as unloading dishwashers or preparing food.

Despite these advances, humanoids remain scientifically unsolved because of force and inertia control. Classical robotics achieved precise force regulation more than 40 years ago using spring-damper models, but those methods require detailed knowledge of the robot, environment, and task, limiting generalization. Modern reinforcement learning mainly learns position policies, while force regulation often emerges only indirectly. Robot bodies are typically stiffer and possess higher inertia than human bodies, increasing risk in delicate contact tasks. Current systems often compensate by moving slowly. Many researchers argue that better tactile sensing, larger datasets, and learning frameworks that treat force as a primary variable are necessary for truly general humanoid robots.