论文标题
通过定期奖励构图对所有常见的两体步态进行模拟学习
Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition
论文作者
论文摘要
我们研究了实现具有SIM到真实增强学习(RL)的真实机器人上两足球运动范围的问题。学习腿部运动的一个关键挑战是通过奖励功能来描述不同的步态,这种方式对设计师来说是直观的,并且足够具体,可以可靠地学习跨不同初始随机种子或超级参数的步态。一种常见的方法是使用参考动作(例如,关节位置的轨迹)来指导学习。但是,找到高质量的参考动作可能很困难,轨迹本身会狭窄地限制了学习运动的空间。在另一个极端,无参考的奖励功能通常被指定(例如向前迈进),从而导致政策行为的巨大差异,或者是通过反复试验进行大量奖励成型的产物,使其独有特定步态。在这项工作中,我们提出了一个基于基本力和速度的简单概率定期成本的奖励规格框架。我们实例化此框架以使用直观的设置来定义一个参数奖励函数,以适用于所有常见的双皮步态 - 站立,步行,跳跃,跑步,跑步和跳过。使用此功能,我们证明了成功的SIM到实现步态向两足机器人Cassie的成功传递,以及可以在所有两种步态之间过渡的通用策略。
We study the problem of realizing the full spectrum of bipedal locomotion on a real robot with sim-to-real reinforcement learning (RL). A key challenge of learning legged locomotion is describing different gaits, via reward functions, in a way that is intuitive for the designer and specific enough to reliably learn the gait across different initial random seeds or hyperparameters. A common approach is to use reference motions (e.g. trajectories of joint positions) to guide learning. However, finding high-quality reference motions can be difficult and the trajectories themselves narrowly constrain the space of learned motion. At the other extreme, reference-free reward functions are often underspecified (e.g. move forward) leading to massive variance in policy behavior, or are the product of significant reward-shaping via trial-and-error, making them exclusive to specific gaits. In this work, we propose a reward-specification framework based on composing simple probabilistic periodic costs on basic forces and velocities. We instantiate this framework to define a parametric reward function with intuitive settings for all common bipedal gaits - standing, walking, hopping, running, and skipping. Using this function we demonstrate successful sim-to-real transfer of the learned gaits to the bipedal robot Cassie, as well as a generic policy that can transition between all of the two-beat gaits.
