Revisiting Reward Design and Evaluation for Robust Humanoid Standing and Walking

Stand and Walk Controller Videos

Full video

Summary video

Abstract

A necessary capability for humanoid robots is the ability to stand and walk while rejecting natural disturbances. Recent progress has been made using sim-to-real reinforcement learning (RL) to train such locomotion controllers, with approaches differing mainly in their reward functions. However, prior works lack a clear method to systematically test new reward functions and compare controller performance through repeatable experiments. This limits our understanding of the trade-offs between approaches and hinders progress.

To address this, we propose a low-cost, quantitative benchmarking method to evaluate and compare the real-world performance of standing and walking (SaW) controllers on metrics like command following, disturbance recovery, and energy efficiency. We also revisit reward function design and construct a minimally constraining reward function to train SaW controllers.

We experimentally verify that our benchmarking framework can identify areas for improvement, which can be systematically addressed to enhance the policies. We also compare our new controller to state-of-the-art controllers on the Digit humanoid robot. The results provide clear quantitative trade-offs among the controllers and suggest directions for future improvements to the reward functions and expansion of the benchmarks.

Clips

Transitioning between commands

Shaking the robot

Pulling the robot

Random pushes

Strong push backwards

Lateral disturbance rejection

Impulse Application Device

Parts List:

2 Round electromagnets (700N 12V 59x34mm) - $22.99 on Amazon
USB relay board (12V 8 channel) - $14.22 on Amazon
Linear solenoid electromagnet (60N 12V 10mm) - $24.49 on Amazon
Latch mechanism
Rope + rollers
Mounting hardware (L-shaped main steel plate and 2x L-bracket)

Construction and Operation:

The impulse application device is constructed using two electromagnets controlled by a relay board. The magnets hold a weight in place, and when the electricity to them is cut off, the weight drops, initiating the pull on the robot. A latch mechanism, operated by a solenoid, is incorporated into the design. This solenoid is automatically activated by the same relay board. When the solenoid activates the latch, it disconnects the rope, allowing the robot to move freely again after the impulse has been applied. The relay board is controlled by a Python script running on a lab computer.

BibTeX

@article{bart2024robustSaW,
  author    = {Bart van Marum, Aayam Shrestha, Helei Duan, Pranay Dugar, Jeremy Dao, Alan Fern},
  title     = {Revisiting Reward Design and Evaluation for Robust Human Standing and walking},
  journal   = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year      = {2024},
}