URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. Building on the DeepMind Control Suite, it provides twelve continuous control tasks from three domains for evaluation.
26 PAPERS • 8 BENCHMARKS
The bipedal skills benchmark is a suite of reinforcement learning environments implemented for the MuJoCo physics simulator. It aims to provide a set of tasks that demand a variety of motor skills beyond locomotion, and is intended for evaluating skill discovery and hierarchical learning methods. The majority of tasks exhibit a sparse reward structure.
2 PAPERS • NO BENCHMARKS YET