PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement Learning

Learning good feature representations is important for deep reinforcement learning (RL). However, with limited experience, RL often suffers from data inefficiency for training. For un-experienced or less-experienced trajectories (i.e., state-action sequences), the lack of data limits the use of them for better feature learning. In this work, we propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning. Specifically, PlayVirtual predicts future states in the latent space based on the current state and action by a dynamics model and then predicts the previous states by a backward dynamics model, which forms a trajectory cycle. Based on this, we augment the actions to generate a large amount of virtual state-action trajectories. Being free of groudtruth state supervision, we enforce a trajectory to meet the cycle consistency constraint, which can significantly enhance the data efficiency. We validate the effectiveness of our designs on the Atari and DeepMind Control Suite benchmarks. Our method achieves the state-of-the-art performance on both benchmarks.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Continuous Control (500k environment steps) DeepMind Ball in cup Catch (Images) PlayVirtual Return 967 # 1
Continuous Control (100k environment steps) DeepMind Ball in cup Catch (Images) PlayVirtual Return 926 # 1
Continuous Control (100k environment steps) DeepMind Cartpole Swingup (Images) PlayVirtual Return 816 # 1
Continuous Control (500k environment steps) DeepMind Cartpole Swingup (Images) PlayVirtual Return 865 # 1
Continuous Control (500k environment steps) DeepMind Cheetah Run (Images) PlayVirtual Return 719 # 1
Continuous Control (100k environment steps) DeepMind Cheetah Run (Images) PlayVirtual Return 474 # 1
Continuous Control (500k environment steps) DeepMind Finger Spin (Images) PlayVirtual Return 963 # 1
Continuous Control (100k environment steps) DeepMind Finger Spin (Images) PlayVirtual Return 915 # 1
Continuous Control (500k environment steps) DeepMind Reacher Easy (Images) PlayVirtual Return 942 # 1
Continuous Control (100k environment steps) DeepMind Reacher Easy (Images) PlayVirtual Return 785 # 1
Continuous Control (100k environment steps) DeepMind Walker Walk (Images) PlayVirtual Return 460 # 1
Continuous Control (500k environment steps) DeepMind Walker Walk (Images) PlayVirtual Return 928 # 1

Methods


No methods listed for this paper. Add relevant methods here