1 code implementation • 19 Oct 2022 • Chengqian Gao, Ke Xu, Liu Liu, Deheng Ye, Peilin Zhao, Zhiqiang Xu
A promising paradigm for offline reinforcement learning (RL) is to constrain the learned policy to stay close to the dataset behaviors, known as policy constraint offline RL.
no code implementations • 15 Oct 2021 • Chengqian Gao, Ke Xu, Kuangqi Zhou, Lanqing Li, Xueqian Wang, Bo Yuan, Peilin Zhao
To alleviate the action distribution shift problem in extracting RL policy from static trajectories, we propose Value Penalized Q-learning (VPQ), an uncertainty-based offline RL algorithm.